Song-chun Zhu
162 papers · 2011–2026 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (11) π§ Keyword Pioneer π Renaissance Researcher (6) π Interdisciplinary Bridge π£ Hot Topic Early Bird
π
Academic Marathon
(14)
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Conference Loyalist
(27)
π
Keyword Trendsetter Combo
(8)
π€
Dynamic Duo
(42)
π
Triple Crown
π±
Topic Pioneer
π
Grand Slam
π¬
Deep Specialist
(29)
π§¬
Topic Evolution
π
Keyword Champion
(4)
π₯
Unstoppable
(13)
β‘
Prolific Year
(14)
β
The Questioner
π
Century Club
(158)
ποΈ
Keyword Collector
(638)
π
Trend Setter
π
Conference Pioneer
Conferences
CVPR (42)
NIPS (27)
ICCV (22)
AAAI (19)
ICLR (12)
ACL (9)
ICML (9)
ECCV (6)
EMNLP (5)
IJCAI (4)
IJCNLP (3)
CORL (2)
NAACL (1)
RSS (1)
Top co-authors
Research topics
Keywords
energy-based model
(17)
generative model
(17)
markov chain monte carlo
(14)
object detection
(9)
scene understanding
(9)
video understanding
(8)
and-or graph
(7)
reinforcement learning
(6)
unsupervised learning
(6)
visual reasoning
(6)
large language model
(6)
representation learning
(6)
convolutional neural network
(6)
3d reconstruction
(5)
multi-agent system
(5)
symbolic reasoning
(5)
visual question answering
(5)
neural network
(5)
human pose estimation
(4)
hierarchical model
(4)
Papers
v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound
ACL 2026
TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents
AAAI 2026
JurisBench: A Deep Benchmark for Assessing Large Language Models in Professional Legal Practice
ACL 2026
Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation
AAAI 2026
ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models
CORL 2025
Differentiable Information Enhanced Model-Based Reinforcement Learning
AAAI 2025
Enhancing LLM-Based Social Bot via an Adversarial Learning Framework
EMNLP 2025
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
ICLR 2025
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
ICLR 2025
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection
ACL 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
CVPR 2025
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
CVPR 2025
METASCENES: Towards Automated Replica Creation for Real-world 3D Scans
CVPR 2025
Fast Peer Adaptation with Context-aware Exploration
ICML 2024
INTERPRET: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning
RSS 2024
ProAgent: Building Proactive Cooperative Agents with Large Language Models
AAAI 2024
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
NIPS 2024
Learning to Balance Altruism and Self-interest Based on Empathy in Mixed-Motive Games
NIPS 2024
RulE: Knowledge Graph Reasoning with Rule Embedding
ACL 2024
CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
CVPR 2024
LangSuitΒ·E: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments
ACL 2024
Neural-Symbolic Recursive Machine for Systematic Generalization
ICLR 2024
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World
ICLR 2024
CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents
ICLR 2024
Mars: Situated Inductive Reasoning in an Open-World Environment
NIPS 2024
PhyRecon: Physically Plausible Neural Scene Reconstruction
NIPS 2024
AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making
NIPS 2024
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning
ICML 2024
An Embodied Generalist Agent in 3D World
ICML 2024
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics
ICLR 2023
Learning non-Markovian Decision-Making from State-only Sequences
NIPS 2023
Evaluating and Inducing Personality in Pre-trained Language Models
NIPS 2023
Learning Energy-Based Prior Model with Diffusion-Amortized MCMC
NIPS 2023
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
NIPS 2023
Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning
NIPS 2023
On the Complexity of Bayesian Generalization
ICML 2023
SQA3D: Situated Question Answering in 3D Scenes
ICLR 2023
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
ICLR 2023
ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes
ICCV 2023
X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events
ICCV 2023
Diffusion-Based Generation, Optimization, and Planning in 3D Scenes
CVPR 2023
Learning from the Tangram to Solve Mini Visual Tasks
AAAI 2022
Learning V1 Simple Cells with Vector Representation of Local Content and Matrix Representation of Local Motion
AAAI 2022
ValueNet: A New Dataset for Human Value Driven Dialogue System
AAAI 2022
MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control
NIPS 2022
Emergent Graphical Conventions in a Visual Communication Game
NIPS 2022
Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning
NIPS 2022
EgoTaskQA: Understanding Human Tasks in Egocentric Videos
NIPS 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
NIPS 2022
Latent Diffusion Energy-Based Model for Interpretable Text Modelling
ICML 2022
COAT: Measuring Object Compositionality in Emergent Representations
ICML 2022
Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning
ECCV 2022
RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
ICLR 2022
MCMC Should Mix: Learning Energy-Based Model with Neural Transport Latent Space MCMC
ICLR 2022
Learning Probabilistic Models from Generator Latent Spaces with Hat EBM
NIPS 2022
GRICE: A Grammar-based Dataset for Recovering Implicature and Conversational rEasoning
IJCNLP 2021
Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
IJCNLP 2021
SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues
IJCNLP 2021
Generative PointNet: Deep Energy-Based Learning on Unordered Point Sets for 3D Generation, Reconstruction and Classification
CVPR 2021
ACRE: Abstract Causal REasoning Beyond Covariation
CVPR 2021
Mind the Context: The Impact of Contextualization in Neural Module Networks for Grounding Visual Referring Expressions
EMNLP 2021
Learning Neural Representation of Camera Pose with Matrix Representation of Pose Shift via View Synthesis
CVPR 2021
Learning by Fixing: Solving Math Word Problems with Weak Supervision
AAAI 2021
Learning Cycle-Consistent Cooperative Networks via Alternating MCMC Teaching for Unsupervised Cross-Domain Translation
AAAI 2021
Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models
ICLR 2021
Spatio-Temporal Self-Supervised Representation Learning for 3D Point Clouds
ICCV 2021
YouRefIt: Embodied Reference Understanding With Language and Gesture
ICCV 2021
VLGrammar: Grounded Grammar Induction of Vision and Language
ICCV 2021
Unsupervised Foreground Extraction via Deep Region Competition
NIPS 2021
On Path Integration of Grid Cells: Group Representation and Isotropic Scaling
NIPS 2021
Iterative Teacher-Aware Learning
NIPS 2021
SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues
ACL 2021
Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
ACL 2021
GRICE: A Grammar-based Dataset for Recovering Implicature and Conversational rEasoning
ACL 2021
Robust Visual Reasoning via Language Guided Neural Module Networks
NIPS 2021
SMART: A Situation Model for Algebra Story Problems via Attributed Grammar
AAAI 2021
CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization
EMNLP 2021
Learning Triadic Belief Dynamics in Nonverbal Communication From Videos
CVPR 2021
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
CVPR 2021
CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines
AAAI 2020
On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models
AAAI 2020
Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning
ICML 2020
Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning
AAAI 2020
Structured Attention for Unsupervised Dialogue Structure Induction
EMNLP 2020
Motion-Based Generator Model: Unsupervised Disentanglement of Appearance, Trackable and Intrackable Motions in Dynamic Patterns
AAAI 2020
LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities
ECCV 2020
Learning Multi-layer Latent Variable Model via Variational Optimization of Short Run MCMC for Approximate Inference
ECCV 2020
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering
ECCV 2020
Theory-Based Causal Transfer:Integrating Instance-Level Induction and Abstract-Level Structure Learning
AAAI 2020
Words Arenβt Enough, Their Order Matters: On the Robustness of Grounding Visual Referring Expressions
ACL 2020
Joint Training of Variational Auto-Encoder and Latent Energy-Based Model
CVPR 2020
Learning Latent Space Energy-Based Prior Model
NIPS 2020
Inducing Hierarchical Compositional Model by Sparsifying Generator Network
CVPR 2020
DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare
ICCV 2019
Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
NIPS 2019
Learning Perceptual Inference by Contrasting
NIPS 2019
PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points
NIPS 2019
MetaStyle: Three-Way Trade-off among Speed, Flexibility, and Quality in Neural Style Transfer
AAAI 2019
Learning Dynamic Generator Model by Alternating Back-Propagation through Time
AAAI 2019
Mirroring without Overimitation: Learning Functionally Equivalent Manipulation Actions
AAAI 2019
Recognizing Unseen Attribute-Object Pair with Generative Model
AAAI 2019
RAVEN: A Dataset for Relational and Analogical Visual REasoNing
CVPR 2019
Reasoning Visual Dialogs With Structural and Partial Observations
CVPR 2019
Divergence Triangle for Joint Training of Generator Model, Energy-Based Model, and Inferential Model
CVPR 2019
Unsupervised Disentangling of Appearance and Geometry by Deformable Generator Network
CVPR 2019
Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning
ICCV 2019
Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense
ICCV 2019
Learning Grid Cells as Vector Representation of Self-Position Coupled with Matrix Representation of Self-Motion
ICLR 2019
Inferring Shared Attention in Social Scene Videos
CVPR 2018
Learning Descriptor Networks for 3D Shape Synthesis and Analysis
CVPR 2018
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation
NIPS 2018
Learning Human-Object Interactions by Graph Parsing Neural Networks
ECCV 2018
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
ECCV 2018
Learning Generative ConvNets via Multi-Grid Modeling and Sampling
CVPR 2018
Interpretable Convolutional Neural Networks
CVPR 2018
Where and Why Are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks
CVPR 2018
Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction
ICML 2018
A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects
CVPR 2018
Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification
CVPR 2018
Human-Centric Indoor Scene Synthesis Using Stochastic Grammar
CVPR 2018
Mining Object Parts From CNNs via Active Question-Answering
CVPR 2017
Single-Image 3D Scene Parsing Using Geometric Commonsense
IJCAI 2017
Inferring Human Attention by Learning Latent Intentions
IJCAI 2017
Predicting Human Activities Using Stochastic Grammar
ICCV 2017
Jointly Recognizing Object Fluents and Tasks in Egocentric Videos
ICCV 2017
Monocular 3D Human Pose Estimation by Predicting Depth on Joints
ICCV 2017
Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics
CORL 2017
Generative Hierarchical Learning of Sparse FRAME Models
CVPR 2017
Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet
CVPR 2017
CERN: Confidence-Energy Recurrent Network for Group Activity Recognition
CVPR 2017
What Is Where: Inferring Containment Relations from Videos
IJCAI 2016
Learning Social Affordance for Human-Robot Interaction
IJCAI 2016
A Theory of Generative ConvNet
ICML 2016
Inferring Forces and Learning Human Utilities From Videos
CVPR 2016
Multi-View People Tracking via Hierarchical Trajectory Composition
CVPR 2016
Recognizing Car Fluents From Video
CVPR 2016
Grounded Semantic Role Labeling
NAACL 2016
Jointly Learning Grounded Task Structures from Language Instruction and Visual Demonstration
EMNLP 2016
Mining And-Or Graphs for Graph Matching and Object Discovery
ICCV 2015
Attributed Grammars for Joint Estimation of Human Attributes, Part and Pose
ICCV 2015
Automated Facial Trait Judgment and Election Outcome Prediction: Social Dimensions of Face
ICCV 2015
Joint Action Recognition and Pose Estimation From Video
CVPR 2015
Learning Inhomogeneous FRAME Models for Object Patterns
CVPR 2014
Single-View 3D Scene Parsing by Attributed Grammar
CVPR 2014
Visual Persuasion: Inferring Communicative Intents of Images
CVPR 2014
Cross-view Action Modeling, Learning and Recognition
CVPR 2014
Unsupervised Learning of Dictionaries of Hierarchical Compositional Models
CVPR 2014
Modeling 4D Human-Object Interactions for Event and Object Recognition
ICCV 2013
Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
CVPR 2013
Scene Parsing by Integrating Function, Geometry and Appearance Models
CVPR 2013
Integrating Grammar and Segmentation for Human Pose Estimation
CVPR 2013
Concurrent Action Detection with Structural Prediction
ICCV 2013
Discriminatively Trained And-Or Tree Models for Object Detection
CVPR 2013
Weakly Supervised Learning for Attribute Localization in Outdoor Scenes
CVPR 2013
Inferring "Dark Matter" and "Dark Energy" from Videos
ICCV 2013
Cosegmentation and Cosketch by Unsupervised Learning
ICCV 2013
Unsupervised Structure Learning of Stochastic And-Or Grammars
NIPS 2013
Monte Carlo Tree Search for Scheduling Activity Recognition
ICCV 2013
Modeling Occlusion by Discriminative AND-OR Structures
ICCV 2013
Learning Near-Optimal Cost-Sensitive Decision Policy for Object Detection
ICCV 2013
Human Attribute Recognition by Rich Appearance Dictionary
ICCV 2013
Image Parsing with Stochastic Scene Grammar
NIPS 2011