Chuang Gan

152 papers · 2015–2026 · 12 conferences · across top CS/AI conferences

Achievements

+19 more ↓

🗺️ Taxonomy Completionist (11) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (12)

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (10) 🏠 Conference Loyalist (30) 🌟 Keyword Trendsetter Combo (3) 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (29) 🌱 Topic Pioneer 🔬 Deep Specialist (26) 🧬 Topic Evolution 🏆 Keyword Champion (2) ⚡ Prolific Year (27) 🗃️ Keyword Collector (463) ❓ The Questioner (2) 💎 Century Club (151) 📈 Trend Setter 🔥 Unstoppable (11) 🚀 Conference Pioneer

Conferences

ICLR (33) NIPS (30) CVPR (28) ICML (17) ICCV (14) AAAI (9) ECCV (7) ACL (5) CORL (4) EMNLP (3) IJCAI (1) RSS (1)

Top co-authors

Joshua B. Tenenbaum (29) Zhenfang Chen (22) Peihao Chen (17) Yikang Shen (16) Yilun Du (15) Antonio Torralba (14) Yining Hong (11) Song Han (11) Wenbing Huang (11) Mingkui Tan (10)

Research topics

Synthesis (1)

Keywords

video understanding (10) action recognition (9) large language model (9) reinforcement learning (8) vision-language model (7) self-supervised learning (7) multimodal learning (6) visual question answering (6) convolutional neural network (6) multi-modal learning (6) neural network (5) question answering (4) unsupervised learning (4) weakly supervised learning (4) representation learning (4) transfer learning (4) diffusion model (4) visual reasoning (4) semantic segmentation (4) embodied ai (4)

Papers

Tailored Primitive Initialization is the Secret Key to Reinforcement Learning ACL 2026 Scaling Autonomous Agents via Automatic Reward Modeling And Planning ICLR 2025 TopoGaussian: Inferring Internal Topology Structures from Visual Clues ICLR 2025 Learning 4D Embodied World Models ICCV 2025 VCA: Video Curious Agent for Long Video Understanding ICCV 2025 Articulate AnyMesh: Open-vocabulary 3D Articulated Objects Modeling CORL 2025 RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text ICCV 2025 DELTA: DENSE EFFICIENT LONG-RANGE 3D TRACKING FOR ANY VIDEO ICLR 2025 SafeDiffuser: Safe Planning with Diffusion Probabilistic Models ICLR 2025 Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search ICML 2025 3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning CVPR 2025 CommVQ: Commutative Vector Quantization for KV Cache Compression ICML 2025 AdaWorld: Learning Adaptable World Models with Latent Actions ICML 2025 LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences CVPR 2025 UniMuMo: Unified Text, Music, and Motion Generation AAAI 2025 Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training ACL 2025 COMBO: Compositional World Models for Embodied Multi-Agent Cooperation ICLR 2025 ABNet: Adaptive explicit-Barrier Net for Safe and Scalable Robot Learning ICML 2025 HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments ICLR 2024 UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments CORL 2024 RoboDreamer: Learning Compositional World Models for Robot Imagination ICML 2024 ContPhy: Continuum Physical Concept Learning and Reasoning from Videos ICML 2024 3D-VLA: A 3D Vision-Language-Action Generative World Model ICML 2024 RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation ICML 2024 LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery ICML 2024 Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning AAAI 2024 Speech Self-Supervised Learning Using Diffusion Model Synthetic Data ICML 2024 Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision NIPS 2024 Building Cooperative Embodied Agents Modularly with Large Language Models ICLR 2024 DIFFTACTILE: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation ICLR 2024 SALMON: Self-Alignment with Instructable Reward Models ICLR 2024 Thin-Shell Object Manipulations With Differentiable Physics Simulations ICLR 2024 Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance CVPR 2024 MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World CVPR 2024 SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge CVPR 2024 RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation CVPR 2024 GENOME: Generative Neuro-Symbolic Visual Reasoning by Growing and Reusing Modules ICLR 2024 FlexAttention for Efficient High-Resolution Vision-Language Models ECCV 2024 CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding ICLR 2024 SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization NIPS 2024 ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs NIPS 2024 Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge NIPS 2024 Aligning Large Multimodal Models with Factually Augmented RLHF ACL 2024 Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting NIPS 2024 Physically Compatible 3D Object Modeling from a Single Image NIPS 2024 EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction ICCV 2023 Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision NIPS 2023 3D-LLM: Injecting the 3D World into Large Language Models NIPS 2023 DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics NIPS 2023 Adaptive Online Replanning with Diffusion Models NIPS 2023 DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models NIPS 2023 Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties NIPS 2023 JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions ACL 2023 On the Forward Invariance of Neural ODEs ICML 2023 Learning Neural Constitutive Laws from Motion Observations for Generalizable PDE Dynamics ICML 2023 Reparameterized Policy Learning for Multimodal Trajectory Optimization ICML 2023 Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners CVPR 2023 Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention CVPR 2023 Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos CVPR 2023 3D Concept Learning and Reasoning From Multi-View Images CVPR 2023 EC2: Emergent Communication for Embodied Control CVPR 2023 Masked Motion Encoding for Self-Supervised Video Representation Learning CVPR 2023 Learning Situation Hyper-Graphs for Video Question Answering CVPR 2023 Sparse Universal Transformer EMNLP 2023 PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification ICLR 2023 FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation ICLR 2023 SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments ICLR 2023 Planning with Large Language Models for Code Generation ICLR 2023 DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics ICLR 2023 Hyper-Decision Transformer for Efficient Online Policy Adaptation ICLR 2023 RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects RSS 2023 TextPSG: Panoptic Scene Graph Generation from Textual Descriptions ICCV 2023 Learning Vision-and-Language Navigation from YouTube Videos ICCV 2023 3D Concept Grounding on Neural Fields NIPS 2022 ComPhy: Compositional Physical Reasoning of Objects and Events from Videos ICLR 2022 Network Augmentation for Tiny Deep Learning ICLR 2022 SNAKE: Shape-aware Neural 3D Keypoint Field NIPS 2022 Learning Neural Acoustic Fields NIPS 2022 DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools ICLR 2022 Prompting Decision Transformer for Few-Shot Policy Generalization ICML 2022 Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following CORL 2022 Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation CORL 2022 Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction CVPR 2022 Finding Fallen Objects via Asynchronous Audio-Visual Integration CVPR 2022 AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation CVPR 2022 Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation ECCV 2022 Weakly Supervised Grounding for VQA in Vision-Language Transformers ECCV 2022 FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations ICLR 2022 RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation ICLR 2022 Revisiting the Roles of “Text” in Text Games EMNLP 2022 Linking Emergent and Natural Languages via Corpus Transfer ICLR 2022 Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics ICLR 2022 Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation NIPS 2022 Learning Active Camera for Multi-Object Navigation NIPS 2022 Learning Physical Dynamics with Subequivariant Graph Neural Networks NIPS 2022 On-Device Training Under 256KB Memory NIPS 2022 RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning AAAI 2021 Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language NIPS 2021 Memory-efficient Patch-based Inference for Tiny Deep Learning NIPS 2021 PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning NIPS 2021 When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning? NIPS 2021 MVFNet: Multi-View Fusion Network for Efficient Video Recognition AAAI 2021 Augmenting Policy Learning with Routines Discovered from a Single Demonstration AAAI 2021 Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules CVPR 2021 Curious Representation Learning for Embodied Intelligence ICCV 2021 Learning Task Decomposition with Ordered Memory Policy Network ICLR 2021 Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning ICLR 2021 On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning ICLR 2021 PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics ICLR 2021 Adversarial Option-Aware Hierarchical Imitation Learning ICML 2021 Global Prosody Style Transfer Without Text Transcriptions ICML 2021 AGENT: A Benchmark for Core Psychological Reasoning ICML 2021 Temporal and Object Quantification Networks IJCAI 2021 Location-Aware Graph Convolutional Networks for Video Question Answering AAAI 2020 Dense Regression Network for Video Grounding CVPR 2020 Music Gesture for Visual Sound Separation CVPR 2020 HAT: Hardware-Aware Transformers for Efficient Natural Language Processing ACL 2020 Deep Audio Priors Emerge From Harmonic Convolutional Networks ICLR 2020 Once-for-All: Train One Network and Specialize it for Efficient Deployment ICLR 2020 TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning NIPS 2020 Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning EMNLP 2020 Foley Music: Learning to Generate Music from Videos ECCV 2020 DataMix: Efficient Privacy-Preserving Edge-Cloud Inference ECCV 2020 MCUNet: Tiny Deep Learning on IoT Devices NIPS 2020 Defensive Quantization: When Efficiency Meets Robustness ICLR 2019 TSM: Temporal Shift Module for Efficient Video Understanding ICCV 2019 Self-Supervised Moving Vehicle Tracking With Stereo Sound ICCV 2019 The Sound of Motions ICCV 2019 Cross-channel Communication Networks NIPS 2019 Visual Concept-Metaconcept Learning NIPS 2019 Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering AAAI 2019 StNet: Local and Global Spatial-Temporal Modeling for Action Recognition AAAI 2019 Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation AAAI 2019 Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement NIPS 2019 The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision ICLR 2019 Graph Convolutional Networks for Temporal Action Localization ICCV 2019 End-to-End Learning of Motion Representation for Video Understanding CVPR 2018 Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning CVPR 2018 Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding NIPS 2018 Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification CVPR 2018 Weakly Supervised Dense Event Captioning in Videos NIPS 2018 Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency ECCV 2018 Sparse, Smart Contours to Represent and Edit Images CVPR 2018 The Sound of Pixels ECCV 2018 StyleNet: Generating Attractive Visual Captions With Styles CVPR 2017 Recurrent Topic-Transition GAN for Visual Paragraph Generation ICCV 2017 Semantic Compositional Networks for Visual Captioning CVPR 2017 VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation ICCV 2017 You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images CVPR 2016 Learning Attributes Equals Multi-Source Domain Generalization CVPR 2016 DevNet: A Deep Event Network for Multimedia Event Detection and Evidence Recounting CVPR 2015 Automatic Concept Discovery From Parallel Text and Visual Corpora ICCV 2015