Chuang Gan
152 papers · 2015–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (11) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π Conference Polyglot (12)
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Academic Marathon
(10)
π
Conference Loyalist
(30)
π
Keyword Trendsetter Combo
(3)
π
Grand Slam
π
Triple Crown
π€
Dynamic Duo
(29)
π±
Topic Pioneer
π¬
Deep Specialist
(26)
π§¬
Topic Evolution
π
Keyword Champion
(2)
β‘
Prolific Year
(27)
ποΈ
Keyword Collector
(463)
β
The Questioner
(2)
π
Century Club
(151)
π
Trend Setter
π₯
Unstoppable
(11)
π
Conference Pioneer
Conferences
ICLR (33)
NIPS (30)
CVPR (28)
ICML (17)
ICCV (14)
AAAI (9)
ECCV (7)
ACL (5)
CORL (4)
EMNLP (3)
IJCAI (1)
RSS (1)
Top co-authors
Research topics
Keywords
video understanding
(10)
action recognition
(9)
large language model
(9)
reinforcement learning
(8)
vision-language model
(7)
self-supervised learning
(7)
multimodal learning
(6)
visual question answering
(6)
convolutional neural network
(6)
multi-modal learning
(6)
neural network
(5)
question answering
(4)
unsupervised learning
(4)
weakly supervised learning
(4)
representation learning
(4)
transfer learning
(4)
diffusion model
(4)
visual reasoning
(4)
semantic segmentation
(4)
embodied ai
(4)
Papers
Tailored Primitive Initialization is the Secret Key to Reinforcement Learning
ACL 2026
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
ICLR 2025
TopoGaussian: Inferring Internal Topology Structures from Visual Clues
ICLR 2025
Learning 4D Embodied World Models
ICCV 2025
VCA: Video Curious Agent for Long Video Understanding
ICCV 2025
Articulate AnyMesh: Open-vocabulary 3D Articulated Objects Modeling
CORL 2025
RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text
ICCV 2025
DELTA: DENSE EFFICIENT LONG-RANGE 3D TRACKING FOR ANY VIDEO
ICLR 2025
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
ICLR 2025
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
ICML 2025
3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning
CVPR 2025
CommVQ: Commutative Vector Quantization for KV Cache Compression
ICML 2025
AdaWorld: Learning Adaptable World Models with Latent Actions
ICML 2025
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
CVPR 2025
UniMuMo: Unified Text, Music, and Motion Generation
AAAI 2025
Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training
ACL 2025
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
ICLR 2025
ABNet: Adaptive explicit-Barrier Net for Safe and Scalable Robot Learning
ICML 2025
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
ICLR 2024
UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments
CORL 2024
RoboDreamer: Learning Compositional World Models for Robot Imagination
ICML 2024
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos
ICML 2024
3D-VLA: A 3D Vision-Language-Action Generative World Model
ICML 2024
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
ICML 2024
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
ICML 2024
Visual Chain-of-Thought Prompting for Knowledge-Based Visual Reasoning
AAAI 2024
Speech Self-Supervised Learning Using Diffusion Model Synthetic Data
ICML 2024
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
NIPS 2024
Building Cooperative Embodied Agents Modularly with Large Language Models
ICLR 2024
DIFFTACTILE: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation
ICLR 2024
SALMON: Self-Alignment with Instructable Reward Models
ICLR 2024
Thin-Shell Object Manipulations With Differentiable Physics Simulations
ICLR 2024
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
CVPR 2024
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
CVPR 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
CVPR 2024
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
CVPR 2024
GENOME: Generative Neuro-Symbolic Visual Reasoning by Growing and Reusing Modules
ICLR 2024
FlexAttention for Efficient High-Resolution Vision-Language Models
ECCV 2024
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
ICLR 2024
SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
NIPS 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
NIPS 2024
Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge
NIPS 2024
Aligning Large Multimodal Models with Factually Augmented RLHF
ACL 2024
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting
NIPS 2024
Physically Compatible 3D Object Modeling from a Single Image
NIPS 2024
EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction
ICCV 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
NIPS 2023
3D-LLM: Injecting the 3D World into Large Language Models
NIPS 2023
DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics
NIPS 2023
Adaptive Online Replanning with Diffusion Models
NIPS 2023
DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models
NIPS 2023
Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties
NIPS 2023
JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions
ACL 2023
On the Forward Invariance of Neural ODEs
ICML 2023
Learning Neural Constitutive Laws from Motion Observations for Generalizable PDE Dynamics
ICML 2023
Reparameterized Policy Learning for Multimodal Trajectory Optimization
ICML 2023
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
CVPR 2023
Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention
CVPR 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos
CVPR 2023
3D Concept Learning and Reasoning From Multi-View Images
CVPR 2023
EC2: Emergent Communication for Embodied Control
CVPR 2023
Masked Motion Encoding for Self-Supervised Video Representation Learning
CVPR 2023
Learning Situation Hyper-Graphs for Video Question Answering
CVPR 2023
Sparse Universal Transformer
EMNLP 2023
PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification
ICLR 2023
FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation
ICLR 2023
SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments
ICLR 2023
Planning with Large Language Models for Code Generation
ICLR 2023
DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics
ICLR 2023
Hyper-Decision Transformer for Efficient Online Policy Adaptation
ICLR 2023
RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects
RSS 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
ICCV 2023
Learning Vision-and-Language Navigation from YouTube Videos
ICCV 2023
3D Concept Grounding on Neural Fields
NIPS 2022
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
ICLR 2022
Network Augmentation for Tiny Deep Learning
ICLR 2022
SNAKE: Shape-aware Neural 3D Keypoint Field
NIPS 2022
Learning Neural Acoustic Fields
NIPS 2022
DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools
ICLR 2022
Prompting Decision Transformer for Few-Shot Policy Generalization
ICML 2022
Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following
CORL 2022
Planning with Spatial-Temporal Abstraction from Point Clouds for Deformable Object Manipulation
CORL 2022
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
CVPR 2022
Finding Fallen Objects via Asynchronous Audio-Visual Integration
CVPR 2022
AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation
CVPR 2022
Prototype-Guided Continual Adaptation for Class-Incremental Unsupervised Domain Adaptation
ECCV 2022
Weakly Supervised Grounding for VQA in Vision-Language Transformers
ECCV 2022
FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations
ICLR 2022
RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation
ICLR 2022
Revisiting the Roles of βTextβ in Text Games
EMNLP 2022
Linking Emergent and Natural Languages via Corpus Transfer
ICLR 2022
Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics
ICLR 2022
Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation
NIPS 2022
Learning Active Camera for Multi-Object Navigation
NIPS 2022
Learning Physical Dynamics with Subequivariant Graph Neural Networks
NIPS 2022
On-Device Training Under 256KB Memory
NIPS 2022
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
AAAI 2021
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
NIPS 2021
Memory-efficient Patch-based Inference for Tiny Deep Learning
NIPS 2021
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
NIPS 2021
When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
NIPS 2021
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
AAAI 2021
Augmenting Policy Learning with Routines Discovered from a Single Demonstration
AAAI 2021
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
CVPR 2021
Curious Representation Learning for Embodied Intelligence
ICCV 2021
Learning Task Decomposition with Ordered Memory Policy Network
ICLR 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
ICLR 2021
On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning
ICLR 2021
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics
ICLR 2021
Adversarial Option-Aware Hierarchical Imitation Learning
ICML 2021
Global Prosody Style Transfer Without Text Transcriptions
ICML 2021
AGENT: A Benchmark for Core Psychological Reasoning
ICML 2021
Temporal and Object Quantification Networks
IJCAI 2021
Location-Aware Graph Convolutional Networks for Video Question Answering
AAAI 2020
Dense Regression Network for Video Grounding
CVPR 2020
Music Gesture for Visual Sound Separation
CVPR 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
ACL 2020
Deep Audio Priors Emerge From Harmonic Convolutional Networks
ICLR 2020
Once-for-All: Train One Network and Specialize it for Efficient Deployment
ICLR 2020
TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning
NIPS 2020
Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning
EMNLP 2020
Foley Music: Learning to Generate Music from Videos
ECCV 2020
DataMix: Efficient Privacy-Preserving Edge-Cloud Inference
ECCV 2020
MCUNet: Tiny Deep Learning on IoT Devices
NIPS 2020
Defensive Quantization: When Efficiency Meets Robustness
ICLR 2019
TSM: Temporal Shift Module for Efficient Video Understanding
ICCV 2019
Self-Supervised Moving Vehicle Tracking With Stereo Sound
ICCV 2019
The Sound of Motions
ICCV 2019
Cross-channel Communication Networks
NIPS 2019
Visual Concept-Metaconcept Learning
NIPS 2019
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
AAAI 2019
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
AAAI 2019
Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation
AAAI 2019
Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement
NIPS 2019
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
ICLR 2019
Graph Convolutional Networks for Temporal Action Localization
ICCV 2019
End-to-End Learning of Motion Representation for Video Understanding
CVPR 2018
Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning
CVPR 2018
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
NIPS 2018
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
CVPR 2018
Weakly Supervised Dense Event Captioning in Videos
NIPS 2018
Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
ECCV 2018
Sparse, Smart Contours to Represent and Edit Images
CVPR 2018
The Sound of Pixels
ECCV 2018
StyleNet: Generating Attractive Visual Captions With Styles
CVPR 2017
Recurrent Topic-Transition GAN for Visual Paragraph Generation
ICCV 2017
Semantic Compositional Networks for Visual Captioning
CVPR 2017
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
ICCV 2017
You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images
CVPR 2016
Learning Attributes Equals Multi-Source Domain Generalization
CVPR 2016
DevNet: A Deep Event Network for Multimedia Event Detection and Evidence Recounting
CVPR 2015
Automatic Concept Discovery From Parallel Text and Visual Corpora
ICCV 2015