Jiajun Wu
201 papers · 2013–2026 · 16 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
π Academic Marathon (12) π Conference Polyglot (15) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (9)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π
Renaissance Researcher
(12)
π
Conference Loyalist
(40)
π
Keyword Trendsetter Combo
(8)
π€
Dynamic Duo
(30)
π
Triple Crown
π
Grand Slam
π₯
Mega-Team
(98)
π±
Topic Pioneer
π¬
Deep Specialist
(40)
π§¬
Topic Evolution
π
Keyword Champion
(18)
π
Trend Setter
β‘
Prolific Year
(13)
π
Conference Pioneer
π₯
Unstoppable
(13)
β
The Questioner
(5)
π
Century Club
(196)
ποΈ
Keyword Collector
(671)
Conferences
CVPR (50)
NIPS (40)
ICLR (27)
CORL (26)
ICCV (14)
ECCV (11)
AAAI (9)
ICML (7)
RSS (5)
IJCAI (3)
ACL (2)
L4DC (2)
WACV (2)
IJCNLP (1)
MIDL (1)
UAI (1)
Top co-authors
Keywords
3d reconstruction
(20)
scene understanding
(18)
multimodal learning
(13)
3d vision
(11)
diffusion model
(10)
generative model
(9)
vision-language model
(9)
self-supervised learning
(8)
video understanding
(8)
pose estimation
(7)
visual reasoning
(7)
representation learning
(6)
embodied ai
(6)
neural network
(6)
graph neural network
(6)
zero-shot learning
(5)
computer vision
(5)
reinforcement learning
(5)
point cloud
(5)
neural rendering
(5)
Papers
Discovering Hybrid World Representations with Co-Evolving Foundation Models
AAAI 2026
OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Models
ACL 2026
A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding
MIDL 2026
10 Open Challenges Steering the Future of Vision-Language-Action Models
AAAI 2026
LLMC+: Benchmarking Vision-Language Model Compression with a plug-and-play Toolkit
AAAI 2026
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
CVPR 2025
Birth and Death of a Rose
CVPR 2025
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
CVPR 2025
Understanding Complexity in VideoQA via Visual Program Generation
ICML 2025
Lifting Motion to the 3D World via 2D Diffusion
CVPR 2025
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video
CVPR 2025
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
CVPR 2025
Category-Agnostic Neural Object Rigging
CVPR 2025
WorldScore: A Unified Evaluation Benchmark for World Generation
ICCV 2025
HVAdam: A Full-Dimension Adaptive Optimizer
AAAI 2025
WonderWorld: Interactive 3D Scene Generation from a Single Image
CVPR 2025
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset
CVPR 2025
Range, not Independence, Drives Modularity in Biologically Inspired Representations
ICLR 2025
Predicate Hierarchies Improve Few-Shot State Classification
ICLR 2025
What Makes a Maze Look Like a Maze?
ICLR 2025
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
ICML 2025
Re-thinking Temporal Search for Long-Form Video Understanding
CVPR 2025
PGC: Physics-Based Gaussian Cloth from a Single Pose
CVPR 2025
CRAFT: Designing Creative and Functional 3D Objects
WACV 2025
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
ICCV 2025
Weakly-Supervised Learning of Dense Functional Correspondences
ICCV 2025
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
ICCV 2025
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
ICCV 2025
DexSkin: High-Coverage Conformable Robotic Skin for Learning Contact-Rich Manipulation
CORL 2025
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
CORL 2025
TWIST: Teleoperated Whole-Body Imitation System
CORL 2025
Learning Planning Abstractions from Language
ICLR 2024
Efficient imitation learning with conservative world models
L4DC 2024
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
RSS 2024
HourVideo: 1-Hour Video-Language Understanding
NIPS 2024
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
NIPS 2024
FactorSim: Generative Simulation via Factorized Representation
NIPS 2024
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
NIPS 2024
Streaming Detection of Queried Event Start
NIPS 2024
MARPLE: A Benchmark for Long-Horizon Inference
NIPS 2024
CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series
AAAI 2024
Controllable Human-Object Interaction Synthesis
ECCV 2024
Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
ECCV 2024
Physics-Based Interaction with 3D Objects via Video Generation
ECCV 2024
3D Congealing: 3D-Aware Image Alignment in the Wild
ECCV 2024
Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
ECCV 2024
Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning
ICML 2024
Evaluating Real-World Robot Manipulation Policies in Simulation
CORL 2024
Automated Creation of Digital Cousins for Robust Policy Learning
CORL 2024
D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement
CORL 2024
RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing
RSS 2024
View-Invariant Policy Learning via Zero-Shot Novel View Synthesis
CORL 2024
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
CVPR 2024
Hearing Anything Anywhere
CVPR 2024
Learning the 3D Fauna of the Web
CVPR 2024
Holodeck: Language Guided Generation of 3D Embodied AI Environments
CVPR 2024
WonderJourney: Going from Anywhere to Everywhere
CVPR 2024
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
CVPR 2024
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
CVPR 2024
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
CVPR 2024
Patched Denoising Diffusion Models For High-Resolution Image Synthesis
ICLR 2024
Neural Polynomial Gabor Fields for Macro Motion Analysis
ICLR 2024
Language-Informed Visual Concept Learning
ICLR 2024
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
AAAI 2024
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
CORL 2024
Learning Compositional Behaviors from Demonstration and Language
CORL 2024
Rendering Humans from Object-Occluded Monocular Videos
ICCV 2023
Learning Vortex Dynamics for Fluid Inference and Prediction
ICLR 2023
An Extensible Multi-modal Multi-task Object Dataset with Materials
ICLR 2023
MaskViT: Masked Visual Pre-Training for Video Prediction
ICLR 2023
Programmatically Grounded, Compositionally Generalizable Robotic Manipulation
ICLR 2023
Model-Based Control with Sparse Neural Dynamics
NIPS 2023
3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection
NIPS 2023
Whatβs Left? Concept Grounding with Logic-Enhanced Foundation Models
NIPS 2023
Siamese Masked Autoencoders
NIPS 2023
Are These the Same Apple? Comparing Images Based on Object Intrinsics
NIPS 2023
Disentanglement via Latent Quantization
NIPS 2023
Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark
NIPS 2023
SoundCam: A Dataset for Finding Humans Using Room Acoustics
NIPS 2023
Inferring Hybrid Neural Fluid Fields from Videos
NIPS 2023
Holistic Evaluation of Text-to-Image Models
NIPS 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
CORL 2023
RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools
CORL 2023
Learning to Design and Use Tools for Robotic Manipulation
CORL 2023
Learning Sequential Acquisition Policies for Robot-Assisted Feeding
CORL 2023
Composable Part-Based Manipulation
CORL 2023
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities
CORL 2023
Compositional Diffusion-Based Continuous Constraint Solvers
CORL 2023
Learning Rational Subgoals from Demonstrations and Instructions
AAAI 2023
Learning to See the Physical World
AAAI 2023
Benchmarking Rigid Body Contact Models
L4DC 2023
Ego-Body Pose Estimation via Ego-Head Pose Estimation
CVPR 2023
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
CVPR 2023
Multi-Object Manipulation via Object-Centric Neural Scattering Functions
CVPR 2023
Seeing a Rose in Five Thousand Ways
CVPR 2023
Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes
CVPR 2023
3D Neural Field Generation Using Triplane Diffusion
CVPR 2023
RealImpact: A Dataset of Impact Sound Fields for Real Objects
CVPR 2023
Accidental Light Probes
CVPR 2023
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
CVPR 2023
The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects
CVPR 2023
CIRCLE: Capture in Rich Contextual Environments
CVPR 2023
PyPose: A Library for Robot Learning With Physics-Based Optimization
CVPR 2023
Modeling Dynamic Environments with Scene Graph Memory
ICML 2023
Motion Question Answering via Modular Motion Programs
ICML 2023
VQ3D: Learning a 3D-Aware Generative Model on ImageNet
ICCV 2023
Tree-Structured Shading Decomposition
ICCV 2023
A Control-Centric Benchmark for Video Prediction
ICLR 2023
Physically Plausible Animation of Human Upper Body From a Single Image
WACV 2023
Dynamic-Resolution Model Learning for Object Pile Manipulation
RSS 2023
Programmatic Concept Learning for Human Motion Description and Synthesis
CVPR 2022
Vision-Based Manipulators Need to Also See from Their Hands
ICLR 2022
MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing
NIPS 2022
CLEVRER-Humans: Describing Physical and Causal Events the Human Way
NIPS 2022
E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance
NIPS 2022
Geoclidean: Few-Shot Generalization in Euclidean Geometry
NIPS 2022
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
CORL 2022
A Dual Representation Framework for Robot Learning with Human Guidance
CORL 2022
BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation
CORL 2022
Rotationally Equivariant 3D Object Detection
CVPR 2022
Unsupervised Discovery of Object Radiance Fields
ICLR 2022
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
ICLR 2022
Unsupervised Learning of Shape Programs with Repeatable Implicit Parts
NIPS 2022
Revisiting the "Video" in Video-Language Understanding
CVPR 2022
IKEA-Manual: Seeing Shape Assembly Step by Step
NIPS 2022
Interaction Modeling with Multiplex Attention
NIPS 2022
Video Extrapolation in Space and Time
ECCV 2022
Unsupervised Segmentation in Real-World Images via Spelke Object Inference
ECCV 2022
Translating a Visual LEGO Manual to a Machine-Executable Plan
ECCV 2022
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
CVPR 2022
RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks
RSS 2022
Grammar-Based Grounded Lexicon Learning
NIPS 2021
Hierarchical Motion Understanding via Motion Programs
CVPR 2021
Temporal and Object Quantification Networks
IJCAI 2021
Language-Mediated, Object-Centric Representation Learning
IJCNLP 2021
Language-Mediated, Object-Centric Representation Learning
ACL 2021
Augmenting Policy Learning with Routines Discovered from a Single Demonstration
AAAI 2021
DiffImpact: Differentiable Rendering and Identification of Impact Sounds
CORL 2021
Single-Shot Scene Reconstruction
CORL 2021
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments
CORL 2021
ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations
CORL 2021
iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks
CORL 2021
Neural Radiance Flow for 4D View Synthesis and Video Processing
ICCV 2021
3D Shape Generation and Completion Through Point-Voxel Diffusion
ICCV 2021
Learning Temporal Dynamics From Cycles in Narrated Video
ICCV 2021
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
ICLR 2021
Unsupervised Discovery of 3D Physical Objects from Video
ICLR 2021
Repopulating Street Scenes
CVPR 2021
When is particle filtering efficient for planning in partially observed linear dynamical systems?
UAI 2021
De-Rendering the World's Revolutionary Artefacts
CVPR 2021
Pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
CVPR 2021
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control
CVPR 2021
Probabilistic Video Prediction From Noisy Data With a Posterior Confidence
CVPR 2020
Multi-Plane Program Induction with 3D Box Priors
NIPS 2020
Learning Physical Graph Representations from Visual Scenes
NIPS 2020
DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs
IJCAI 2020
Deep Audio Priors Emerge From Harmonic Convolutional Networks
ICLR 2020
Perspective Plane Program Induction From a Single Image
CVPR 2020
Visual Grounding of Learned Physical Models
ICML 2020
Learning Compositional Koopman Operators for Model-Based Control
ICLR 2020
Learning 3D Dynamic Scene Representations for Robot Manipulation
CORL 2020
End-to-End Optimization of Scene Layout
CVPR 2020
CLEVRER: Collision Events for Video Representation and Reasoning
ICLR 2020
Neurally-Guided Structure Inference
ICML 2019
Visual Concept-Metaconcept Learning
NIPS 2019
Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations
NIPS 2019
Entity Abstraction in Visual Model-Based Reinforcement Learning
CORL 2019
Program-Guided Image Manipulators
ICCV 2019
Stochastic Prediction of Multi-Agent Interactions from Partial Observations
ICLR 2019
Learning to Infer and Execute 3D Shape Programs
ICLR 2019
Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids
ICLR 2019
Reasoning About Physical Interactions with Object-Oriented Prediction and Planning
ICLR 2019
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
ICLR 2019
Learning to Describe Scenes with Programs
ICLR 2019
Unsupervised Discovery of Parts, Structure, and Dynamics
ICLR 2019
DensePhysNet: Learning Dense Physical Object Representations Via Multi-Step Dynamic Interactions
RSS 2019
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
CVPR 2018
Learning Shape Priors for Single-View 3D Completion and Reconstruction
ECCV 2018
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
CVPR 2018
Seeing Tree Structure from Vibration
ECCV 2018
Visual Object Networks: Image Generation with Disentangled 3D Representations
NIPS 2018
3D-Aware Scene Manipulation via Inverse Graphics
NIPS 2018
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
NIPS 2018
Learning to Exploit Stability for 3D Scene Parsing
NIPS 2018
Learning to Reconstruct Shapes from Unseen Classes
NIPS 2018
Physical Primitive Decomposition
ECCV 2018
Neural Scene De-Rendering
CVPR 2017
Generative Modeling of Audible Shapes for Object Perception
ICCV 2017
Cake Cutting: Envy and Truth
IJCAI 2017
Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks
CVPR 2017
Shape and Material from Sound
NIPS 2017
Learning to See Physics via Visual De-animation
NIPS 2017
MarrNet: 3D Shape Reconstruction via 2.5D Sketches
NIPS 2017
Raster-To-Vector: Revisiting Floorplan Transformation
ICCV 2017
Self-Supervised Intrinsic Image Decomposition
NIPS 2017
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
NIPS 2016
Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
NIPS 2016
Deep Multiple Instance Learning for Image Classification and Auto-Annotation
CVPR 2015
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
NIPS 2015
MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation
CVPR 2014
Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
CVPR 2013