Jiajun Wu

201 papers · 2013–2026 · 16 conferences · across top CS/AI conferences

Achievements

+20 more ↓

🏃 Academic Marathon (12) 🌍 Conference Polyglot (15) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (9)

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (12) 🏠 Conference Loyalist (40) 🌟 Keyword Trendsetter Combo (8) 🤝 Dynamic Duo (30) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (98) 🌱 Topic Pioneer 🔬 Deep Specialist (40) 🧬 Topic Evolution 🏆 Keyword Champion (18) 📈 Trend Setter ⚡ Prolific Year (13) 🚀 Conference Pioneer 🔥 Unstoppable (13) ❓ The Questioner (5) 💎 Century Club (196) 🗃️ Keyword Collector (671)

Conferences

CVPR (50) NIPS (40) ICLR (27) CORL (26) ICCV (14) ECCV (11) AAAI (9) ICML (7) RSS (5) IJCAI (3) ACL (2) L4DC (2) WACV (2) IJCNLP (1) MIDL (1) UAI (1)

Top co-authors

Joshua B. Tenenbaum (30) Li Fei-fei (27) Jiayuan Mao (27) Hong-Xing Yu (23) Yunzhi Zhang (18) Josh Tenenbaum (17) Yunzhu Li (16) William T. Freeman (15) Ruohan Zhang (14) Joy Hsu (12)

Keywords

3d reconstruction (20) scene understanding (18) multimodal learning (13) 3d vision (11) diffusion model (10) generative model (9) vision-language model (9) self-supervised learning (8) video understanding (8) pose estimation (7) visual reasoning (7) representation learning (6) embodied ai (6) neural network (6) graph neural network (6) zero-shot learning (5) computer vision (5) reinforcement learning (5) point cloud (5) neural rendering (5)

Papers

Discovering Hybrid World Representations with Co-Evolving Foundation Models AAAI 2026 OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Models ACL 2026 A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding MIDL 2026 10 Open Challenges Steering the Future of Vision-Language-Action Models AAAI 2026 LLMC+: Benchmarking Vision-Language Model Compression with a plug-and-play Toolkit AAAI 2026 LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models CVPR 2025 Birth and Death of a Rose CVPR 2025 Diffusion Self-Distillation for Zero-Shot Customized Image Generation CVPR 2025 Understanding Complexity in VideoQA via Visual Program Generation ICML 2025 Lifting Motion to the 3D World via 2D Diffusion CVPR 2025 FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video CVPR 2025 The Scene Language: Representing Scenes with Programs, Words, and Embeddings CVPR 2025 Category-Agnostic Neural Object Rigging CVPR 2025 WorldScore: A Unified Evaluation Benchmark for World Generation ICCV 2025 HVAdam: A Full-Dimension Adaptive Optimizer AAAI 2025 WonderWorld: Interactive 3D Scene Generation from a Single Image CVPR 2025 Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset CVPR 2025 Range, not Independence, Drives Modularity in Biologically Inspired Representations ICLR 2025 Predicate Hierarchies Improve Few-Shot State Classification ICLR 2025 What Makes a Maze Look Like a Maze? ICLR 2025 Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas ICML 2025 Re-thinking Temporal Search for Long-Form Video Understanding CVPR 2025 PGC: Physics-Based Gaussian Cloth from a Single Pose CVPR 2025 CRAFT: Designing Creative and Functional 3D Objects WACV 2025 X-Capture: An Open-Source Portable Device for Multi-Sensory Learning ICCV 2025 Weakly-Supervised Learning of Dense Functional Correspondences ICCV 2025 WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions ICCV 2025 Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization ICCV 2025 DexSkin: High-Coverage Conformable Robotic Skin for Learning Contact-Rich Manipulation CORL 2025 BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities CORL 2025 TWIST: Teleoperated Whole-Body Imitation System CORL 2025 Learning Planning Abstractions from Language ICLR 2024 Efficient imitation learning with conservative world models L4DC 2024 DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset RSS 2024 HourVideo: 1-Hour Video-Language Understanding NIPS 2024 IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos NIPS 2024 FactorSim: Generative Simulation via Factorized Representation NIPS 2024 Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making NIPS 2024 Streaming Detection of Queried Event Start NIPS 2024 MARPLE: A Benchmark for Long-Horizon Inference NIPS 2024 CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series AAAI 2024 Controllable Human-Object Interaction Synthesis ECCV 2024 Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians ECCV 2024 Physics-Based Interaction with 3D Objects via Video Generation ECCV 2024 3D Congealing: 3D-Aware Image Alignment in the Wild ECCV 2024 Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos ECCV 2024 Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning ICML 2024 Evaluating Real-World Robot Manipulation Policies in Simulation CORL 2024 Automated Creation of Digital Cousins for Robust Policy Learning CORL 2024 D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Rearrangement CORL 2024 RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing RSS 2024 View-Invariant Policy Learning via Zero-Shot Novel View Synthesis CORL 2024 Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners CVPR 2024 Hearing Anything Anywhere CVPR 2024 Learning the 3D Fauna of the Web CVPR 2024 Holodeck: Language Guided Generation of 3D Embodied AI Environments CVPR 2024 WonderJourney: Going from Anywhere to Everywhere CVPR 2024 ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding CVPR 2024 ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image CVPR 2024 BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation CVPR 2024 Patched Denoising Diffusion Models For High-Resolution Image Synthesis ICLR 2024 Neural Polynomial Gabor Fields for Macro Motion Analysis ICLR 2024 Language-Informed Visual Concept Learning ICLR 2024 SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing AAAI 2024 TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction CORL 2024 Learning Compositional Behaviors from Demonstration and Language CORL 2024 Rendering Humans from Object-Occluded Monocular Videos ICCV 2023 Learning Vortex Dynamics for Fluid Inference and Prediction ICLR 2023 An Extensible Multi-modal Multi-task Object Dataset with Materials ICLR 2023 MaskViT: Masked Visual Pre-Training for Video Prediction ICLR 2023 Programmatically Grounded, Compositionally Generalizable Robotic Manipulation ICLR 2023 Model-Based Control with Sparse Neural Dynamics NIPS 2023 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection NIPS 2023 What’s Left? Concept Grounding with Logic-Enhanced Foundation Models NIPS 2023 Siamese Masked Autoencoders NIPS 2023 Are These the Same Apple? Comparing Images Based on Object Intrinsics NIPS 2023 Disentanglement via Latent Quantization NIPS 2023 Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark NIPS 2023 SoundCam: A Dataset for Finding Humans Using Room Acoustics NIPS 2023 Inferring Hybrid Neural Fluid Fields from Videos NIPS 2023 Holistic Evaluation of Text-to-Image Models NIPS 2023 VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models CORL 2023 RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools CORL 2023 Learning to Design and Use Tools for Robotic Manipulation CORL 2023 Learning Sequential Acquisition Policies for Robot-Assisted Feeding CORL 2023 Composable Part-Based Manipulation CORL 2023 NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities CORL 2023 Compositional Diffusion-Based Continuous Constraint Solvers CORL 2023 Learning Rational Subgoals from Demonstrations and Instructions AAAI 2023 Learning to See the Physical World AAAI 2023 Benchmarking Rigid Body Contact Models L4DC 2023 Ego-Body Pose Estimation via Ego-Head Pose Estimation CVPR 2023 NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations CVPR 2023 Multi-Object Manipulation via Object-Centric Neural Scattering Functions CVPR 2023 Seeing a Rose in Five Thousand Ways CVPR 2023 Putting People in Their Place: Affordance-Aware Human Insertion Into Scenes CVPR 2023 3D Neural Field Generation Using Triplane Diffusion CVPR 2023 RealImpact: A Dataset of Impact Sound Fields for Real Objects CVPR 2023 Accidental Light Probes CVPR 2023 ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding CVPR 2023 The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects CVPR 2023 CIRCLE: Capture in Rich Contextual Environments CVPR 2023 PyPose: A Library for Robot Learning With Physics-Based Optimization CVPR 2023 Modeling Dynamic Environments with Scene Graph Memory ICML 2023 Motion Question Answering via Modular Motion Programs ICML 2023 VQ3D: Learning a 3D-Aware Generative Model on ImageNet ICCV 2023 Tree-Structured Shading Decomposition ICCV 2023 A Control-Centric Benchmark for Video Prediction ICLR 2023 Physically Plausible Animation of Human Upper Body From a Single Image WACV 2023 Dynamic-Resolution Model Learning for Object Pile Manipulation RSS 2023 Programmatic Concept Learning for Human Motion Description and Synthesis CVPR 2022 Vision-Based Manipulators Need to Also See from Their Hands ICLR 2022 MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing NIPS 2022 CLEVRER-Humans: Describing Physical and Causal Events the Human Way NIPS 2022 E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance NIPS 2022 Geoclidean: Few-Shot Generalization in Euclidean Geometry NIPS 2022 See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation CORL 2022 A Dual Representation Framework for Robot Learning with Human Guidance CORL 2022 BEHAVIOR-1K: A Benchmark for Embodied AI with 1,000 Everyday Activities and Realistic Simulation CORL 2022 Rotationally Equivariant 3D Object Detection CVPR 2022 Unsupervised Discovery of Object Radiance Fields ICLR 2022 SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations ICLR 2022 Unsupervised Learning of Shape Programs with Repeatable Implicit Parts NIPS 2022 Revisiting the "Video" in Video-Language Understanding CVPR 2022 IKEA-Manual: Seeing Shape Assembly Step by Step NIPS 2022 Interaction Modeling with Multiplex Attention NIPS 2022 Video Extrapolation in Space and Time ECCV 2022 Unsupervised Segmentation in Real-World Images via Spelke Object Inference ECCV 2022 Translating a Visual LEGO Manual to a Machine-Executable Plan ECCV 2022 ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer CVPR 2022 RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks RSS 2022 Grammar-Based Grounded Lexicon Learning NIPS 2021 Hierarchical Motion Understanding via Motion Programs CVPR 2021 Temporal and Object Quantification Networks IJCAI 2021 Language-Mediated, Object-Centric Representation Learning IJCNLP 2021 Language-Mediated, Object-Centric Representation Learning ACL 2021 Augmenting Policy Learning with Routines Discovered from a Single Demonstration AAAI 2021 DiffImpact: Differentiable Rendering and Identification of Impact Sounds CORL 2021 Single-Shot Scene Reconstruction CORL 2021 BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments CORL 2021 ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations CORL 2021 iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks CORL 2021 Neural Radiance Flow for 4D View Synthesis and Video Processing ICCV 2021 3D Shape Generation and Completion Through Point-Voxel Diffusion ICCV 2021 Learning Temporal Dynamics From Cycles in Narrated Video ICCV 2021 Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning ICLR 2021 Unsupervised Discovery of 3D Physical Objects from Video ICLR 2021 Repopulating Street Scenes CVPR 2021 When is particle filtering efficient for planning in partially observed linear dynamical systems? UAI 2021 De-Rendering the World's Revolutionary Artefacts CVPR 2021 Pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis CVPR 2021 KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control CVPR 2021 Probabilistic Video Prediction From Noisy Data With a Posterior Confidence CVPR 2020 Multi-Plane Program Induction with 3D Box Priors NIPS 2020 Learning Physical Graph Representations from Visual Scenes NIPS 2020 DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs IJCAI 2020 Deep Audio Priors Emerge From Harmonic Convolutional Networks ICLR 2020 Perspective Plane Program Induction From a Single Image CVPR 2020 Visual Grounding of Learned Physical Models ICML 2020 Learning Compositional Koopman Operators for Model-Based Control ICLR 2020 Learning 3D Dynamic Scene Representations for Robot Manipulation CORL 2020 End-to-End Optimization of Scene Layout CVPR 2020 CLEVRER: Collision Events for Video Representation and Reasoning ICLR 2020 Neurally-Guided Structure Inference ICML 2019 Visual Concept-Metaconcept Learning NIPS 2019 Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations NIPS 2019 Entity Abstraction in Visual Model-Based Reinforcement Learning CORL 2019 Program-Guided Image Manipulators ICCV 2019 Stochastic Prediction of Multi-Agent Interactions from Partial Observations ICLR 2019 Learning to Infer and Execute 3D Shape Programs ICLR 2019 Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids ICLR 2019 Reasoning About Physical Interactions with Object-Oriented Prediction and Planning ICLR 2019 The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision ICLR 2019 Learning to Describe Scenes with Programs ICLR 2019 Unsupervised Discovery of Parts, Structure, and Dynamics ICLR 2019 DensePhysNet: Learning Dense Physical Object Representations Via Multi-Step Dynamic Interactions RSS 2019 Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification CVPR 2018 Learning Shape Priors for Single-View 3D Completion and Reconstruction ECCV 2018 Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling CVPR 2018 Seeing Tree Structure from Vibration ECCV 2018 Visual Object Networks: Image Generation with Disentangled 3D Representations NIPS 2018 3D-Aware Scene Manipulation via Inverse Graphics NIPS 2018 Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding NIPS 2018 Learning to Exploit Stability for 3D Scene Parsing NIPS 2018 Learning to Reconstruct Shapes from Unseen Classes NIPS 2018 Physical Primitive Decomposition ECCV 2018 Neural Scene De-Rendering CVPR 2017 Generative Modeling of Audible Shapes for Object Perception ICCV 2017 Cake Cutting: Envy and Truth IJCAI 2017 Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks CVPR 2017 Shape and Material from Sound NIPS 2017 Learning to See Physics via Visual De-animation NIPS 2017 MarrNet: 3D Shape Reconstruction via 2.5D Sketches NIPS 2017 Raster-To-Vector: Revisiting Floorplan Transformation ICCV 2017 Self-Supervised Intrinsic Image Decomposition NIPS 2017 Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks NIPS 2016 Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling NIPS 2016 Deep Multiple Instance Learning for Image Classification and Auto-Annotation CVPR 2015 Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning NIPS 2015 MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation CVPR 2014 Harvesting Mid-level Visual Concepts from Large-Scale Internet Images CVPR 2013