Baoxiong Jia

31 papers · 2018–2025 · 8 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🏃 Academic Marathon (7) 🌍 Conference Polyglot (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (13)

🐝 Cross-Pollinator (13) 🌈 Renaissance Researcher (9) 🗺️ Taxonomy Completionist (42) 👥 Mega-Team (34) 🤝 Dynamic Duo (22) 🚀 Conference Pioneer 🔥 Unstoppable (8) 🗃️ Keyword Collector (114) 💎 Century Club (31) 📈 Trend Setter ⚡ Prolific Year (7)

Conferences

CVPR (10) ECCV (6) ICCV (4) NIPS (4) ICML (3) ICLR (2) CORL (1) RSS (1)

Top co-authors

Siyuan Huang (22) Song-chun Zhu (17) Yixin Zhu (11) Yixin Chen (9) Chi Zhang (6) Qing Li (6) Xiaojian Ma (5) Puhao Li (5) Yu Liu (4) Tengyu Liu (4)

Keywords

visual reasoning (4) sim-to-real transfer (3) embodied ai (3) question answering (3) scene understanding (3) action recognition (2) contrastive learning (2) scene synthesis (2) diffusion model (2) activity recognition (2) vision-language model (2) imitation learning (2) 3d scene understanding (2) spatial-temporal reasoning (2) visual grounding (2) video understanding (2) 3d scene generation (2) probabilistic inference (1) object recognition (1) variational inference (1)

Papers

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting ICLR 2025 Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation CORL 2025 METASCENES: Towards Automated Replica Creation for Real-world 3D Scans CVPR 2025 Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding CVPR 2025 Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis CVPR 2025 MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes CVPR 2025 RoboVerse: A Unified Platform, Benchmark and Dataset for Scalable and Generalizable Robot Learning RSS 2025 GWM: Towards Scalable Gaussian World Models for Robotic Manipulation ICCV 2025 Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation ICCV 2025 PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI CVPR 2024 Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance CVPR 2024 Unifying 3D Vision-Language Understanding via Promptable Queries ECCV 2024 An Embodied Generalist Agent in 3D World ICML 2024 SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields ECCV 2024 Multi-modal Situated Reasoning in 3D Scenes NIPS 2024 SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding ECCV 2024 X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events ICCV 2023 Improving Object-centric Learning with Query Optimization ICLR 2023 ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab NIPS 2023 Diffusion-Based Generation, Optimization, and Planning in 3D Scenes CVPR 2023 ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes ICCV 2023 Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning ECCV 2022 Latent Diffusion Energy-Based Model for Interpretable Text Modelling ICML 2022 EgoTaskQA: Understanding Human Tasks in Egocentric Videos NIPS 2022 Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution CVPR 2021 ACRE: Abstract Causal REasoning Beyond Covariation CVPR 2021 LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities ECCV 2020 Learning Perceptual Inference by Contrasting NIPS 2019 RAVEN: A Dataset for Relational and Analogical Visual REasoNing CVPR 2019 Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction ICML 2018 Learning Human-Object Interactions by Graph Parsing Neural Networks ECCV 2018