Tai WANG

32 papers · 2020–2026 · 10 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🐝 Cross-Pollinator (9) 🧭 Keyword Pioneer 🏃 Academic Marathon (5) 🌍 Conference Polyglot (8) 🌈 Renaissance Researcher (9)

🌈 Renaissance Researcher (9) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (43) 🏆 Keyword Champion 🔬 Deep Specialist (11) 🧬 Topic Evolution 🤝 Dynamic Duo (19) 🗃️ Keyword Collector (132) ⚡ Prolific Year (12) 🔥 Unstoppable (6) 💎 Century Club (30)

Conferences

ICCV (7) CVPR (6) ECCV (5) NIPS (5) CORL (4) AAAI (1) ACL (1) EMNLP (1) ICLR (1) WACV (1)

Top co-authors

Jiangmiao Pang (20) Dahua Lin (16) Yilun Chen (9) Runsen Xu (7) Xinge ZHU (6) Wenwei Zhang (5) Chenming Zhu (4) Hanqing Wang (4) Yuexin Ma (4) Haifeng Huang (4)

Keywords

scene understanding (5) 3d object detection (4) autonomous driving (3) question answering (3) large language model (3) visual grounding (3) 3d scene understanding (3) depth estimation (3) robot manipulation (2) computer vision (2) 3d reconstruction (2) semantic segmentation (2) humanoid robot (2) object detection (2) embodied ai (2) robotic manipulation (2) 3d vision (2) point cloud (2) multi-modal learning (2) language grounding (2)

Papers

Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning ACL 2026 Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling AAAI 2026 GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation CVPR 2025 RoboGround: Robotic Manipulation with Grounded Vision-Language Priors CVPR 2025 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities ICCV 2025 Language-to-Space Programming for Training-Free 3D Visual Grounding EMNLP 2025 GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene ICCV 2025 Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities ICCV 2025 VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization ICCV 2025 Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation ECCV 2024 Learning to Adapt SAM for Segmenting Cross-domain Point Clouds ECCV 2024 ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities ECCV 2024 MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations NIPS 2024 OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries NIPS 2024 CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics NIPS 2024 Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers NIPS 2024 VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding CORL 2024 EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI CVPR 2024 GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction CVPR 2024 PointLLM: Empowering Large Language Models to Understand Point Clouds ECCV 2024 Unified Human-Scene Interaction via Prompted Chain-of-Contacts ICLR 2024 Scene as Occupancy ICCV 2023 DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking CORL 2023 GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding ICCV 2023 MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection ICCV 2023 MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training CVPR 2023 Monocular 3D Object Detection with Depth from Motion ECCV 2022 SIDE: Center-Based Stereo 3D Detector With Structure-Aware Instance Depth Estimation WACV 2022 Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion NIPS 2021 Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation CVPR 2021 Probabilistic and Geometric Depth: Detecting Objects in Perspective CORL 2021 Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds CORL 2020