Hang Zhao

74 papers · 2017–2025 · 13 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (8)

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (8) 🌍 Conference Polyglot (13) 🤝 Dynamic Duo (11) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (23) 🔬 Deep Specialist (17) 🏆 Keyword Champion (2) ❓ The Questioner 🚀 Conference Pioneer ⚡ Prolific Year (17) 🗃️ Keyword Collector (273) 📈 Trend Setter 💎 Century Club (74) 🔥 Unstoppable (9)

Conferences

CVPR (16) CORL (13) ICCV (13) ECCV (6) ICLR (6) NIPS (5) ICML (3) INTERSPEECH (3) AAAI (2) IJCAI (2) RSS (2) WACV (2) EMNLP (1)

Top co-authors

Yue Wang (11) Antonio Torralba (9) Yicheng Liu (6) Yilun Wang (6) Tianyuan Yuan (6) Tingle Li (5) Dragomir Anguelov (5) Sucheng Ren (5) Zihui Xue (5) Junru Gu (4)

Keywords

autonomous driving (16) 3d object detection (6) trajectory prediction (6) self-supervised learning (6) motion forecasting (5) contrastive learning (4) multimodal learning (4) scene understanding (4) multi-modal learning (4) graph neural network (4) audio-visual learning (3) hd map (3) semantic segmentation (3) representation learning (3) human pose estimation (3) end-to-end learning (3) 3d vision (2) depth estimation (2) vision transformer (2) model compression (2)

Papers

LONG3R: Long Sequence Streaming 3D Reconstruction ICCV 2025 Morpheus: A Neural-driven Animatronic Face with Hybrid Actuation and Diverse Emotion Control RSS 2025 Supervising Sound Localization by In-the-wild Egomotion CVPR 2025 GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting ICCV 2025 PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation RSS 2025 Embrace Contacts: humanoid shadowing with full body ground contacts CORL 2025 Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation ECCV 2024 Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration ICML 2024 Uncertainty-Aware Decision Transformer for Stochastic Driving Environments CORL 2024 Humanoid Parkour Learning CORL 2024 DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models CORL 2024 StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction WACV 2024 PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors ECCV 2024 MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning INTERSPEECH 2024 ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information EMNLP 2024 CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction ECCV 2024 SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer CVPR 2023 What Happened 3 Seconds Ago? Inferring the Past With Thermal Imaging CVPR 2023 ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries CVPR 2023 VectorMapNet: End-to-end Vectorized HD Map Learning ICML 2023 On Uni-Modal Feature Learning in Supervised Multi-Modal Learning ICML 2023 Programmatically Grounded, Compositionally Generalizable Robotic Manipulation ICLR 2023 Robot Parkour Learning CORL 2023 Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable CORL 2023 A Universal Semantic-Geometric Representation for Robotic Manipulation CORL 2023 The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation ICLR 2023 Self-supervision through Random Segments with Autoregressive Coding (RandSAC) ICLR 2023 INT2: Interactive Trajectory Prediction at Intersections ICCV 2023 PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework ICCV 2023 Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models NIPS 2023 Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving NIPS 2023 GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training CVPR 2023 Neural Map Prior for Autonomous Driving CVPR 2023 Co-Advise: Cross Inductive Bias Distillation CVPR 2022 SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-training for Spatial-Aware Visual Representations AAAI 2022 Embracing Single Stride 3D Object Detector With Sparse Transformer CVPR 2022 Egocentric Prediction of Action Target in 3D CVPR 2022 M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction CVPR 2022 CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation ECCV 2022 Learning Visual Styles from Audio-Visual Associations ECCV 2022 Learning Efficient Online 3D Bin Packing on Packing Configuration Trees ICLR 2022 R4D: Utilizing Reference Objects for Long-Range Distance Estimation ICLR 2022 IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes ICLR 2022 AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection IJCAI 2022 Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation IJCAI 2022 Radio2Speech: High Quality Speech Recovery from Radio Frequency Signals INTERSPEECH 2022 Neural Dubber: Dubbing for Videos According to Scripts NIPS 2021 CVC: Contrastive Learning for Non-Parallel Voice Conversion INTERSPEECH 2021 On Feature Decorrelation in Self-Supervised Learning ICCV 2021 DenseTNT: End-to-End Trajectory Prediction From Dense Goal Sets ICCV 2021 Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset ICCV 2021 Multimodal Knowledge Expansion ICCV 2021 HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps CVPR 2021 What Makes Multi-Modal Learning Better than Single (Provably) NIPS 2021 Online 3D Bin Packing with Constrained Deep Reinforcement Learning AAAI 2021 Multi-Agent Trajectory Prediction by Combining Egocentric and Allocentric Views CORL 2021 Adversarially Robust Imitation Learning CORL 2021 DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries CORL 2021 CLOUD: Contrastive Learning of Unsupervised Dynamics CORL 2020 UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging NIPS 2020 VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation CVPR 2020 Music Gesture for Visual Sound Separation CVPR 2020 Scalability in Perception for Autonomous Driving: Waymo Open Dataset CVPR 2020 Unsupervised Monocular Depth Learning in Dynamic Scenes CORL 2020 AlignNet: A Unifying Approach to Audio-Visual Alignment WACV 2020 TNT: Target-driven Trajectory Prediction CORL 2020 Self-Supervised Moving Vehicle Tracking With Stereo Sound ICCV 2019 Through-Wall Human Mesh Recovery Using Radio Signals ICCV 2019 The Sound of Motions ICCV 2019 HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization ICCV 2019 Through-Wall Human Pose Estimation Using Radio Signals CVPR 2018 The Sound of Pixels ECCV 2018 Open Vocabulary Scene Parsing ICCV 2017 Scene Parsing Through ADE20K Dataset CVPR 2017