Hang Zhao
74 papers · 2017–2025 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
🐣 Hot Topic Early Bird 🌍 Conference Polyglot (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (8)
🌉
Interdisciplinary Bridge
🏃
Academic Marathon
(8)
🌍
Conference Polyglot
(13)
🤝
Dynamic Duo
(11)
👑
Triple Crown
🏆
Grand Slam
👥
Mega-Team
(23)
🔬
Deep Specialist
(17)
🏆
Keyword Champion
(2)
❓
The Questioner
🚀
Conference Pioneer
⚡
Prolific Year
(17)
🗃️
Keyword Collector
(273)
📈
Trend Setter
💎
Century Club
(74)
🔥
Unstoppable
(9)
Conferences
CVPR (16)
CORL (13)
ICCV (13)
ECCV (6)
ICLR (6)
NIPS (5)
ICML (3)
INTERSPEECH (3)
AAAI (2)
IJCAI (2)
RSS (2)
WACV (2)
EMNLP (1)
Top co-authors
Keywords
autonomous driving
(16)
3d object detection
(6)
trajectory prediction
(6)
self-supervised learning
(6)
motion forecasting
(5)
contrastive learning
(4)
multimodal learning
(4)
scene understanding
(4)
multi-modal learning
(4)
graph neural network
(4)
audio-visual learning
(3)
hd map
(3)
semantic segmentation
(3)
representation learning
(3)
human pose estimation
(3)
end-to-end learning
(3)
3d vision
(2)
depth estimation
(2)
vision transformer
(2)
model compression
(2)
Papers
LONG3R: Long Sequence Streaming 3D Reconstruction
ICCV 2025
Morpheus: A Neural-driven Animatronic Face with Hybrid Actuation and Diverse Emotion Control
RSS 2025
Supervising Sound Localization by In-the-wild Egomotion
CVPR 2025
GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting
ICCV 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
RSS 2025
Embrace Contacts: humanoid shadowing with full body ground contacts
CORL 2025
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
ECCV 2024
Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration
ICML 2024
Uncertainty-Aware Decision Transformer for Stochastic Driving Environments
CORL 2024
Humanoid Parkour Learning
CORL 2024
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
CORL 2024
StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction
WACV 2024
PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
ECCV 2024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
INTERSPEECH 2024
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information
EMNLP 2024
CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction
ECCV 2024
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
CVPR 2023
What Happened 3 Seconds Ago? Inferring the Past With Thermal Imaging
CVPR 2023
ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries
CVPR 2023
VectorMapNet: End-to-end Vectorized HD Map Learning
ICML 2023
On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
ICML 2023
Programmatically Grounded, Compositionally Generalizable Robotic Manipulation
ICLR 2023
Robot Parkour Learning
CORL 2023
Cross-Dataset Sensor Alignment: Making Visual 3D Object Detector Generalizable
CORL 2023
A Universal Semantic-Geometric Representation for Robotic Manipulation
CORL 2023
The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation
ICLR 2023
Self-supervision through Random Segments with Autoregressive Coding (RandSAC)
ICLR 2023
INT2: Interactive Trajectory Prediction at Intersections
ICCV 2023
PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework
ICCV 2023
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
NIPS 2023
Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
NIPS 2023
GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training
CVPR 2023
Neural Map Prior for Autonomous Driving
CVPR 2023
Co-Advise: Cross Inductive Bias Distillation
CVPR 2022
SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-training for Spatial-Aware Visual Representations
AAAI 2022
Embracing Single Stride 3D Object Detector With Sparse Transformer
CVPR 2022
Egocentric Prediction of Action Target in 3D
CVPR 2022
M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction
CVPR 2022
CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation
ECCV 2022
Learning Visual Styles from Audio-Visual Associations
ECCV 2022
Learning Efficient Online 3D Bin Packing on Packing Configuration Trees
ICLR 2022
R4D: Utilizing Reference Objects for Long-Range Distance Estimation
ICLR 2022
IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes
ICLR 2022
AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection
IJCAI 2022
Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation
IJCAI 2022
Radio2Speech: High Quality Speech Recovery from Radio Frequency Signals
INTERSPEECH 2022
Neural Dubber: Dubbing for Videos According to Scripts
NIPS 2021
CVC: Contrastive Learning for Non-Parallel Voice Conversion
INTERSPEECH 2021
On Feature Decorrelation in Self-Supervised Learning
ICCV 2021
DenseTNT: End-to-End Trajectory Prediction From Dense Goal Sets
ICCV 2021
Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset
ICCV 2021
Multimodal Knowledge Expansion
ICCV 2021
HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
CVPR 2021
What Makes Multi-Modal Learning Better than Single (Provably)
NIPS 2021
Online 3D Bin Packing with Constrained Deep Reinforcement Learning
AAAI 2021
Multi-Agent Trajectory Prediction by Combining Egocentric and Allocentric Views
CORL 2021
Adversarially Robust Imitation Learning
CORL 2021
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
CORL 2021
CLOUD: Contrastive Learning of Unsupervised Dynamics
CORL 2020
UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging
NIPS 2020
VectorNet: Encoding HD Maps and Agent Dynamics From Vectorized Representation
CVPR 2020
Music Gesture for Visual Sound Separation
CVPR 2020
Scalability in Perception for Autonomous Driving: Waymo Open Dataset
CVPR 2020
Unsupervised Monocular Depth Learning in Dynamic Scenes
CORL 2020
AlignNet: A Unifying Approach to Audio-Visual Alignment
WACV 2020
TNT: Target-driven Trajectory Prediction
CORL 2020
Self-Supervised Moving Vehicle Tracking With Stereo Sound
ICCV 2019
Through-Wall Human Mesh Recovery Using Radio Signals
ICCV 2019
The Sound of Motions
ICCV 2019
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
ICCV 2019
Through-Wall Human Pose Estimation Using Radio Signals
CVPR 2018
The Sound of Pixels
ECCV 2018
Open Vocabulary Scene Parsing
ICCV 2017
Scene Parsing Through ADE20K Dataset
CVPR 2017