Hengshuang Zhao
88 papers · 2017–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π Academic Marathon (8) π§ Keyword Pioneer π Conference Polyglot (9) π Interdisciplinary Bridge π Cross-Pollinator (14)
π
Cross-Pollinator
(14)
π
Renaissance Researcher
(7)
πΊοΈ
Taxonomy Completionist
(95)
π
Conference Loyalist
(38)
π
Grand Slam
π¬
Deep Specialist
(22)
π₯
Mega-Team
(30)
π
Keyword Champion
(2)
π€
Dynamic Duo
(24)
π
Century Club
(87)
ποΈ
Keyword Collector
(316)
π₯
Unstoppable
(9)
β
The Questioner
β‘
Prolific Year
(25)
π
Conference Pioneer
Conferences
CVPR (38)
ECCV (14)
NIPS (12)
ICCV (11)
ICML (6)
AAAI (2)
ICLR (2)
IJCAI (2)
EMNLP (1)
Top co-authors
Keywords
semantic segmentation
(19)
point cloud
(16)
representation learning
(7)
3d vision
(7)
self-supervised learning
(7)
point cloud processing
(6)
scene understanding
(6)
3d object detection
(6)
data augmentation
(5)
depth estimation
(5)
domain adaptation
(5)
diffusion model
(4)
zero-shot learning
(4)
3d point cloud
(4)
object detection
(4)
autonomous driving
(4)
vision-language model
(4)
large language model
(4)
semi-supervised learning
(3)
image generation
(3)
Papers
Game Ground Bench: Probing the Limits of LVLMs in Complex Semantic Grounding Across Game Universes
AAAI 2026
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
CVPR 2025
Sonata: Self-Supervised Learning of Reliable Point Representations
CVPR 2025
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
CVPR 2025
Empowering Large Language Models with 3D Situation Awareness
CVPR 2025
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
CVPR 2025
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
ICML 2025
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
ICML 2025
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
ICML 2025
BOOD: Boundary-based Out-Of-Distribution Data Generation
ICML 2025
LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
ICML 2025
VIP: Vision Instructed Pre-training for Robotic Manipulation
ICML 2025
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
ICLR 2025
ViLLa: Video Reasoning Segmentation with Large Language Model
ICCV 2025
DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs
ICCV 2025
StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth
ICCV 2025
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
ICCV 2025
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025
Enhancing LLM Knowledge Learning through Generalization
EMNLP 2025
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation
CVPR 2025
SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
CVPR 2025
UniMODE: Unified Monocular 3D Object Detection
CVPR 2024
LION: Linear Group RNN for 3D Object Detection in Point Clouds
NIPS 2024
Depth Anything V2
NIPS 2024
SyncVIS: Synchronized Video Instance Segmentation
NIPS 2024
One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
NIPS 2024
Zero-shot Image Editing with Reference Imitation
NIPS 2024
LiT: Unifying LiDAR "Languages" with LiDAR Translator
NIPS 2024
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
CVPR 2024
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
CVPR 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
CVPR 2024
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
CVPR 2024
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
CVPR 2024
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
CVPR 2024
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
CVPR 2024
Point Transformer V3: Simpler Faster Stronger
CVPR 2024
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
CVPR 2024
AnyDoor: Zero-shot Object-level Image Customization
CVPR 2024
LivePhoto: Real Image Animation with Text-guided Motion Control
ECCV 2024
Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
ECCV 2024
InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
ECCV 2024
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
ECCV 2024
OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
ECCV 2024
LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
ECCV 2024
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
ECCV 2024
Influencer Backdoor Attack on Semantic Segmentation
ICLR 2024
CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
NIPS 2023
Uni3DETR: Unified 3D Detection Transformer
NIPS 2023
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
NIPS 2023
FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
NIPS 2023
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
CVPR 2023
Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
CVPR 2023
Open-vocabulary Panoptic Segmentation with Embedding Modulation
ICCV 2023
Universal Adaptive Data Augmentation
IJCAI 2023
BT^2: Backward-compatible Training with Basis Transformation
ICCV 2023
Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning
ICCV 2023
Detecting Everything in the Open World: Towards Universal Object Detection
CVPR 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
AAAI 2023
FocalClick: Towards Practical Interactive Image Segmentation
CVPR 2022
MTFormer: Multi-task Learning via Transformer and Cross-Task Reasoning
ECCV 2022
SegPGD: An Effective and Efficient Adversarial Attack for Evaluating and Boosting Segmentation Robustness
ECCV 2022
DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation
ECCV 2022
Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
NIPS 2022
Stratified Transformer for 3D Point Cloud Segmentation
CVPR 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
CVPR 2022
PhysFormer: Facial Video-Based Physiological Measurement With Temporal Difference Transformer
CVPR 2022
Generalized Few-Shot Semantic Segmentation
CVPR 2022
Point Transformer
ICCV 2021
Fully Convolutional Networks for Panoptic Segmentation
CVPR 2021
Bidirectional Projection Network for Cross Dimension Scene Understanding
CVPR 2021
Do Different Tracking Tasks Require Different Appearance Models?
NIPS 2021
Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers
CVPR 2021
Semi-Supervised Semantic Segmentation With Directional Context-Aware Consistency
CVPR 2021
Distilling Knowledge via Knowledge Review
CVPR 2021
Dual-Cross Central Difference Network for Face Anti-Spoofing
IJCAI 2021
PAConv: Position Adaptive Convolution With Dynamic Kernel Assembling on Point Clouds
CVPR 2021
Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation
ICCV 2021
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
CVPR 2020
Exploring Self-Attention for Image Recognition
CVPR 2020
UPSNet: A Unified Panoptic Segmentation Network
CVPR 2019
PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing
CVPR 2019
Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation
ICCV 2019
SegStereo: Exploiting Semantic Information for Disparity Estimation
ECCV 2018
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
ECCV 2018
Compositing-aware Image Search
ECCV 2018
PSANet: Point-wise Spatial Attention Network for Scene Parsing
ECCV 2018
Pyramid Scene Parsing Network
CVPR 2017