Zheng Zhu

49 papers · 2018–2025 · 8 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (7) 🌍 Conference Polyglot (8) 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (73)

🗺️ Taxonomy Completionist (73) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (21) 🔬 Deep Specialist (11) 🤝 Dynamic Duo (22) 🏆 Keyword Champion (2) 💎 Century Club (49) 🔥 Unstoppable (8) 🗃️ Keyword Collector (193) ❓ The Questioner ⚡ Prolific Year (5) 🚀 Conference Pioneer

Conferences

CVPR (21) ICCV (10) AAAI (7) ECCV (6) NIPS (2) CORL (1) ICLR (1) IJCAI (1)

Top co-authors

Guan Huang (22) Jiwen Lu (18) Jie Zhou (16) Xiaofeng Wang (11) Xingang Wang (11) Dalong Du (7) Wenzhao Zheng (6) Yun Ye (6) Guosheng Zhao (6) Xinze Chen (5)

Keywords

autonomous driving (9) semantic segmentation (5) vision transformer (5) 3d vision (4) neural network (4) video generation (4) convolutional neural network (4) world model (4) self-supervised learning (4) diffusion model (4) 3d object detection (4) attention mechanism (4) object tracking (3) knowledge distillation (3) contrastive learning (3) object detection (2) scene reconstruction (2) metric learning (2) human pose estimation (2) scene representation (2)

Papers

JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems CVPR 2025 DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation CVPR 2025 ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation ICCV 2025 WonderTurbo: Generating Interactive 3D World in 0.72 Seconds ICCV 2025 DetRF: Detachable Novel Views Synthesis of Dynamic Scenes Using Backdrop-Driven Neural Radiance Fields AAAI 2025 DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation AAAI 2025 HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation CVPR 2025 ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration CVPR 2025 DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation ECCV 2024 DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving CVPR 2024 OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models ECCV 2024 Unified Single-Stage Transformer Network for Efficient RGB-T Tracking IJCAI 2024 DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving ECCV 2024 One at a Time: Progressive Multi-Step Volumetric Probability Learning for Reliable 3D Scene Perception AAAI 2024 DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception AAAI 2024 CompletionFormer: Depth Completion With Convolutions and Vision Transformers CVPR 2023 Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning AAAI 2023 A Simple Baseline for Multi-Camera 3D Object Detection AAAI 2023 Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark CVPR 2023 DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation CVPR 2023 OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions ICCV 2023 OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction ICCV 2023 Token-Label Alignment for Vision Transformers ICCV 2023 DREAM: Efficient Dataset Distillation by Representative Matching ICCV 2023 DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition ICCV 2023 OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception ICCV 2023 SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving ICCV 2023 Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors ICLR 2023 Decoupled Multi-Task Learning With Cyclical Self-Regulation for Face Parsing CVPR 2022 Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis ECCV 2022 MVSTER: Epipolar Transformer for Efficient Multi-View Stereo ECCV 2022 OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression NIPS 2022 Dimension Embeddings for Monocular 3D Object Detection CVPR 2022 Crafting Better Contrastive Views for Siamese Representation Learning CVPR 2022 DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting CVPR 2022 An Efficient Training Approach for Very Large Scale Face Recognition CVPR 2022 Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search CVPR 2022 SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation CORL 2022 CAFE: Learning To Condense Dataset by Aligning Features CVPR 2022 Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes CVPR 2021 Global Filter Networks for Image Classification NIPS 2021 WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition CVPR 2021 Gait Recognition in the Wild: A Benchmark ICCV 2021 SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation AAAI 2021 The Devil Is in the Details: Delving Into Unbiased Data Processing for Human Pose Estimation CVPR 2020 Attention-Guided Unified Network for Panoptic Segmentation CVPR 2019 Distractor-aware Siamese Networks for Visual Object Tracking ECCV 2018 High Performance Visual Tracking With Siamese Region Proposal Network CVPR 2018 End-to-End Flow Correlation Tracking With Spatial-Temporal Attention CVPR 2018