Zhaoxiang Zhang

123 papers · 2016–2026 · 12 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (19) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (19) 🧭 Keyword Pioneer 🏠 Conference Loyalist (41) 🤝 Dynamic Duo (24) 🏆 Grand Slam 👥 Mega-Team (26) 🔬 Deep Specialist (18) 🧬 Topic Evolution 🏆 Keyword Champion (4) 📈 Trend Setter ❓ The Questioner 💎 Century Club (121) 🗃️ Keyword Collector (520) 🚀 Conference Pioneer ⚡ Prolific Year (8) 🔥 Unstoppable (10)

Conferences

CVPR (41) ICCV (22) ECCV (16) AAAI (12) ACL (8) ICLR (8) IJCAI (7) NIPS (5) EMNLP (1) ICML (1) JMLR (1) WACV (1)

Top co-authors

Yuntao Chen (24) Lue Fan (16) Zhen Lei (14) Junran Peng (13) Junsong Fan (12) Tieniu Tan (11) Naiyan Wang (10) Yuxi Wang (9) Jiaheng Liu (9) Yuqi Wang (9)

Research topics

Reinforcement Learning (1)

Keywords

semantic segmentation (15) autonomous driving (14) object detection (14) domain adaptation (9) instance segmentation (8) convolutional neural network (8) 3d object detection (7) unsupervised learning (6) weakly supervised learning (6) vision-language model (5) neural network (5) person re-identification (5) bird's eye view (4) pseudo label (4) world model (4) large language model (4) point cloud (4) transfer learning (4) depth estimation (4) feature learning (4)

Papers

AdaField: Generalizable Surface Pressure Modeling with Physics-Informed Pre-training and Flow-Conditioned Adaptation AAAI 2026 CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization ACL 2026 DexVLG: Dexterous Vision-Language-Grasp Model at Scale ICCV 2025 Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness ICCV 2025 LayerAnimate: Layer-level Control for Animation ICCV 2025 MIO: A Foundation Model on Multimodal Tokens EMNLP 2025 FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering CVPR 2025 FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes CVPR 2025 Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance ICML 2025 Reconstructive Visual Instruction Tuning ICLR 2025 Enhancing End-to-End Autonomous Driving with Latent World Model ICLR 2025 FreeVS: Generative View Synthesis on Free Driving Trajectory ICLR 2025 FIRM: Flexible Interactive Reflection ReMoval AAAI 2025 SceneX: Procedural Controllable Large-Scale Scene Generation AAAI 2025 Top-Down Guidance for Learning Object-Centric Representations IJCAI 2025 AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs ACL 2025 Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models through Bidirectional Hidden State Intervention ACL 2025 M2RC-EVAL: Massively Multilingual Repository-level Code Completion Evaluation ACL 2025 Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? ACL 2025 OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models ACL 2025 C2KD: Cross-layer and Cross-head Knowledge Distillation for Small Language Model-based Recommendation ACL 2025 CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes ICLR 2025 McEval: Massively Multilingual Code Evaluation ICLR 2025 MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models ICLR 2025 Images as Noisy Labels: Unleashing the Potential of the Diffusion Model for Open-Vocabulary Semantic Segmentation ICCV 2025 MCOP: Multi-UAV Collaborative Occupancy Prediction ICCV 2025 DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers ICCV 2025 End-to-End Driving with Online Trajectory Evaluation via BEV World Model ICCV 2025 UIPro: Unleashing Superior Interaction Capability For GUI Agents ICCV 2025 CSOT: Cross-Scan Object Transfer for Semi-Supervised LiDAR Object Detection ECCV 2024 RoleAgent: Building, Interacting, and Benchmarking High-quality Role-Playing Agents from Scripts NIPS 2024 OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction NIPS 2024 VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization NIPS 2024 Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection NIPS 2024 Fully Data-Driven Pseudo Label Estimation for Pointly-Supervised Panoptic Segmentation AAAI 2024 Compositional Inversion for Stable Diffusion Models AAAI 2024 RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models ACL 2024 Enhancing Visual Continual Learning with Language-Guided Supervision CVPR 2024 Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving CVPR 2024 RCL: Reliable Continual Learning for Unified Failure Detection CVPR 2024 PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation CVPR 2024 Continual Forgetting for Pre-trained Vision Models CVPR 2024 MemoNav: Working Memory Model for Visual Navigation CVPR 2024 Robust Depth Enhancement via Polarization Prompt Fusion Tuning CVPR 2024 HardMo: A Large-Scale Hardcase Dataset for Motion Capture CVPR 2024 OneTrack: Demystifying the Conflict Between Detection and Tracking in End-to-End 3D Trackers ECCV 2024 Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation ECCV 2024 Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance ECCV 2024 CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians ECCV 2024 DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model NIPS 2024 Monocular Occupancy Prediction for Scalable Indoor Scenes ECCV 2024 General Geometry-aware Weakly Supervised 3D Object Detection ECCV 2024 Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention ECCV 2024 MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection ICLR 2024 3D Video Object Detection With Learnable Object-Centric Global Optimization CVPR 2023 LMR: A Large-Scale Multi-Reference Dataset for Reference-Based Super-Resolution ICCV 2023 FPR: False Positive Rectification for Weakly Supervised Semantic Segmentation ICCV 2023 DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization ICCV 2023 Sharpness-Aware Gradient Matching for Domain Generalization CVPR 2023 Once Detected, Never Lost: Surpassing Human Performance in Offline LiDAR based 3D Object Detection ICCV 2023 Bi-Directional Frame Interpolation for Unsupervised Video Anomaly Detection WACV 2023 Hard Patches Mining for Masked Image Modeling CVPR 2023 Robust Feature Rectification of Pretrained Vision Models for Object Recognition AAAI 2023 BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision CVPR 2023 FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection CVPR 2023 Intrinsic Physical Concepts Discovery With Object-Centric Predictive Models CVPR 2023 Graphics Capsule: Learning Hierarchical 3D Face Representations From 2D Images CVPR 2023 SSF: Accelerating Training of Spiking Neural Networks with Stabilized Spiking Flow ICCV 2023 Blind Video Deflickering by Neural Filtering With a Flawed Atlas CVPR 2023 BAEFormer: Bi-Directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation CVPR 2023 Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation ICCV 2023 Stereo Depth Estimation with Echoes ECCV 2022 Deconfounding Physical Dynamics with Global Causal Relation and Confounder Transmission for Counterfactual Prediction AAAI 2022 DATA: Domain-Aware and Task-Aware Self-Supervised Learning CVPR 2022 Densely Constrained Depth Estimator for Monocular 3D Object Detection ECCV 2022 RRSR:Reciprocal Reference-Based Image Super-Resolution with Progressive Feature Alignment and Selection ECCV 2022 Pointly-Supervised Panoptic Segmentation ECCV 2022 Self-Guided Hard Negative Generation for Unsupervised Person Re-Identification IJCAI 2022 OBJECT DYNAMICS DISTILLATION FOR SCENE DECOMPOSITION AND REPRESENTATION ICLR 2022 Sparse Instance Activation for Real-Time Instance Segmentation CVPR 2022 Embracing Single Stride 3D Object Detector With Sparse Transformer CVPR 2022 HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network CVPR 2022 Implicit Sample Extension for Unsupervised Person Re-Identification CVPR 2022 Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer CVPR 2022 Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture CVPR 2022 The Devil Is in the Details: Window-Based Attention for Image Compression CVPR 2022 Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation CVPR 2022 Uncertainty-Aware Pseudo Label Refinery for Domain Adaptive Semantic Segmentation ICCV 2021 RangeDet: In Defense of Range View for LiDAR-Based 3D Object Detection ICCV 2021 GAIA: A Transfer Learning System of Object Detection That Fits Your Needs CVPR 2021 RefineMask: Towards High-Quality Instance Segmentation With Fine-Grained Features CVPR 2021 Look Closer To Segment Better: Boundary Patch Refinement for Instance Segmentation CVPR 2021 Unsupervised Object Detection With LIDAR Clues CVPR 2021 Bottom-Up Human Pose Estimation via Disentangled Keypoint Regression CVPR 2021 Learnable Graph Matching: Incorporating Graph Partitioning With Deep Feature Learning for Multiple Object Tracking CVPR 2021 Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation AAAI 2021 Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT Philosophy CVPR 2021 Clothing Status Awareness for Long-Term Person Re-Identification ICCV 2021 Instance Guided Proposal Network for Person Search CVPR 2020 Bi-Directional Interaction Network for Person Search CVPR 2020 CIAN: Cross-Image Affinity Net for Weakly Supervised Semantic Segmentation AAAI 2020 Cascading Convolutional Color Constancy AAAI 2020 Context-Aware Attention Network for Image-Text Retrieval CVPR 2020 Boosting Decision-based Black-box Adversarial Attacks with Random Sign Flip ECCV 2020 Large-Scale Object Detection in the Wild From Imbalanced Multi-Labels CVPR 2020 Learning Integral Objects With Intra-Class Discriminator for Weakly-Supervised Semantic Segmentation CVPR 2020 Generalizing Person Re-Identification by Camera-Aware Invariance Learning and Cross-Domain Mixup ECCV 2020 Employing Multi-Estimations for Weakly-Supervised Semantic Segmentation ECCV 2020 Spectral Feature Transformation for Person Re-Identification ICCV 2019 Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization ICCV 2019 Human-Like Delicate Region Erasing Strategy for Weakly Supervised Detection AAAI 2019 Scale-Aware Trident Networks for Object Detection ICCV 2019 Sequence Level Semantics Aggregation for Video Object Detection ICCV 2019 SimpleDet: A Simple and Versatile Distributed Framework for Object Detection and Instance Recognition JMLR 2019 Attention-Aware Sampling via Deep Reinforcement Learning for Action Recognition AAAI 2019 POD: Practical Object Detection With Scale-Sensitive Network ICCV 2019 Hard-Aware Point-to-Set Deep Metric for Person Re-identification ECCV 2018 Multi-task Layout Analysis for Historical Handwritten Documents Using Fully Convolutional Networks IJCAI 2018 Deep Convolutional Neural Networks with Merge-and-Run Mappings IJCAI 2018 Dynamic Multi-Task Learning with Convolutional Neural Network IJCAI 2017 Diverse Neuron Type Selection for Convolutional Neural Networks IJCAI 2017 Random Shifting for CNN: a Solution to Reduce Information Loss in Down-Sampling Layers IJCAI 2017 GIFT: A Real-Time and Scalable 3D Shape Search Engine CVPR 2016