Shanghang Zhang

96 papers · 2017–2026 · 14 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (13) 🌍 Conference Polyglot (14)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🐝 Cross-Pollinator (6) 🏠 Conference Loyalist (26) 🤝 Dynamic Duo (23) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (37) 🔬 Deep Specialist (15) 🧬 Topic Evolution 🏆 Keyword Champion (2) 🔥 Unstoppable (9) 🗃️ Keyword Collector (357) 💎 Century Club (92) ⚡ Prolific Year (27)

Conferences

CVPR (26) AAAI (12) NIPS (11) ICCV (10) ICML (10) ICLR (7) ECCV (6) ACL (3) IJCAI (3) EMNLP (2) RSS (2) WACV (2) AISTATS (1) CORL (1)

Top co-authors

Jiaming Liu (23) Renrui Zhang (15) Kurt Keutzer (15) Ming Lu (11) yuan zhang (8) Yandong Guo (8) Qizhe Zhang (8) Zhen Dong (7) Xiaoqi Li (7) Jianxin Li (6)

Keywords

domain adaptation (8) 3d object detection (6) knowledge distillation (6) diffusion model (6) self-supervised learning (5) model compression (5) vision-language model (5) point cloud (5) distribution shift (5) semantic segmentation (4) domain generalization (4) robotic manipulation (4) multimodal learning (4) autonomous driving (4) image generation (4) object detection (4) representation learning (3) continual learning (3) post-training quantization (3) adversarial learning (3)

Papers

FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning AAAI 2026 NavA3: Understanding Any Instruction, Navigating Anywhere, Finding Anything ACL 2026 MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs AAAI 2026 MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation AAAI 2026 FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers IJCAI 2025 Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs ICCV 2025 3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation CORL 2025 Authentic 4D Driving Simulation with a Video Generation Model ICCV 2025 EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting ICCV 2025 DesignEdit: Unify Spatial-Aware Image Editing via Training-free Inpainting with a Multi-Layered Latent Diffusion Framework AAAI 2025 LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding AAAI 2025 Subgraph Aggregation for Out-of-Distribution Generalization on Graphs AAAI 2025 MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation ACL 2025 RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete CVPR 2025 Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression CVPR 2025 MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders CVPR 2025 Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation CVPR 2025 Segment Any Motion in Videos CVPR 2025 Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation CVPR 2025 LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information ACL 2025 Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want ICLR 2025 Co$^{\mathbf{3}}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion ICLR 2025 MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine ICLR 2025 SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference ICML 2025 PINNsAgent: Automated PDE Surrogation with Large Language Models ICML 2025 SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model’s Parameter-Efficient Fine-Tuning ICML 2025 Empowering World Models with Reflection for Embodied Video Prediction ICML 2025 OmniArch: Building Foundation Model for Scientific Computing ICML 2025 CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World RSS 2025 RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation RSS 2025 4D Visual Pre-training for Robot Learning ICCV 2025 I-MedSAM: Implicit Medical Image Segmentation with Segment Anything ECCV 2024 Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training EMNLP 2024 Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models EMNLP 2024 ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation ICLR 2024 ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate ICLR 2024 Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention NIPS 2024 Unveiling the Tapestry of Consistency in Large Vision-Language Models NIPS 2024 Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection AAAI 2024 Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction AAAI 2024 FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection AAAI 2024 Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation AAAI 2024 Compositional Few-Shot Class-Incremental Learning ICML 2024 TCP: Triplet Contrastive-Relationship Preserving for Class-Incremental Learning WACV 2024 RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation NIPS 2024 VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model ICML 2024 Gradient-based Parameter Selection for Efficient Fine-Tuning CVPR 2024 PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought CVPR 2024 Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation CVPR 2024 FreeKD: Knowledge Distillation via Semantic Frequency Prompt CVPR 2024 NTO3D: Neural Target Object 3D Reconstruction with Segment Anything CVPR 2024 Cloud-Device Collaborative Learning for Multimodal Large Language Models CVPR 2024 Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation CVPR 2024 Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting ICML 2024 LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model ECCV 2024 PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection CVPR 2023 PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection NIPS 2023 Annealing-Based Label-Transfer Learning for Open World Object Detection CVPR 2023 NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers CVPR 2023 MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID CVPR 2023 Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level CVPR 2023 Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation CVPR 2023 BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks CVPR 2023 Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World CVPR 2023 Q-Diffusion: Quantizing Diffusion Models ICCV 2023 PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning ICCV 2023 QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection ICCV 2023 Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks ICML 2023 Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts CVPR 2022 Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting ICLR 2022 Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models NIPS 2022 Online Continual Adaptation with Active Self-Training AISTATS 2022 Efficient Meta-Tuning for Content-Aware Neural Video Delivery ECCV 2022 MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer ECCV 2022 Jump Self-attention: Capturing High-order Statistics in Transformers NIPS 2022 DNA: Domain Generalization with Diversified Neural Averaging ICML 2022 Self-Supervised Pretraining Improves Self-Supervised Pretraining WACV 2022 Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation NIPS 2022 Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data IJCAI 2022 Decoupling Global and Local Representations via Invertible Generative Flows ICLR 2021 Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency ICCV 2021 Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation CVPR 2021 Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation CVPR 2021 Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting AAAI 2021 Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks NIPS 2021 Contrastive Multimodal Fusion With TupleInfoNCE ICCV 2021 Generalized Zero-Shot Text Classification for ICD Coding IJCAI 2020 Multi-Source Distilling Domain Adaptation AAAI 2020 TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning ECCV 2020 Instance Adaptive Self-Training for Unsupervised Domain Adaptation ECCV 2020 MaCow: Masked Convolutional Generative Flow NIPS 2019 Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning NIPS 2019 Adversarial Multiple Source Domain Adaptation NIPS 2018 Learning to Understand Image Blur CVPR 2018 FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras ICCV 2017 Understanding Traffic Density From Large-Scale Web Camera Data CVPR 2017