Shanghang Zhang
96 papers · 2017–2026 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π§ Keyword Pioneer π£ Hot Topic Early Bird π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (13) π Conference Polyglot (14)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Cross-Pollinator
(6)
π
Conference Loyalist
(26)
π€
Dynamic Duo
(23)
π
Triple Crown
π
Grand Slam
π₯
Mega-Team
(37)
π¬
Deep Specialist
(15)
π§¬
Topic Evolution
π
Keyword Champion
(2)
π₯
Unstoppable
(9)
ποΈ
Keyword Collector
(357)
π
Century Club
(92)
β‘
Prolific Year
(27)
Conferences
CVPR (26)
AAAI (12)
NIPS (11)
ICCV (10)
ICML (10)
ICLR (7)
ECCV (6)
ACL (3)
IJCAI (3)
EMNLP (2)
RSS (2)
WACV (2)
AISTATS (1)
CORL (1)
Top co-authors
Keywords
domain adaptation
(8)
3d object detection
(6)
knowledge distillation
(6)
diffusion model
(6)
self-supervised learning
(5)
model compression
(5)
vision-language model
(5)
point cloud
(5)
distribution shift
(5)
semantic segmentation
(4)
domain generalization
(4)
robotic manipulation
(4)
multimodal learning
(4)
autonomous driving
(4)
image generation
(4)
object detection
(4)
representation learning
(3)
continual learning
(3)
post-training quantization
(3)
adversarial learning
(3)
Papers
FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning
AAAI 2026
NavA3: Understanding Any Instruction, Navigating Anywhere, Finding Anything
ACL 2026
MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
AAAI 2026
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
AAAI 2026
FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers
IJCAI 2025
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
ICCV 2025
3DS-VLA: A 3D Spatial-Aware Vision Language Action Model for Robust Multi-Task Manipulation
CORL 2025
Authentic 4D Driving Simulation with a Video Generation Model
ICCV 2025
EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting
ICCV 2025
DesignEdit: Unify Spatial-Aware Image Editing via Training-free Inpainting with a Multi-Layered Latent Diffusion Framework
AAAI 2025
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
AAAI 2025
Subgraph Aggregation for Out-of-Distribution Generalization on Graphs
AAAI 2025
MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation
ACL 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
CVPR 2025
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
CVPR 2025
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
CVPR 2025
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation
CVPR 2025
Segment Any Motion in Videos
CVPR 2025
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CVPR 2025
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information
ACL 2025
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
ICLR 2025
Co$^{\mathbf{3}}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
ICLR 2025
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
ICLR 2025
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
ICML 2025
PINNsAgent: Automated PDE Surrogation with Large Language Models
ICML 2025
SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Modelβs Parameter-Efficient Fine-Tuning
ICML 2025
Empowering World Models with Reflection for Embodied Video Prediction
ICML 2025
OmniArch: Building Foundation Model for Scientific Computing
ICML 2025
CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World
RSS 2025
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation
RSS 2025
4D Visual Pre-training for Robot Learning
ICCV 2025
I-MedSAM: Implicit Medical Image Segmentation with Segment Anything
ECCV 2024
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
EMNLP 2024
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
EMNLP 2024
ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation
ICLR 2024
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ICLR 2024
Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention
NIPS 2024
Unveiling the Tapestry of Consistency in Large Vision-Language Models
NIPS 2024
Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection
AAAI 2024
Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction
AAAI 2024
FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection
AAAI 2024
Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation
AAAI 2024
Compositional Few-Shot Class-Incremental Learning
ICML 2024
TCP: Triplet Contrastive-Relationship Preserving for Class-Incremental Learning
WACV 2024
RoboMamba: Efficient Vision-Language-Action Model for Robotic Reasoning and Manipulation
NIPS 2024
VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
ICML 2024
Gradient-based Parameter Selection for Efficient Fine-Tuning
CVPR 2024
PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
CVPR 2024
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
CVPR 2024
FreeKD: Knowledge Distillation via Semantic Frequency Prompt
CVPR 2024
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
CVPR 2024
Cloud-Device Collaborative Learning for Multimodal Large Language Models
CVPR 2024
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
CVPR 2024
Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting
ICML 2024
LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
ECCV 2024
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
CVPR 2023
PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection
NIPS 2023
Annealing-Based Label-Transfer Learning for Open World Object Detection
CVPR 2023
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
CVPR 2023
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
CVPR 2023
Improving Generalization of Meta-Learning With Inverted Regularization at Inner-Level
CVPR 2023
Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation
CVPR 2023
BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks
CVPR 2023
Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-World
CVPR 2023
Q-Diffusion: Quantizing Diffusion Models
ICCV 2023
PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning
ICCV 2023
QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection
ICCV 2023
Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks
ICML 2023
Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts
CVPR 2022
Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting
ICLR 2022
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
NIPS 2022
Online Continual Adaptation with Active Self-Training
AISTATS 2022
Efficient Meta-Tuning for Content-Aware Neural Video Delivery
ECCV 2022
MTTrans: Cross-Domain Object Detection with Mean Teacher Transformer
ECCV 2022
Jump Self-attention: Capturing High-order Statistics in Transformers
NIPS 2022
DNA: Domain Generalization with Diversified Neural Averaging
ICML 2022
Self-Supervised Pretraining Improves Self-Supervised Pretraining
WACV 2022
Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation
NIPS 2022
Domain-Adaptive Text Classification with Structured Knowledge from Unlabeled Data
IJCAI 2022
Decoupling Global and Local Representations via Invertible Generative Flows
ICLR 2021
Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency
ICCV 2021
Learning Invariant Representations and Risks for Semi-Supervised Domain Adaptation
CVPR 2021
Prototypical Cross-Domain Self-Supervised Learning for Few-Shot Unsupervised Domain Adaptation
CVPR 2021
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
AAAI 2021
Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks
NIPS 2021
Contrastive Multimodal Fusion With TupleInfoNCE
ICCV 2021
Generalized Zero-Shot Text Classification for ICD Coding
IJCAI 2020
Multi-Source Distilling Domain Adaptation
AAAI 2020
TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning
ECCV 2020
Instance Adaptive Self-Training for Unsupervised Domain Adaptation
ECCV 2020
MaCow: Masked Convolutional Generative Flow
NIPS 2019
Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning
NIPS 2019
Adversarial Multiple Source Domain Adaptation
NIPS 2018
Learning to Understand Image Blur
CVPR 2018
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
ICCV 2017
Understanding Traffic Density From Large-Scale Web Camera Data
CVPR 2017