Hanwang Zhang
138 papers · 2015–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (17) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π Conference Polyglot (12)
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(17)
π
Keyword Trendsetter Combo
(6)
π
Conference Loyalist
(21)
π€
Dynamic Duo
(25)
π
Grand Slam
π₯
Mega-Team
(32)
π
Triple Crown
π±
Topic Pioneer
π¬
Deep Specialist
(25)
π
Keyword Champion
(6)
ποΈ
Keyword Collector
(484)
β
The Questioner
(3)
π
Conference Pioneer
β‘
Prolific Year
(18)
π₯
Unstoppable
(12)
π
Trend Setter
π
Century Club
(131)
Conferences
CVPR (41)
ICCV (22)
NIPS (21)
AAAI (17)
ECCV (10)
ICML (8)
ACL (6)
ICLR (5)
IJCAI (3)
IJCNLP (2)
WACV (2)
EMNLP (1)
Top co-authors
Keywords
diffusion model
(14)
causal inference
(12)
vision-language model
(10)
knowledge distillation
(9)
image generation
(9)
visual reasoning
(8)
few-shot learning
(8)
multimodal learning
(8)
transfer learning
(7)
image captioning
(7)
out-of-distribution generalization
(6)
zero-shot learning
(6)
semantic segmentation
(6)
attention mechanism
(6)
weakly supervised learning
(6)
visual grounding
(6)
spurious correlation
(6)
visual question answering
(6)
scene graph
(6)
representation learning
(5)
Papers
Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image
AAAI 2026
Object Fusion via Diffusion Time-step for Customized Image Editing with Single Example
AAAI 2026
Learning to Animate Images from A Few Videos to Portray Delicate Human Actions
WACV 2026
DEPO: Dual-Efficiency Preference Optimization for LLM Agents
AAAI 2026
Hierarchical Semantic Alignment for Image Clustering
AAAI 2026
DragNeXt: Rethinking Drag-Based Image Editing
AAAI 2026
NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos
AAAI 2026
Pushing Rendering Boundaries: Hard Gaussian Splatting
AAAI 2026
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
CVPR 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
CVPR 2025
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
CVPR 2025
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
CVPR 2025
3D Question Answering via only 2D Vision-Language Models
ICML 2025
On Path to Multimodal Generalist: General-Level and General-Bench
ICML 2025
Towards Semantic Equivalence of Tokenization in Multimodal LLM
ICLR 2025
Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models
ICCV 2025
Dynamic Multimodal Prototype Learning in Vision-Language Models
ICCV 2025
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
ICCV 2025
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
ICCV 2025
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
ICCV 2025
$\mathcalVista\mathcalDPO$: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
ICML 2025
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
ICML 2025
SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
AAAI 2025
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
CVPR 2025
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
CVPR 2025
Few-shot NeRF by Adaptive Rendering Loss Regularization
ECCV 2024
Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting
NIPS 2024
MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
NIPS 2024
Action Imitation in Common Action Space for Customized Action Image Synthesis
NIPS 2024
Unified Generative and Discriminative Training for Multi-modal Large Language Models
NIPS 2024
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
NIPS 2024
Decoupled Kullback-Leibler Divergence Loss
NIPS 2024
Robust Fine-tuning of Zero-shot Models via Variance Reduction
NIPS 2024
Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
NIPS 2024
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
NIPS 2024
Dual-Perspective Knowledge Enrichment for Semi-supervised 3D Object Detection
AAAI 2024
MGNet: Learning Correspondences via Multiple Graphs
AAAI 2024
Doubly Abductive Counterfactual Inference for Text-based Image Editing
CVPR 2024
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
CVPR 2024
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
CVPR 2024
DisCo: Disentangled Control for Realistic Human Dance Generation
CVPR 2024
Discriminative Probing and Tuning for Text-to-Image Generation
CVPR 2024
Few-shot Learner Parameterization by Diffusion Time-steps
CVPR 2024
Distributionally Generative Augmentation for Fair Facial Attribute Classification
CVPR 2024
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
CVPR 2024
Diffusion Time-step Curriculum for One Image to 3D Generation
CVPR 2024
View-Consistent 3D Editing with Gaussian Splatting
ECCV 2024
Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation Framework
ECCV 2024
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
ECCV 2024
Exploring Diffusion Time-steps for Unsupervised Representation Learning
ICLR 2024
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
ICLR 2024
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
ICML 2024
Non-confusing Generation of Customized Concepts in Diffusion Models
ICML 2024
Auto-Encoding Morph-Tokens for Multimodal LLM
ICML 2024
Hypothetical Training for Robust Machine Reading Comprehension of Tabular Context
ACL 2023
Counterfactual Active Learning for Out-of-Distribution Generalization
ACL 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
ICCV 2023
Invariant Feature Regularization for Fair Face Recognition
ICCV 2023
Semantic Scene Completion With Cleaner Self
CVPR 2023
Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
CVPR 2023
Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground
ICCV 2023
Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization
AAAI 2023
Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition
ICCV 2023
Equivariant Similarity for Vision-Language Foundation Models
ICCV 2023
Tuning Multi-mode Token-level Prompt Alignment across Modalities
NIPS 2023
Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation
NIPS 2023
Prompt-aligned Gradient for Prompt Tuning
ICCV 2023
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
ICLR 2023
Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion
NIPS 2023
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models
NIPS 2023
Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery
CVPR 2023
Random Boxes Are Open-world Object Detectors
ICCV 2023
Identifying Hard Noise in Long-Tailed Sample Distribution
ECCV 2022
Respecting Transfer Gap in Knowledge Distillation
NIPS 2022
Certified Robustness Against Natural Language Attacks by Causal Intervention
ICML 2022
On Non-Random Missing Labels in Semi-Supervised Learning
ICLR 2022
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
CVPR 2022
Deconfounded Visual Grounding
AAAI 2022
KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base
ACL 2022
Learning to Imagine: Integrating Counterfactual Thinking in Neural Discrete Reasoning
ACL 2022
Cross-Domain Empirical Risk Minimization for Unbiased Long-Tailed Classification
AAAI 2022
Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
ECCV 2022
Invariant Feature Learning for Generalized Long-Tailed Classification
ECCV 2022
Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-of-Distribution Generalization
ECCV 2022
Self-Regulation for Semantic Segmentation
ICCV 2021
Distilling Causal Effect of Data in Class-Incremental Learning
CVPR 2021
Counterfactual VQA: A Cause-Effect Look at Language Bias
CVPR 2021
The Blessings of Unlabeled Background in Untrimmed Videos
CVPR 2021
Causal Attention for Vision-Language Tasks
CVPR 2021
Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion
IJCNLP 2021
Empowering Language Understanding with Counterfactual Reasoning
IJCNLP 2021
Empowering Language Understanding with Counterfactual Reasoning
ACL 2021
Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion
ACL 2021
Self-Supervised Learning Disentangled Group Representation as Feature
NIPS 2021
TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph
EMNLP 2021
Transporting Causal Mechanisms for Unsupervised Domain Adaptation
ICCV 2021
Counterfactual Zero-Shot and Open-Set Visual Recognition
CVPR 2021
Causal Attention for Unbiased Visual Recognition
ICCV 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
ICCV 2021
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
AAAI 2021
Introspective Distillation for Robust Question Answering
NIPS 2021
How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?
NIPS 2021
Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
NIPS 2020
Interventional Few-Shot Learning
NIPS 2020
Counterfactual Samples Synthesizing for Robust Visual Question Answering
CVPR 2020
Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
CVPR 2020
Feature Pyramid Transformer
ECCV 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
CVPR 2020
Stochastic Dynamics for Video Infilling
WACV 2020
Learning to Segment the Tail
CVPR 2020
General Partial Label Learning via Dual Bipartite Graph Autoencoder
AAAI 2020
Unbiased Scene Graph Generation From Biased Training
CVPR 2020
Causal Intervention for Weakly-Supervised Semantic Segmentation
NIPS 2020
Two Causal Principles for Improving Visual Dialog
CVPR 2020
Iterative Context-Aware Graph Inference for Visual Dialog
CVPR 2020
Visual Commonsense R-CNN
CVPR 2020
Learning to Collocate Neural Modules for Image Captioning
ICCV 2019
DeepChannel: Salience Estimation by Contrastive Learning for Extractive Document Summarization
AAAI 2019
Learning to Embed Sentences Using Attentive Recursive Trees
AAAI 2019
Learning to Compose Dynamic Tree Structures for Visual Contexts
CVPR 2019
Recursive Visual Attention in Visual Dialog
CVPR 2019
Explainable and Explicit Visual Reasoning Over Scene Graphs
CVPR 2019
Auto-Encoding Scene Graphs for Image Captioning
CVPR 2019
Learning to Assemble Neural Module Tree Networks for Visual Grounding
ICCV 2019
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
ICCV 2019
Making History Matter: History-Advantage Sequence Training for Visual Dialog
ICCV 2019
Grounding Referring Expressions in Images by Variational Context
CVPR 2018
Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks
NIPS 2018
Multi-Level Policy and Reward Reinforcement Learning for Image Captioning
IJCAI 2018
Discrete Factorization Machines for Fast Feature-based Recommendation
IJCAI 2018
Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks
CVPR 2018
Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features
ECCV 2018
SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning
CVPR 2017
PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN
ICCV 2017
Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
IJCAI 2017
Visual Translation Embedding Network for Visual Relation Detection
CVPR 2017
Online Collaborative Learning for Open-Vocabulary Visual Classifiers
CVPR 2016
Learning Image and User Features for Recommendation in Social Networks
ICCV 2015