Zhe Gan
112 papers · 2015–2025 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
🐣 Hot Topic Early Bird 🌍 Conference Polyglot (14) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (10)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🌟
Keyword Trendsetter Combo
(5)
🏠
Conference Loyalist
(21)
🤝
Dynamic Duo
(39)
👥
Mega-Team
(29)
🌱
Topic Pioneer
👑
Triple Crown
🔬
Deep Specialist
(19)
🧬
Topic Evolution
🏆
Keyword Champion
🏆
Grand Slam
💎
Century Club
(112)
🚀
Conference Pioneer
🔥
Unstoppable
(11)
❓
The Questioner
⚡
Prolific Year
(11)
📈
Trend Setter
🗃️
Keyword Collector
(393)
Conferences
CVPR (21)
NIPS (17)
EMNLP (14)
ICLR (13)
ACL (8)
ICML (8)
AAAI (7)
ECCV (7)
IJCNLP (6)
AISTATS (4)
ICCV (3)
NAACL (2)
MLHC (1)
WACV (1)
Top co-authors
Keywords
multimodal learning
(14)
model compression
(8)
text generation
(8)
generative adversarial network
(7)
image captioning
(7)
transfer learning
(7)
domain adaptation
(6)
zero-shot learning
(6)
adversarial learning
(6)
variational autoencoder
(6)
semi-supervised learning
(6)
generative model
(6)
representation learning
(6)
reinforcement learning
(5)
knowledge distillation
(5)
contrastive learning
(4)
image generation
(4)
variational inference
(4)
visual question answering
(4)
lottery ticket hypothesis
(4)
Papers
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
ICLR 2025
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
ICCV 2025
Multimodal Autoregressive Pre-training of Large Vision Encoders
CVPR 2025
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
ICLR 2025
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
ICLR 2025
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
ICLR 2025
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
CVPR 2025
Contrastive Localized Language-Image Pre-Training
ICML 2025
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
ICLR 2025
Improve Vision Language Model Chain-of-thought Reasoning
ACL 2025
Guiding Instruction-based Image Editing via Multimodal Large Language Models
ICLR 2024
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
ECCV 2024
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
EMNLP 2024
GRiT: A Generative Region-to-text Transformer for Object Understanding
ECCV 2024
"MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"
ECCV 2024
VeCLIP: Improving CLIP Training via Visual-enriched Captions
ECCV 2024
Compressing LLMs: The Truth is Rarely Pure and Never Simple
ICLR 2024
Ferret: Refer and Ground Anything Anywhere at Any Granularity
ICLR 2024
LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling
CVPR 2023
Prompting GPT-3 To Be Reliable
ICLR 2023
Non-Contrastive Learning Meets Language-Image Pre-Training
CVPR 2023
ReCo: Region-Controlled Text-to-Image Generation
CVPR 2023
An Empirical Study of Multimodal Model Merging
EMNLP 2023
Generalized Decoding for Pixel, Image, and Language
CVPR 2023
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling
CVPR 2023
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
NIPS 2022
An Empirical Study of Training End-to-End Vision-and-Language Transformers
CVPR 2022
Injecting Semantic Concepts Into End-to-End Image Captioning
CVPR 2022
Scaling Up Vision-Language Pre-Training for Image Captioning
CVPR 2022
K-LITE: Learning Transferable Visual Models with External Knowledge
NIPS 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
NIPS 2022
Playing Lottery Tickets with Vision and Language
AAAI 2022
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
AAAI 2022
Efficient Robust Training via Backward Smoothing
AAAI 2022
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
ECCV 2022
SwinBERT: End-to-End Transformers With Sparse Attention for Video Captioning
CVPR 2022
Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
CVPR 2021
Meta Module Network for Compositional Visual Reasoning
WACV 2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
NIPS 2021
Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective
NIPS 2021
The Elastic Lottery Ticket Hypothesis
NIPS 2021
APo-VAE: Text Generation in Hyperbolic Space
NAACL 2021
Cluster-Former: Clustering-based Sparse Transformer for Question Answering
IJCNLP 2021
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding
AAAI 2021
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
IJCNLP 2021
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
ACL 2021
Cluster-Former: Clustering-based Sparse Transformer for Question Answering
ACL 2021
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
ICLR 2021
Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning
ICLR 2021
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
ICCV 2021
Wasserstein Contrastive Representation Distillation
CVPR 2021
Contrastive Distillation on Intermediate Representations for Language Model Compression
EMNLP 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
NIPS 2020
Graph-Driven Generative Models for Heterogeneous Multi-Task Learning
AAAI 2020
What Makes A Good Story? Designing Composite Rewards for Visual Storytelling
AAAI 2020
Improving Adversarial Text Generation by Modeling the Distant Future
ACL 2020
Discourse-Aware Neural Extractive Text Summarization
ACL 2020
Distilling Knowledge Learned in BERT for Text Generation
ACL 2020
Nested-Wasserstein Self-Imitation Learning for Sequence Generation
AISTATS 2020
BachGAN: High-Resolution Image Synthesis From Salient Object Layout
CVPR 2020
Violin: A Large-Scale Dataset for Video-and-Language Inference
CVPR 2020
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
ECCV 2020
UNITER: UNiversal Image-TExt Representation Learning
ECCV 2020
Cross-Thought for Sentence Encoder Pre-training
EMNLP 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
EMNLP 2020
POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training
EMNLP 2020
Hierarchical Graph Network for Multi-hop Question Answering
EMNLP 2020
Multi-Fact Correction in Abstractive Text Summarization
EMNLP 2020
Contextual Text Style Transfer
EMNLP 2020
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
ICLR 2020
Graph Optimal Transport for Cross-Domain Alignment
ICML 2020
CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information
ICML 2020
Patient Knowledge Distillation for BERT Model Compression
IJCNLP 2019
StoryGAN: A Sequential Conditional GAN for Story Visualization
CVPR 2019
Relation-Aware Graph Attention Network for Visual Question Answering
ICCV 2019
Improving Sequence-to-Sequence Learning via Optimal Transport
ICLR 2019
Topic-Guided Variational Auto-Encoder for Text Generation
NAACL 2019
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
ACL 2019
Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation
AAAI 2019
Improving Textual Network Learning with Variational Homophilic Embeddings
NIPS 2019
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
IJCNLP 2019
Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation
CVPR 2019
TIGEr: Text-to-Image Grounding for Image Caption Evaluation
EMNLP 2019
Adversarial Domain Adaptation for Machine Reading Comprehension
EMNLP 2019
Domain Adaptive Text Style Transfer
EMNLP 2019
Patient Knowledge Distillation for BERT Model Compression
EMNLP 2019
Adversarial Domain Adaptation for Machine Reading Comprehension
IJCNLP 2019
Domain Adaptive Text Style Transfer
IJCNLP 2019
AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks
CVPR 2018
Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization
NIPS 2018
Adversarial Text Generation via Feature-Mover's Distance
NIPS 2018
Topic Compositional Neural Language Model
AISTATS 2018
Multi-Label Learning from Medical Plain Text with Convolutional Residual Models
MLHC 2018
JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets
ICML 2018
Adversarial Feature Matching for Text Generation
ICML 2017
Adversarial Symmetric Variational Autoencoder
NIPS 2017
Learning Generic Sentence Representations Using Convolutional Neural Networks
EMNLP 2017
Triangle Generative Adversarial Networks
NIPS 2017
Stochastic Gradient Monomial Gamma Sampler
ICML 2017
Deconvolutional Paragraph Representation Learning
NIPS 2017
VAE Learning via Stein Variational Gradient Descent
NIPS 2017
Semantic Compositional Networks for Visual Captioning
CVPR 2017
StyleNet: Generating Attractive Visual Captions With Styles
CVPR 2017
Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling
ACL 2017
Variational Autoencoder for Deep Learning of Images, Labels and Captions
NIPS 2016
Factored Temporal Sigmoid Belief Networks for Sequence Learning
ICML 2016
Learning Weight Uncertainty With Stochastic Gradient MCMC for Shape Classification
CVPR 2016
Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization
AISTATS 2016
Deep Temporal Sigmoid Belief Networks for Sequence Modeling
NIPS 2015
Deep Poisson Factor Modeling
NIPS 2015
Learning Deep Sigmoid Belief Networks with Data Augmentation
AISTATS 2015
Scalable Deep Poisson Factor Analysis for Topic Modeling
ICML 2015