Chunyuan Li

85 papers · 2014–2025 · 13 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🗺️ Taxonomy Completionist (21) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (11) 🗺️ Taxonomy Completionist (21) 🌟 Keyword Trendsetter Combo (5) 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (33) 👥 Mega-Team (71) 🌱 Topic Pioneer 🔬 Deep Specialist (18) 🧬 Topic Evolution 🏆 Keyword Champion (2) 🗃️ Keyword Collector (314) 💎 Century Club (85) 📈 Trend Setter 🔥 Unstoppable (12) ⚡ Prolific Year (9) 🚀 Conference Pioneer

Conferences

NIPS (16) CVPR (14) EMNLP (11) ICLR (8) ACL (6) ECCV (6) AAAI (5) AISTATS (5) ICML (5) IJCNLP (3) NAACL (3) ICCV (2) IJCAI (1)

Top co-authors

Jianfeng Gao (33) Lawrence Carin (24) Changyou Chen (19) Jianwei Yang (19) Zhe Gan (12) Ricardo Henao (11) Pengchuan Zhang (9) Yizhe Zhang (9) Yunchen Pu (8) Lei Zhang (8)

Keywords

zero-shot learning (10) few-shot learning (9) variational autoencoder (8) transfer learning (8) multimodal learning (8) vision-language model (7) object detection (7) generative adversarial network (5) pre-trained language model (5) semantic segmentation (5) adversarial learning (4) text generation (4) instruction following (4) vision transformer (4) semi-supervised learning (4) image classification (4) language modeling (4) image generation (4) convolutional neural network (4) large multimodal model (4)

Papers

LLaVA-Critic: Learning to Evaluate Multimodal Models CVPR 2025 Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward NAACL 2025 MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines ICLR 2025 LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models ICLR 2025 Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning ICLR 2025 Graphic Design with Large Multimodal Model AAAI 2025 MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding ICLR 2025 LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models NAACL 2025 Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment NIPS 2024 LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents ECCV 2024 Segment and Recognize Anything at Any Granularity ECCV 2024 MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts ICLR 2024 Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection ECCV 2024 LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models ECCV 2024 Improved Baselines with Visual Instruction Tuning CVPR 2024 Visual In-Context Prompting CVPR 2024 Aligning Large Multimodal Models with Factually Augmented RLHF ACL 2024 Position: TrustLLM: Trustworthiness in Large Language Models ICML 2024 Visual Instruction Tuning NIPS 2023 GLIGEN: Open-Set Grounded Text-to-Image Generation CVPR 2023 Learning Customized Visual Models With Retrieval-Augmented Knowledge CVPR 2023 A Simple Framework for Open-Vocabulary Segmentation and Detection ICCV 2023 LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day NIPS 2023 Generalized Decoding for Pixel, Image, and Language CVPR 2023 Large Language Models are Visual Reasoning Coordinators NIPS 2023 Scaling Vision-Language Models with Sparse Mixture of Experts EMNLP 2023 Parameter-Efficient Model Adaptation for Vision Transformers AAAI 2023 ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models NIPS 2022 Focal Modulation Networks NIPS 2022 K-LITE: Learning Transferable Visual Models with External Knowledge NIPS 2022 Grounded Language-Image Pre-Training CVPR 2022 RegionCLIP: Region-Based Language-Image Pretraining CVPR 2022 Towards Language-Free Training for Text-to-Image Generation CVPR 2022 Unified Contrastive Learning in Image-Text-Label Space CVPR 2022 Efficient Self-supervised Vision Transformers for Representation Learning ICLR 2022 RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems ACL 2021 Focal Attention for Long-Range Interactions in Vision Transformers NIPS 2021 RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems IJCNLP 2021 Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation ICCV 2021 Rethinking Sentiment Style Transfer EMNLP 2021 Few-Shot Named Entity Recognition: An Empirical Baseline Study EMNLP 2021 Hierarchical Graph Capsule Network AAAI 2021 Partition-Guided GANs CVPR 2021 Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks ECCV 2020 Structure-Aware Human-Action Generation ECCV 2020 Complementary Auxiliary Classifiers for Label-Conditional Text Generation AAAI 2020 Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference EMNLP 2020 Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space EMNLP 2020 POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training EMNLP 2020 Improving Text Generation with Student-Forcing Optimal Transport EMNLP 2020 Few-shot Natural Language Generation for Task-Oriented Dialog EMNLP 2020 Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning ICLR 2020 Feature Quantization Improves GAN Training ICML 2020 Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training CVPR 2020 Robust Navigation with Language Pretraining and Stochastic Sampling EMNLP 2019 Robust Navigation with Language Pretraining and Stochastic Sampling IJCNLP 2019 Adversarial Learning of a Sampler Based on an Unnormalized Distribution AISTATS 2019 Communication-Efficient Stochastic Gradient MCMC for Neural Networks AAAI 2019 Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing NAACL 2019 Implicit Deep Latent Variable Models for Text Generation IJCNLP 2019 Twin Auxilary Classifiers GAN NIPS 2019 DoubleTransfer at MEDIQA 2019: Multi-Source Transfer Learning for Natural Language Understanding in the Medical Domain ACL 2019 Implicit Deep Latent Variable Models for Text Generation EMNLP 2019 Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms ACL 2018 Measuring the Intrinsic Dimension of Objective Landscapes ICLR 2018 Adversarial Time-to-Event Modeling ICML 2018 Continuous-Time Flows for Efficient Inference and Density Estimation ICML 2018 Joint Embedding of Words and Labels for Text Classification ACL 2018 Policy Optimization as Wasserstein Gradient Flows ICML 2018 Learning Structural Weight Uncertainty for Sequential Decision-Making AISTATS 2018 Symmetric Variational Autoencoder and Connections to Adversarial Learning AISTATS 2018 ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching NIPS 2017 Adversarial Symmetric Variational Autoencoder NIPS 2017 Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling ACL 2017 Learning Generic Sentence Representations Using Convolutional Neural Networks EMNLP 2017 VAE Learning via Stein Variational Gradient Descent NIPS 2017 Triangle Generative Adversarial Networks NIPS 2017 A Deep Generative Deconvolutional Image Model AISTATS 2016 Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization AISTATS 2016 Learning Weight Uncertainty With Stochastic Gradient MCMC for Shape Classification CVPR 2016 Bayesian Dictionary Learning with Gaussian Processes and Sigmoid Belief Networks IJCAI 2016 Stochastic Gradient MCMC with Stale Gradients NIPS 2016 Variational Autoencoder for Deep Learning of Images, Labels and Captions NIPS 2016 Deep Temporal Sigmoid Belief Networks for Sequence Modeling NIPS 2015 Persistence-based Structural Recognition CVPR 2014