Yinfei Yang

69 papers · 2012–2025 · 15 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🏃 Academic Marathon (13) 🌍 Conference Polyglot (15) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (6)

🐝 Cross-Pollinator (6) 🌈 Renaissance Researcher (10) 🗺️ Taxonomy Completionist (92) 🌱 Topic Pioneer 🔬 Deep Specialist (11) 👥 Mega-Team (29) 🏆 Keyword Champion (6) 🧬 Topic Evolution 🤝 Dynamic Duo (16) 🚀 Conference Pioneer 🗃️ Keyword Collector (230) ⚡ Prolific Year (12) 🔥 Unstoppable (9) 📈 Trend Setter 💎 Century Club (69)

Conferences

EMNLP (14) ACL (11) ICLR (9) EACL (6) ICCV (5) NAACL (5) CVPR (4) ECCV (3) ICML (3) IJCNLP (3) WACV (2) AAAI (1) AACL (1) CONLL (1) IJCAI (1)

Top co-authors

Daniel Cer (16) Zhe Gan (15) Bowen Zhang (13) Haotian Zhang (11) Xianzhi Du (9) Jason Baldridge (9) Mandy Guo (9) Noah Constant (8) Yun-Hsuan Sung (7) Forrest Bao (7)

Keywords

transfer learning (9) contrastive learning (7) zero-shot learning (6) dual encoder (6) sentence embedding (6) neural retrieval (5) semantic similarity (5) multimodal learning (5) question answering (5) domain adaptation (4) information retrieval (4) cross-lingual transfer (4) cross-lingual retrieval (4) representation learning (3) neural machine translation (3) information extraction (3) text-to-image generation (3) data augmentation (3) machine translation (3) vision-language model (3)

Papers

Improve Vision Language Model Chain-of-thought Reasoning ACL 2025 MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs ICCV 2025 STIV: Scalable Text and Image Conditioned Video Generation ICCV 2025 MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning ICLR 2025 Contrastive Localized Language-Image Pre-Training ICML 2025 Multimodal Autoregressive Pre-training of Large Vision Encoders CVPR 2025 UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing ICCV 2025 CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling EMNLP 2025 MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA ICLR 2025 MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs ICLR 2025 Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models ICLR 2025 Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms ICLR 2025 Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs ECCV 2024 "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training" ECCV 2024 On the Intractability to Synthesize Factual Inconsistencies in Summarization EACL 2024 Guiding Instruction-based Image Editing via Multimodal Large Language Models ICLR 2024 Ferret: Refer and Ground Anything Anywhere at Any Granularity ICLR 2024 Compressing LLMs: The Truth is Rarely Pure and Never Simple ICLR 2024 MOFI: Learning Image Representations from Noisy Entity Annotated Images ICLR 2024 Empowering Unsupervised Domain Adaptation With Large-Scale Pre-Trained Vision-Language Models WACV 2024 VeCLIP: Improving CLIP Training via Visual-enriched Captions ECCV 2024 Perceptual Grouping in Contrastive Vision-Language Models ICCV 2023 Masked Autoencoding Does Not Help Natural Language Supervision at Scale CVPR 2023 A New Path: Scaling Vision-and-Language Navigation With Synthetic Instructions and Imitation Learning CVPR 2023 STAIR: Learning Sparse Text and Image Representation in Grounded Tokens EMNLP 2023 DocAsRef: An Empirical Study on Repurposing Reference-based Summary Quality Metrics as Reference-free Metrics EMNLP 2023 Simple and Effective Synthesis of Indoor 3D Scenes AAAI 2023 Robustness in Multimodal Learning under Train-Test Modality Mismatch ICML 2023 Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models ACL 2022 Language-agnostic BERT Sentence Embedding ACL 2022 Large Dual Encoders Are Generalizable Retrievers EMNLP 2022 SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling NAACL 2022 LongT5: Efficient Text-To-Text Transformer for Long Sequences NAACL 2022 A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations EMNLP 2021 Multi-stage Training with Improved Negative Contrast for Neural Passage Retrieval EMNLP 2021 Universal Sentence Representation Learning with Conditional Masked Language Model EMNLP 2021 MURAL: Multimodal, Multitask Representations Across Languages EMNLP 2021 Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO EACL 2021 Cross-Modal Contrastive Learning for Text-to-Image Generation CVPR 2021 MultiReQA: A Cross-Domain Evaluation forRetrieval Question Answering Models EACL 2021 Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation EACL 2021 Pathdreamer: A World Model for Indoor Navigation ICCV 2021 Text-to-Image Generation Grounded by Fine-Grained User Attention WACV 2021 Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation IJCNLP 2021 Neural Retrieval for Question Answering with Cross-Attention Supervised Data Augmentation ACL 2021 Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision ICML 2021 Self-Supervised Learning for Pairwise Data Refinement AACL 2020 LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool EMNLP 2020 Multilingual Universal Sentence Encoder for Semantic Retrieval ACL 2020 Learning a Multi-Domain Curriculum for Neural Machine Translation ACL 2020 Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction NAACL 2019 Hierarchical Document Encoder for Parallel Corpus Mining ACL 2019 Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model ACL 2019 PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification EMNLP 2019 ReQA: An Evaluation for End-to-End Answer Retrieval Models EMNLP 2019 PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification IJCNLP 2019 Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax IJCAI 2019 Cross-Domain Review Helpfulness Prediction Based on Convolutional Neural Networks with Auxiliary Domain Discriminators NAACL 2018 Effective Parallel Corpus Mining using Bilingual Sentence Embeddings EMNLP 2018 Learning Semantic Textual Similarity from Conversations ACL 2018 A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature ACL 2018 Syntactic Patterns Improve Information Extraction for Medical Search NAACL 2018 Universal Sentence Encoder for English EMNLP 2018 Aspect Extraction from Product Reviews Using Category Hierarchy Information EACL 2017 Detecting (Un)Important Content for Single-Document News Summarization EACL 2017 Semantic Analysis and Helpfulness Prediction of Text for Online Product Reviews ACL 2015 Semantic Analysis and Helpfulness Prediction of Text for Online Product Reviews IJCNLP 2015 Linking Named Entities to Any Database EMNLP 2012 Linking Named Entities to Any Database CONLL 2012