Jimmy Lin
147 papers · 2000–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (11) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11)
🗺️
Taxonomy Completionist
(11)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🌟
Keyword Trendsetter Combo
(6)
🏠
Conference Loyalist
(53)
🐺
Lone Wolf
(7)
🤝
Dynamic Duo
(23)
🏆
Keyword Champion
(4)
🔬
Deep Specialist
(40)
🏆
Grand Slam
💎
Century Club
(144)
📈
Trend Setter
🗃️
Keyword Collector
(424)
❓
The Questioner
(12)
🚀
Conference Pioneer
🔥
Unstoppable
(11)
⚡
Prolific Year
(15)
Conferences
EMNLP (53)
ACL (33)
NAACL (29)
IJCNLP (13)
COLING (6)
AAAI (3)
EACL (3)
NIPS (3)
AACL (1)
ICLR (1)
ICML (1)
SEMEVAL (1)
Top co-authors
Keywords
information retrieval
(28)
dense retrieval
(19)
large language model
(13)
question answering
(10)
retrieval-augmented generation
(9)
document retrieval
(9)
knowledge distillation
(9)
model compression
(9)
zero-shot learning
(8)
document ranking
(8)
neural network
(7)
transfer learning
(7)
low-resource language
(6)
knowledge graph
(6)
sparse retrieval
(6)
cross-lingual transfer
(6)
passage retrieval
(6)
text classification
(5)
neural ranking
(5)
search engine
(5)
Papers
rosaOS: Agentic Operating System for Embodied LLMs
ACL 2026
BrowseComp-Plus: A Fair and Disentangled Evaluation Benchmark for Deep Search Agents
ACL 2026
QuackIR: Retrieval in DuckDB and Other Relational Database Management Systems
EMNLP 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
EMNLP 2025
Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with LLMs
EMNLP 2025
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
ICLR 2025
AfroBench: How Good are Large Language Models on African Languages?
ACL 2025
Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes?
ACL 2025
Tomato, Tomahto, Tomate: Do Multilingual Language Models Understand Based on Subword-Level Semantic Concepts?
NAACL 2025
Zero-Shot ATC Coding with Large Language Models for Clinical Assessments
NAACL 2025
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
NAACL 2025
Can’t Hide Behind the API: Stealing Black-Box Commercial Embedding Models
NAACL 2025
UniRAG: Universal Retrieval Augmentation for Large Vision Language Models
NAACL 2025
Illusions of Relevance: Arbitrary Content Injection Attacks Deceive Retrievers, Rerankers, and LLM Judges
AACL 2025
VISA: Retrieval Augmented Generation with Visual Source Attribution
ACL 2025
DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers
ACL 2025
Illusions of Relevance: Arbitrary Content Injection Attacks Deceive Retrievers, Rerankers, and LLM Judges
IJCNLP 2025
“Knowing When You Don’t Know”: A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation
EMNLP 2024
CELI: Simple yet Effective Approach to Enhance Out-of-Domain Generalization of Cross-Encoders.
NAACL 2024
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
NAACL 2024
Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models
NAACL 2024
FLAME : Factuality-Aware Alignment for Large Language Models
NIPS 2024
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
NIPS 2024
Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification
AAAI 2024
EWEK-QA : Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems
ACL 2024
Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages
ACL 2024
Retrieval Evaluation for Long-Form and Knowledge-Intensive Image–Text Article Composition
EMNLP 2024
Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogisms
EMNLP 2024
ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models
EMNLP 2024
Unifying Multimodal Retrieval via Document Screenshot Embedding
EMNLP 2024
Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation
EMNLP 2024
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
EMNLP 2024
mAggretriever: A Simple yet Effective Approach to Zero-Shot Multilingual Dense Retrieval
EMNLP 2023
Better Quality Pre-training Data and T5 Models for African Languages
EMNLP 2023
How Does Generative Retrieval Scale to Millions of Passages?
EMNLP 2023
Precise Zero-Shot Dense Retrieval without Relevance Labels
ACL 2023
What the DAAM: Interpreting Stable Diffusion Using Cross Attention
ACL 2023
CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval
ACL 2023
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration
ACL 2023
Evaluating Embedding APIs for Information Retrieval
ACL 2023
Operator Selection and Ordering in a Pipeline Approach to Efficiency Optimizations for Transformers
ACL 2023
“Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
ACL 2023
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
EMNLP 2023
How to Train Your Dragon: Diverse Augmentation Towards Generalizable Dense Retrieval
EMNLP 2023
AfriTeVA: Extending ?Small Data? Pretraining Approaches to Sequence-to-Sequence Models
NAACL 2022
An Encoder Attribution Analysis for Dense Passage Retriever in Open-Domain Question Answering
NAACL 2022
Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking
EMNLP 2022
AfriCLIRMatrix: Enabling Cross-Lingual Information Retrieval for African Languages
EMNLP 2022
Few-Shot Non-Parametric Learning with Deep Latent Variable Model
NIPS 2022
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale
EMNLP 2022
Improving Precancerous Case Characterization via Transformer-based Ensemble Learning
EMNLP 2022
Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval
EMNLP 2022
XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing
EMNLP 2022
Cross-lingual Text-to-SQL Semantic Parsing with Representation Mixup
EMNLP 2022
Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation with GPT2
IJCNLP 2021
Segatron: Segment-Aware Transformer for Language Modeling and Understanding
AAAI 2021
The Art of Abstention: Selective Prediction and Error Regularization for Natural Language Processing
ACL 2021
Exploring Listwise Evidence Reasoning with T5 for Fact Verification
ACL 2021
Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation with GPT2
ACL 2021
Bag-of-Words Baselines for Semantic Code Search
ACL 2021
In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval
ACL 2021
BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression
EACL 2021
Don’t Change Me! User-Controllable Selective Paraphrase Generation
EACL 2021
Scientific Claim Verification with VerT5erini
EACL 2021
Voice Query Auto Completion
EMNLP 2021
Contextualized Query Embeddings for Conversational Search
EMNLP 2021
Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval
EMNLP 2021
Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering
EMNLP 2021
Unsupervised Chunking as Syntactic Structure Induction with a Knowledge-Transfer Approach
EMNLP 2021
How Does BERT Rerank Passages? An Attribution Analysis with Information Bottlenecks
EMNLP 2021
Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages
EMNLP 2021
Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval
EMNLP 2021
Cross-Lingual Training of Dense Retrievers for Document Retrieval
EMNLP 2021
Learning to Rank in the Age of Muppets: Effectiveness–Efficiency Tradeoffs in Multi-Stage Ranking
EMNLP 2021
The Art of Abstention: Selective Prediction and Error Regularization for Natural Language Processing
IJCNLP 2021
Exploring Listwise Evidence Reasoning with T5 for Fact Verification
IJCNLP 2021
Bag-of-Words Baselines for Semantic Code Search
IJCNLP 2021
In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval
IJCNLP 2021
Pretrained Transformers for Text Ranking: BERT and Beyond
NAACL 2021
Inserting Information Bottlenecks for Attribution in Transformers
EMNLP 2020
Cross-Lingual Training of Neural Models for Document Ranking
EMNLP 2020
Document Ranking with a Pretrained Sequence-to-Sequence Model
EMNLP 2020
Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models
COLING 2020
Generalized and Scalable Optimal Sparse Decision Trees
ICML 2020
Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT
ACL 2020
Cydex: Neural Search Infrastructure for the Scholarly Literature
EMNLP 2020
Early Exiting BERT for Efficient Document Ranking
EMNLP 2020
A Little Bit Is Worse Than None: Ranking with Limited Training Data
EMNLP 2020
Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset
ACL 2020
Two Birds, One Stone: A Simple, Unified Model for Text Generation from Structured and Unstructured Data
ACL 2020
Showing Your Work Doesn’t Always Work
ACL 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
ACL 2020
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset
EMNLP 2020
Howl: A Deployed, Open-Source Wake Word Detection System
EMNLP 2020
Simple Attention-Based Representation Learning for Ranking Short Social Media Posts
NAACL 2019
Rethinking Complex Neural Network Architectures for Document Classification
NAACL 2019
Applying BERT to Document Retrieval with Birch
IJCNLP 2019
Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting
IJCNLP 2019
Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval
EMNLP 2019
Aligning Cross-Lingual Entities with Multi-Aspect Information
EMNLP 2019
Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling
EMNLP 2019
What Part of the Neural Network Does This? Understanding LSTMs by Measuring and Dissecting Neurons
EMNLP 2019
Applying BERT to Document Retrieval with Birch
EMNLP 2019
Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting
EMNLP 2019
Natural Language Generation for Effective Knowledge Distillation
EMNLP 2019
Scalable Knowledge Graph Construction from Text Collections
EMNLP 2019
End-to-End Open-Domain Question Answering with BERTserini
NAACL 2019
Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search
AAAI 2019
Detecting Customer Complaint Escalation with Recurrent Neural Networks and Manually-Engineered Features
NAACL 2019
Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling
EMNLP 2019
Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling
IJCNLP 2019
Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval
IJCNLP 2019
Aligning Cross-Lingual Entities with Multi-Aspect Information
IJCNLP 2019
Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling
IJCNLP 2019
What Part of the Neural Network Does This? Understanding LSTMs by Measuring and Dissecting Neurons
IJCNLP 2019
Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia
COLING 2018
Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks
NAACL 2018
Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures
NAACL 2018
CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities
NAACL 2018
An Insight Extraction System on BioMedical Literature with Deep Neural Networks
EMNLP 2017
Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement
NAACL 2016
UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement
SEMEVAL 2016
Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks
EMNLP 2015
Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce
ACL 2013
Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs
NAACL 2013
NAACL HLT 2013 Tutorial Abstracts
NAACL 2013
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling
NAACL 2012
Combining Statistical Translation Techniques for Cross-Language Information Retrieval
COLING 2012
Data-Intensive Text Processing with MapReduce
NAACL 2010
Putting the User in the Loop: Interactive Maximal Marginal Relevance for Query-Focused Summarization
NAACL 2010
Data Intensive Text Processing with MapReduce
NAACL 2009
Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce
EMNLP 2008
Pairwise Document Similarity in Large Collections with MapReduce
ACL 2008
Proceedings of the ACL-08: HLT Demo Session
ACL 2008
Different Structures for Evaluating Answers to Complex Questions: Pyramids Won’t Topple, and Neither Will Human Assessors
ACL 2007
Is Question Answering Better than Information Retrieval? Towards a Task-Based Evaluation Framework for Question Series
NAACL 2007
The Role of Information Retrieval in Answering Complex Questions
COLING 2006
The Role of Information Retrieval in Answering Complex Questions
ACL 2006
Will Pyramids Built of Nuggets Topple Over?
NAACL 2006
Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering
ACL 2006
Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering
COLING 2006
Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation
ACL 2006
Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation
COLING 2006
Automatically Evaluating Answers to Definition Questions
EMNLP 2005
A Computational Framework for Non-Lexicalist Semantics
NAACL 2004
Answering Definition Questions with Multiple Knowledge Sources
NAACL 2004
REXTOR: A System for Generating Relations from Natural Language
ACL 2000