conftrace_

Jimmy Lin

147 papers · 2000–2026 · 12 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+17 more ↓

🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (11) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11)

🗺️ Taxonomy Completionist (11) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌟 Keyword Trendsetter Combo (6) 🏠 Conference Loyalist (53) 🐺 Lone Wolf (7) 🤝 Dynamic Duo (23) 🏆 Keyword Champion (4) 🔬 Deep Specialist (40) 🏆 Grand Slam 💎 Century Club (144) 📈 Trend Setter 🗃️ Keyword Collector (424) ❓ The Questioner (12) 🚀 Conference Pioneer 🔥 Unstoppable (11) ⚡ Prolific Year (15)

Conferences

EMNLP (53) ACL (33) NAACL (29) IJCNLP (13) COLING (6) AAAI (3) EACL (3) NIPS (3) AACL (1) ICLR (1) ICML (1) SEMEVAL (1)

Top co-authors

Raphael Tang (23) Ji Xin (17) Peng Shi (15) Xueguang Ma (12) Wei Yang (11) Ming Li (10) Sheng-Chieh Lin (10) Minghan Li (9) Yaoliang Yu (9) Xinyu Zhang (9)

Keywords

information retrieval (28) dense retrieval (19) large language model (13) question answering (10) retrieval-augmented generation (9) document retrieval (9) knowledge distillation (9) model compression (9) zero-shot learning (8) document ranking (8) neural network (7) transfer learning (7) low-resource language (6) knowledge graph (6) sparse retrieval (6) cross-lingual transfer (6) passage retrieval (6) text classification (5) neural ranking (5) search engine (5)

Papers

rosaOS: Agentic Operating System for Embodied LLMs ACL 2026 BrowseComp-Plus: A Fair and Disentangled Evaluation Benchmark for Deep Search Agents ACL 2026 QuackIR: Retrieval in DuckDB and Other Relational Database Management Systems EMNLP 2025 Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards EMNLP 2025 Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with LLMs EMNLP 2025 MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS ICLR 2025 AfroBench: How Good are Large Language Models on African Languages? ACL 2025 Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? ACL 2025 Tomato, Tomahto, Tomate: Do Multilingual Language Models Understand Based on Subword-Level Semantic Concepts? NAACL 2025 Zero-Shot ATC Coding with Large Language Models for Clinical Assessments NAACL 2025 MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems NAACL 2025 Can’t Hide Behind the API: Stealing Black-Box Commercial Embedding Models NAACL 2025 UniRAG: Universal Retrieval Augmentation for Large Vision Language Models NAACL 2025 Illusions of Relevance: Arbitrary Content Injection Attacks Deceive Retrievers, Rerankers, and LLM Judges AACL 2025 VISA: Retrieval Augmented Generation with Visual Source Attribution ACL 2025 DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers ACL 2025 Illusions of Relevance: Arbitrary Content Injection Attacks Deceive Retrievers, Rerankers, and LLM Judges IJCNLP 2025 “Knowing When You Don’t Know”: A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation EMNLP 2024 CELI: Simple yet Effective Approach to Enhance Out-of-Domain Generalization of Cross-Encoders. NAACL 2024 Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval NAACL 2024 Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models NAACL 2024 FLAME : Factuality-Aware Alignment for Large Language Models NIPS 2024 Nearest Neighbor Speculative Decoding for LLM Generation and Attribution NIPS 2024 Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification AAAI 2024 EWEK-QA : Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems ACL 2024 Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages ACL 2024 Retrieval Evaluation for Long-Form and Knowledge-Intensive Image–Text Article Composition EMNLP 2024 Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogisms EMNLP 2024 ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models EMNLP 2024 Unifying Multimodal Retrieval via Document Screenshot Embedding EMNLP 2024 Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation EMNLP 2024 PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval EMNLP 2024 mAggretriever: A Simple yet Effective Approach to Zero-Shot Multilingual Dense Retrieval EMNLP 2023 Better Quality Pre-training Data and T5 Models for African Languages EMNLP 2023 How Does Generative Retrieval Scale to Millions of Passages? EMNLP 2023 Precise Zero-Shot Dense Retrieval without Relevance Labels ACL 2023 What the DAAM: Interpreting Stable Diffusion Using Cross Attention ACL 2023 CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval ACL 2023 GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration ACL 2023 Evaluating Embedding APIs for Information Retrieval ACL 2023 Operator Selection and Ordering in a Pipeline Approach to Efficiency Optimizations for Transformers ACL 2023 “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors ACL 2023 Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face EMNLP 2023 How to Train Your Dragon: Diverse Augmentation Towards Generalizable Dense Retrieval EMNLP 2023 AfriTeVA: Extending ?Small Data? Pretraining Approaches to Sequence-to-Sequence Models NAACL 2022 An Encoder Attribution Analysis for Dense Passage Retriever in Open-Domain Question Answering NAACL 2022 Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking EMNLP 2022 AfriCLIRMatrix: Enabling Cross-Lingual Information Retrieval for African Languages EMNLP 2022 Few-Shot Non-Parametric Learning with Deep Latent Variable Model NIPS 2022 SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale EMNLP 2022 Improving Precancerous Case Characterization via Transformer-based Ensemble Learning EMNLP 2022 Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval EMNLP 2022 XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing EMNLP 2022 Cross-lingual Text-to-SQL Semantic Parsing with Representation Mixup EMNLP 2022 Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation with GPT2 IJCNLP 2021 Segatron: Segment-Aware Transformer for Language Modeling and Understanding AAAI 2021 The Art of Abstention: Selective Prediction and Error Regularization for Natural Language Processing ACL 2021 Exploring Listwise Evidence Reasoning with T5 for Fact Verification ACL 2021 Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation with GPT2 ACL 2021 Bag-of-Words Baselines for Semantic Code Search ACL 2021 In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval ACL 2021 BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression EACL 2021 Don’t Change Me! User-Controllable Selective Paraphrase Generation EACL 2021 Scientific Claim Verification with VerT5erini EACL 2021 Voice Query Auto Completion EMNLP 2021 Contextualized Query Embeddings for Conversational Search EMNLP 2021 Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval EMNLP 2021 Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering EMNLP 2021 Unsupervised Chunking as Syntactic Structure Induction with a Knowledge-Transfer Approach EMNLP 2021 How Does BERT Rerank Passages? An Attribution Analysis with Information Bottlenecks EMNLP 2021 Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages EMNLP 2021 Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval EMNLP 2021 Cross-Lingual Training of Dense Retrievers for Document Retrieval EMNLP 2021 Learning to Rank in the Age of Muppets: Effectiveness–Efficiency Tradeoffs in Multi-Stage Ranking EMNLP 2021 The Art of Abstention: Selective Prediction and Error Regularization for Natural Language Processing IJCNLP 2021 Exploring Listwise Evidence Reasoning with T5 for Fact Verification IJCNLP 2021 Bag-of-Words Baselines for Semantic Code Search IJCNLP 2021 In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval IJCNLP 2021 Pretrained Transformers for Text Ranking: BERT and Beyond NAACL 2021 Inserting Information Bottlenecks for Attribution in Transformers EMNLP 2020 Cross-Lingual Training of Neural Models for Document Ranking EMNLP 2020 Document Ranking with a Pretrained Sequence-to-Sequence Model EMNLP 2020 Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models COLING 2020 Generalized and Scalable Optimal Sparse Decision Trees ICML 2020 Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT ACL 2020 Cydex: Neural Search Infrastructure for the Scholarly Literature EMNLP 2020 Early Exiting BERT for Efficient Document Ranking EMNLP 2020 A Little Bit Is Worse Than None: Ranking with Limited Training Data EMNLP 2020 Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset ACL 2020 Two Birds, One Stone: A Simple, Unified Model for Text Generation from Structured and Unstructured Data ACL 2020 Showing Your Work Doesn’t Always Work ACL 2020 DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference ACL 2020 Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset EMNLP 2020 Howl: A Deployed, Open-Source Wake Word Detection System EMNLP 2020 Simple Attention-Based Representation Learning for Ranking Short Social Media Posts NAACL 2019 Rethinking Complex Neural Network Architectures for Document Classification NAACL 2019 Applying BERT to Document Retrieval with Birch IJCNLP 2019 Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting IJCNLP 2019 Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval EMNLP 2019 Aligning Cross-Lingual Entities with Multi-Aspect Information EMNLP 2019 Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling EMNLP 2019 What Part of the Neural Network Does This? Understanding LSTMs by Measuring and Dissecting Neurons EMNLP 2019 Applying BERT to Document Retrieval with Birch EMNLP 2019 Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting EMNLP 2019 Natural Language Generation for Effective Knowledge Distillation EMNLP 2019 Scalable Knowledge Graph Construction from Text Collections EMNLP 2019 End-to-End Open-Domain Question Answering with BERTserini NAACL 2019 Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search AAAI 2019 Detecting Customer Complaint Escalation with Recurrent Neural Networks and Manually-Engineered Features NAACL 2019 Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling EMNLP 2019 Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling IJCNLP 2019 Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval IJCNLP 2019 Aligning Cross-Lingual Entities with Multi-Aspect Information IJCNLP 2019 Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling IJCNLP 2019 What Part of the Neural Network Does This? Understanding LSTMs by Measuring and Dissecting Neurons IJCNLP 2019 Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia COLING 2018 Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks NAACL 2018 Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures NAACL 2018 CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities NAACL 2018 An Insight Extraction System on BioMedical Literature with Deep Neural Networks EMNLP 2017 Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement NAACL 2016 UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement SEMEVAL 2016 Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks EMNLP 2015 Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce ACL 2013 Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs NAACL 2013 NAACL HLT 2013 Tutorial Abstracts NAACL 2013 Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling NAACL 2012 Combining Statistical Translation Techniques for Cross-Language Information Retrieval COLING 2012 Data-Intensive Text Processing with MapReduce NAACL 2010 Putting the User in the Loop: Interactive Maximal Marginal Relevance for Query-Focused Summarization NAACL 2010 Data Intensive Text Processing with MapReduce NAACL 2009 Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce EMNLP 2008 Pairwise Document Similarity in Large Collections with MapReduce ACL 2008 Proceedings of the ACL-08: HLT Demo Session ACL 2008 Different Structures for Evaluating Answers to Complex Questions: Pyramids Won’t Topple, and Neither Will Human Assessors ACL 2007 Is Question Answering Better than Information Retrieval? Towards a Task-Based Evaluation Framework for Question Series NAACL 2007 The Role of Information Retrieval in Answering Complex Questions COLING 2006 The Role of Information Retrieval in Answering Complex Questions ACL 2006 Will Pyramids Built of Nuggets Topple Over? NAACL 2006 Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering ACL 2006 Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering COLING 2006 Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation ACL 2006 Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation COLING 2006 Automatically Evaluating Answers to Definition Questions EMNLP 2005 A Computational Framework for Non-Lexicalist Semantics NAACL 2004 Answering Definition Questions with Multiple Knowledge Sources NAACL 2004 REXTOR: A System for Generating Relations from Natural Language ACL 2000