Kyle Lo

53 papers · 2018–2026 · 11 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌍 Conference Polyglot (11) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)

🌈 Renaissance Researcher (9) 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (11) 🤝 Dynamic Duo (20) 🏆 Grand Slam 👥 Mega-Team (60) 🔬 Deep Specialist (13) 💎 Century Club (52) ❓ The Questioner (2) 📈 Trend Setter ⚡ Prolific Year (7) 🗃️ Keyword Collector (226) 🔥 Unstoppable (8)

Conferences

EMNLP (15) ACL (13) NAACL (9) NIPS (5) EACL (3) ICLR (2) IJCNLP (2) AAAI (1) COLING (1) CVPR (1) ICML (1)

Top co-authors

Arman Cohan (20) Luca Soldaini (18) Iz Beltagy (13) Lucy Lu Wang (10) Hannaneh Hajishirzi (10) David Wadden (9) Doug Downey (8) Dirk Groeneveld (7) Noah A. Smith (7) Bailey Kuehl (7)

Research topics

Digital Humanities (1)

Keywords

information retrieval (12) large language model (12) scientific document (5) question answering (5) text generation (5) language model (5) text classification (4) information extraction (4) domain adaptation (4) document summarization (4) fact checking (3) document understanding (3) data curation (3) multi-document summarization (3) text summarization (3) scientific literature (3) natural language processing (3) claim verification (3) multimodal learning (3) instruction following (3)

Papers

The olmOCR Project: Building Fully Open OCR using VLMs ACL 2026 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models CVPR 2025 Human-AI Collaboration: How AIs Augment Human Teammates ACL 2025 SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature EMNLP 2025 Intent-aware Schema Generation and Refinement for Literature Review Tables EMNLP 2025 RouterRetriever: Routing over a Mixture of Expert Embedding Models AAAI 2025 FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions NAACL 2025 DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images NAACL 2025 Organize the Web: Constructing Domains Enhances Pre-Training Data Curation ICML 2025 OLMoE: Open Mixture-of-Experts Language Models ICLR 2025 One Thousand and One Pairs: A “novel” challenge for long-context language models EMNLP 2024 DataComp-LM: In search of the next generation of training sets for language models NIPS 2024 Paloma: A Benchmark for Evaluating Language Model Fit NIPS 2024 InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification ACL 2024 Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research ACL 2024 OLMo: Accelerating the Science of Language Models ACL 2024 KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions ACL 2024 When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets EACL 2024 BooookScore: A systematic exploration of book-length summarization in the era of LLMs ICLR 2024 ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models EMNLP 2024 MathFish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula EMNLP 2024 A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents EMNLP 2023 PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents EMNLP 2023 Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents ACL 2023 Decomposing Complex Queries for Tip-of-the-tongue Retrieval EMNLP 2023 Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval EMNLP 2023 LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization EACL 2023 Overview of the Third Workshop on Scholarly Document Processing COLING 2022 Generating Scientific Claims for Zero-Shot Scientific Fact Checking ACL 2022 The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset NIPS 2022 MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting NAACL 2022 Multi-LexSum: Real-world Summaries of Civil Rights Lawsuits at Multiple Granularities NIPS 2022 MultiVerS: Improving scientific claim verification with weak supervision and full-document context NAACL 2022 ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts EMNLP 2022 SciFact-Open: Towards open-domain scientific claim verification EMNLP 2022 Discourse Understanding and Factual Consistency in Abstractive Summarization EACL 2021 Explaining Relationships Between Scientific Documents ACL 2021 FLEX: Unifying Evaluation for Few-Shot NLP NIPS 2021 Explaining Relationships Between Scientific Documents IJCNLP 2021 A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers NAACL 2021 Overview and Insights from the SCIVER shared task on Scientific Claim Verification NAACL 2021 Overview of the Second Workshop on Scholarly Document Processing NAACL 2021 CORD-19: The COVID-19 Open Research Dataset ACL 2020 Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions EMNLP 2020 TLDR: Extreme Summarization of Scientific Documents EMNLP 2020 Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks ACL 2020 S2ORC: The Semantic Scholar Open Research Corpus ACL 2020 Fact or Fiction: Verifying Scientific Claims EMNLP 2020 SciBERT: A Pretrained Language Model for Scientific Text IJCNLP 2019 Combining Distant and Direct Supervision for Neural Relation Extraction NAACL 2019 SciBERT: A Pretrained Language Model for Scientific Text EMNLP 2019 Ontology alignment in the biomedical domain using entity definitions and context ACL 2018 Construction of the Literature Graph in Semantic Scholar NAACL 2018