Kyle Lo
53 papers · 2018–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Conference Polyglot (11) π Academic Marathon (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (12)
π
Renaissance Researcher
(9)
π£
Hot Topic Early Bird
π
Conference Polyglot
(11)
π€
Dynamic Duo
(20)
π
Grand Slam
π₯
Mega-Team
(60)
π¬
Deep Specialist
(13)
π
Century Club
(52)
β
The Questioner
(2)
π
Trend Setter
β‘
Prolific Year
(7)
ποΈ
Keyword Collector
(226)
π₯
Unstoppable
(8)
Conferences
EMNLP (15)
ACL (13)
NAACL (9)
NIPS (5)
EACL (3)
ICLR (2)
IJCNLP (2)
AAAI (1)
COLING (1)
CVPR (1)
ICML (1)
Top co-authors
Research topics
Keywords
information retrieval
(12)
large language model
(12)
scientific document
(5)
question answering
(5)
text generation
(5)
language model
(5)
text classification
(4)
information extraction
(4)
domain adaptation
(4)
document summarization
(4)
fact checking
(3)
document understanding
(3)
data curation
(3)
multi-document summarization
(3)
text summarization
(3)
scientific literature
(3)
natural language processing
(3)
claim verification
(3)
multimodal learning
(3)
instruction following
(3)
Papers
The olmOCR Project: Building Fully Open OCR using VLMs
ACL 2026
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
Human-AI Collaboration: How AIs Augment Human Teammates
ACL 2025
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
EMNLP 2025
Intent-aware Schema Generation and Refinement for Literature Review Tables
EMNLP 2025
RouterRetriever: Routing over a Mixture of Expert Embedding Models
AAAI 2025
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
NAACL 2025
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Studentsβ Hand-Drawn Math Images
NAACL 2025
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
ICML 2025
OLMoE: Open Mixture-of-Experts Language Models
ICLR 2025
One Thousand and One Pairs: A βnovelβ challenge for long-context language models
EMNLP 2024
DataComp-LM: In search of the next generation of training sets for language models
NIPS 2024
Paloma: A Benchmark for Evaluating Language Model Fit
NIPS 2024
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
ACL 2024
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
ACL 2024
OLMo: Accelerating the Science of Language Models
ACL 2024
KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions
ACL 2024
When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets
EACL 2024
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
ICLR 2024
ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models
EMNLP 2024
MathFish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula
EMNLP 2024
A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents
EMNLP 2023
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
EMNLP 2023
Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
ACL 2023
Decomposing Complex Queries for Tip-of-the-tongue Retrieval
EMNLP 2023
Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval
EMNLP 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
EACL 2023
Overview of the Third Workshop on Scholarly Document Processing
COLING 2022
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
ACL 2022
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
NIPS 2022
MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting
NAACL 2022
Multi-LexSum: Real-world Summaries of Civil Rights Lawsuits at Multiple Granularities
NIPS 2022
MultiVerS: Improving scientific claim verification with weak supervision and full-document context
NAACL 2022
ACCoRD: A Multi-Document Approach to Generating Diverse Descriptions of Scientific Concepts
EMNLP 2022
SciFact-Open: Towards open-domain scientific claim verification
EMNLP 2022
Discourse Understanding and Factual Consistency in Abstractive Summarization
EACL 2021
Explaining Relationships Between Scientific Documents
ACL 2021
FLEX: Unifying Evaluation for Few-Shot NLP
NIPS 2021
Explaining Relationships Between Scientific Documents
IJCNLP 2021
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers
NAACL 2021
Overview and Insights from the SCIVER shared task on Scientific Claim Verification
NAACL 2021
Overview of the Second Workshop on Scholarly Document Processing
NAACL 2021
CORD-19: The COVID-19 Open Research Dataset
ACL 2020
Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
EMNLP 2020
TLDR: Extreme Summarization of Scientific Documents
EMNLP 2020
Donβt Stop Pretraining: Adapt Language Models to Domains and Tasks
ACL 2020
S2ORC: The Semantic Scholar Open Research Corpus
ACL 2020
Fact or Fiction: Verifying Scientific Claims
EMNLP 2020
SciBERT: A Pretrained Language Model for Scientific Text
IJCNLP 2019
Combining Distant and Direct Supervision for Neural Relation Extraction
NAACL 2019
SciBERT: A Pretrained Language Model for Scientific Text
EMNLP 2019
Ontology alignment in the biomedical domain using entity definitions and context
ACL 2018
Construction of the Literature Graph in Semantic Scholar
NAACL 2018