Jesse Dodge
37 papers · 2012–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π Conference Polyglot (9) π§ Keyword Pioneer π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (12) π Academic Marathon (13)
π
Academic Marathon
(13)
π
Cross-Pollinator
(12)
π
Renaissance Researcher
(7)
π₯
Mega-Team
(43)
π
Keyword Champion
(2)
π€
Dynamic Duo
(16)
π₯
Unstoppable
(7)
π
Trend Setter
ποΈ
Keyword Collector
(163)
π
Century Club
(37)
β
The Questioner
(2)
β‘
Prolific Year
(5)
Conferences
ACL (11)
EMNLP (9)
NAACL (5)
NIPS (3)
EACL (2)
ICLR (2)
ICML (2)
IJCNLP (2)
SEMEVAL (1)
Top co-authors
Keywords
language model
(7)
large language model
(4)
domain adaptation
(3)
training datum
(3)
validation performance
(3)
text classification
(3)
experimental methodology
(3)
pretrained language model
(2)
parameter efficiency
(2)
human evaluation
(2)
natural language processing
(2)
data quality
(2)
hyperparameter optimization
(2)
model comparison
(2)
parameter-efficient fine-tuning
(2)
natural language inference
(2)
group lasso
(2)
web corpus
(2)
corpus construction
(2)
model merging
(2)
Papers
Holistically Evaluating the Environmental Impact of Creating Language Models
ICLR 2025
DataDecide: How to Predict Best Pretraining Data with Small Experiments
ICML 2025
OLMES: A Standard for Language Model Evaluations
NAACL 2025
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
ACL 2025
OLMo: Accelerating the Science of Language Models
ACL 2024
Paloma: A Benchmark for Evaluating Language Model Fit
NIPS 2024
Language Models Hallucinate, but May Excel at Fact Verification
NAACL 2024
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging
EMNLP 2024
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
EMNLP 2024
What's In My Big Data?
ICLR 2024
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
ACL 2024
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
ACL 2024
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
EACL 2023
Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications
ACL 2023
Stubborn Lexical Bias in Data and Models
ACL 2023
Reproducibility in NLP: What Have We Learned from the Checklist?
ACL 2023
Detecting Personal Information in Training Corpora: an Analysis
ACL 2023
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
NIPS 2023
ACL Tutorial Proposal: Towards Reproducible Machine Learning Research in Natural Language Processing
ACL 2022
Findings of the WMTβ22 Shared Task on Large-Scale Machine Translation Evaluation for African Languages
EMNLP 2022
Efficient Hierarchical Domain Adaptation for Pretrained Language Models
NAACL 2022
Staged Training for Transformer Language Models
ICML 2022
Modeling the Machine Learning Multiverse
NIPS 2022
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
EMNLP 2021
Competency Problems: On Finding and Removing Artifacts in Language Data
EMNLP 2021
Expected Validation Performance and Estimation of a Random Variableβs Maximum
EMNLP 2021
The Right Tool for the Job: Matching Model and Instance Complexities
ACL 2020
Show Your Work: Improved Reporting of Experimental Results
EMNLP 2019
RNN Architecture Learning with Sparse Regularization
EMNLP 2019
RNN Architecture Learning with Sparse Regularization
IJCNLP 2019
Show Your Work: Improved Reporting of Experimental Results
IJCNLP 2019
Key-Value Memory Networks for Directly Reading Documents
EMNLP 2016
Retrofitting Word Vectors to Semantic Lexicons
NAACL 2015
Context-dependent Semantic Parsing for Time Expressions
ACL 2014
CMU: Arc-Factored, Discriminative Semantic Dependency Parsing
SEMEVAL 2014
Detecting Visual Text
NAACL 2012
Midge: Generating Image Descriptions From Computer Vision Detections
EACL 2012