Suchin Gururangan
24 papers · 2018–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
๐ Conference Polyglot (7) ๐ Academic Marathon (7) ๐งญ Keyword Pioneer ๐ Interdisciplinary Bridge ๐ Cross-Pollinator (13)
๐
Cross-Pollinator
(13)
๐
Renaissance Researcher
(5)
๐บ๏ธ
Taxonomy Completionist
(58)
๐ฅ
Mega-Team
(60)
๐
Triple Crown
๐ค
Dynamic Duo
(14)
๐๏ธ
Keyword Collector
(120)
๐
Century Club
(24)
๐ฅ
Unstoppable
(5)
โ
The Questioner
โก
Prolific Year
(6)
Conferences
EMNLP (8)
ACL (5)
NAACL (5)
ICLR (2)
IJCNLP (2)
ICML (1)
NIPS (1)
Top co-authors
Keywords
language model
(8)
transfer learning
(4)
domain adaptation
(4)
data selection
(3)
validation performance
(3)
web text
(2)
machine-generated text
(2)
large language model
(2)
human evaluation
(2)
text generation
(2)
text classification
(2)
data filtering
(2)
model robustness
(2)
experimental methodology
(2)
hyperparameter search
(2)
natural language generation
(2)
toxicity detection
(2)
language modeling
(2)
parameter efficiency
(2)
data curation
(2)
Papers
BTS: Harmonizing Specialized Experts into a Generalist LLM
EMNLP 2025
Self-Generated Critiques Boost Reward Modeling for Language Models
NAACL 2025
Language models scale reliably with over-training and on downstream tasks
ICLR 2025
Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
EMNLP 2024
Time is Encoded in the Weights of Finetuned Language Models
ACL 2024
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
ACL 2024
DataComp-LM: In search of the next generation of training sets for language models
NIPS 2024
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
ICLR 2024
LESS: Selecting Influential Data for Targeted Instruction Tuning
ICML 2024
Time Waits for No One! Analysis and Challenges of Temporal Misalignment
NAACL 2022
M2D2: A Massively Multi-Domain Language Modeling Dataset
EMNLP 2022
Nearest Neighbor Zero-Shot Inference
EMNLP 2022
DEMix Layers: Disentangling Domains for Modular Language Modeling
NAACL 2022
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
EMNLP 2022
Expected Validation Performance and Estimation of a Random Variableโs Maximum
EMNLP 2021
All Thatโs โHumanโ Is Not Gold: Evaluating Human Evaluation of Generated Text
ACL 2021
All Thatโs โHumanโ Is Not Gold: Evaluating Human Evaluation of Generated Text
IJCNLP 2021
Detoxifying Language Models Risks Marginalizing Minority Voices
NAACL 2021
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
EMNLP 2020
Donโt Stop Pretraining: Adapt Language Models to Domains and Tasks
ACL 2020
Show Your Work: Improved Reporting of Experimental Results
EMNLP 2019
Variational Pretraining for Semi-supervised Text Classification
ACL 2019
Show Your Work: Improved Reporting of Experimental Results
IJCNLP 2019
Annotation Artifacts in Natural Language Inference Data
NAACL 2018