Suchin Gururangan

24 papers · 2018–2025 · 7 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (7) 🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13)

🐝 Cross-Pollinator (13) 🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (58) 👥 Mega-Team (60) 👑 Triple Crown 🤝 Dynamic Duo (14) 🗃️ Keyword Collector (120) 💎 Century Club (24) 🔥 Unstoppable (5) ❓ The Questioner ⚡ Prolific Year (6)

Conferences

EMNLP (8) ACL (5) NAACL (5) ICLR (2) IJCNLP (2) ICML (1) NIPS (1)

Top co-authors

Noah A. Smith (14) Luke Zettlemoyer (7) Dallas Card (5) Roy Schwartz (4) Jesse Dodge (4) Luca Soldaini (3) Sedrick Keh (2) Jenia Jitsev (2) Achal Dave (2) Nikita Haduong (2)

Keywords

language model (8) transfer learning (4) domain adaptation (4) data selection (3) validation performance (3) web text (2) machine-generated text (2) large language model (2) human evaluation (2) text generation (2) text classification (2) data filtering (2) model robustness (2) experimental methodology (2) hyperparameter search (2) natural language generation (2) toxicity detection (2) language modeling (2) parameter efficiency (2) data curation (2)

Papers

BTS: Harmonizing Specialized Experts into a Generalist LLM EMNLP 2025 Self-Generated Critiques Boost Reward Modeling for Language Models NAACL 2025 Language models scale reliably with over-training and on downstream tasks ICLR 2025 Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models EMNLP 2024 Time is Encoded in the Weights of Finetuned Language Models ACL 2024 AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters ACL 2024 DataComp-LM: In search of the next generation of training sets for language models NIPS 2024 SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore ICLR 2024 LESS: Selecting Influential Data for Targeted Instruction Tuning ICML 2024 Time Waits for No One! Analysis and Challenges of Temporal Misalignment NAACL 2022 M2D2: A Massively Multi-Domain Language Modeling Dataset EMNLP 2022 Nearest Neighbor Zero-Shot Inference EMNLP 2022 DEMix Layers: Disentangling Domains for Modular Language Modeling NAACL 2022 Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection EMNLP 2022 Expected Validation Performance and Estimation of a Random Variable’s Maximum EMNLP 2021 All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text ACL 2021 All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text IJCNLP 2021 Detoxifying Language Models Risks Marginalizing Minority Voices NAACL 2021 RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models EMNLP 2020 Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks ACL 2020 Show Your Work: Improved Reporting of Experimental Results EMNLP 2019 Variational Pretraining for Semi-supervised Text Classification ACL 2019 Show Your Work: Improved Reporting of Experimental Results IJCNLP 2019 Annotation Artifacts in Natural Language Inference Data NAACL 2018