Adina Williams

55 papers · 2018–2026 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (9) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (11)

🌈 Renaissance Researcher (9) 🐝 Cross-Pollinator (11) 🌍 Conference Polyglot (9) 🏠 Conference Loyalist (21) 🤝 Dynamic Duo (17) 👥 Mega-Team (47) 🧬 Topic Evolution 🏆 Keyword Champion (3) 📈 Trend Setter 🗃️ Keyword Collector (248) ⚡ Prolific Year (5) ❓ The Questioner (4) 🔥 Unstoppable (8) 💎 Century Club (54)

Conferences

EMNLP (21) ACL (15) NAACL (6) CONLL (4) NIPS (4) IJCNLP (2) AAAI (1) COLING (1) CVPR (1)

Top co-authors

Ryan Cotterell (17) Douwe Kiela (12) Koustuv Sinha (7) Dieuwke Hupkes (6) Tristan Thrush (5) Max Bartolo (5) Josef Valvoda (5) Robin Jia (5) Alex Warstadt (5) Candace Ross (5)

Keywords

natural language inference (8) language model (7) benchmark evaluation (6) natural language processing (4) grammatical gender (4) large language model (4) model evaluation (4) language modeling (3) dependency parsing (3) bias detection (3) information theory (3) model ranking (3) mutual information (3) representation learning (3) data augmentation (3) text generation (2) neural machine translation (2) zero-shot learning (2) shortcut learning (2) in-context learning (2)

Papers

Calibrating LLM Judges: Linear Probes for Fast and Reliable Uncertainty Estimation ACL 2026 Domain Regeneration: How well do LLMs match syntactic properties of text domains? ACL 2025 On the Role of Speech Data in Reducing Toxicity Detection Bias NAACL 2025 Arbiters of Ambivalence: Challenges of using LLMs in No-Consensus tasks ACL 2025 Findings of the Third BabyLM Challenge: Accelerating Language Modeling Research with Cognitively Plausible Data EMNLP 2025 Improving Model Evaluation using SMART Filtering of Benchmark Datasets NAACL 2025 EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models EMNLP 2024 Are Female Carpenters like Blue Bananas? A Corpus Investigation of Occupation Gender Typicality ACL 2024 Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora CONLL 2024 The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models NIPS 2024 The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More NIPS 2024 ROBBIE: Robust Bias Evaluation of Large Generative Language Models EMNLP 2023 A Latent-Variable Model for Intrinsic Probing AAAI 2023 DataPerf: Benchmarks for Data-Centric AI Development NIPS 2023 Language model acceptability judgements are not always robust to context ACL 2023 The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks CONLL 2023 Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora CONLL 2023 The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages EMNLP 2023 The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks EMNLP 2023 Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora EMNLP 2023 Robustness of Named-Entity Replacements for In-Context Learning EMNLP 2023 Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality CVPR 2022 The Curious Case of Absolute Position Embeddings EMNLP 2022 Analyzing Dynamic Adversarial Training Data in the Limit ACL 2022 Perturbation Augmentation for Fairer NLP EMNLP 2022 Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks ACL 2022 Investigating Failures of Automatic Translationin the Case of Unambiguous Gender ACL 2022 On the Machine Learning of Ethical Judgments from Natural Language NAACL 2022 Benchmarking Compositionality with Formal Languages COLING 2022 “I’m sorry to hear that”: Finding New Biases in Language Models with a Holistic Descriptor Dataset EMNLP 2022 Sometimes We Want Ungrammatical Translations EMNLP 2021 UnNatural Language Inference ACL 2021 Generalising to German Plural Noun Classes, from the Perspective of a Recurrent Neural Network CONLL 2021 Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little EMNLP 2021 Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking NIPS 2021 To what extent do human explanations of model behavior align with actual model behavior? EMNLP 2021 Generalising to German Plural Noun Classes, from the Perspective of a Recurrent Neural Network EMNLP 2021 UnNatural Language Inference IJCNLP 2021 Dynabench: Rethinking Benchmarking in NLP NAACL 2021 Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions EMNLP 2020 Multi-Dimensional Gender Bias Classification EMNLP 2020 Intrinsic Probing through Dimension Selection EMNLP 2020 Information-Theoretic Probing for Linguistic Structure ACL 2020 A Tale of a Probe and a Parser ACL 2020 Predicting Declension Class from Form and Meaning ACL 2020 Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation EMNLP 2020 Pareto Probing: Trading Off Accuracy for Complexity EMNLP 2020 SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection ACL 2020 Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition ACL 2020 Adversarial NLI: A New Benchmark for Natural Language Understanding ACL 2020 Quantifying the Semantic Core of Gender Systems EMNLP 2019 On the Idiosyncrasies of the Mandarin Chinese Classifier System NAACL 2019 Quantifying the Semantic Core of Gender Systems IJCNLP 2019 XNLI: Evaluating Cross-lingual Sentence Representations EMNLP 2018 A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference NAACL 2018