Adina Williams
55 papers · 2018–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Conference Polyglot (9) π Academic Marathon (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (11)
π
Renaissance Researcher
(9)
π
Cross-Pollinator
(11)
π
Conference Polyglot
(9)
π
Conference Loyalist
(21)
π€
Dynamic Duo
(17)
π₯
Mega-Team
(47)
π§¬
Topic Evolution
π
Keyword Champion
(3)
π
Trend Setter
ποΈ
Keyword Collector
(248)
β‘
Prolific Year
(5)
β
The Questioner
(4)
π₯
Unstoppable
(8)
π
Century Club
(54)
Conferences
EMNLP (21)
ACL (15)
NAACL (6)
CONLL (4)
NIPS (4)
IJCNLP (2)
AAAI (1)
COLING (1)
CVPR (1)
Top co-authors
Keywords
natural language inference
(8)
language model
(7)
benchmark evaluation
(6)
natural language processing
(4)
grammatical gender
(4)
large language model
(4)
model evaluation
(4)
language modeling
(3)
dependency parsing
(3)
bias detection
(3)
information theory
(3)
model ranking
(3)
mutual information
(3)
representation learning
(3)
data augmentation
(3)
text generation
(2)
neural machine translation
(2)
zero-shot learning
(2)
shortcut learning
(2)
in-context learning
(2)
Papers
Calibrating LLM Judges: Linear Probes for Fast and Reliable Uncertainty Estimation
ACL 2026
Domain Regeneration: How well do LLMs match syntactic properties of text domains?
ACL 2025
On the Role of Speech Data in Reducing Toxicity Detection Bias
NAACL 2025
Arbiters of Ambivalence: Challenges of using LLMs in No-Consensus tasks
ACL 2025
Findings of the Third BabyLM Challenge: Accelerating Language Modeling Research with Cognitively Plausible Data
EMNLP 2025
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
NAACL 2025
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models
EMNLP 2024
Are Female Carpenters like Blue Bananas? A Corpus Investigation of Occupation Gender Typicality
ACL 2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
CONLL 2024
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
NIPS 2024
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
NIPS 2024
ROBBIE: Robust Bias Evaluation of Large Generative Language Models
EMNLP 2023
A Latent-Variable Model for Intrinsic Probing
AAAI 2023
DataPerf: Benchmarks for Data-Centric AI Development
NIPS 2023
Language model acceptability judgements are not always robust to context
ACL 2023
The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks
CONLL 2023
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
CONLL 2023
The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages
EMNLP 2023
The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks
EMNLP 2023
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
EMNLP 2023
Robustness of Named-Entity Replacements for In-Context Learning
EMNLP 2023
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
CVPR 2022
The Curious Case of Absolute Position Embeddings
EMNLP 2022
Analyzing Dynamic Adversarial Training Data in the Limit
ACL 2022
Perturbation Augmentation for Fairer NLP
EMNLP 2022
Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
ACL 2022
Investigating Failures of Automatic Translationin the Case of Unambiguous Gender
ACL 2022
On the Machine Learning of Ethical Judgments from Natural Language
NAACL 2022
Benchmarking Compositionality with Formal Languages
COLING 2022
βIβm sorry to hear thatβ: Finding New Biases in Language Models with a Holistic Descriptor Dataset
EMNLP 2022
Sometimes We Want Ungrammatical Translations
EMNLP 2021
UnNatural Language Inference
ACL 2021
Generalising to German Plural Noun Classes, from the Perspective of a Recurrent Neural Network
CONLL 2021
Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little
EMNLP 2021
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
NIPS 2021
To what extent do human explanations of model behavior align with actual model behavior?
EMNLP 2021
Generalising to German Plural Noun Classes, from the Perspective of a Recurrent Neural Network
EMNLP 2021
UnNatural Language Inference
IJCNLP 2021
Dynabench: Rethinking Benchmarking in NLP
NAACL 2021
Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions
EMNLP 2020
Multi-Dimensional Gender Bias Classification
EMNLP 2020
Intrinsic Probing through Dimension Selection
EMNLP 2020
Information-Theoretic Probing for Linguistic Structure
ACL 2020
A Tale of a Probe and a Parser
ACL 2020
Predicting Declension Class from Form and Meaning
ACL 2020
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation
EMNLP 2020
Pareto Probing: Trading Off Accuracy for Complexity
EMNLP 2020
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
ACL 2020
Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition
ACL 2020
Adversarial NLI: A New Benchmark for Natural Language Understanding
ACL 2020
Quantifying the Semantic Core of Gender Systems
EMNLP 2019
On the Idiosyncrasies of the Mandarin Chinese Classifier System
NAACL 2019
Quantifying the Semantic Core of Gender Systems
IJCNLP 2019
XNLI: Evaluating Cross-lingual Sentence Representations
EMNLP 2018
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
NAACL 2018