Linguistically-Informed Evaluation of LLMs on Acceptability Judgments in a Forced-Choice Paradigm

Ziyue Liu; Nils Reiter

2026 ACL ACL 2026

Linguistically-Informed Evaluation of LLMs on Acceptability Judgments in a Forced-Choice Paradigm

Abstract

AbstractEvaluating the grammatical abilities of large language models (LLMs) is important for both NLP and linguistic theory. We investigate the ability of large language models (LLMs) to perform acceptability judgments in a forced-choice paradigm. We evaluate a subset of LLMs on 150 minimal sentence pairs sampled from Linguistic Inquiry and categorized using BLiMP linguistic phenomena. Our results show that while LLMs approximate human judgments, performance varies across models and phenomenon types, with stronger alignment on morphosyntactic phenomena than on linguistically and semantically demanding phenomena. Prompting strategies have minimal impact.

Authors

Ziyue Liu , Nils Reiter

Topics

Interdisciplinary > Linguistics > Computational Linguistics Artificial Intelligence > Core AI > Natural Language Processing Natural Language Processing > Applications > Evaluation

Keywords

large language model acceptability judgment grammatical ability forced-choice paradigm morphosyntactic phenomenon

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026