Catherine Arnett
8 papers · 2023–2026 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+1 more ↓ Show less ↑
π Conference Polyglot (4) π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (16) π§ Keyword Pioneer π Cross-Pollinator (15)
β
The Questioner
(2)
Conferences
EMNLP (3)
ACL (2)
COLING (2)
NAACL (1)
Top co-authors
Keywords
grammatical representation
(2)
language modeling
(2)
structural priming
(2)
multilingual language model
(2)
linear regression
(1)
low-resource language
(1)
subword tokenization
(1)
human annotation
(1)
multilingual model
(1)
syntactic structure
(1)
morphological analysis
(1)
high-resource language
(1)
text compression
(1)
multilingual dataset
(1)
byte-pair encoding
(1)
agglutinative language
(1)
model capacity
(1)
multilingual capability
(1)
multilingual corpus
(1)
bilingual language model
(1)
Papers
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
ACL 2026
Why do language models perform worse for morphologically complex languages?
COLING 2025
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models
ACL 2025
When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages
EMNLP 2024
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
EMNLP 2024
Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement
NAACL 2024
A Bit of a Problem: Measurement Disparities in Dataset Sizes across Languages
COLING 2024
Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models
EMNLP 2023