Aaron Mueller
42 papers · 2019–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π Academic Marathon (6) π Conference Polyglot (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (9)
π
Renaissance Researcher
(5)
π
Conference Polyglot
(7)
π
Academic Marathon
(6)
π€
Dynamic Duo
(14)
π₯
Mega-Team
(23)
π¬
Deep Specialist
(13)
π§¬
Topic Evolution
ποΈ
Keyword Collector
(139)
β
The Questioner
β‘
Prolific Year
(5)
π₯
Unstoppable
(7)
π
Century Club
(36)
Conferences
ACL (14)
EMNLP (8)
NAACL (5)
CONLL (4)
ICLR (4)
IJCNLP (4)
EACL (2)
ICML (1)
Top co-authors
Keywords
sparse autoencoder
(6)
language model
(6)
large language model
(4)
multilingual language model
(4)
mechanistic interpretability
(3)
inductive bia
(3)
neural network
(3)
subject-verb agreement
(3)
syntactic agreement
(3)
language modeling
(3)
narrative generation
(2)
text generation
(2)
pretrained language model
(2)
masked language model
(2)
cross-lingual transfer
(2)
in-context learning
(2)
few-shot learning
(2)
model interpretability
(2)
neural network interpretability
(2)
computational linguistics
(2)
Papers
Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?
EACL 2026
From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts?
ACL 2026
Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate
ACL 2026
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
ACL 2026
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
ACL 2026
Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection
EACL 2026
MIB: A Mechanistic Interpretability Benchmark
ICML 2025
Position-aware Automatic Circuit Discovery
ACL 2025
SAEs Are Good for Steering β If You Select the Right Features
EMNLP 2025
Findings of the Third BabyLM Challenge: Accelerating Language Modeling Research with Cognitively Plausible Data
EMNLP 2025
Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models
EMNLP 2025
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
ICLR 2025
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
ICLR 2025
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
ICLR 2025
Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models
NAACL 2025
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
NAACL 2025
Characterizing the Role of Similarity in the Property Inferences of Language Models
NAACL 2025
In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
NAACL 2024
Developmentally Plausible Multimodal Language Models Are Highly Modular
CONLL 2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
CONLL 2024
Function Vectors in Large Language Models
ICLR 2024
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
EMNLP 2023
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
CONLL 2023
How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases
ACL 2023
Meta-training with Demonstration Retrieval for Efficient Few-shot Learning
ACL 2023
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
ACL 2023
Language model acceptability judgements are not always robust to context
ACL 2023
Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models
EMNLP 2022
Label Semantic Aware Pre-training for Few-shot Text Classification
ACL 2022
Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models
ACL 2022
Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models
CONLL 2022
Bernice: A Multilingual Pre-trained Encoder for Twitter
EMNLP 2022
Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic Modeling
NAACL 2021
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
ACL 2021
Decoding Methods for Neural Narrative Generation
ACL 2021
Decoding Methods for Neural Narrative Generation
IJCNLP 2021
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
IJCNLP 2021
Cross-Linguistic Syntactic Evaluation of Word Prediction Models
ACL 2020
Modeling Color Terminology Across Thousands of Languages
EMNLP 2019
Quantity doesnβt buy quality syntax with neural language models
IJCNLP 2019
Modeling Color Terminology Across Thousands of Languages
IJCNLP 2019
Quantity doesnβt buy quality syntax with neural language models
EMNLP 2019