Aaron Mueller

42 papers · 2019–2026 · 8 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🏃 Academic Marathon (6) 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (9)

🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (7) 🏃 Academic Marathon (6) 🤝 Dynamic Duo (14) 👥 Mega-Team (23) 🔬 Deep Specialist (13) 🧬 Topic Evolution 🗃️ Keyword Collector (139) ❓ The Questioner ⚡ Prolific Year (5) 🔥 Unstoppable (7) 💎 Century Club (36)

Conferences

ACL (14) EMNLP (8) NAACL (5) CONLL (4) ICLR (4) IJCNLP (4) EACL (2) ICML (1)

Top co-authors

Tal Linzen (14) Yonatan Belinkov (9) Adina Williams (5) David Bau (5) Ryan Cotterell (4) Dana Arad (4) Alex Warstadt (4) Leshem Choshen (4) Martin Tutek (3) Alexandra DeLucia (3)

Keywords

sparse autoencoder (6) language model (6) large language model (4) multilingual language model (4) mechanistic interpretability (3) inductive bia (3) neural network (3) subject-verb agreement (3) syntactic agreement (3) language modeling (3) narrative generation (2) text generation (2) pretrained language model (2) masked language model (2) cross-lingual transfer (2) in-context learning (2) few-shot learning (2) model interpretability (2) neural network interpretability (2) computational linguistics (2)

Papers

Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics? EACL 2026 From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts? ACL 2026 Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate ACL 2026 CRISP: Persistent Concept Unlearning via Sparse Autoencoders ACL 2026 Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining ACL 2026 Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection EACL 2026 MIB: A Mechanistic Interpretability Benchmark ICML 2025 Position-aware Automatic Circuit Discovery ACL 2025 SAEs Are Good for Steering – If You Select the Right Features EMNLP 2025 Findings of the Third BabyLM Challenge: Accelerating Language Modeling Research with Cognitively Plausible Data EMNLP 2025 Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models EMNLP 2025 NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals ICLR 2025 Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics ICLR 2025 Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models ICLR 2025 Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models NAACL 2025 Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages NAACL 2025 Characterizing the Role of Similarity in the Property Inferences of Language Models NAACL 2025 In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax NAACL 2024 Developmentally Plausible Multimodal Language Models Are Highly Modular CONLL 2024 Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora CONLL 2024 Function Vectors in Large Language Models ICLR 2024 Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora EMNLP 2023 Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora CONLL 2023 How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases ACL 2023 Meta-training with Demonstration Retrieval for Efficient Few-shot Learning ACL 2023 What Do NLP Researchers Believe? Results of the NLP Community Metasurvey ACL 2023 Language model acceptability judgements are not always robust to context ACL 2023 Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models EMNLP 2022 Label Semantic Aware Pre-training for Few-shot Text Classification ACL 2022 Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models ACL 2022 Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models CONLL 2022 Bernice: A Multilingual Pre-trained Encoder for Twitter EMNLP 2022 Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic Modeling NAACL 2021 Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models ACL 2021 Decoding Methods for Neural Narrative Generation ACL 2021 Decoding Methods for Neural Narrative Generation IJCNLP 2021 Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models IJCNLP 2021 Cross-Linguistic Syntactic Evaluation of Word Prediction Models ACL 2020 Modeling Color Terminology Across Thousands of Languages EMNLP 2019 Quantity doesn’t buy quality syntax with neural language models IJCNLP 2019 Modeling Color Terminology Across Thousands of Languages IJCNLP 2019 Quantity doesn’t buy quality syntax with neural language models EMNLP 2019