Atticus Geiger

32 papers · 2019–2026 · 9 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (6) 🌍 Conference Polyglot (9) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (13)

🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (9) 🏃 Academic Marathon (6) 👥 Mega-Team (23) 🤝 Dynamic Duo (24) 🔬 Deep Specialist (15) 🏆 Keyword Champion (8) 🗃️ Keyword Collector (102) 📈 Trend Setter ⚡ Prolific Year (7) ❓ The Questioner (3) 🔥 Unstoppable (7) 💎 Century Club (31)

Conferences

EMNLP (6) ACL (5) ICML (5) NAACL (4) NIPS (4) CLEAR (3) ICLR (2) IJCNLP (2) JMLR (1)

Top co-authors

Christopher Potts (24) Zhengxuan Wu (15) Jing Huang (7) Thomas Icard (7) Noah Goodman (6) Amir Zur (4) Aryaman Arora (4) Ignacio Cases (3) Zheng Wang (3) Karel D’Oosterlinck (3)

Keywords

causal abstraction (8) natural language inference (6) causal inference (5) neural network (4) sentiment analysis (4) language model (4) natural language processing (3) ternary classification (2) large language model (2) mechanistic interpretability (2) neural network interpretability (2) representation learning (2) distributed representation (2) benchmark dataset (2) adversarial testing (2) neural model (2) concept-based explanation (2) in-context learning (1) few-shot learning (1) knowledge distillation (1)

Papers

Constructing Interpretable Features from Compositional Neuron Groups ACL 2026 Combining Causal Models for More Accurate Abstractions of Neural Networks CLEAR 2025 HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks ICLR 2025 How Do Transformers Learn Variable Binding in Symbolic Programs? ICML 2025 AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders ICML 2025 Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability JMLR 2025 MIB: A Mechanistic Interpretability Benchmark ICML 2025 Enhancing Automated Interpretability with Output-Centric Feature Descriptions ACL 2025 Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations EMNLP 2024 ReFT: Representation Finetuning for Language Models NIPS 2024 RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations ACL 2024 Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations CLEAR 2024 Updating CLIP to Prefer Descriptions Over Captions EMNLP 2024 Language Models Linearly Represent Sentiment EMNLP 2024 Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching ICLR 2024 pyvene: A Library for Understanding and Improving PyTorch Models via Interventions NAACL 2024 Rigorously Assessing Natural Language Explanations of Neurons EMNLP 2023 Causal Proxy Models for Concept-based Model Explanations ICML 2023 Causal Abstraction with Soft Interventions CLEAR 2023 ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning ACL 2023 Interpretability at Scale: Identifying Causal Mechanisms in Alpaca NIPS 2023 Causal Distillation for Language Models NAACL 2022 CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior NIPS 2022 Inducing Causal Structure for Interpretable Neural Networks ICML 2022 Causal Abstractions of Neural Networks NIPS 2021 DynaSent: A Dynamic Benchmark for Sentiment Analysis ACL 2021 DynaSent: A Dynamic Benchmark for Sentiment Analysis IJCNLP 2021 Dynabench: Rethinking Benchmarking in NLP NAACL 2021 Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation EMNLP 2020 Posing Fair Generalization Tasks for Natural Language Inference IJCNLP 2019 Posing Fair Generalization Tasks for Natural Language Inference EMNLP 2019 Recursive Routing Networks: Learning to Compose Modules for Language Understanding NAACL 2019