Atticus Geiger
32 papers · 2019–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Academic Marathon (6) π Conference Polyglot (9) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (13)
π
Renaissance Researcher
(6)
π
Conference Polyglot
(9)
π
Academic Marathon
(6)
π₯
Mega-Team
(23)
π€
Dynamic Duo
(24)
π¬
Deep Specialist
(15)
π
Keyword Champion
(8)
ποΈ
Keyword Collector
(102)
π
Trend Setter
β‘
Prolific Year
(7)
β
The Questioner
(3)
π₯
Unstoppable
(7)
π
Century Club
(31)
Conferences
EMNLP (6)
ACL (5)
ICML (5)
NAACL (4)
NIPS (4)
CLEAR (3)
ICLR (2)
IJCNLP (2)
JMLR (1)
Top co-authors
Keywords
causal abstraction
(8)
natural language inference
(6)
causal inference
(5)
neural network
(4)
sentiment analysis
(4)
language model
(4)
natural language processing
(3)
ternary classification
(2)
large language model
(2)
mechanistic interpretability
(2)
neural network interpretability
(2)
representation learning
(2)
distributed representation
(2)
benchmark dataset
(2)
adversarial testing
(2)
neural model
(2)
concept-based explanation
(2)
in-context learning
(1)
few-shot learning
(1)
knowledge distillation
(1)
Papers
Constructing Interpretable Features from Compositional Neuron Groups
ACL 2026
Combining Causal Models for More Accurate Abstractions of Neural Networks
CLEAR 2025
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
ICLR 2025
How Do Transformers Learn Variable Binding in Symbolic Programs?
ICML 2025
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
ICML 2025
Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability
JMLR 2025
MIB: A Mechanistic Interpretability Benchmark
ICML 2025
Enhancing Automated Interpretability with Output-Centric Feature Descriptions
ACL 2025
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
EMNLP 2024
ReFT: Representation Finetuning for Language Models
NIPS 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
ACL 2024
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
CLEAR 2024
Updating CLIP to Prefer Descriptions Over Captions
EMNLP 2024
Language Models Linearly Represent Sentiment
EMNLP 2024
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
ICLR 2024
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
NAACL 2024
Rigorously Assessing Natural Language Explanations of Neurons
EMNLP 2023
Causal Proxy Models for Concept-based Model Explanations
ICML 2023
Causal Abstraction with Soft Interventions
CLEAR 2023
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
ACL 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
NIPS 2023
Causal Distillation for Language Models
NAACL 2022
CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior
NIPS 2022
Inducing Causal Structure for Interpretable Neural Networks
ICML 2022
Causal Abstractions of Neural Networks
NIPS 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis
ACL 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis
IJCNLP 2021
Dynabench: Rethinking Benchmarking in NLP
NAACL 2021
Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation
EMNLP 2020
Posing Fair Generalization Tasks for Natural Language Inference
IJCNLP 2019
Posing Fair Generalization Tasks for Natural Language Inference
EMNLP 2019
Recursive Routing Networks: Learning to Compose Modules for Language Understanding
NAACL 2019