Arthur Conmy
10 papers · 2023–2025 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+4 more ↓ Show less ↑
π Cross-Pollinator (14) π Conference Polyglot (4) π Interdisciplinary Bridge π§ Keyword Pioneer π£ Hot Topic Early Bird
π
Keyword Champion
(2)
π
Triple Crown
β‘
Prolific Year
(6)
π
Century Club
(10)
Conferences
EMNLP (3)
ICML (3)
ICLR (2)
NIPS (2)
Top co-authors
Keywords
activation patching
(2)
mechanistic interpretability
(2)
circuit discovery
(2)
sparse autoencoder
(2)
model analysis
(1)
neural network analysis
(1)
neural network optimization
(1)
latent representation
(1)
language model
(1)
feature decomposition
(1)
attention head
(1)
interpretable feature
(1)
transformer model
(1)
activation decomposition
(1)
feature learning
(1)
copy suppression
(1)
model calibration
(1)
model architecture
(1)
neural network interpretability
(1)
Papers
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
ICML 2025
Scaling Sparse Feature Circuits For Studying In-Context Learning
ICML 2025
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
EMNLP 2024
Copy Suppression: Comprehensively Understanding a Motif in Language Model Attention Heads
EMNLP 2024
Attribution Patching Outperforms Automated Circuit Discovery
EMNLP 2024
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
NIPS 2024
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
ICLR 2024
Stealing part of a production language model
ICML 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small
ICLR 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
NIPS 2023