conftrace_

Arthur Conmy

10 papers · 2023–2025 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+4 more ↓

🐝 Cross-Pollinator (14) 🌍 Conference Polyglot (4) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird

🏆 Keyword Champion (2) 👑 Triple Crown ⚡ Prolific Year (6) 💎 Century Club (10)

Conferences

EMNLP (3) ICML (3) ICLR (2) NIPS (2)

Top co-authors

Neel Nanda (5) Vikrant Varma (2) Can Rager (2) János Kramár (2) Tom Lieberum (2) Lewis Smith (2) Callum Stuart McDougall (2) Rohin Shah (2) Senthooran Rajamanoharan (2) Curt Tigges (1)

Keywords

activation patching (2) mechanistic interpretability (2) circuit discovery (2) sparse autoencoder (2) model analysis (1) neural network analysis (1) neural network optimization (1) latent representation (1) language model (1) feature decomposition (1) attention head (1) interpretable feature (1) transformer model (1) activation decomposition (1) feature learning (1) copy suppression (1) model calibration (1) model architecture (1) neural network interpretability (1)

Papers

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability ICML 2025 Scaling Sparse Feature Circuits For Studying In-Context Learning ICML 2025 Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 EMNLP 2024 Copy Suppression: Comprehensively Understanding a Motif in Language Model Attention Heads EMNLP 2024 Attribution Patching Outperforms Automated Circuit Discovery EMNLP 2024 Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders NIPS 2024 Successor Heads: Recurring, Interpretable Attention Heads In The Wild ICLR 2024 Stealing part of a production language model ICML 2024 Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small ICLR 2023 Towards Automated Circuit Discovery for Mechanistic Interpretability NIPS 2023