conftrace_

Can Rager

6 papers · 2024–2025 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+2 more ↓

🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌍 Conference Polyglot (4) 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (8)

🗺️ Taxonomy Completionist (10) 👥 Mega-Team (20)

Conferences

EMNLP (2) ICLR (2) ICML (1) NIPS (1)

Top co-authors

Samuel Marks (4) David Bau (3) Yeu-Tong Lau (2) Adam Karvonen (2) Arthur Conmy (2) Jannik Brinkmann (2) Aaron Mueller (2) Neel Nanda (1) Curt Tigges (1) Eoin Farrell (1)

Keywords

dictionary learning (1) feature extraction (1) model architecture (1) feature disentanglement (1) neural network optimization (1) language model (1) sparse autoencoder (1) attention head (1) language model interpretability (1) ground truth metric (1) circuit discovery (1) residual stream (1) memory management (1) activation patching (1) board game (1) direct logit attribution (1)

Papers

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals ICLR 2025 Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models ICLR 2025 SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability ICML 2025 Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models NIPS 2024 An Adversarial Example for Direct Logit Attribution: Memory Management in GELU-4L EMNLP 2024 Attribution Patching Outperforms Automated Circuit Discovery EMNLP 2024