Can Rager
6 papers · 2024–2025 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+2 more ↓ Show less ↑
π£ Hot Topic Early Bird π§ Keyword Pioneer π Conference Polyglot (4) π Interdisciplinary Bridge π Cross-Pollinator (8)
πΊοΈ
Taxonomy Completionist
(10)
π₯
Mega-Team
(20)
Conferences
EMNLP (2)
ICLR (2)
ICML (1)
NIPS (1)
Top co-authors
Keywords
dictionary learning
(1)
feature extraction
(1)
model architecture
(1)
feature disentanglement
(1)
neural network optimization
(1)
language model
(1)
sparse autoencoder
(1)
attention head
(1)
language model interpretability
(1)
ground truth metric
(1)
circuit discovery
(1)
residual stream
(1)
memory management
(1)
activation patching
(1)
board game
(1)
direct logit attribution
(1)
Papers
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
ICLR 2025
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
ICLR 2025
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
ICML 2025
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
NIPS 2024
An Adversarial Example for Direct Logit Attribution: Memory Management in GELU-4L
EMNLP 2024
Attribution Patching Outperforms Automated Circuit Discovery
EMNLP 2024