Martin Wattenberg
11 papers · 2018–2025 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+3 more ↓ Show less ↑
π Cross-Pollinator (15) π Conference Polyglot (4) π Academic Marathon (7) π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (10)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Century Club
(11)
Conferences
ICML (5)
ICLR (3)
NIPS (2)
EMNLP (1)
Top co-authors
Keywords
transformer architecture
(1)
image classification
(1)
self-supervised learning
(1)
bert model
(1)
model interpretability
(1)
world model
(1)
attention head
(1)
semantic subspace
(1)
linear representation
(1)
word embedding
(1)
linguistic feature
(1)
syntactic representation
(1)
activation steering
(1)
inference-time intervention
(1)
attention matrix
(1)
large language model
(1)
neural network
(1)
concept activation vector
(1)
testing with cav
(1)
syntactic subspace
(1)
Papers
When Bad Data Leads to Good Models
ICML 2025
ICLR: In-Context Learning of Representations
ICLR 2025
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
ICML 2025
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
ICML 2024
Linearity of Relation Decoding in Transformer Language Models
ICLR 2024
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
ICML 2024
Emergent Linear Representations in World Models of Self-Supervised Sequence Models
EMNLP 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
NIPS 2023
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
ICLR 2023
Visualizing and Measuring the Geometry of BERT
NIPS 2019
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
ICML 2018