conftrace_

Martin Wattenberg

11 papers · 2018–2025 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+3 more ↓

🐝 Cross-Pollinator (15) 🌍 Conference Polyglot (4) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (10)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 💎 Century Club (11)

Conferences

ICML (5) ICLR (3) NIPS (2) EMNLP (1)

Top co-authors

Kenneth Li (4) Fernanda Viégas (4) Andrew Lee (3) Been Kim (2) David Bau (2) Ekdeep Singh Lubana (2) Hanspeter Pfister (2) Rada Mihalcea (1) Yida Chen (1) Hugh Zhang (1)

Keywords

transformer architecture (1) image classification (1) self-supervised learning (1) bert model (1) model interpretability (1) world model (1) attention head (1) semantic subspace (1) linear representation (1) word embedding (1) linguistic feature (1) syntactic representation (1) activation steering (1) inference-time intervention (1) attention matrix (1) large language model (1) neural network (1) concept activation vector (1) testing with cav (1) syntactic subspace (1)

Papers

When Bad Data Leads to Good Models ICML 2025 ICLR: In-Context Learning of Representations ICLR 2025 Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models ICML 2025 A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity ICML 2024 Linearity of Relation Decoding in Transformer Language Models ICLR 2024 Q-Probe: A Lightweight Approach to Reward Maximization for Language Models ICML 2024 Emergent Linear Representations in World Models of Self-Supervised Sequence Models EMNLP 2023 Inference-Time Intervention: Eliciting Truthful Answers from a Language Model NIPS 2023 Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task ICLR 2023 Visualizing and Measuring the Geometry of BERT NIPS 2019 Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) ICML 2018