David Krueger
27 papers · 2017–2025 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
π§ Keyword Pioneer π Renaissance Researcher (5) π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (12) π Conference Polyglot (5)
π
Conference Polyglot
(5)
π
Academic Marathon
(8)
π
Cross-Pollinator
(7)
π
Triple Crown
π§¬
Topic Evolution
π₯
Unstoppable
(5)
β‘
Prolific Year
(10)
π
Trend Setter
π
Century Club
(27)
ποΈ
Keyword Collector
(79)
Conferences
ICML (10)
ICLR (9)
NIPS (6)
ACML (1)
NAACL (1)
Top co-authors
Keywords
reinforcement learning
(3)
world model
(2)
large language model
(2)
out-of-distribution generalization
(2)
causal inference
(2)
domain generalization
(1)
group theory
(1)
adversarial robustness
(1)
neural network interpretability
(1)
model safety
(1)
feature learning
(1)
loss landscape
(1)
model evaluation
(1)
reinforcement learning from human feedback
(1)
probability distribution
(1)
covariate shift
(1)
model-based reinforcement learning
(1)
reward function
(1)
action prediction
(1)
data augmentation
(1)
Papers
Analyzing (In)Abilities of SAEs via Formal Languages
NAACL 2025
Interpreting Emergent Planning in Model-Free Reinforcement Learning
ICLR 2025
Protecting against simultaneous data poisoning attacks
ICLR 2025
Towards Interpreting Visual Information Processing in Vision-Language Models
ICLR 2025
Input Space Mode Connectivity in Deep Neural Networks
ICLR 2025
Influence Functions for Scalable Data Attribution in Diffusion Models
ICLR 2025
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
ICML 2025
PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data
ICML 2025
Position: Humanity Faces Existential Risk from Gradual Disempowerment
ICML 2025
Position: Probabilistic Modelling is Sufficient for Causal Inference
ICML 2025
Reward Model Ensembles Help Mitigate Overoptimization
ICLR 2024
Implicit meta-learning may lead language models to trust more reliable sources
ICML 2024
Interpreting Learned Feedback Patterns in Large Language Models
NIPS 2024
Predicting Future Actions of Reinforcement Learning Agents
NIPS 2024
Stress-Testing Capability Elicitation With Password-Locked Models
NIPS 2024
A Generative Model of Symmetry Transformations
NIPS 2024
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
ICLR 2024
Mechanistic Mode Connectivity
ICML 2023
Thinker: Learning to Plan and Act
NIPS 2023
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
ICLR 2023
Broken Neural Scaling Laws
ICLR 2023
Defining and Characterizing Reward Gaming
NIPS 2022
Goal Misgeneralization in Deep Reinforcement Learning
ICML 2022
Out-of-Distribution Generalization via Risk Extrapolation (REx)
ICML 2021
Neural Autoregressive Flows
ICML 2018
Nested LSTMs
ACML 2017
A Closer Look at Memorization in Deep Networks
ICML 2017