Dylan Hadfield-Menell
16 papers · 2016–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Academic Marathon (9) π§ Keyword Pioneer π Interdisciplinary Bridge π Conference Polyglot (7) π Cross-Pollinator (6)
π
Academic Marathon
(9)
πΊοΈ
Taxonomy Completionist
(25)
π
Renaissance Researcher
(8)
π
Keyword Trendsetter Combo
(3)
π
Keyword Champion
(2)
π₯
Mega-Team
(27)
π₯
Unstoppable
(6)
β
The Questioner
(2)
ποΈ
Keyword Collector
(72)
π
Century Club
(16)
π
Trend Setter
Conferences
NIPS (7)
ICLR (2)
ICML (2)
IJCAI (2)
CORL (1)
EMNLP (1)
RSS (1)
Top co-authors
Keywords
reward function
(5)
inverse reinforcement learning
(3)
value alignment
(3)
model debugging
(2)
reinforcement learning
(2)
human-robot interaction
(2)
game theory
(1)
robotic manipulation
(1)
multi-agent reinforcement learning
(1)
robot planning
(1)
utility optimization
(1)
imitation learning
(1)
partially observable markov decision process
(1)
model misspecification
(1)
reward design
(1)
machine learning
(1)
preference inference
(1)
reward learning
(1)
computer vision
(1)
representation learning
(1)
Papers
Diverse Preference Learning for Capabilities and Alignment
ICLR 2025
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
ICLR 2024
Melting Pot Contest: Charting the Future of Generalized Cooperative Intelligence
NIPS 2024
Red Teaming Deep Neural Networks with Feature Synthesis Tools
NIPS 2023
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
EMNLP 2023
Robust Feature-Level Adversaries are Interpretability Tools
NIPS 2022
Estimating and Penalizing Induced Preference Shifts in Recommender Systems
ICML 2022
How to talk so AI will learn: Instructions, descriptions, and autonomy
NIPS 2022
Guided Imitation of Task and Motion Planning
CORL 2021
Consequences of Misaligned AI
NIPS 2020
Simplifying Reward Design through Divide-and-Conquer
RSS 2018
An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning
ICML 2018
Inverse Reward Design
NIPS 2017
The Off-Switch Game
IJCAI 2017
Should Robots be Obedient?
IJCAI 2017
Cooperative Inverse Reinforcement Learning
NIPS 2016