Dylan Hadfield-Menell

16 papers · 2016–2025 · 7 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🏃 Academic Marathon (9) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (6)

🏃 Academic Marathon (9) 🗺️ Taxonomy Completionist (25) 🌈 Renaissance Researcher (8) 🌟 Keyword Trendsetter Combo (3) 🏆 Keyword Champion (2) 👥 Mega-Team (27) 🔥 Unstoppable (6) ❓ The Questioner (2) 🗃️ Keyword Collector (72) 💎 Century Club (16) 📈 Trend Setter

Conferences

NIPS (7) ICLR (2) ICML (2) IJCAI (2) CORL (1) EMNLP (1) RSS (1)

Top co-authors

Anca Dragan (7) Stuart Russell (6) Pieter Abbeel (3) Stephen Casper (3) Smitha Milli (2) Kevin Zhang (1) Jaime Fisac (1) Dipam Chakraborty (1) Tom Griffiths (1) Kaivalya Hariharan (1)

Keywords

reward function (5) inverse reinforcement learning (3) value alignment (3) model debugging (2) reinforcement learning (2) human-robot interaction (2) game theory (1) robotic manipulation (1) multi-agent reinforcement learning (1) robot planning (1) utility optimization (1) imitation learning (1) partially observable markov decision process (1) model misspecification (1) reward design (1) machine learning (1) preference inference (1) reward learning (1) computer vision (1) representation learning (1)

Papers

Diverse Preference Learning for Capabilities and Alignment ICLR 2025 Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF ICLR 2024 Melting Pot Contest: Charting the Future of Generalized Cooperative Intelligence NIPS 2024 Red Teaming Deep Neural Networks with Feature Synthesis Tools NIPS 2023 Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness? EMNLP 2023 Robust Feature-Level Adversaries are Interpretability Tools NIPS 2022 Estimating and Penalizing Induced Preference Shifts in Recommender Systems ICML 2022 How to talk so AI will learn: Instructions, descriptions, and autonomy NIPS 2022 Guided Imitation of Task and Motion Planning CORL 2021 Consequences of Misaligned AI NIPS 2020 Simplifying Reward Design through Divide-and-Conquer RSS 2018 An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning ICML 2018 Inverse Reward Design NIPS 2017 The Off-Switch Game IJCAI 2017 Should Robots be Obedient? IJCAI 2017 Cooperative Inverse Reinforcement Learning NIPS 2016