Jan Leike

14 papers · 2015–2025 · 6 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🏃 Academic Marathon (10) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (6) 🐝 Cross-Pollinator (11)

🏃 Academic Marathon (10) 🗺️ Taxonomy Completionist (25) 🌈 Renaissance Researcher (6) 🌟 Keyword Trendsetter Combo (3) 📛 The Namer 🧬 Topic Evolution 👥 Mega-Team (20) 💎 Century Club (14) 🔥 Unstoppable (8) 📈 Trend Setter 🗃️ Keyword Collector (53)

Conferences

ICLR (4) IJCAI (3) NIPS (3) ICML (2) AISTATS (1) COLT (1)

Top co-authors

Shane Legg (5) Marcus Hutter (4) Ilya Sutskever (3) Jeffrey Wu (3) John Schulman (2) Leo Gao (2) Paul F Christiano (2) Bowen Baker (2) Dario Amodei (2) Laurent Orseau (2)

Keywords

preference learning (3) reward function (3) inverse reinforcement learning (2) reinforcement learning (2) reward model (2) deep reinforcement learning (2) reward learning (2) human preference (2) sequential decision making (1) reinforcement learning from human feedback (1) bayesian inference (1) ai safety (1) model alignment (1) trajectory optimization (1) language model alignment (1) imitation learning (1) value of information (1) demonstration learning (1) sequence prediction (1) kolmogorov complexity (1)

Papers

Scaling and evaluating sparse autoencoders ICLR 2025 Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision ICML 2024 Let's Verify Step by Step ICLR 2024 Training language models to follow instructions with human feedback NIPS 2022 Quantifying Differences in Reward Functions ICLR 2021 Pitfalls of Learning a Reward Function Online IJCAI 2020 Learning Human Objectives by Evaluating Hypothetical Behavior ICML 2020 Learning to Understand Goal Specifications by Modelling Reward ICLR 2019 Reward learning from human preferences and demonstrations in Atari NIPS 2018 Deep Reinforcement Learning from Human Preferences NIPS 2017 Universal Reinforcement Learning Algorithms: Survey and Experiments IJCAI 2017 On Thompson Sampling and Asymptotic Optimality IJCAI 2017 Loss Bounds and Time Complexity for Speed Priors AISTATS 2016 Bad Universal Priors and Notions of Optimality COLT 2015