Bilal Piot

30 papers · 2012–2025 · 5 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (11) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (5)

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (5) 🌈 Renaissance Researcher (6) 🤝 Dynamic Duo (12) 👑 Triple Crown 🧬 Topic Evolution 🏆 Keyword Champion 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (7) ⚡ Prolific Year (6) ❓ The Questioner 🗃️ Keyword Collector (93) 💎 Century Club (30)

Conferences

ICML (9) ICLR (8) NIPS (7) AISTATS (4) IJCAI (2)

Top co-authors

Rémi Munos (12) Olivier Pietquin (11) Daniele Calandriello (9) Mohammad Gheshlaghi azar (9) Michal Valko (8) Zhaohan Daniel Guo (7) Yunhao Tang (6) Bernardo Avila Pires (6) Matthieu Geist (6) Florian Strub (5)

Keywords

reinforcement learning (7) policy optimization (4) representation learning (4) markov game (4) self-supervised learning (4) game theory (3) bellman residual (3) optimal bellman residual (2) self-predictive learning (2) two-player zero-sum game (2) latent representation (2) policy iteration (2) deep reinforcement learning (2) multi-agent system (2) markov decision process (2) approximate dynamic programming (2) nash equilibrium (2) value function (2) zero-sum game (2) preference learning (1)

Papers

RRM: Robust Reward Model Training Mitigates Reward Hacking ICLR 2025 Building Math Agents with Multi-Turn Iterative Preference Learning ICLR 2025 Learning from negative feedback, or positive feedback or both ICLR 2025 Multi-turn Reinforcement Learning with Preference Human Feedback NIPS 2024 Nash Learning from Human Feedback ICML 2024 Unlocking the Power of Representations in Long-term Novelty-based Exploration ICLR 2024 Generalized Preference Optimization: A Unified Approach to Offline Alignment ICML 2024 A General Theoretical Paradigm to Understand Learning from Human Preferences AISTATS 2024 Human Alignment of Large Language Models through Online Preference Optimisation ICML 2024 Understanding Self-Predictive Learning for Reinforcement Learning ICML 2023 The Edge of Orthogonality: A Simple View of What Makes BYOL Tick ICML 2023 BYOL-Explore: Exploration by Bootstrapped Prediction NIPS 2022 Emergent Communication at Scale ICLR 2022 Agent57: Outperforming the Atari Human Benchmark ICML 2020 Never Give Up: Learning Directed Exploration Strategies ICLR 2020 Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning NIPS 2020 Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning ICML 2020 Hindsight Credit Assignment NIPS 2019 The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning ICLR 2018 Actor-Critic Fictitious Play in Simultaneous Move Multistage Games AISTATS 2018 Noisy Networks For Exploration ICLR 2018 End-to-end optimization of goal-driven and visually grounded dialogue systems IJCAI 2017 Is the Bellman residual a bad proxy? NIPS 2017 Learning Nash Equilibrium for General-Sum Markov Games from Batch Data AISTATS 2017 Softened Approximate Policy Iteration for Markov Games ICML 2016 On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games AISTATS 2016 Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games ICML 2015 Inverse Reinforcement Learning in Relational Domains IJCAI 2015 Difference of Convex Functions Programming for Reinforcement Learning NIPS 2014 Inverse Reinforcement Learning through Structured Classification NIPS 2012