Bilal Piot
30 papers · 2012–2025 · 5 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π§ Keyword Pioneer π£ Hot Topic Early Bird πΊοΈ Taxonomy Completionist (11) π Interdisciplinary Bridge π Conference Polyglot (5)
π
Interdisciplinary Bridge
π
Conference Polyglot
(5)
π
Renaissance Researcher
(6)
π€
Dynamic Duo
(12)
π
Triple Crown
π§¬
Topic Evolution
π
Keyword Champion
π
Trend Setter
π
Conference Pioneer
π₯
Unstoppable
(7)
β‘
Prolific Year
(6)
β
The Questioner
ποΈ
Keyword Collector
(93)
π
Century Club
(30)
Conferences
ICML (9)
ICLR (8)
NIPS (7)
AISTATS (4)
IJCAI (2)
Top co-authors
Keywords
reinforcement learning
(7)
policy optimization
(4)
representation learning
(4)
markov game
(4)
self-supervised learning
(4)
game theory
(3)
bellman residual
(3)
optimal bellman residual
(2)
self-predictive learning
(2)
two-player zero-sum game
(2)
latent representation
(2)
policy iteration
(2)
deep reinforcement learning
(2)
multi-agent system
(2)
markov decision process
(2)
approximate dynamic programming
(2)
nash equilibrium
(2)
value function
(2)
zero-sum game
(2)
preference learning
(1)
Papers
RRM: Robust Reward Model Training Mitigates Reward Hacking
ICLR 2025
Building Math Agents with Multi-Turn Iterative Preference Learning
ICLR 2025
Learning from negative feedback, or positive feedback or both
ICLR 2025
Multi-turn Reinforcement Learning with Preference Human Feedback
NIPS 2024
Nash Learning from Human Feedback
ICML 2024
Unlocking the Power of Representations in Long-term Novelty-based Exploration
ICLR 2024
Generalized Preference Optimization: A Unified Approach to Offline Alignment
ICML 2024
A General Theoretical Paradigm to Understand Learning from Human Preferences
AISTATS 2024
Human Alignment of Large Language Models through Online Preference Optimisation
ICML 2024
Understanding Self-Predictive Learning for Reinforcement Learning
ICML 2023
The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
ICML 2023
BYOL-Explore: Exploration by Bootstrapped Prediction
NIPS 2022
Emergent Communication at Scale
ICLR 2022
Agent57: Outperforming the Atari Human Benchmark
ICML 2020
Never Give Up: Learning Directed Exploration Strategies
ICLR 2020
Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning
NIPS 2020
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
ICML 2020
Hindsight Credit Assignment
NIPS 2019
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
ICLR 2018
Actor-Critic Fictitious Play in Simultaneous Move Multistage Games
AISTATS 2018
Noisy Networks For Exploration
ICLR 2018
End-to-end optimization of goal-driven and visually grounded dialogue systems
IJCAI 2017
Is the Bellman residual a bad proxy?
NIPS 2017
Learning Nash Equilibrium for General-Sum Markov Games from Batch Data
AISTATS 2017
Softened Approximate Policy Iteration for Markov Games
ICML 2016
On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games
AISTATS 2016
Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
ICML 2015
Inverse Reinforcement Learning in Relational Domains
IJCAI 2015
Difference of Convex Functions Programming for Reinforcement Learning
NIPS 2014
Inverse Reinforcement Learning through Structured Classification
NIPS 2012