Rรฉmi Munos
117 papers · 2006–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
๐งญ Keyword Pioneer ๐ฃ Hot Topic Early Bird ๐ Interdisciplinary Bridge ๐บ๏ธ Taxonomy Completionist (43) ๐ Conference Polyglot (7)
๐
Academic Marathon
(19)
๐
Interdisciplinary Bridge
๐
Cross-Pollinator
(13)
๐
Keyword Trendsetter Combo
(6)
๐
Conference Loyalist
(46)
๐บ
Lone Wolf
(3)
๐ค
Dynamic Duo
(35)
๐
Triple Crown
๐ฑ
Topic Pioneer
๐ฌ
Deep Specialist
(15)
๐
Keyword Champion
๐
Century Club
(117)
๐ฅ
Unstoppable
(20)
๐๏ธ
Keyword Collector
(212)
๐
Trend Setter
๐
Conference Pioneer
โก
Prolific Year
(5)
Conferences
NIPS (46)
ICML (41)
JMLR (11)
AISTATS (10)
ICLR (7)
ACML (1)
COLT (1)
Top co-authors
Research topics
Keywords
reinforcement learning
(23)
multi-armed bandit
(17)
regret bound
(17)
markov decision process
(12)
value function
(11)
stochastic optimization
(10)
variance reduction
(9)
deep reinforcement learning
(9)
distributional reinforcement learning
(9)
sample complexity
(9)
value iteration
(7)
policy gradient
(7)
off-policy learning
(7)
online algorithm
(7)
policy optimization
(7)
representation learning
(6)
online learning
(6)
stratified sampling
(5)
game theory
(5)
nash equilibrium
(5)
Papers
Optimizing Return Distributions with Distributional Dynamic Programming
JMLR 2025
Temporal Difference Flows
ICML 2025
Optimizing Language Models for Inference Time Objectives using Reinforcement Learning
ICML 2025
Human Alignment of Large Language Models through Online Preference Optimisation
ICML 2024
An Analysis of Quantile Temporal-Difference Learning
JMLR 2024
A General Theoretical Paradigm to Understand Learning from Human Preferences
AISTATS 2024
Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
NIPS 2024
Multi-turn Reinforcement Learning with Preference Human Feedback
NIPS 2024
Local and Adaptive Mirror Descents in Extensive-Form Games
NIPS 2024
Generalized Preference Optimization: A Unified Approach to Offline Alignment
ICML 2024
Nash Learning from Human Feedback
ICML 2024
Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
ICML 2023
Towards a better understanding of representation dynamics under TD-learning
ICML 2023
Fast Rates for Maximum Entropy Exploration
ICML 2023
VA-learning as a more efficient alternative to Q-learning
ICML 2023
Model-free Posterior Sampling via Learning Rate Randomization
NIPS 2023
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
ICML 2023
Understanding Self-Predictive Learning for Reinforcement Learning
ICML 2023
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
ICML 2023
Quantile Credit Assignment
ICML 2023
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
ICML 2023
Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments
ICML 2023
Adapting to game trees in zero-sum imperfect information games
ICML 2023
Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
NIPS 2022
Marginalized Operators for Off-policy Reinforcement Learning
AISTATS 2022
Generalised Policy Improvement with Geometric Policy Composition
ICML 2022
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
NIPS 2022
BYOL-Explore: Exploration by Bootstrapped Prediction
NIPS 2022
Large-Scale Representation Learning on Graphs via Bootstrapping
ICLR 2022
Taylor Expansion of Discount Factors
ICML 2021
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
NIPS 2021
Learning in two-player zero-sum partially observable Markov games with perfect recall
NIPS 2021
Revisiting Pengโs Q($ฮป$) for Modern Reinforcement Learning
ICML 2021
Counterfactual Credit Assignment in Model-Free Reinforcement Learning
ICML 2021
From Poincarรฉ Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization
ICML 2021
Monte-Carlo Tree Search as Regularized Policy Optimization
ICML 2020
Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning
NIPS 2020
Taylor Expansion Policy Optimization
ICML 2020
Adaptive Trade-Offs in Off-Policy Learning
AISTATS 2020
Conditional Importance Sampling for Off-Policy Learning
AISTATS 2020
Spectral bandits
JMLR 2020
A Generalized Training Approach for Multiagent Learning
ICLR 2020
Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning
NIPS 2020
Fast computation of Nash Equilibria in Imperfect Information Games
ICML 2020
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
ICML 2020
Recurrent Experience Replay in Distributed Reinforcement Learning
ICLR 2019
Hindsight Credit Assignment
NIPS 2019
Planning in entropy-regularized Markov decision processes and games
NIPS 2019
Multiagent Evaluation under Incomplete Information
NIPS 2019
The Termination Critic
AISTATS 2019
Statistics and Samples in Distributional Reinforcement Learning
ICML 2019
Universal Successor Features Approximators
ICLR 2019
Maximum a Posteriori Policy Optimisation
ICLR 2018
Optimistic optimization of a Brownian
NIPS 2018
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments
NIPS 2018
An Analysis of Categorical Distributional Reinforcement Learning
AISTATS 2018
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
ICLR 2018
Noisy Networks For Exploration
ICLR 2018
Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
ICML 2018
Implicit Quantile Networks for Distributional Reinforcement Learning
ICML 2018
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
ICML 2018
Learning to search with MCTSnets
ICML 2018
The Uncertainty Bellman Equation and Exploration
ICML 2018
Autoregressive Quantile Networks for Generative Modeling
ICML 2018
A Distributional Perspective on Reinforcement Learning
ICML 2017
Minimax Regret Bounds for Reinforcement Learning
ICML 2017
Successor Features for Transfer in Reinforcement Learning
NIPS 2017
Count-Based Exploration with Neural Density Models
ICML 2017
Automated Curriculum Learning for Neural Networks
ICML 2017
Memory-Efficient Backpropagation Through Time
NIPS 2016
Safe and Efficient Off-Policy Reinforcement Learning
NIPS 2016
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
NIPS 2016
Analysis of Classification-based Policy Iteration Algorithms
JMLR 2016
Unifying Count-Based Exploration and Intrinsic Motivation
NIPS 2016
Adaptive Strategy for Stratified Monte Carlo Sampling
JMLR 2015
Cheap Bandits
ICML 2015
Black-box optimization of noisy functions with unknown smoothness
NIPS 2015
Toward Minimax Off-policy Value Estimation
AISTATS 2015
Active Regression by Stratification
NIPS 2014
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
ICML 2014
Spectral Bandits for Smooth Graph Functions
ICML 2014
Best-Arm Identification in Linear Bandits
NIPS 2014
Optimistic Planning in Markov Decision Processes Using a Generative Model
NIPS 2014
Bounded Regret for Finite-Armed Structured Bandits
NIPS 2014
Efficient learning by implicit exploration in bandit problems with side observations
NIPS 2014
Thompson Sampling for 1-Dimensional Exponential Family Bandits
NIPS 2013
Toward Optimal Stratification for Stratified Monte-Carlo Integration
ICML 2013
Stochastic Simultaneous Optimistic Optimization
ICML 2013
Aggregating Optimistic Planning Trees for Solving Markov Decision Processes
NIPS 2013
Risk-Aversion in Multi-armed Bandits
NIPS 2012
Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button
NIPS 2012
Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions
NIPS 2012
Finite-Sample Analysis of Least-Squares Policy Iteration
JMLR 2012
Linear Regression With Random Projections
JMLR 2012
Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit
AISTATS 2012
Optimistic planning for Markov decision processes
AISTATS 2012
Speedy Q-Learning
NIPS 2011
Selecting the State-Representation in Reinforcement Learning
NIPS 2011
Finite Time Analysis of Stratified Sampling for Monte Carlo
NIPS 2011
Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
NIPS 2011
Sparse Recovery with Brownian Sensing
NIPS 2011
A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
COLT 2011
-Armed Bandits
JMLR 2011
Adaptive Bandits: Towards the best history-dependent strategy
AISTATS 2011
LSTD with Random Projections
NIPS 2010
Finite-sample Analysis of Bellman Residual Minimization
ACML 2010
Scrambled Objects for Least-Squares Regression
NIPS 2010
Error Propagation for Approximate Policy and Value Iteration
NIPS 2010
Sensitivity analysis in HMMs with application to likelihood maximization
NIPS 2009
Compressed Least-Squares Regression
NIPS 2009
Particle Filter-based Policy Gradient in POMDPs
NIPS 2008
Online Optimization in X-Armed Bandits
NIPS 2008
Algorithms for Infinitely Many-Armed Bandits
NIPS 2008
Finite-Time Bounds for Fitted Value Iteration
JMLR 2008
Fitted Q-iteration in continuous action-space MDPs
NIPS 2007
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
JMLR 2006
Policy Gradient in Continuous Time
JMLR 2006