conftrace_

Rémi Munos

117 papers · 2006–2025 · 7 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+17 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (43) 🌍 Conference Polyglot (7)

🏃 Academic Marathon (19) 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13) 🌟 Keyword Trendsetter Combo (6) 🏠 Conference Loyalist (46) 🐺 Lone Wolf (3) 🤝 Dynamic Duo (35) 👑 Triple Crown 🌱 Topic Pioneer 🔬 Deep Specialist (15) 🏆 Keyword Champion 💎 Century Club (117) 🔥 Unstoppable (20) 🗃️ Keyword Collector (212) 📈 Trend Setter 🚀 Conference Pioneer ⚡ Prolific Year (5)

Conferences

NIPS (46) ICML (41) JMLR (11) AISTATS (10) ICLR (7) ACML (1) COLT (1)

Top co-authors

Michal Valko (35) Mark Rowland (27) Will Dabney (23) Yunhao Tang (23) Mohammad Gheshlaghi azar (14) Bilal Piot (12) Daniele Calandriello (10) Bernardo Avila Pires (9) Zhaohan Daniel Guo (8) Pierre Menard (8)

Research topics

Applications (2) Statistics (1)

Keywords

reinforcement learning (23) multi-armed bandit (17) regret bound (17) markov decision process (12) value function (11) stochastic optimization (10) variance reduction (9) deep reinforcement learning (9) distributional reinforcement learning (9) sample complexity (9) value iteration (7) policy gradient (7) off-policy learning (7) online algorithm (7) policy optimization (7) representation learning (6) online learning (6) stratified sampling (5) game theory (5) nash equilibrium (5)

Papers

Optimizing Return Distributions with Distributional Dynamic Programming JMLR 2025 Temporal Difference Flows ICML 2025 Optimizing Language Models for Inference Time Objectives using Reinforcement Learning ICML 2025 Human Alignment of Large Language Models through Online Preference Optimisation ICML 2024 An Analysis of Quantile Temporal-Difference Learning JMLR 2024 A General Theoretical Paradigm to Understand Learning from Human Preferences AISTATS 2024 Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model NIPS 2024 Multi-turn Reinforcement Learning with Preference Human Feedback NIPS 2024 Local and Adaptive Mirror Descents in Extensive-Form Games NIPS 2024 Generalized Preference Optimization: A Unified Approach to Offline Alignment ICML 2024 Nash Learning from Human Feedback ICML 2024 Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition ICML 2023 Towards a better understanding of representation dynamics under TD-learning ICML 2023 Fast Rates for Maximum Entropy Exploration ICML 2023 VA-learning as a more efficient alternative to Q-learning ICML 2023 Model-free Posterior Sampling via Learning Rate Randomization NIPS 2023 DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm ICML 2023 Understanding Self-Predictive Learning for Reinforcement Learning ICML 2023 The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation ICML 2023 Quantile Credit Assignment ICML 2023 Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice ICML 2023 Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments ICML 2023 Adapting to game trees in zero-sum imperfect information games ICML 2023 Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees NIPS 2022 Marginalized Operators for Off-policy Reinforcement Learning AISTATS 2022 Generalised Policy Improvement with Geometric Policy Composition ICML 2022 The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning NIPS 2022 BYOL-Explore: Exploration by Bootstrapped Prediction NIPS 2022 Large-Scale Representation Learning on Graphs via Bootstrapping ICLR 2022 Taylor Expansion of Discount Factors ICML 2021 Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation NIPS 2021 Learning in two-player zero-sum partially observable Markov games with perfect recall NIPS 2021 Revisiting Peng’s Q($λ$) for Modern Reinforcement Learning ICML 2021 Counterfactual Credit Assignment in Model-Free Reinforcement Learning ICML 2021 From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization ICML 2021 Monte-Carlo Tree Search as Regularized Policy Optimization ICML 2020 Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning NIPS 2020 Taylor Expansion Policy Optimization ICML 2020 Adaptive Trade-Offs in Off-Policy Learning AISTATS 2020 Conditional Importance Sampling for Off-Policy Learning AISTATS 2020 Spectral bandits JMLR 2020 A Generalized Training Approach for Multiagent Learning ICLR 2020 Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning NIPS 2020 Fast computation of Nash Equilibria in Imperfect Information Games ICML 2020 Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning ICML 2020 Recurrent Experience Replay in Distributed Reinforcement Learning ICLR 2019 Hindsight Credit Assignment NIPS 2019 Planning in entropy-regularized Markov decision processes and games NIPS 2019 Multiagent Evaluation under Incomplete Information NIPS 2019 The Termination Critic AISTATS 2019 Statistics and Samples in Distributional Reinforcement Learning ICML 2019 Universal Successor Features Approximators ICLR 2019 Maximum a Posteriori Policy Optimisation ICLR 2018 Optimistic optimization of a Brownian NIPS 2018 Actor-Critic Policy Optimization in Partially Observable Multiagent Environments NIPS 2018 An Analysis of Categorical Distributional Reinforcement Learning AISTATS 2018 The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning ICLR 2018 Noisy Networks For Exploration ICLR 2018 Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement ICML 2018 Implicit Quantile Networks for Distributional Reinforcement Learning ICML 2018 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures ICML 2018 Learning to search with MCTSnets ICML 2018 The Uncertainty Bellman Equation and Exploration ICML 2018 Autoregressive Quantile Networks for Generative Modeling ICML 2018 A Distributional Perspective on Reinforcement Learning ICML 2017 Minimax Regret Bounds for Reinforcement Learning ICML 2017 Successor Features for Transfer in Reinforcement Learning NIPS 2017 Count-Based Exploration with Neural Density Models ICML 2017 Automated Curriculum Learning for Neural Networks ICML 2017 Memory-Efficient Backpropagation Through Time NIPS 2016 Safe and Efficient Off-Policy Reinforcement Learning NIPS 2016 Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning NIPS 2016 Analysis of Classification-based Policy Iteration Algorithms JMLR 2016 Unifying Count-Based Exploration and Intrinsic Motivation NIPS 2016 Adaptive Strategy for Stratified Monte Carlo Sampling JMLR 2015 Cheap Bandits ICML 2015 Black-box optimization of noisy functions with unknown smoothness NIPS 2015 Toward Minimax Off-policy Value Estimation AISTATS 2015 Active Regression by Stratification NIPS 2014 Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem ICML 2014 Spectral Bandits for Smooth Graph Functions ICML 2014 Best-Arm Identification in Linear Bandits NIPS 2014 Optimistic Planning in Markov Decision Processes Using a Generative Model NIPS 2014 Bounded Regret for Finite-Armed Structured Bandits NIPS 2014 Efficient learning by implicit exploration in bandit problems with side observations NIPS 2014 Thompson Sampling for 1-Dimensional Exponential Family Bandits NIPS 2013 Toward Optimal Stratification for Stratified Monte-Carlo Integration ICML 2013 Stochastic Simultaneous Optimistic Optimization ICML 2013 Aggregating Optimistic Planning Trees for Solving Markov Decision Processes NIPS 2013 Risk-Aversion in Multi-armed Bandits NIPS 2012 Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button NIPS 2012 Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions NIPS 2012 Finite-Sample Analysis of Least-Squares Policy Iteration JMLR 2012 Linear Regression With Random Projections JMLR 2012 Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit AISTATS 2012 Optimistic planning for Markov decision processes AISTATS 2012 Speedy Q-Learning NIPS 2011 Selecting the State-Representation in Reinforcement Learning NIPS 2011 Finite Time Analysis of Stratified Sampling for Monte Carlo NIPS 2011 Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness NIPS 2011 Sparse Recovery with Brownian Sensing NIPS 2011 A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences COLT 2011 -Armed Bandits JMLR 2011 Adaptive Bandits: Towards the best history-dependent strategy AISTATS 2011 LSTD with Random Projections NIPS 2010 Finite-sample Analysis of Bellman Residual Minimization ACML 2010 Scrambled Objects for Least-Squares Regression NIPS 2010 Error Propagation for Approximate Policy and Value Iteration NIPS 2010 Sensitivity analysis in HMMs with application to likelihood maximization NIPS 2009 Compressed Least-Squares Regression NIPS 2009 Particle Filter-based Policy Gradient in POMDPs NIPS 2008 Online Optimization in X-Armed Bandits NIPS 2008 Algorithms for Infinitely Many-Armed Bandits NIPS 2008 Finite-Time Bounds for Fitted Value Iteration JMLR 2008 Fitted Q-iteration in continuous action-space MDPs NIPS 2007 Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation JMLR 2006 Policy Gradient in Continuous Time JMLR 2006