Matteo Papini
25 papers · 2017–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
🌍 Conference Polyglot (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (8)
🌈
Renaissance Researcher
(6)
🌍
Conference Polyglot
(8)
🏃
Academic Marathon
(8)
🤝
Dynamic Duo
(14)
🔬
Deep Specialist
(12)
💎
Century Club
(24)
🗃️
Keyword Collector
(87)
⚡
Prolific Year
(9)
🔥
Unstoppable
(9)
Conferences
NIPS (7)
ICML (6)
AAAI (3)
AISTATS (2)
ALT (2)
COLT (2)
IJCAI (2)
JMLR (1)
Top co-authors
Keywords
reinforcement learning
(7)
policy gradient
(6)
regret bound
(5)
contextual bandit
(4)
online learning
(4)
policy optimization
(4)
importance sampling
(4)
continuous control
(4)
markov decision process
(4)
function approximation
(4)
representation learning
(3)
linear mdp
(2)
sample complexity
(2)
online algorithm
(2)
constant regret
(2)
policy search
(2)
off-policy learning
(2)
regret minimization
(2)
variance reduction
(2)
pessimistic estimator
(2)
Papers
Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning
AAAI 2026
Convergence Analysis of Policy Gradient Methods with Dynamic Stochasticity
ICML 2025
Importance-Weighted Offline Learning Done Right
ALT 2024
Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs
NIPS 2024
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning
NIPS 2024
Offline Primal-Dual Reinforcement Learning for Linear MDPs
AISTATS 2024
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs
COLT 2024
Optimistic Information Directed Sampling
COLT 2024
No-Regret Reinforcement Learning in Smooth MDPs
ICML 2024
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
ICML 2024
Online Learning with Off-Policy Feedback in Adversarial MDPs
IJCAI 2024
Online Learning with Off-Policy Feedback
ALT 2023
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
NIPS 2022
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
NIPS 2022
Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
NIPS 2021
Leveraging Good Representations in Linear Contextual Bandits
ICML 2021
Policy Optimization as Online Learning with Mediator Feedback
AAAI 2021
Importance Sampling Techniques for Policy Optimization
JMLR 2020
Balancing Learning Speed and Stability in Policy Gradient via Adaptive Exploration
AISTATS 2020
Gradient-Aware Model-Based Policy Search
AAAI 2020
Risk-Averse Trust Region Optimization for Reward-Volatility Reduction
IJCAI 2020
Optimistic Policy Optimization via Multiple Importance Sampling
ICML 2019
Stochastic Variance-Reduced Policy Gradient
ICML 2018
Policy Optimization via Importance Sampling
NIPS 2018
Adaptive Batch Size for Safe Policy Gradients
NIPS 2017