Csaba Szepesvári
158 papers · 2007–2025 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
🗺️ Taxonomy Completionist (48) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (8) 🐣 Hot Topic Early Bird
🐣
Hot Topic Early Bird
🌈
Renaissance Researcher
(8)
🌉
Interdisciplinary Bridge
🏠
Conference Loyalist
(56)
🌟
Keyword Trendsetter Combo
(3)
🌱
Topic Pioneer
👑
Triple Crown
🔬
Deep Specialist
(11)
🏆
Keyword Champion
(3)
🧬
Topic Evolution
🏆
Grand Slam
🤝
Dynamic Duo
(33)
❓
The Questioner
(3)
📈
Trend Setter
🚀
Conference Pioneer
🔥
Unstoppable
(19)
⚡
Prolific Year
(12)
💎
Century Club
(158)
🗃️
Keyword Collector
(201)
Conferences
NIPS (56)
ICML (37)
AISTATS (26)
COLT (14)
ALT (7)
JMLR (6)
UAI (4)
ICLR (3)
IJCAI (3)
AAAI (1)
L4DC (1)
Top co-authors
Keywords
regret bound
(49)
online learning
(31)
multi-armed bandit
(25)
markov decision process
(20)
stochastic optimization
(20)
reinforcement learning
(16)
sample complexity
(13)
linear function approximation
(12)
policy iteration
(9)
regret analysis
(8)
partial monitoring
(8)
function approximation
(8)
regret minimization
(7)
online algorithm
(7)
thompson sampling
(7)
value function
(7)
contextual bandit
(7)
learning to rank
(6)
policy optimization
(6)
stochastic bandit
(6)
Papers
Thompson Sampling for Bandit Convex Optimisation
COLT 2025
Almost Free: Self-concordance in Natural Exponential Families and an Application to Bandits
NIPS 2024
To Believe or Not to Believe Your LLM: Iterative Prompting for Estimating Epistemic Uncertainty
NIPS 2024
Ensemble sampling for linear bandits: small ensembles suffice
NIPS 2024
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
NIPS 2024
Switching the Loss Reduces the Cost in Batch Reinforcement Learning
ICML 2024
Exploration via linearly perturbed loss minimisation
AISTATS 2024
Stochastic Gradient Descent for Gaussian Processes Done Right
ICLR 2024
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear $q^\pi$-Realizability and Concentrability
NIPS 2024
Confident Natural Policy Gradient for Local Planning in $q_\pi$-realizable Constrained MDPs
NIPS 2024
Context-lumpable stochastic bandits
NIPS 2023
Exponential Hardness of Reinforcement Learning with Linear Function Approximation
COLT 2023
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
ICML 2023
Revisiting Simple Regret: Fast Rates for Returning a Good Arm
ICML 2023
Stochastic Gradient Succeeds for Bandits
ICML 2023
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation
ICML 2023
Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning
AISTATS 2023
Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore
NIPS 2023
Regret Minimization via Saddle Point Optimization
NIPS 2023
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
NIPS 2023
Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL
NIPS 2023
Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics
ICLR 2023
The Role of Baselines in Policy Gradient Optimization
NIPS 2022
A free lunch from the noise: Provable and practical exploration for representation learning
UAI 2022
Towards painless policy optimization for constrained MDPs
UAI 2022
When Is Partially Observable Reinforcement Learning Not Scary?
COLT 2022
Efficient local planning with linear function approximation
ALT 2022
TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions
ALT 2022
The Curse of Passive Data Collection in Batch Reinforcement Learning
AISTATS 2022
Faster Rates, Adaptive Algorithms, and Finite-Time Bounds for Linear Composition Optimization and Gradient TD Learning
AISTATS 2022
Confident Least Square Value Iteration with Local Access to a Simulator
AISTATS 2022
Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
NIPS 2022
Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs
NIPS 2022
Sample-Efficient Reinforcement Learning of Partially Observable Markov Games
NIPS 2022
Near-Optimal Sample Complexity Bounds for Constrained MDPs
NIPS 2022
On the Optimality of Batch Policy Optimization Algorithms
ICML 2021
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
NIPS 2021
Understanding the Effect of Stochasticity in Policy Optimization
NIPS 2021
No Regrets for Learning the Prior in Bandits
NIPS 2021
On the Role of Optimization in Double Descent: A Least Squares Study
NIPS 2021
Online Sparse Reinforcement Learning
AISTATS 2021
Adaptive Approximate Policy Iteration
AISTATS 2021
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting
AISTATS 2021
Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions
ALT 2021
Asymptotically Optimal Information-Directed Sampling
COLT 2021
**Paper retracted by author request (see pdf for retraction notice from the authors)** Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping
COLT 2021
On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function
COLT 2021
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes
COLT 2021
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
ICML 2021
Bootstrapping Fitted Q-Evaluation for Off-Policy Inference
ICML 2021
A Distribution-dependent Analysis of Meta Learning
ICML 2021
Meta-Thompson Sampling
ICML 2021
Improved Regret Bound and Experience Replay in Regularized Policy Iteration
ICML 2021
Leveraging Non-uniformity in First-order Non-convex Optimization
ICML 2021
Tighter Risk Certificates for Neural Networks
JMLR 2021
A simpler approach to accelerated optimization: iterative averaging meets optimism
ICML 2020
Online Algorithm for Unsupervised Sequential Selection with Contextual Information
NIPS 2020
Differentiable Meta-Learning of Bandit Policies
NIPS 2020
Variational Policy Gradient Method for Reinforcement Learning with General Utilities
NIPS 2020
CoinDICE: Off-Policy Confidence Interval Estimation
NIPS 2020
Randomized Exploration in Generalized Linear Bandits
AISTATS 2020
Adaptive Exploration in Linear Contextual Bandit
AISTATS 2020
Model Selection in Contextual Stochastic Bandit Problems
NIPS 2020
PAC-Bayes Analysis Beyond the Usual Bounds
NIPS 2020
ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool
NIPS 2020
Efficient Planning in Large MDPs with Weak Linear Function Approximation
NIPS 2020
Escaping the Gravitational Pull of Softmax
NIPS 2020
Model-Based Reinforcement Learning with Value-Targeted Regression
L4DC 2020
Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers
JMLR 2020
On the Global Convergence Rates of Softmax Policy Gradient Methods
ICML 2020
Learning with Good Feature Representations in Bandits and in RL with a Generative Model
ICML 2020
Exploration by Optimisation in Partial Monitoring
COLT 2020
Model-Based Reinforcement Learning with Value-Targeted Regression
ICML 2020
Behaviour Suite for Reinforcement Learning
ICLR 2020
Online Algorithm for Unsupervised Sensor Selection
AISTATS 2019
Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging
NIPS 2019
Detecting Overfitting via Adversarial Examples
NIPS 2019
Perturbed-History Exploration in Stochastic Multi-Armed Bandits
IJCAI 2019
An Information-Theoretic Approach to Minimax Regret in Partial Monitoring
COLT 2019
POLITEX: Regret Bounds for Policy Iteration using Expert Prediction
ICML 2019
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
ICML 2019
Perturbed-History Exploration in Stochastic Linear Bandits
UAI 2019
BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback
UAI 2019
Model-Free Linear Quadratic Control via Reduction to Expert Prediction
AISTATS 2019
An Exponential Tail Bound for the Deleted Estimate
AAAI 2019
An Exponential Efron-Stein Inequality for $L_q$ Stable Learning Rules
ALT 2019
Cleaning up the neighborhood: A full classification for adversarial partial monitoring
ALT 2019
Online Learning to Rank with Features
ICML 2019
Distribution-Dependent Analysis of Gibbs-ERM Principle
COLT 2019
CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration
ICML 2019
LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration
ICML 2018
Bandits with Delayed, Aggregated Anonymous Feedback
ICML 2018
Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers
ICML 2018
TopRank: A practical algorithm for online stochastic ranking
NIPS 2018
PAC-Bayes bounds for stable algorithms with instance-dependent priors
NIPS 2018
Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go?
AISTATS 2018
Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities
JMLR 2017
A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds
ALT 2017
Stochastic Rank-1 Bandits
AISTATS 2017
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits
AISTATS 2017
Unsupervised Sequential Sensor Acquisition
AISTATS 2017
Multi-view Matrix Factorization for Linear Dynamical System Estimation
NIPS 2017
Structured Best Arm Identification with Fixed Confidence
ALT 2017
Online Learning to Rank in Stochastic Click Models
ICML 2017
Bernoulli Rank-1 Bandits for Click Feedback
IJCAI 2017
Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities
NIPS 2016
(Bandit) Convex Optimization with Biased Noisy Gradient Oracles
AISTATS 2016
Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models
COLT 2016
Regularized Policy Iteration with Nonparametric Function Spaces
JMLR 2016
DCM Bandits: Learning to Rank with Multiple Clicks
ICML 2016
Conservative Bandits
ICML 2016
Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
ICML 2016
Shifting Regret, Mirror Descent, and Matrices
ICML 2016
SDP Relaxation with Randomized Rounding for Energy Disaggregation
NIPS 2016
Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM
AISTATS 2015
Fast Cross-Validation for Incremental Learning
IJCAI 2015
Online Learning with Gaussian Payoffs and Side Observations
NIPS 2015
Linear Multi-Resource Allocation with Semi-Bandit Feedback
NIPS 2015
Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path
NIPS 2015
Deterministic Independent Component Analysis
ICML 2015
Cascading Bandits: Learning to Rank in the Cascade Model
ICML 2015
On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments
ICML 2015
Near-optimal max-affine estimators for convex regression
AISTATS 2015
Combinatorial Cascading Bandits
NIPS 2015
Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
AISTATS 2015
Toward Minimax Off-policy Value Estimation
AISTATS 2015
Universal Option Models
NIPS 2014
Online Learning in Markov Decision Processes with Changing Cost Sequences
ICML 2014
A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models
AISTATS 2014
Adaptive Monte Carlo via Bandit Allocation
ICML 2014
Online Learning with Costly Features and Labels
NIPS 2013
Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
NIPS 2013
A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning
ICML 2013
Online Learning under Delayed Feedback
ICML 2013
Cost-sensitive Multiclass Classification Risk Bounds
ICML 2013
Characterizing the Representer Theorem
ICML 2013
Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits
AISTATS 2012
The adversarial stochastic shortest path problem with unknown transition probabilities
AISTATS 2012
Deep Representations and Codes for Image Auto-Annotation
NIPS 2012
Regret Bounds for the Adaptive Control of Linear Quadratic Systems
COLT 2011
Improved Algorithms for Linear Stochastic Bandits
NIPS 2011
-Armed Bandits
JMLR 2011
Agnostic KWIK learning and efficient approximate reinforcement learning
COLT 2011
Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
COLT 2011
Error Propagation for Approximate Policy and Value Iteration
NIPS 2010
Online Markov Decision Processes under Bandit Feedback
NIPS 2010
Parametric Bandits: The Generalized Linear Case
NIPS 2010
A Markov-Chain Monte Carlo Approach to Simultaneous Localization and Mapping
AISTATS 2010
Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs
NIPS 2010
REGO: Rank-based Estimation of Renyi Information using Euclidean Graph Optimization
AISTATS 2010
Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
NIPS 2009
A General Projection Property for Distribution Families
NIPS 2009
Multi-Step Dyna Planning for Policy Evaluation and Control
NIPS 2009
Finite-Time Bounds for Fitted Value Iteration
JMLR 2008
Regularized Policy Iteration
NIPS 2008
A Convergent $O(n)$ Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation
NIPS 2008
Online Optimization in X-Armed Bandits
NIPS 2008
Fitted Q-iteration in continuous action-space MDPs
NIPS 2007