conftrace_

Csaba Szepesvári

158 papers · 2007–2025 · 11 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+19 more ↓

🗺️ Taxonomy Completionist (48) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (8) 🐣 Hot Topic Early Bird

🐣 Hot Topic Early Bird 🌈 Renaissance Researcher (8) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (56) 🌟 Keyword Trendsetter Combo (3) 🌱 Topic Pioneer 👑 Triple Crown 🔬 Deep Specialist (11) 🏆 Keyword Champion (3) 🧬 Topic Evolution 🏆 Grand Slam 🤝 Dynamic Duo (33) ❓ The Questioner (3) 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (19) ⚡ Prolific Year (12) 💎 Century Club (158) 🗃️ Keyword Collector (201)

Conferences

NIPS (56) ICML (37) AISTATS (26) COLT (14) ALT (7) JMLR (6) UAI (4) ICLR (3) IJCAI (3) AAAI (1) L4DC (1)

Top co-authors

András György (33) Tor Lattimore (22) Branislav Kveton (16) Dale Schuurmans (15) Gellert Weisz (13) Bo Dai (11) Jincheng Mei (10) Yasin Abbasi-Yadkori (10) Mohammad Ghavamzadeh (9) Mengdi Wang (8)

Keywords

regret bound (49) online learning (31) multi-armed bandit (25) markov decision process (20) stochastic optimization (20) reinforcement learning (16) sample complexity (13) linear function approximation (12) policy iteration (9) regret analysis (8) partial monitoring (8) function approximation (8) regret minimization (7) online algorithm (7) thompson sampling (7) value function (7) contextual bandit (7) learning to rank (6) policy optimization (6) stochastic bandit (6)

Papers

Thompson Sampling for Bandit Convex Optimisation COLT 2025 Almost Free: Self-concordance in Natural Exponential Families and an Application to Bandits NIPS 2024 To Believe or Not to Believe Your LLM: Iterative Prompting for Estimating Epistemic Uncertainty NIPS 2024 Ensemble sampling for linear bandits: small ensembles suffice NIPS 2024 Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates NIPS 2024 Switching the Loss Reduces the Cost in Batch Reinforcement Learning ICML 2024 Exploration via linearly perturbed loss minimisation AISTATS 2024 Stochastic Gradient Descent for Gaussian Processes Done Right ICLR 2024 Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear $q^\pi$-Realizability and Concentrability NIPS 2024 Confident Natural Policy Gradient for Local Planning in $q_\pi$-realizable Constrained MDPs NIPS 2024 Context-lumpable stochastic bandits NIPS 2023 Exponential Hardness of Reinforcement Learning with Linear Function Approximation COLT 2023 Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice ICML 2023 Revisiting Simple Regret: Fast Rates for Returning a Good Arm ICML 2023 Stochastic Gradient Succeeds for Bandits ICML 2023 The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation ICML 2023 Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning AISTATS 2023 Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore NIPS 2023 Regret Minimization via Saddle Point Optimization NIPS 2023 Ordering-based Conditions for Global Convergence of Policy Gradient Methods NIPS 2023 Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL NIPS 2023 Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics ICLR 2023 The Role of Baselines in Policy Gradient Optimization NIPS 2022 A free lunch from the noise: Provable and practical exploration for representation learning UAI 2022 Towards painless policy optimization for constrained MDPs UAI 2022 When Is Partially Observable Reinforcement Learning Not Scary? COLT 2022 Efficient local planning with linear function approximation ALT 2022 TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions ALT 2022 The Curse of Passive Data Collection in Batch Reinforcement Learning AISTATS 2022 Faster Rates, Adaptive Algorithms, and Finite-Time Bounds for Linear Composition Optimization and Gradient TD Learning AISTATS 2022 Confident Least Square Value Iteration with Local Access to a Simulator AISTATS 2022 Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization NIPS 2022 Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs NIPS 2022 Sample-Efficient Reinforcement Learning of Partially Observable Markov Games NIPS 2022 Near-Optimal Sample Complexity Bounds for Constrained MDPs NIPS 2022 On the Optimality of Batch Policy Optimization Algorithms ICML 2021 On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method NIPS 2021 Understanding the Effect of Stochasticity in Policy Optimization NIPS 2021 No Regrets for Learning the Prior in Bandits NIPS 2021 On the Role of Optimization in Double Descent: A Least Squares Study NIPS 2021 Online Sparse Reinforcement Learning AISTATS 2021 Adaptive Approximate Policy Iteration AISTATS 2021 Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting AISTATS 2021 Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions ALT 2021 Asymptotically Optimal Information-Directed Sampling COLT 2021 **Paper retracted by author request (see pdf for retraction notice from the authors)** Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping COLT 2021 On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function COLT 2021 Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes COLT 2021 Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient ICML 2021 Bootstrapping Fitted Q-Evaluation for Off-Policy Inference ICML 2021 A Distribution-dependent Analysis of Meta Learning ICML 2021 Meta-Thompson Sampling ICML 2021 Improved Regret Bound and Experience Replay in Regularized Policy Iteration ICML 2021 Leveraging Non-uniformity in First-order Non-convex Optimization ICML 2021 Tighter Risk Certificates for Neural Networks JMLR 2021 A simpler approach to accelerated optimization: iterative averaging meets optimism ICML 2020 Online Algorithm for Unsupervised Sequential Selection with Contextual Information NIPS 2020 Differentiable Meta-Learning of Bandit Policies NIPS 2020 Variational Policy Gradient Method for Reinforcement Learning with General Utilities NIPS 2020 CoinDICE: Off-Policy Confidence Interval Estimation NIPS 2020 Randomized Exploration in Generalized Linear Bandits AISTATS 2020 Adaptive Exploration in Linear Contextual Bandit AISTATS 2020 Model Selection in Contextual Stochastic Bandit Problems NIPS 2020 PAC-Bayes Analysis Beyond the Usual Bounds NIPS 2020 ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool NIPS 2020 Efficient Planning in Large MDPs with Weak Linear Function Approximation NIPS 2020 Escaping the Gravitational Pull of Softmax NIPS 2020 Model-Based Reinforcement Learning with Value-Targeted Regression L4DC 2020 Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers JMLR 2020 On the Global Convergence Rates of Softmax Policy Gradient Methods ICML 2020 Learning with Good Feature Representations in Bandits and in RL with a Generative Model ICML 2020 Exploration by Optimisation in Partial Monitoring COLT 2020 Model-Based Reinforcement Learning with Value-Targeted Regression ICML 2020 Behaviour Suite for Reinforcement Learning ICLR 2020 Online Algorithm for Unsupervised Sensor Selection AISTATS 2019 Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging NIPS 2019 Detecting Overfitting via Adversarial Examples NIPS 2019 Perturbed-History Exploration in Stochastic Multi-Armed Bandits IJCAI 2019 An Information-Theoretic Approach to Minimax Regret in Partial Monitoring COLT 2019 POLITEX: Regret Bounds for Policy Iteration using Expert Prediction ICML 2019 Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits ICML 2019 Perturbed-History Exploration in Stochastic Linear Bandits UAI 2019 BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback UAI 2019 Model-Free Linear Quadratic Control via Reduction to Expert Prediction AISTATS 2019 An Exponential Tail Bound for the Deleted Estimate AAAI 2019 An Exponential Efron-Stein Inequality for $L_q$ Stable Learning Rules ALT 2019 Cleaning up the neighborhood: A full classification for adversarial partial monitoring ALT 2019 Online Learning to Rank with Features ICML 2019 Distribution-Dependent Analysis of Gibbs-ERM Principle COLT 2019 CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration ICML 2019 LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration ICML 2018 Bandits with Delayed, Aggregated Anonymous Feedback ICML 2018 Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers ICML 2018 TopRank: A practical algorithm for online stochastic ranking NIPS 2018 PAC-Bayes bounds for stable algorithms with instance-dependent priors NIPS 2018 Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go? AISTATS 2018 Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities JMLR 2017 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds ALT 2017 Stochastic Rank-1 Bandits AISTATS 2017 The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits AISTATS 2017 Unsupervised Sequential Sensor Acquisition AISTATS 2017 Multi-view Matrix Factorization for Linear Dynamical System Estimation NIPS 2017 Structured Best Arm Identification with Fixed Confidence ALT 2017 Online Learning to Rank in Stochastic Click Models ICML 2017 Bernoulli Rank-1 Bandits for Click Feedback IJCAI 2017 Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities NIPS 2016 (Bandit) Convex Optimization with Biased Noisy Gradient Oracles AISTATS 2016 Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models COLT 2016 Regularized Policy Iteration with Nonparametric Function Spaces JMLR 2016 DCM Bandits: Learning to Rank with Multiple Clicks ICML 2016 Conservative Bandits ICML 2016 Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control ICML 2016 Shifting Regret, Mirror Descent, and Matrices ICML 2016 SDP Relaxation with Randomized Rounding for Energy Disaggregation NIPS 2016 Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM AISTATS 2015 Fast Cross-Validation for Incremental Learning IJCAI 2015 Online Learning with Gaussian Payoffs and Side Observations NIPS 2015 Linear Multi-Resource Allocation with Semi-Bandit Feedback NIPS 2015 Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path NIPS 2015 Deterministic Independent Component Analysis ICML 2015 Cascading Bandits: Learning to Rank in the Cascade Model ICML 2015 On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments ICML 2015 Near-optimal max-affine estimators for convex regression AISTATS 2015 Combinatorial Cascading Bandits NIPS 2015 Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits AISTATS 2015 Toward Minimax Off-policy Value Estimation AISTATS 2015 Universal Option Models NIPS 2014 Online Learning in Markov Decision Processes with Changing Cost Sequences ICML 2014 A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models AISTATS 2014 Adaptive Monte Carlo via Bandit Allocation ICML 2014 Online Learning with Costly Features and Labels NIPS 2013 Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions NIPS 2013 A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning ICML 2013 Online Learning under Delayed Feedback ICML 2013 Cost-sensitive Multiclass Classification Risk Bounds ICML 2013 Characterizing the Representer Theorem ICML 2013 Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits AISTATS 2012 The adversarial stochastic shortest path problem with unknown transition probabilities AISTATS 2012 Deep Representations and Codes for Image Auto-Annotation NIPS 2012 Regret Bounds for the Adaptive Control of Linear Quadratic Systems COLT 2011 Improved Algorithms for Linear Stochastic Bandits NIPS 2011 -Armed Bandits JMLR 2011 Agnostic KWIK learning and efficient approximate reinforcement learning COLT 2011 Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments COLT 2011 Error Propagation for Approximate Policy and Value Iteration NIPS 2010 Online Markov Decision Processes under Bandit Feedback NIPS 2010 Parametric Bandits: The Generalized Linear Case NIPS 2010 A Markov-Chain Monte Carlo Approach to Simultaneous Localization and Mapping AISTATS 2010 Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs NIPS 2010 REGO: Rank-based Estimation of Renyi Information using Euclidean Graph Optimization AISTATS 2010 Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation NIPS 2009 A General Projection Property for Distribution Families NIPS 2009 Multi-Step Dyna Planning for Policy Evaluation and Control NIPS 2009 Finite-Time Bounds for Fitted Value Iteration JMLR 2008 Regularized Policy Iteration NIPS 2008 A Convergent $O(n)$ Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation NIPS 2008 Online Optimization in X-Armed Bandits NIPS 2008 Fitted Q-iteration in continuous action-space MDPs NIPS 2007