Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Multi-Armed Bandits
1044 directly classified papers
Papers per year
2002: 1
2006: 2
2007: 3
2008: 5
2009: 3
2010: 5
2011: 23
2012: 16
2013: 32
2014: 42
2015: 27
2016: 33
2017: 46
2018: 55
2019: 80
2020: 87
2021: 124
2022: 160
2023: 136
2024: 126
2025: 38
Papers
Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification
NIPS 2022
Smoothed Adversarial Linear Contextual Bandits with Knapsacks
ICML 2022
Trading Off Resource Budgets For Improved Regret Bounds
NIPS 2022
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
ICML 2022
Breaking the $\sqrtT$ Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits
ICML 2022
Best Heuristic Identification for Constraint Satisfaction
IJCAI 2022
Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits
NIPS 2022
Deep Hierarchy in Bandits
ICML 2022
A Reduction from Linear Contextual Bandits Lower Bounds to Estimations Lower Bounds
ICML 2022
Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
NIPS 2022
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits
ICML 2022
Regret Minimization with Performative Feedback
ICML 2022
Inverse Contextual Bandits: Learning How Behavior Evolves over Time
ICML 2022
IMED-RL: Regret optimal learning of ergodic Markov decision processes
NIPS 2022
Towards Off-Policy Learning for Ranking Policies with Logged Feedback
AAAI 2022
Differentially Private Regret Minimization in Episodic Markov Decision Processes
AAAI 2022
Fixed-Budget Best-Arm Identification in Structured Bandits
IJCAI 2022
Effective Dimension in Bandit Problems under Censorship
NIPS 2022
Instance Dependent Regret Analysis of Kernelized Bandits
ICML 2022
Multi-slots Online Matching with High Entropy
ICML 2022
No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation
NIPS 2022
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits
AAAI 2022
Learning in Congestion Games with Bandit Feedback
NIPS 2022
When Combinatorial Thompson Sampling meets Approximation Regret
NIPS 2022
An $\alpha$-No-Regret Algorithm For Graphical Bilinear Bandits
NIPS 2022
<
1
…
16
17
18
…
42
>