Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Global Convergence of Two-Timescale Actor-Critic for Solving Linear Quadratic Regulator
AAAI 2023
For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal
ICML 2023
Why Target Networks Stabilise Temporal Difference Methods
ICML 2023
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies
ICML 2023
Optimizing DDPM Sampling with Shortcut Fine-Tuning
ICML 2023
A Connection between One-Step RL and Critic Regularization in Reinforcement Learning
ICML 2023
Does Sparsity Help in Learning Misspecified Linear Bandits?
ICML 2023
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
ICML 2023
Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes
NIPS 2023
Best of Both Worlds Policy Optimization
ICML 2023
Inverse Reinforcement Learning for Text Summarization
EMNLP 2023
Reinforcement Learning Can Be More Efficient with Multiple Rewards
ICML 2023
Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs
L4DC 2023
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
ICML 2023
Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning
L4DC 2023
Refined Regret for Adversarial MDPs with Linear Function Approximation
ICML 2023
Lower Bounds for Learning in Revealing POMDPs
ICML 2023
Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm
AAAI 2023
Correcting discount-factor mismatch in on-policy policy gradient methods
ICML 2023
Identification of Blackwell Optimal Policies for Deterministic MDPs
AISTATS 2023
Scalable Safe Policy Improvement via Monte Carlo Tree Search
ICML 2023
Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
NIPS 2023
Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games
JMLR 2023
Learning Shared Safety Constraints from Multi-task Demonstrations
NIPS 2023
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
NIPS 2023
<
1
…
29
30
31
…
83
>