Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Off-Policy Imitation Learning from Observations
NIPS 2020
f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning
NIPS 2020
PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
NIPS 2020
Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting
NIPS 2020
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
NIPS 2020
Meta-Gradient Reinforcement Learning with an Objective Discovered Online
NIPS 2020
Error Bounds of Imitating Policies and Environments
NIPS 2020
Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
NIPS 2020
A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods
NIPS 2020
Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
NIPS 2020
On Reward-Free Reinforcement Learning with Linear Function Approximation
NIPS 2020
Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
NIPS 2020
Actor-Double-Critic: Incorporating Model-Based Critic for Task-Oriented Dialogue Systems
EMNLP 2020
ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)
AAAI 2020
Third-Person Imitation Learning via Image Difference and Variational Discriminator Bottleneck (Student Abstract)
AAAI 2020
Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)
AAAI 2020
Tree-Structured Policy Based Progressive Reinforcement Learning for Temporally Language Grounding in Video
AAAI 2020
POST: POlicy-Based Switch Tracking
AAAI 2020
Learning from Interventions Using Hierarchical Policies for Safe Learning
AAAI 2020
The Choice Function Framework for Online Policy Improvement
AAAI 2020
Scalable Methods for Computing State Similarity in Deterministic Markov Decision Processes
AAAI 2020
Learning Calibratable Policies using Programmatic Style-Consistency
ICML 2020
Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate
ICML 2020
Adapting to Misspecification in Contextual Bandits
NIPS 2020
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
AAAI 2020
<
1
…
55
56
57
…
83
>