Reinforcement Learning › Methods ›

Policy Learning

2068 directly classified papers

Papers per year

Papers

Receding Horizon Inverse Reinforcement Learning NIPS 2022

A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines ICML 2022

Actor-Critic based Improper Reinforcement Learning ICML 2022

Exponential Family Model-Based Reinforcement Learning via Score Matching NIPS 2022

Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning ICML 2022

Robust Policy Learning over Multiple Uncertainty Sets ICML 2022

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee ICML 2022

Understanding Policy Gradient Algorithms: A Sensitivity-Based Approach ICML 2022

Robust Anytime Learning of Markov Decision Processes NIPS 2022

Policy Gradient Method For Robust Reinforcement Learning ICML 2022

The Geometry of Robust Value Functions ICML 2022

Safe Exploration for Efficient Policy Evaluation and Comparison ICML 2022

Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes ICML 2022

A Temporal-Difference Approach to Policy Gradient Estimation ICML 2022

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses ICML 2022

Generalised Policy Improvement with Geometric Policy Composition ICML 2022

Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments ICML 2022

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation NIPS 2022

Do Differentiable Simulators Give Better Policy Gradients? ICML 2022

Divergence-Regularized Multi-Agent Actor-Critic ICML 2022

Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation ICML 2022

Disentangling Sources of Risk for Distributional Multi-Agent Reinforcement Learning ICML 2022

Communicating via Markov Decision Processes ICML 2022

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning ICML 2022

Optimal Estimation of Policy Gradient via Double Fitted Iteration ICML 2022