Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Low-rank MDPs with Continuous Action Spaces
AISTATS 2024
Meta-learning linear quadratic regulators: a policy gradient MAML approach for model-free LQR
L4DC 2024
Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs
NIPS 2024
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
JMLR 2024
NeoRL: Efficient Exploration for Nonepisodic RL
NIPS 2024
On the uniqueness of solution for the Bellman equation of LTL objectives
L4DC 2024
Data-Efficient Policy Evaluation Through Behavior Policy Search
JMLR 2024
Learning Optimal Advantage from Preferences and Mistaking It for Reward
AAAI 2024
F2RL: Factuality and Faithfulness Reinforcement Learning Framework for Claim-Guided Evidence-Supported Counterspeech Generation
EMNLP 2024
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
EMNLP 2024
A theoretical case-study of Scalable Oversight in Hierarchical Reinforcement Learning
NIPS 2024
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence beyond the Minty Property
AAAI 2024
Hierarchical Planning and Learning for Robots in Stochastic Settings Using Zero-Shot Option Invention
AAAI 2024
Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation
EMNLP 2024
Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning
NIPS 2024
Risk-sensitive control as inference with Rényi divergence
NIPS 2024
Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignment
NIPS 2024
GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via Stationary Distribution Correction Estimation
AAAI 2024
A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective
NIPS 2024
Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables
AAAI 2024
DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization
AAAI 2024
Mitigating Open-Vocabulary Caption Hallucinations
EMNLP 2024
C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory
NIPS 2024
Beyond Expected Return: Accounting for Policy Reproducibility When Evaluating Reinforcement Learning Algorithms
AAAI 2024
Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation
COLING 2024
<
1
…
16
17
18
…
83
>