Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
The Accuracy Paradox in RLHF: When Better Reward Models Don’t Yield Better Language Models
EMNLP 2024
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
NIPS 2024
Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use
EMNLP 2024
Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning
JMLR 2024
Resilient Constrained Reinforcement Learning
AISTATS 2024
Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality
NIPS 2024
Timing as an Action: Learning When to Observe and Act
AISTATS 2024
A Grounded Preference Model for LLM Alignment
ACL 2024
Reward Engineering for Generating Semi-structured Explanation
EACL 2024
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics
RSS 2024
Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees
NIPS 2024
POLICEd RL: Learning Closed-Loop Robot Control Policies with Provable Satisfaction of Hard Constraints
RSS 2024
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
EMNLP 2024
Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator
NIPS 2024
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning
NIPS 2024
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
EMNLP 2024
Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests
EACL 2024
Minimax-optimal reward-agnostic exploration in reinforcement learning
COLT 2024
A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning
AISTATS 2024
ERL-TD: Evolutionary Reinforcement Learning Enhanced with Truncated Variance and Distillation Mutation
AAAI 2024
Imitating Language via Scalable Inverse Reinforcement Learning
NIPS 2024
OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
AAAI 2024
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
AISTATS 2024
Relative Policy-Transition Optimization for Fast Policy Transfer
AAAI 2024
Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability
COLT 2024
<
1
…
14
15
16
…
83
>