Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards
EMNLP 2024
Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics
RSS 2024
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
EMNLP 2024
Mitigating Open-Vocabulary Caption Hallucinations
EMNLP 2024
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
EMNLP 2024
A Fairness-Driven Method for Learning Human-Compatible Negotiation Strategies
EMNLP 2024
Reward Modeling Requires Automatic Adjustment Based on Data Quality
EMNLP 2024
E2CL: Exploration-based Error Correction Learning for Embodied Agents
EMNLP 2024
Exploiting Careful Design of SVM Solution for Aspect-term Sentiment Analysis
EMNLP 2024
POLICEd RL: Learning Closed-Loop Robot Control Policies with Provable Satisfaction of Hard Constraints
RSS 2024
Rating-Based Reinforcement Learning
AAAI 2024
Fast two-time-scale stochastic gradient method with applications in reinforcement learning
COLT 2024
TRIP NEGOTIATOR: A Travel Persona-aware Reinforced Dialogue Generation Model for Personalized Integrative Negotiation in Tourism
EMNLP 2024
Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL
AISTATS 2024
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
EMNLP 2024
Positivity-free Policy Learning with Observational Data
AISTATS 2024
Enhancing Alignment using Curriculum Learning & Ranked Preferences
EMNLP 2024
Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability
COLT 2024
Any-point Trajectory Modeling for Policy Learning
RSS 2024
Do no harm: A counterfactual approach to safe reinforcement learning
L4DC 2024
Solving Long-run Average Reward Robust MDPs via Stochastic Games
IJCAI 2024
Meta-learning linear quadratic regulators: a policy gradient MAML approach for model-free LQR
L4DC 2024
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
EMNLP 2024
Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation
EMNLP 2024
Minimax-optimal reward-agnostic exploration in reinforcement learning
COLT 2024
<
1
…
13
14
15
…
83
>