Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
EMNLP 2024
OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
AAAI 2024
Model-Based Transfer Learning for Contextual Reinforcement Learning
NIPS 2024
Relative Policy-Transition Optimization for Fast Policy Transfer
AAAI 2024
AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers
NIPS 2024
A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
AISTATS 2024
Sub-optimal Experts mitigate Ambiguity in Inverse Reinforcement Learning
NIPS 2024
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue
EMNLP 2024
Minimax-optimal reward-agnostic exploration in reinforcement learning
COLT 2024
Exploiting the Replay Memory Before Exploring the Environment: Enhancing Reinforcement Learning Through Empirical MDP Iteration
NIPS 2024
Filtered Direct Preference Optimization
EMNLP 2024
An Analytical Study of Utility Functions in Multi-Objective Reinforcement Learning
NIPS 2024
Reward Modeling Requires Automatic Adjustment Based on Data Quality
EMNLP 2024
A Fairness-Driven Method for Learning Human-Compatible Negotiation Strategies
EMNLP 2024
EPO: Hierarchical LLM Agents with Environment Preference Optimization
EMNLP 2024
ABLE: Personalized Disability Support with Politeness and Empathy Integration
EMNLP 2024
Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation
NIPS 2024
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code
EMNLP 2024
ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback
EMNLP 2024
Measuring Mutual Policy Divergence for Multi-Agent Sequential Exploration
NIPS 2024
Rethinking the Role of Proxy Rewards in Language Model Alignment
EMNLP 2024
Mitigating Open-Vocabulary Caption Hallucinations
EMNLP 2024
Exploiting Careful Design of SVM Solution for Aspect-term Sentiment Analysis
EMNLP 2024
Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation
EMNLP 2024
Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs
NIPS 2024
<
1
…
15
16
17
…
83
>