Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
MTRL-CG: Multi-Task Reinforcement Learning Method with Spectral Clustering-Based Task Grouping
AAAI 2026
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
EACL 2026
LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration
AAAI 2026
RL-Studio: A System for Multi-Phase Reinforcement Learning Experimentation
AAAI 2026
Active Perception Meets Rule-Guided RL: A Two-Phase Approach for Precise Object Navigation in Complex Environments
ICCV 2025
Simple Policy Optimization
ICML 2025
SciCompanion: Graph-Grounded Reasoning for Structured Evaluation of Scientific Arguments
EMNLP 2025
RAVEN++: Pinpointing Fine-Grained Violations in Advertisement Videos with Active Reinforcement Reasoning
EMNLP 2025
ViUniT: Visual Unit Tests for More Robust Visual Programming
CVPR 2025
REARANK: Reasoning Re-ranking Agent via Reinforcement Learning
EMNLP 2025
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
CVPR 2025
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
ICCV 2025
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
ICCV 2025
One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues
EMNLP 2025
Reward Mixology: Crafting Hybrid Signals for Reinforcement Learning Driven In-Context Learning
EMNLP 2025
RLHF Algorithms Ranked: An Extensive Evaluation Across Diverse Tasks, Rewards, and Hyperparameters
EMNLP 2025
Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks
EMNLP 2025
VLP: Vision-Language Preference Learning for Embodied Manipulation
EMNLP 2025
Token-level Proximal Policy Optimization for Query Generation
EMNLP 2025
RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning
EMNLP 2025
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
EMNLP 2025
Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
EMNLP 2025
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
EMNLP 2025
Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization
EMNLP 2025
SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin
EMNLP 2025
<
1
2
3
4
5
…
83
>