Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Formally Verified Approximate Policy Iteration
AAAI 2025
DisCo-DSO: Coupling Discrete and Continuous Optimization for Efficient Generative Design in Hybrid Spaces
AAAI 2025
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
AAAI 2025
Optimizing Heat Alert Issuance with Reinforcement Learning
AAAI 2025
PaSa: An LLM Agent for Comprehensive Academic Paper Search
ACL 2025
Optimizing Decomposition for Optimal Claim Verification
ACL 2025
Dynamic Scaling of Unit Tests for Code Reward Modeling
ACL 2025
ACECODER: Acing Coder RL via Automated Test-Case Synthesis
ACL 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
ACL 2025
Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction
IJCNLP 2025
MasRouter: Learning to Route LLMs for Multi-Agent Systems
ACL 2025
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL
ACL 2025
Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models
ACL 2025
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
IJCNLP 2025
Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively
ACL 2025
One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL
ACL 2025
Reinforcement Learning for Adversarial Query Generation to Enhance Relevance in Cold-Start Product Search
ACL 2025
Structured Document Translation via Format Reinforcement Learning
IJCNLP 2025
Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA
ACL 2025
CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation
ACL 2025
RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler
CVPR 2025
Teaching Your Models to Understand Code via Focal Preference Alignment
EMNLP 2025
LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models
EMNLP 2025
LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization
EMNLP 2025
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
JMLR 2025
<
1
…
17
18
19
…
118
>