Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
ACL 2025
Steering LLM Reasoning Through Bias-Only Adaptation
EMNLP 2025
Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models
EMNLP 2025
Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework
EMNLP 2025
From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
ACL 2025
Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling
EMNLP 2025
Mapping Smarter, Not Harder: A Test-Time Reinforcement Learning Agent That Improve Without Labels or Model Updates
EMNLP 2025
T-REG: Preference Optimization with Token-Level Reward Regularization
ACL 2025
AutoDSPy: Automating Modular Prompt Design with Reinforcement Learning for Small and Large Language Models
EMNLP 2025
STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback
EMNLP 2025
Don’t Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
ACL 2025
Learning to Translate Ambiguous Terminology by Preference Optimization on Post-Edits
EMNLP 2025
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
ACL 2025
Removing Prompt-template Bias in Reinforcement Learning from Human Feedback
ACL 2025
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
ACL 2025
CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback
ACL 2025
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
ACL 2025
FastMCTS: A Simple Sampling Strategy for Data Synthesis
ACL 2025
Uncertainty-Aware Iterative Preference Optimization for Enhanced LLM Reasoning
ACL 2025
InspireDebate: Multi-Dimensional Subjective-Objective Evaluation-Guided Reasoning and Optimization for Debating
ACL 2025
HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks
ACL 2025
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
ACL 2025
Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning
ACL 2025
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
ACL 2025
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
EMNLP 2025
<
1
…
15
16
17
…
118
>