Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Don’t Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
ACL 2025
Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback
ACL 2025
bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning
ACL 2025
Understanding Reference Policies in Direct Preference Optimization
NAACL 2025
A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
EMNLP 2025
LookAlike: Consistent Distractor Generation in Math MCQs
ACL 2025
Henry at BEA 2025 Shared Task: Improving AI Tutor’s Guidance Evaluation Through Context-Aware Distillation
ACL 2025
Positive Experience Reflection for Agents in Interactive Text Environments
ACL 2025
Sparse Rewards Can Self-Train Dialogue Agents
ACL 2025
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
EMNLP 2025
Search-in-Context: Efficient Multi-Hop QA over Long Contexts via Monte Carlo Tree Search with Dynamic KV Retrieval
ACL 2025
Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning
ACL 2025
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty
EMNLP 2025
Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
ACL 2025
Training Language Model to Critique for Better Refinement
ACL 2025
Adversarial Preference Learning for Robust LLM Alignment
ACL 2025
A Reinforcement Learning Framework for Cross-Lingual Stance Detection Using Chain-of-Thought Alignment
ACL 2025
Understand the Implication: Learning to Think for Pragmatic Understanding
ACL 2025
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
EMNLP 2025
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
EMNLP 2025
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning
EMNLP 2025
Removing Prompt-template Bias in Reinforcement Learning from Human Feedback
ACL 2025
DynaQuest: A Dynamic Question Answering Dataset Reflecting Real-World Knowledge Updates
ACL 2025
The Power of Simplicity in LLM-Based Event Forecasting
ACL 2025
Mutual-Taught for Co-adapting Policy and Reward Models
ACL 2025
<
1
…
5
6
7
…
118
>