← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning

2932 directly classified papers

Papers per year

Papers

Don’t Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls ACL 2025

Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback ACL 2025

bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning ACL 2025

Understanding Reference Policies in Direct Preference Optimization NAACL 2025

A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy EMNLP 2025

LookAlike: Consistent Distractor Generation in Math MCQs ACL 2025

Henry at BEA 2025 Shared Task: Improving AI Tutor’s Guidance Evaluation Through Context-Aware Distillation ACL 2025

Positive Experience Reflection for Agents in Interactive Text Environments ACL 2025

Sparse Rewards Can Self-Train Dialogue Agents ACL 2025

Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation EMNLP 2025

Search-in-Context: Efficient Multi-Hop QA over Long Contexts via Monte Carlo Tree Search with Dynamic KV Retrieval ACL 2025

Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning ACL 2025

Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty EMNLP 2025

Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL ACL 2025

Training Language Model to Critique for Better Refinement ACL 2025

Adversarial Preference Learning for Robust LLM Alignment ACL 2025

A Reinforcement Learning Framework for Cross-Lingual Stance Detection Using Chain-of-Thought Alignment ACL 2025

Understand the Implication: Learning to Think for Pragmatic Understanding ACL 2025

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning EMNLP 2025

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning EMNLP 2025

ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning EMNLP 2025

Removing Prompt-template Bias in Reinforcement Learning from Human Feedback ACL 2025

DynaQuest: A Dynamic Question Answering Dataset Reflecting Real-World Knowledge Updates ACL 2025

The Power of Simplicity in LLM-Based Event Forecasting ACL 2025

Mutual-Taught for Co-adapting Policy and Reward Models ACL 2025