← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning

2932 directly classified papers

Papers per year

Papers

Formally Verified Approximate Policy Iteration AAAI 2025

DisCo-DSO: Coupling Discrete and Continuous Optimization for Efficient Generative Design in Hybrid Spaces AAAI 2025

Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models AAAI 2025

Optimizing Heat Alert Issuance with Reinforcement Learning AAAI 2025

PaSa: An LLM Agent for Comprehensive Academic Paper Search ACL 2025

Optimizing Decomposition for Optimal Claim Verification ACL 2025

Dynamic Scaling of Unit Tests for Code Reward Modeling ACL 2025

ACECODER: Acing Coder RL via Automated Test-Case Synthesis ACL 2025

YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering ACL 2025

Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction IJCNLP 2025

MasRouter: Learning to Route LLMs for Multi-Agent Systems ACL 2025

Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL ACL 2025

Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models ACL 2025

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs IJCNLP 2025

Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively ACL 2025

One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL ACL 2025

Reinforcement Learning for Adversarial Query Generation to Enhance Relevance in Cold-Start Product Search ACL 2025

Structured Document Translation via Format Reinforcement Learning IJCNLP 2025

Boosting Policy and Process Reward Models with Monte Carlo Tree Search in Open-Domain QA ACL 2025

CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation ACL 2025

RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler CVPR 2025

Teaching Your Models to Understand Code via Focal Preference Alignment EMNLP 2025

LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models EMNLP 2025

LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization EMNLP 2025

Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF JMLR 2025