← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning

2932 directly classified papers

Papers per year

Papers

Alignment with Fill-In-the-Middle for Enhancing Code Generation EMNLP 2025

Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization EMNLP 2025

Predicate-Guided Generation for Mathematical Reasoning EMNLP 2025

A Comprehensive Survey on Learning from Rewards for Large Language Models: Reward Models and Learning Strategies EMNLP 2025

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning ICCV 2025

START: Self-taught Reasoner with Tools EMNLP 2025

Case-Based Decision-Theoretic Decoding with Quality Memories EMNLP 2025

Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG EMNLP 2025

SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin EMNLP 2025

Mitigating Object Hallucinations via Sentence-Level Early Intervention ICCV 2025

NeLLCom-Lex: A Neural-agent Framework to Study the Interplay between Lexical Systems and Language Use EMNLP 2025

ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning EMNLP 2025

DynaQuest: A Dynamic Question Answering Dataset Reflecting Real-World Knowledge Updates ACL 2025

Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization COLING 2025

RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation EMNLP 2025

Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards EMNLP 2025

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method CVPR 2025

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards CVPR 2025

Exploration-Driven Generative Interactive Environments CVPR 2025

Decision SpikeFormer: Spike-Driven Transformer for Decision Making CVPR 2025

Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning CVPR 2025

STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback EMNLP 2025

Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting NAACL 2025

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation CVPR 2025

ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation NAACL 2025