reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

PUER: Boosting Few-shot Positive-Unlabeled Entity Resolution with Reinforcement Learning EMNLP 2025

Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series EMNLP 2025

DRBO: Mitigating the Bottleneck Effect via Dynamic Reward Balancing in Multi-reward LLM Optimization EMNLP 2025

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following EMNLP 2025

INREACT: An Inspire-Then-Reinforce Training Framework For Multimodal GUI Agent EMNLP 2025

GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration EMNLP 2025

Teaching LLMs to Plan, Not Just Solve: Plan Learning Boosts LLMs Generalization in Reasoning Tasks EMNLP 2025

Training Medical QA Models Based on Mixed Rewards from Multiple-Choice and Open-Ended Questions EMNLP 2025

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning EMNLP 2025

Teaching Language Models To Gather Information Proactively EMNLP 2025

Reinforcement Learning with Supervised Alignment EMNLP 2025

Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment EMNLP 2025

Teaching Models to Improve on Tape AAAI 2025

All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising CVPR 2025

Scale Down to Speed Up: Dynamic Data Selection for Reinforcement Learning EMNLP 2025

Fast Quiet-STaR: Thinking Without Thought Tokens EMNLP 2025

Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning EMNLP 2025

Speaking at the Right Level: Literacy-Controlled Counterspeech Generation with RAG-RL EMNLP 2025

Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation CVPR 2025

Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning CVPR 2025

ReAL: How Can LLMs Simulate the Real Teacher? Retrieval-enhanced Agent for Adaptive Learning EMNLP 2025

GTA: Supervised-Guided Reinforcement Learning for Text Classification with Large Language Models EMNLP 2025

Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG EMNLP 2025

BackMATH: Towards Backward Reasoning for Solving Math Problems Step by Step COLING 2025

TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use EMNLP 2025