reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Procedural Environment Generation for Tool-Use Agents EMNLP 2025

Flexible Thinking for Multimodal Emotional Support Conversation via Reinforcement Learning EMNLP 2025

Training Language Models to Critique With Multi-agent Feedback EMNLP 2025

LegalSim: Multi-Agent Simulation of Legal Systems for Discovering Procedural Exploits EMNLP 2025

AdaptThink: Reasoning Models Can Learn When to Think EMNLP 2025

Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series EMNLP 2025

PRED: Performance-oriented Random Early Detection for Consistently Stable Performance in Datacenters NSDI 2025

Evolutionary Large Language Model for Automated Feature Transformation AAAI 2025

GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning EMNLP 2025

Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning EMNLP 2025

Legal Mathematical Reasoning with LLMs: Procedural Alignment through Two-Stage Reinforcement Learning EMNLP 2025

LSRL: Process-Supervised GRPO on Latent Recurrent States Improves Mathematical Reasoning EMNLP 2025

Predicate-Guided Generation for Mathematical Reasoning EMNLP 2025

Learning a Continue-Thinking Token for Enhanced Test-Time Scaling IJCNLP 2025

ControlMed: Adding Reasoning Control to Medical Language Model IJCNLP 2025

INREACT: An Inspire-Then-Reinforce Training Framework For Multimodal GUI Agent EMNLP 2025

Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization IJCNLP 2025

KERLQA: Knowledge-Enhanced Reinforcement Learning for Question Answering in Low-resource Languages IJCNLP 2025

DRBO: Mitigating the Bottleneck Effect via Dynamic Reward Balancing in Multi-reward LLM Optimization EMNLP 2025

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning EMNLP 2025

Structured Document Translation via Format Reinforcement Learning IJCNLP 2025

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs IJCNLP 2025

LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization EMNLP 2025

Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction IJCNLP 2025

Marco Large Translation Model at WMT2025: Transforming Translation Capability in LLMs via Quality-Aware Training and Decoding EMNLP 2025