reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Continuous-Time Reward Machines IJCAI 2025

Token-Level Accept or Reject: A Micro Alignment Approach for Large Language Models IJCAI 2025

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning EMNLP 2025

Robustness to Spurious Correlations via Dynamic Knowledge Transfer IJCAI 2025

Cold Starts and Hard Cases: A Two-Stage SFT-RLVR Approach for Legal Machine Translation (Just-NLP L-MT shared task) IJCNLP 2025

EvolveSearch: An Iterative Self-Evolving Search Agent EMNLP 2025

Knowledge-Aware Co-Reasoning for Multidisciplinary Collaboration EMNLP 2025

InnateCoder: Learning Programmatic Options with Foundation Models IJCAI 2025

Process-Supervised Reinforcement Learning for Code Generation EMNLP 2025

Think Wider, Detect Sharper: Reinforced Reference Coverage for Document-Level Self-Contradiction Detection EMNLP 2025

Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety EMNLP 2025

CARE: Multilingual Human Preference Learning for Cultural Awareness EMNLP 2025

Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning EMNLP 2025

Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty EMNLP 2025

Reinforced Query Reasoners for Reasoning-intensive Retrieval Tasks EMNLP 2025

Speaking at the Right Level: Literacy-Controlled Counterspeech Generation with RAG-RL EMNLP 2025

s3: You Don’t Need That Much Data to Train a Search Agent via RL EMNLP 2025

Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment EMNLP 2025

Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning EMNLP 2025

Removing Prompt-template Bias in Reinforcement Learning from Human Feedback ACL 2025

Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems IJCAI 2025

Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions EMNLP 2025

VerIF: Verification Engineering for Reinforcement Learning in Instruction Following EMNLP 2025

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning EMNLP 2025

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback ACL 2025