reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Token-Level Accept or Reject: A Micro Alignment Approach for Large Language Models IJCAI 2025

AdsQA: Towards Advertisement Video Understanding ICCV 2025

StoryLLaVA: Enhancing Visual Storytelling with Multi-Modal Large Language Models COLING 2025

Procedural Environment Generation for Tool-Use Agents EMNLP 2025

Steering LLM Reasoning Through Bias-Only Adaptation EMNLP 2025

R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory ACL 2025

Comparing Bad Apples to Good Oranges Aligning Large Language Models via Joint Preference Optimization ACL 2025

Domain Randomization is Sample Efficient for Linear Quadratic Control L4DC 2025

Group-Aware Reinforcement Learning for Output Diversity in Large Language Models EMNLP 2025

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding ICCV 2025

SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch AAAI 2025

Tag-Instruct: Controlled Instruction Complexity Enhancement through Structure-based Augmentation ACL 2025

FRACTAL: Fine-Grained Scoring from Aggregate Text Labels ACL 2025

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond ACL 2025

The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning IJCAI 2025

RL + Transformer = A General-Purpose Problem Solver ACL 2025

Team XSZ at BioLaySumm2025: Section-Wise Summarization, Retrieval-Augmented LLM, and Reinforcement Learning Fine-Tuning for Lay Summaries ACL 2025

Efficient and Robust Reinforcement Learning from Human Feedback AAAI 2025

Grounding Open-Domain Knowledge from LLMs to Real-World Reinforcement Learning Tasks: A Survey IJCAI 2025

Overview of the BioLaySumm 2025 Shared Task on Lay Summarization of Biomedical Research Articles and Radiology Reports ACL 2025

bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning ACL 2025

EFormer: An Effective Edge-based Transformer for Vehicle Routing Problems IJCAI 2025

Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback ACL 2025

Text2World: Benchmarking Large Language Models for Symbolic World Model Generation ACL 2025

PRED: Performance-oriented Random Early Detection for Consistently Stable Performance in Datacenters NSDI 2025