reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback EMNLP 2025

Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers EMNLP 2025

Unfamiliar Finetuning Examples Control How Language Models Hallucinate NAACL 2025

Representation-driven Option Discovery in Reinforcement Learning AAAI 2025

A Practical Analysis of Human Alignment with *PO NAACL 2025

Co-Learning of Strategy and Structure Achieves Full Cooperation in Complex Networks with Dynamical Linking IJCAI 2025

Adapting LLM Agents with Universal Communication Feedback NAACL 2025

Revisiting Early Detection of Sexual Predators via Turn-level Optimization NAACL 2025

NAT: Enhancing Agent Tuning with Negative Samples NAACL 2025

Optimal Viewpoint Selection for Autonomous Photography Using Reinforcement Learning AAAI 2025

ThinkTuning: Instilling Cognitive Reflections without Distillation EMNLP 2025

Semi-Markovian Planning to Coordinate Aerial and Maritime Medical Evacuation Platforms AAAI 2025

Pretrained Image-Text Models are Secretly Video Captioners NAACL 2025

MSMAR-RL: Multi-Step Masked-Attention Recovery Reinforcement Learning for Safe Maneuver Decision in High-Speed Pursuit-Evasion Game IJCAI 2025

StoryLLaVA: Enhancing Visual Storytelling with Multi-Modal Large Language Models COLING 2025

One fish, two fish, but not the whole sea: Alignment reduces language models’ conceptual diversity NAACL 2025

Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs NAACL 2025

Atoxia: Red-teaming Large Language Models with Target Toxic Answers NAACL 2025

ITERATE: Image-Text Enhancement, Retrieval, and Alignment for Transmodal Evolution with LLMs COLING 2025

Can Large Language Models Invent Algorithms to Improve Themselves?: Algorithm Discovery for Recursive Self-Improvement through Reinforcement Learning NAACL 2025

Aligning Sentence Simplification with ESL Learner’s Proficiency for Language Acquisition NAACL 2025

Improving Reward Models with Synthetic Critiques NAACL 2025

BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment NAACL 2025

Kill two birds with one stone: generalized and robust AI-generated text detection via dynamic perturbations NAACL 2025

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition AAAI 2025