reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

EvolveSearch: An Iterative Self-Evolving Search Agent EMNLP 2025

Process-Supervised Reinforcement Learning for Code Generation EMNLP 2025

The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning IJCAI 2025

A Case for Validation Buffer in Pessimistic Actor-Critic IJCAI 2025

DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding ICCV 2025

LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization EMNLP 2025

From General Reward to Targeted Reward: Improving Open-ended Long-context Generation Models EMNLP 2025

InnateCoder: Learning Programmatic Options with Foundation Models IJCAI 2025

Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction IJCNLP 2025

CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning EMNLP 2025

AlphaGAT: A Two-Stage Learning Approach for Adaptive Portfolio Selection IJCAI 2025

Simulate, Refine and Integrate: Strategy Synthesis for Efficient SMT Solving IJCAI 2025

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning EMNLP 2025

Continuous-Time Reward Machines IJCAI 2025

ADPFedGNN: Adaptive Decoupling Personalized Federated Graph Neural Network IJCAI 2025

Rewarding Explainability in Drug Repurposing with Knowledge Graphs IJCAI 2025

Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting IJCAI 2025

Counterfactual Explanations for Continuous Action Reinforcement Learning IJCAI 2025

Robustness to Spurious Correlations via Dynamic Knowledge Transfer IJCAI 2025

EFormer: An Effective Edge-based Transformer for Vehicle Routing Problems IJCAI 2025

ERFSL: An Efficient Reward Function Searcher via Large Language Models for Custom-Environment Multi-Objective Reinforcement Learning (Student Abstract) AAAI 2025

Neurosymbolic Reinforcement Learning: Playing MiniHack with Probabilistic Logic Shields AAAI 2025

Temporal Consistency Constrained Transferable Adversarial Attacks with Background Mixup for Action Recognition IJCAI 2025

GARLIC: GPT-Augmented Reinforcement Learning with Intelligent Control for Vehicle Dispatching AAAI 2025

Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains EMNLP 2025