conftrace_

reinforcement learning

4352 papers

Explore in graph

Also known as

RL REINFORCE

Co-occurring keywords

large language model (13587) policy learning (702) markov decision process (790) policy optimization (657) policy gradient (520) deep reinforcement learning (903) multi-agent system (1819) imitation learning (744) regret bound (1926) language model (4599)

Papers

Convert Language Model into a Value-based Strategic Planner ACL 2025

iManip: Skill-Incremental Learning for Robotic Manipulation ICCV 2025

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning ICCV 2025

Reinforcement Learning for Adversarial Query Generation to Enhance Relevance in Cold-Start Product Search ACL 2025

ASTRO: Automatic Strategy Optimization For Non-Cooperative Dialogues ACL 2025

Reinforcement Learning-Guided Data Selection via Redundancy Assessment ICCV 2025

Boosting MLLM Reasoning with Text-Debiased Hint-GRPO ICCV 2025

Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models EMNLP 2025

PROGRESSOR: A Perceptually Guided Reward Estimator with Self-Supervised Online Refinement ICCV 2025

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training ICCV 2025

CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards EMNLP 2025

Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling ACL 2025

RTADev: Intention Aligned Multi-Agent Framework for Software Development ACL 2025

MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration ICCV 2025

Visual-RFT: Visual Reinforcement Fine-Tuning ICCV 2025

Search-o1: Agentic Search-Enhanced Large Reasoning Models EMNLP 2025

Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation EMNLP 2025

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill CVPR 2025

DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics ACL 2025

CEAES: Bidirectional Reinforcement Learning Optimization for Consistent and Explainable Essay Assessment ACL 2025

Can GRPO Boost Complex Multimodal Table Understanding? EMNLP 2025

MuTIS: Enhancing Reasoning Efficiency through Multi Turn Intervention Sampling in Reinforcement Learning EMNLP 2025

EditGRPO: Reinforcement Learning with Post -Rollout Edits for Clinically Accurate Chest X-Ray Report Generation AACL 2025

Evolutionary Large Language Model for Automated Feature Transformation AAAI 2025

Steering LLM Reasoning Through Bias-Only Adaptation EMNLP 2025