reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Imitation Learning Backoff: Reinforcement Learning-based Channel Access for Guaranteeing Fairness (Student Abstract) AAAI 2025

Towards Building Human-like Smart Agents in Modern 3D Video Games (Student Abstract) AAAI 2025

UACOF: A USV-AUV Collaboration Framework for Underwater Tasks Under Extreme Sea Conditions (Student Abstract) AAAI 2025

AI-Driven Multicultural Identity Preservation AAAI 2025

Optimal Viewpoint Selection for Autonomous Photography Using Reinforcement Learning AAAI 2025

Assess and Prompt: A Generative RL Framework for Improving Engagement in Online Mental Health Communities EMNLP 2025

Following Length Constraints in Instructions EMNLP 2025

Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers EMNLP 2025

ThinkTuning: Instilling Cognitive Reflections without Distillation EMNLP 2025

Enhancing Goal-oriented Proactive Dialogue Systems via Dynamic Multi-dimensional Consistency Optimization EMNLP 2025

Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning) EMNLP 2025

EQA-RM: A Generative Embodied Reward Model with Test-time Scaling EMNLP 2025

PATeam at SemEval-2025 Task 10: Two-stage News Analytical Framework: Target-oriented Semantic Segmentation and Sequence Generation LLMs for Cross-Lingual Entity and Narrative Analysis SEMEVAL 2025

Cold Starts and Hard Cases: A Two-Stage SFT-RLVR Approach for Legal Machine Translation (Just-NLP L-MT shared task) IJCNLP 2025

A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy EMNLP 2025

RLHF Algorithms Ranked: An Extensive Evaluation Across Diverse Tasks, Rewards, and Hyperparameters EMNLP 2025

When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning EMNLP 2025

Terminology-Constrained Translation from Monolingual Data Using GRPO EMNLP 2025

Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches RSS 2025

Demonstrating GPU Parallelized Robot Simulation and Rendering for Generalizable Embodied AI with ManiSkill3 RSS 2025

Building Helpful-Only Large Language Models: A Complete Approach from Motivation to Evaluation IJCNLP 2025

SLM-SQL: An Exploration of Small Language Models for Text-to-SQL IJCNLP 2025

Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation CVPR 2025

Deep Reinforcement Learning with Time-Scale Invariant Memory AAAI 2025

ViUniT: Visual Unit Tests for More Robust Visual Programming CVPR 2025