reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

An Approach towards Unsupervised Text Simplification on Paragraph-Level for German Texts COLING 2024

Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation NAACL 2024

Safe & Accurate at Speed with Tendons: A Robot Arm for Exploring Dynamic Motion RSS 2024

Keep it Private: Unsupervised Privatization of Online Text NAACL 2024

AGILE: A Novel Reinforcement Learning Framework of LLM Agents NIPS 2024

QueST: Self-Supervised Skill Abstractions for Learning Continuous Control NIPS 2024

RePair: Automated Program Repair with Process-based Feedback ACL 2024

Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents EMNLP 2024

Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training EMNLP 2024

Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient NIPS 2024

Adversarial Environment Design via Regret-Guided Diffusion Models NIPS 2024

Fast two-time-scale stochastic gradient method with applications in reinforcement learning COLT 2024

PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference NIPS 2024

Reinforcement Learning for Athletic Intelligence: Lessons from the 1st “AI Olympics with RealAIGym” Competition IJCAI 2024

TRIP NEGOTIATOR: A Travel Persona-aware Reinforced Dialogue Generation Model for Personalized Integrative Negotiation in Tourism EMNLP 2024

Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue EMNLP 2024

Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation NIPS 2024

RLOP: A Framework for Reinforcement Learning, Optimization and Planning Algorithms IJCAI 2024

MACAROON: Training Vision-Language Models To Be Your Engaged Partners EMNLP 2024

E2CL: Exploration-based Error Correction Learning for Embodied Agents EMNLP 2024

Occupancy-based Policy Gradient: Estimation, Convergence, and Optimality NIPS 2024

Near-Optimal Distributionally Robust Reinforcement Learning with General $L_p$ Norms NIPS 2024

Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs COLT 2024

Exploiting Careful Design of SVM Solution for Aspect-term Sentiment Analysis EMNLP 2024

A Fairness-Driven Method for Learning Human-Compatible Negotiation Strategies EMNLP 2024