reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion EMNLP 2024

Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL AISTATS 2024

Rethinking the Role of Proxy Rewards in Language Model Alignment EMNLP 2024

ToolPlanner: A Tool Augmented LLM for Multi Granularity Instructions with Path Planning and Feedback EMNLP 2024

Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards EMNLP 2024

Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems EMNLP 2024

CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework NIPS 2024

If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions EMNLP 2024

Learning to Retrieve Iteratively for In-Context Learning EMNLP 2024

CoGen: Learning from Feedback with Coupled Comprehension and Generation EMNLP 2024

F2RL: Factuality and Faithfulness Reinforcement Learning Framework for Claim-Guided Evidence-Supported Counterspeech Generation EMNLP 2024

Bootstrapped Policy Learning for Task-oriented Dialogue through Goal Shaping EMNLP 2024

BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment EMNLP 2024

Experience as Source for Anticipation and Planning: Experiential Policy Learning for Target-driven Recommendation Dialogues EMNLP 2024

Improving Multi-party Dialogue Generation via Topic and Rhetorical Coherence EMNLP 2024

Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use EMNLP 2024

LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-tailed Problems NIPS 2024

ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback EMNLP 2024

Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths EMNLP 2024

Grounded Language Agent for Product Search via Intelligent Web Interactions EMNLP 2024

Learning Autonomous Driving Tasks via Human Feedbacks with Large Language Models EMNLP 2024

MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension EMNLP 2024

A Critical Evaluation of AI Feedback for Aligning Large Language Models NIPS 2024

Perplexity-aware Correction for Robust Alignment with Noisy Preferences NIPS 2024

SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World CVPR 2024