reward modeling

159 papers

Explore in graph

Also known as

RLHF RM

Co-occurring keywords

large language model (12755) reinforcement learning (4122) reinforcement learning from human feedback (261) reward model (251) preference learning (411) language model alignment (142) human feedback (161) direct preference optimization (317) policy optimization (630) language model (4573)

Papers

A Multi-Task Embedder For Retrieval Augmented LLMs ACL 2024

HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM NAACL 2024

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling NIPS 2024

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations AAAI 2024

e-Health CSIRO at RRG24: Entropy-Augmented Self-Critical Sequence Training for Radiology Report Generation ACL 2024

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback ACL 2024

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding ACL 2024

PRewrite: Prompt Rewriting with Reinforcement Learning ACL 2024

Aligning Large Language Models via Fine-grained Supervision ACL 2024

Direct Preference Optimization with an Offset ACL 2024

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization ACL 2024

RePALM: Popular Quote Tweet Generation via Auto-Response Augmentation ACL 2024

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning NIPS 2024

Preferential Normalizing Flows NIPS 2024

InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling NIPS 2024

Stepwise Alignment for Constrained Language Model Policy Optimization NIPS 2024

Enhancing Reinforcement Learning with Dense Rewards from Language Model Critic EMNLP 2024

Learning Goal-Conditioned Representations for Language Reward Models NIPS 2024

DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging EMNLP 2024

Towards Aligning Language Models with Textual Feedback EMNLP 2024

Semi-Supervised Reward Modeling via Iterative Self-Training EMNLP 2024

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts EMNLP 2024

DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling ACL 2024

Fine-Tuning Large Language Model Based Explainable Recommendation with Explainable Quality Reward AAAI 2024

Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation NIPS 2024