reward modeling

159 papers

Explore in graph

Also known as

RLHF RM

Co-occurring keywords

large language model (12755) reinforcement learning (4122) reinforcement learning from human feedback (261) reward model (251) preference learning (411) language model alignment (142) human feedback (161) direct preference optimization (317) policy optimization (630) language model (4573)

Papers

Thank you BART! Rewarding Pre-Trained Models Improves Formality Style Transfer ACL 2021

Learning Feature Weights using Reward Modeling for Denoising Parallel Corpora EMNLP 2021

Learning Human Objectives by Evaluating Hypothetical Behavior ICML 2020

Positive-Unlabeled Reward Learning CORL 2020

Adapting to Misspecification in Contextual Bandits NIPS 2020

Learning to summarize with human feedback NIPS 2020

Bayesian Execution Skill Estimation AAAI 2019

Evaluating Rewards for Question Generation Models NAACL 2019

Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning INTERSPEECH 2017