preference learning

411 papers

Explore in graph

Also known as

DPO PL

Co-occurring keywords

large language model (12755) reinforcement learning (4122) direct preference optimization (317) reinforcement learning from human feedback (261) language model alignment (142) reward model (251) human feedback (161) reward modeling (159) model alignment (219) human preference (120)

Papers

Collaborative Gaussian Processes for Preference Learning NIPS 2012

Iterative ranking from pair-wise comparisons NIPS 2012

Evaluating the inverse decision-making approach to preference learning NIPS 2011

Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity NIPS 2011

Clustering Algorithms for Chains JMLR 2011

An Exponential Model for Infinite Rankings JMLR 2010

A Data-Driven Approach to Modeling Choice NIPS 2009

A rational model of preference learning and choice prediction by children NIPS 2008

Active Preference Learning with Discrete Choice Data NIPS 2007

A General Boosting Method and its Application to Learning Ranking Functions for Web Search NIPS 2007

An Efficient Boosting Algorithm for Combining Preferences JMLR 2003