Papers
261 papers found
Self-Steering Optimization: Autonomous Preference Optimization for Large Language Models
Hao Xiang, Bowen Yu, Hongyu Lin et al.
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu et al.
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
Mohammad Saqib Hasan, Saikat Chakraborty, Santu Karmaker et al.
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Fu-Yun Wang, Yunhao Shui, Jingtan Piao et al.
UCPO: A Universal Constrained Combinatorial Optimization Method via Preference Optimization
Zhanhong Fang, Debing Wang, Jinbiao Chen et al.
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
Lanyun Zhu, Tianrun Chen, Qianxiong Xu et al.
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma, Zhongxiang Dai, Xiaoqiang Lin et al.
Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling
Xingzhou Lou, Junge Zhang, Jian Xie et al.
CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences
Rhitabrat Pokharel, Yufei Tao, Ameeta Agrawal
CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences
Rhitabrat Pokharel, Yufei Tao, Ameeta Agrawal
Direct Preference-based Policy Optimization without Reward Modeling
Gaon An, Junhyeok Lee, Xingdong Zuo et al.
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Hyungkyu Kang, Min-hwan Oh
Beyond Reward: Offline Preference-guided Policy Optimization
Yachen Kang, Diyuan Shi, Jinxin Liu et al.
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
Yao Xiao, Hai Ye, Linyao Chen et al.
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li, Haojing Huang, Yujia Zhang et al.
Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization
Jian Li, Shenglin Yin, Yujia Zhang et al.
No Preference Left Behind: Group Distributional Preference Optimization
Binwei Yao, Zefan Cai, Yun-Shiuan Chuang et al.
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
Zhanhao Liang, Yuhui Yuan, Shuyang Gu et al.
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandits
Tian Huang, Shengbo Wang, Ke Li
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Zhanhui Zhou, Jie Liu, Jing Shao et al.
Preference-Aware Constrained Multi-Objective Bayesian Optimization (Student Abstract)
Alaleh Ahmadianshalchi, Syrine Belakaria, Janardhan Rao Doppa
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov, Archit Sharma, Eric Mitchell et al.
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
Xuan Zhang, Chao Du, Tianyu Pang et al.