Papers
261 papers found
Perspective-driven Preference Optimization with Entropy Maximization for Diverse Argument Generation
Yilin Cao, Ruike Zhang, Penghui Wei et al.
Instruction-Tuned English to Bhojpuri Neural Machine Translation Using Contrastive Preference Optimization
Kshetrimayum Boynao Singh, Deepak Kumar, Asif Ekbal
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
Hengjia Li, Lifan Jiang, Xi Xiao et al.
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao, Beier Zhu, Qianru Sun et al.
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao, John Dang, Aditya Grover
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang, Yibo Jiang, Chenghao Yang et al.
Statistical Rejection Sampling Improves Preference Optimization
Tianqi Liu, Yao Zhao, Rishabh Joshi et al.
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen, Jincheng Mei, Katayoon Goshvadi et al.
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
Audrey Huang, Wenhao Zhan, Tengyang Xie et al.
Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Junkang Wu, Yuexiang Xie, Zhengyi Yang et al.
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Mingzhi Wang, Chengdong Ma, Qizhi Chen et al.
Multi-objective antibody design with constrained preference optimization
Milong Ren, ZaiKai He, Haicang Zhang
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Yaowen Ye, Cassidy Laidlaw, Jacob Steinhardt
Self-Improving Robust Preference Optimization
Eugene Choi, Arash Ahmadian, Matthieu Geist et al.
Self-Play Preference Optimization for Language Model Alignment
Yue Wu, Zhiqing Sun, Huizhuo Yuan et al.
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Tengyang Xie, Dylan J Foster, Akshay Krishnamurthy et al.
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Huaisheng Zhu, Teng Xiao, Vasant G Honavar
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
Jinlan Fu, huangfushenzhen, Hao Fei et al.
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Yuxin Jiang, Bo Huang, Yufei Wang et al.
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
Ruichen Shao, Bei Li, Gangao Liu et al.
Weighted-Reward Preference Optimization for Implicit Model Fusion
Ziyi Yang, Fanqi Wan, Longguang Zhong et al.