Papers
261 papers found
Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback
Kyuyoung Kim, Ah Jeong Seo, Hao Liu et al.
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
Yong Lin, Skyler Seto, Maartje Ter Hoeve et al.
Direct Judgement Preference Optimization
PeiFeng Wang, Austin Xu, Yilun Zhou et al.
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing, Peiran Li, Yuping Wang et al.
Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering
Zetong Li, Qinliang Su, Minhua Huang et al.
Selective Preference Optimization via Token-Level Reward Function Estimation
Kailai Yang, Zhiwei Liu, Qianqian Xie et al.
TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making
Kechen Jiao, Zhirui Fang, Jiahao Liu et al.
Structured Preference Optimization for Vision-Language Long-Horizon Task Planning
Xiwen Liang, Min Lin, Weiqi Ruan et al.
Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization
Jiulong Wu, Zhengliang Shi, Shuaiqiang Wang et al.
Weights-Rotated Preference Optimization for Large Language Models
Chenxu Yang, Ruipeng Jia, Mingyu Zheng et al.
Refining Text Generation for Realistic Conversational Recommendation via Direct Preference Optimization
Manato Tajiri, Michimasa Inaba
Image Difference Captioning via Adversarial Preference Optimization
Zihan Huang, Junda Wu, Rohan Surana et al.
Learning to Translate Ambiguous Terminology by Preference Optimization on Post-Edits
Nathaniel Berger, Johannes Eschbach-Dymanus, Miriam Exel et al.
Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks
Yuki Ichihara, Yuu Jinnai
DCRM: A Heuristic to Measure Response Pair Quality in Preference Optimization
Chengyu Huang, Tanya Goyal
SPO: Self Preference Optimization with Self Regularization
Yuhao Sun, Yifan Zhang, Quandong Wang et al.
Creative Preference Optimization
Mete Ismayilzada, Antonio Laverghetta Jr., Simone A. Luchini et al.
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization
Zhensheng Jin, Xinze Li, Yifan Ji et al.
Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization
Ji Soo Lee, Byungoh Ko, Jaewon Cho et al.
SeaPO: Strategic Error Amplification for Robust Preference Optimization of Large Language Models
Jun Rao, Yunjie Liao, Xuebo Liu et al.
MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework
Yupeng Qi, Ziyu Lyu, Min Yang et al.
Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
Xiaobo Wang, Zixia Jia, Jiaqi Li et al.
Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation
Jihao Gu, Yingyao Wang, Meng Cao et al.
CoTD-PO: Chain-of-Thought Distillation with Preference Optimization
Lujie Niu, Haochen Sun, Fangkun Zhao et al.
DecoupledESC: Enhancing Emotional Support Generation via Strategy-Response Decoupled Preference Optimization
Chao Zhang, Xin Shi, Xueqiao Zhang et al.