Papers
261 papers found
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao, Genta Indra Winata, Anirban Das et al.
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights
Aiwei Liu, Haoping Bai, Zhiyun Lu et al.
Data Distillation for extrapolative protein design through exact preference optimization
Mostafa Karimi, Sharmi Banerjee, Tommi Jaakkola et al.
The Crucial Role of Samplers in Online Direct Preference Optimization
Ruizhe Shi, Runlong Zhou, Simon Shaolei Du
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Yougang Lyu, Lingyong Yan, Zihan Wang et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
Guanzheng Chen, Xin Li, Michael Shieh et al.
Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
Songtao Jiang, Yan Zhang, Ruizhe Chen et al.
Indirect Online Preference Optimization via Reinforcement Learning
En Wang, Xingyu Lin, Du Su et al.
Atomic Consistency Preference Optimization for Long-Form Question Answering
Jingfeng Chen, Raghuveer Thirukovalluru, Junlin Wang et al.
NHK Submission to WAT 2025: Leveraging Preference Optimization for Article-level Japanese–English News Translation
Hideya Mino, Rei Endo, Yoshihiko Kawai
Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding
Guangyu Yang, Jinghong Chen, Weizhe Lin et al.
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
Saeed Khaki, JinJin Li, Lan Ma et al.
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar, Andrew Lan
Team NP_PROBLEM at SemEval-2024 Task 7: Numerical Reasoning in Headline Generation with Preference Optimization
Pawan Rajpoot, Nut Chukamphaeng
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang, Liangke Gui, Zhiqing Sun et al.
LiPO: Listwise Preference Optimization through Learning-to-Rank
Tianqi Liu, Zhen Qin, Junru Wu et al.
Style Transfer with Multi-iteration Preference Optimization
Shuai Liu, Jonathan May
Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization
Zilu Tang, Rajen Chatterjee, Sarthak Garg
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment
Sizhe Wang, Yongqi Tong, Hengyuan Zhang et al.
PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization
Jiayi Wu, Hengyi Cai, Lingyong Yan et al.
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou, Abdalgader Abubaker, Hakim Hacid
Sequence-level Large Language Model Training with Contrastive Preference Optimization
Zhili Feng, Dhananjay Ram, Cole Hawkins et al.