Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Keywords
direct preference optimization
317 papers
Explore in graph
Also known as
DPO
Co-occurring keywords
large language model
(12755)
preference optimization
(273)
reinforcement learning from human feedback
(261)
preference learning
(411)
language model alignment
(142)
supervised fine-tuning
(310)
model alignment
(219)
preference alignment
(142)
reinforcement learning
(4122)
reward model
(251)
Papers
Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs
NAACL 2025
Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling
NAACL 2025
CALM: Unleashing the Cross-Lingual Self-Aligning Ability of Language Model Question Answering
NAACL 2025
DPL: Diverse Preference Learning Without A Reference Model
NAACL 2025
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning
ICCV 2025
Generating Diverse Training Samples for Relation Extraction with Large Language Models
ACL 2025
Model Extrapolation Expedites Alignment
ACL 2025
Aligning Large Language Models with Implicit Preferences from User-Generated Content
ACL 2025
Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors
ACL 2025
ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning
ACL 2025
Robust Preference Optimization via Dynamic Target Margins
ACL 2025
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
ACL 2025
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
ACL 2025
Context-DPO: Aligning Language Models for Context-Faithfulness
ACL 2025
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
ACL 2025
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
ACL 2025
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
ACL 2025
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization
ACL 2025
Optimizing Reasoning for Text-to-SQL with Execution Feedback
ACL 2025
Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models
ACL 2025
InImageTrans: Multimodal LLM-based Text Image Machine Translation
ACL 2025
Reverse Preference Optimization for Complex Instruction Following
ACL 2025
Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment
ACL 2025
YinYang-Align: A new Benchmark for Competing Objectives and Introducing Multi-Objective Preference based Text-to-Image Alignment
ACL 2025
<
1
…
6
7
8
…
13
>