Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Keywords
preference optimization
273 papers
Explore in graph
Also known as
ACPO
RPO
MMPO
ORPO
KTO
DPO
SPO
PO
Co-occurring keywords
large language model
(12755)
direct preference optimization
(317)
reinforcement learning
(4122)
reinforcement learning from human feedback
(261)
model alignment
(219)
language model alignment
(142)
language model
(4573)
supervised fine-tuning
(310)
reward model
(251)
preference learning
(411)
Papers
WPO: Enhancing RLHF with Weighted Preference Optimization
EMNLP 2024
The Importance of Online Data: Understanding Preference Fine-tuning via Coverage
NIPS 2024
Group Robust Preference Optimization in Reward-free RLHF
NIPS 2024
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
NIPS 2024
Mercury: A Code Efficiency Benchmark for Code Large Language Models
NIPS 2024
EPO: Hierarchical LLM Agents with Environment Preference Optimization
EMNLP 2024
ORPO: Monolithic Preference Optimization without Reference Model
EMNLP 2024
BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment
EMNLP 2024
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
EMNLP 2024
AlignCap: Aligning Speech Emotion Captioning to Human Preferences
EMNLP 2024
Evaluating Psychological Safety of Large Language Models
EMNLP 2024
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
ACL 2024
Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
ACL 2024
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents
ACL 2024
Discovering Preference Optimization Algorithms with and for Large Language Models
NIPS 2024
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
NIPS 2024
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
NIPS 2024
Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization
NIPS 2024
Calibrated Self-Rewarding Vision Language Models
NIPS 2024
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
ACL 2024
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
NIPS 2024
LACIE: Listener-Aware Finetuning for Calibration in Large Language Models
NIPS 2024
A Critical Evaluation of AI Feedback for Aligning Large Language Models
NIPS 2024
<
1
…
7
8
9
10
11
>