preference alignment

142 papers

Explore in graph

Also known as

DPO

Co-occurring keywords

large language model (12755) direct preference optimization (317) reinforcement learning from human feedback (261) reinforcement learning (4122) reward model (251) reward modeling (159) preference optimization (273) language model alignment (142) instruction tuning (810) language model (4573)

Papers

A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses AAAI 2026

LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization AAAI 2026

Suit the Remedy to the Retriever: Interpretable Query Optimization with Retriever Preference Alignment for Vision-Language Retrieval AAAI 2026

Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions AAAI 2026

Reward Model Evaluation via Automatically-Ranked Policy Alignment AAAI 2026

RMO: Towards Better LLM Alignment via Reshaping Reward Margin Distributions AAAI 2026

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation AAAI 2026

Aligning Generative Music AI with Human Preferences: Methods and Challenges AAAI 2026

CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation AAAI 2026

SDA: Steering-Driven Distribution Alignment for Open LLMs Without Fine-Tuning AAAI 2026

PC-Flow: Preference Alignment in Flow Matching via Classifier AAAI 2026

Multi-Metric Preference Alignment for Generative Speech Restoration AAAI 2026

Efficient Preference Alignment via Pareto Exploration (Student Abstract) AAAI 2026

AlignSurvey: A Comprehensive Benchmark for Human Preferences Alignment in Social Surveys AAAI 2026

CONGRAD: Conflicting Gradient Filtering for Multilingual Preference Alignment EACL 2026

CrisiText: A dataset of warning messages for LLM training in emergency communication EACL 2026

Enhancing Stability and Fidelity for Zero-Shot TTS with a Multi-Level Evaluator AAAI 2026

The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive Curation AAAI 2026

OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination AAAI 2026

SCIR: A Self-Correcting Iterative Refinement Framework for Enhanced Information Extraction Based on Schema AAAI 2026

Personalize Your LLM: Fake it then Align it NAACL 2025

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time NAACL 2025

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation CVPR 2025

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance EMNLP 2025

Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety EMNLP 2025