preference learning

411 papers

Explore in graph

Also known as

DPO PL

Co-occurring keywords

large language model (12755) reinforcement learning (4122) direct preference optimization (317) reinforcement learning from human feedback (261) language model alignment (142) reward model (251) human feedback (161) reward modeling (159) model alignment (219) human preference (120)

Papers

DiMA: Distinguishing Resident and Tourist Preferences via Multi-Modal LLM Alignment for Out-of-Town Cross-Domain Recommendation AAAI 2026

Provably Efficient Multi-Objective Bandit Algorithms Under Preference-Centric Customization AAAI 2026

Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment EACL 2026

VaccineRAG: Boosting Multimodal Large Language Models’ Immunity to Harmful RAG Samples AAAI 2026

Knowledge-Based Stable Roommates Problems AAAI 2026

How Hard Is It to Explain Preferences Using Few Boolean Attributes? AAAI 2026

Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation WACV 2026

Energy Matching based Preference Learning for Diffusion Language Models EACL 2026

Long-form RewardBench: Evaluating Reward Models for Long-form Generation AAAI 2026

TEMPLE: Incentivizing Temporal Understanding of Video Large Language Models via Progressive Pre-SFT Alignment AAAI 2026

Reducing the Scope of Language Models AAAI 2026

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment AAAI 2026

GRAM-R²: Self-Training Generative Foundation Reward Models for Reward Reasoning AAAI 2026

Multi-Robot Learning from Human Feedback AAAI 2026

Align Video Diffusion Model with Online Video-Centric Preference Optimization WACV 2026

End-to-End Fine-Tuning of 3D Texture Generation using Differentiable Rewards WACV 2026

PIRA: Preference-Oriented Instruction-Tuned Reward Models with Dual Aggregation EACL 2026

NUS-IDS at AMIYA/VarDial 2026: Improving Arabic Dialectness in LLMs with Reinforcement Learning EACL 2026

VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning AAAI 2026

RankList – a Listwise Preference Learning Framework for Predicting Subjective Preferences AAAI 2026

CTPD: Cross Tokenizer Preference Distillation AAAI 2026

Rethinking Direct Preference Optimization in Diffusion Models AAAI 2026

ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models Using Pareto High-Quality Data AAAI 2026

MedS³: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision AAAI 2026

Bandit Learning in Housing Markets AAAI 2026