conftrace_

← Learning Types

Deep Learning › Learning Types ›

Preference Learning

36 papers

Papers per year

5

8

23

Papers

Bias Fitting to Mitigate Length Bias of Reward Model in RLHF ACL 2026

M2PO: Multi-Perspective Multi-Pair Preference Optimization for Machine Translation ACL 2026

LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models ACL 2026

What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context ACL 2026

Select Before Use: On the Importance of Reference Model Selection in Preference Alignment ACL 2026

What Do LLMs Learn First? Asymmetric Learning Dynamics of Input Complexity and Output Ambiguity in Preference Alignment ACL 2026

FocalOrder: Focal Preference Optimization for Reading Order Detection ACL 2026

PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling ACL 2026

IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation ACL 2026

Personalizing LLMs with Binary Feedback: A Preference-Calibrated Optimization Framework ACL 2026

Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding ACL 2026

From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment ACL 2026

Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning ACL 2026

SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models ACL 2026

ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization ACL 2026

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback ACL 2026

DARM: Distribution-Aware Reward Modeling by Alleviating Biases from Low Preference-Context Dependency Data ACL 2026

Edit-Aware Reward Modeling for Chinese Grammatical Error Correction ACL 2026

From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models ACL 2026

Self-Guided Alignment: Adaptive Preference Sensing for Multi-Objective Generation ACL 2026

Pref-CTRL: Preference Driven LLM Alignment using Representation Editing ACL 2026

Data-efficient Targeted Token-level Preference Optimization for LLM-based Text-to-Speech ACL 2026

AesX: Enhance Your Images with Stunning Aesthetic Beauty ACL 2026

Aligning Large Language Models with Implicit Preferences from User-Generated Content ACL 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment ACL 2025