conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Learning Types
Deep Learning
›
Learning Types
›
Preference Learning
36 papers
Papers per year
2024: 5
5
2025: 8
8
2026: 23
23
Papers
Bias Fitting to Mitigate Length Bias of Reward Model in RLHF
ACL 2026
M2PO: Multi-Perspective Multi-Pair Preference Optimization for Machine Translation
ACL 2026
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
ACL 2026
What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context
ACL 2026
Select Before Use: On the Importance of Reference Model Selection in Preference Alignment
ACL 2026
What Do LLMs Learn First? Asymmetric Learning Dynamics of Input Complexity and Output Ambiguity in Preference Alignment
ACL 2026
FocalOrder: Focal Preference Optimization for Reading Order Detection
ACL 2026
PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling
ACL 2026
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation
ACL 2026
Personalizing LLMs with Binary Feedback: A Preference-Calibrated Optimization Framework
ACL 2026
Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding
ACL 2026
From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment
ACL 2026
Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning
ACL 2026
SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models
ACL 2026
ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization
ACL 2026
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
ACL 2026
DARM: Distribution-Aware Reward Modeling by Alleviating Biases from Low Preference-Context Dependency Data
ACL 2026
Edit-Aware Reward Modeling for Chinese Grammatical Error Correction
ACL 2026
From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models
ACL 2026
Self-Guided Alignment: Adaptive Preference Sensing for Multi-Objective Generation
ACL 2026
Pref-CTRL: Preference Driven LLM Alignment using Representation Editing
ACL 2026
Data-efficient Targeted Token-level Preference Optimization for LLM-based Text-to-Speech
ACL 2026
AesX: Enhance Your Images with Stunning Aesthetic Beauty
ACL 2026
Aligning Large Language Models with Implicit Preferences from User-Generated Content
ACL 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
ACL 2025
<
1
2
>