conftrace_

← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning from Human Feedback

143 papers

Papers per year

1

13

60

55

14

Papers

Efficient KL Divergence Estimation via Truncated Top-K Integration for Large Language Models ACL 2026

Teach a Reward Model to Correct Itself: Reward Guided Adversarial Failure Discovery for Robust Reward Modeling ACL 2026

AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling ACL 2026

IEvoAgent: Evolving Conversational Agent based on User Implicit Feedback ACL 2026

What Do LLMs Learn First? Asymmetric Learning Dynamics of Input Complexity and Output Ambiguity in Preference Alignment ACL 2026

WildReward: Learning Reward Models from In-the-Wild Human Interactions ACL 2026

Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling ACL 2026

MERIT Feedback Elicits Better Bargaining in LLM Negotiators ACL 2026

Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers ACL 2026

ARF-RLHF: Adaptive Reward-Following for RLHF through Emotion-Driven Self-Supervision and Trace-Biased Dynamic Optimization ACL 2026

ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework ACL 2026

ProMedical: Hierarchical Fine-Grained Criteria Modeling for Medical LLM Alignment via Explicit Injection ACL 2026

ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training ACL 2026

Feeling Right vs. Being Right: How AI Sycophancy Affects Value-Laden Deliberation ACL 2026

MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences AAAI 2025

DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback AAAI 2025

Alleviating Shifted Distribution in Human Preference Alignment through Meta-Learning AAAI 2025

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback AAAI 2025

Aligning Language Models Using Follow-up Likelihood as Reward Signal AAAI 2025

LEGEND: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets AAAI 2025

Robust Multi-Objective Preference Alignment with Online DPO AAAI 2025

Model Extrapolation Expedites Alignment ACL 2025

Towards Reward Fairness in RLHF: From a Resource Allocation Perspective ACL 2025

Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling ACL 2025

UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models ACL 2025