← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning

2932 directly classified papers

Papers per year

Papers

Learning to Summarize from LLM-generated Feedback NAACL 2025

HW-TSC at Multilingual Counterspeech Generation COLING 2025

Large Language Models with Reinforcement Learning from Human Feedback Approach for Enhancing Explainable Sexism Detection COLING 2025

Learning to Translate Ambiguous Terminology by Preference Optimization on Post-Edits EMNLP 2025

Towards Human Understanding of Paraphrase Types in Large Language Models COLING 2025

ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation NAACL 2025

OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework EMNLP 2025

Alpha-GPT: Human-AI Interactive Alpha Mining for Quantitative Investment EMNLP 2025

CTR-Guided Generative Query Suggestion in Conversational Search EMNLP 2025

Improving Reward Models with Synthetic Critiques NAACL 2025

Why Does ChatGPT “Delve” So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models COLING 2025

Ask Optimal Questions: Aligning Large Language Models with Retriever’s Preference in Conversation NAACL 2025

Dialogue Systems for Emotional Support via Value Reinforcement ACL 2025

Adapting LLM Agents with Universal Communication Feedback NAACL 2025

Towards Better Robot Learners: Leveraging Implicit and Explicit Human Feedback Together in Human Robot Interactions AAAI 2025

Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning NAACL 2025

Learning Structured World Models From and For Physical Interactions AAAI 2025

Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models NAACL 2025

From General Reward to Targeted Reward: Improving Open-ended Long-context Generation Models EMNLP 2025

Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning NAACL 2025

Efficient and Robust Reinforcement Learning from Human Feedback AAAI 2025

Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting NAACL 2025

Axioms for AI Alignment from Human Feedback AAAI 2025

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models NAACL 2025

STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback EMNLP 2025