Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Learning to Summarize from LLM-generated Feedback
NAACL 2025
HW-TSC at Multilingual Counterspeech Generation
COLING 2025
Large Language Models with Reinforcement Learning from Human Feedback Approach for Enhancing Explainable Sexism Detection
COLING 2025
Learning to Translate Ambiguous Terminology by Preference Optimization on Post-Edits
EMNLP 2025
Towards Human Understanding of Paraphrase Types in Large Language Models
COLING 2025
ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation
NAACL 2025
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
EMNLP 2025
Alpha-GPT: Human-AI Interactive Alpha Mining for Quantitative Investment
EMNLP 2025
CTR-Guided Generative Query Suggestion in Conversational Search
EMNLP 2025
Improving Reward Models with Synthetic Critiques
NAACL 2025
Why Does ChatGPT “Delve” So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models
COLING 2025
Ask Optimal Questions: Aligning Large Language Models with Retriever’s Preference in Conversation
NAACL 2025
Dialogue Systems for Emotional Support via Value Reinforcement
ACL 2025
Adapting LLM Agents with Universal Communication Feedback
NAACL 2025
Towards Better Robot Learners: Leveraging Implicit and Explicit Human Feedback Together in Human Robot Interactions
AAAI 2025
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
NAACL 2025
Learning Structured World Models From and For Physical Interactions
AAAI 2025
Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models
NAACL 2025
From General Reward to Targeted Reward: Improving Open-ended Long-context Generation Models
EMNLP 2025
Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning
NAACL 2025
Efficient and Robust Reinforcement Learning from Human Feedback
AAAI 2025
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
NAACL 2025
Axioms for AI Alignment from Human Feedback
AAAI 2025
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
NAACL 2025
STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback
EMNLP 2025
<
1
…
11
12
13
…
118
>