Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Rejected Dialects: Biases Against African American Language in Reward Models
NAACL 2025
Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models
EMNLP 2025
Learning to Reason via Self-Iterative Process Feedback for Small Language Models
COLING 2025
StoryLLaVA: Enhancing Visual Storytelling with Multi-Modal Large Language Models
COLING 2025
Understanding Reference Policies in Direct Preference Optimization
NAACL 2025
InstructionCP: A Simple yet Effective Approach for Transferring Large Language Models to Target Languages
ACL 2025
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
NAACL 2025
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
ACL 2025
Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction
NAACL 2025
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
ACL 2025
Interaction-Required Suggestions for Control, Ownership, and Awareness in Human-AI Co-Writing
NAACL 2025
Enhancing Machine Translation with Self-Supervised Preference Data
ACL 2025
An Analysis of Scoring Methods for Reranking in Large Language Model Story Generation
NAACL 2025
Don’t Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
ACL 2025
TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods
NAACL 2025
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
ACL 2025
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
ACL 2025
Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning
ACL 2025
Breaking the Reasoning Barrier A Survey on LLM Complex Reasoning through the Lens of Self-Evolution
ACL 2025
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
ACL 2025
CARMO: Dynamic Criteria Generation for Context Aware Reward Modelling
ACL 2025
Comparing Bad Apples to Good Oranges Aligning Large Language Models via Joint Preference Optimization
ACL 2025
Focused-DPO: Enhancing Code Generation Through Focused Preference Optimization on Error-Prone Points
ACL 2025
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
ACL 2025
Learning Structured World Models From and For Physical Interactions
AAAI 2025
<
1
…
12
13
14
…
118
>