Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Alignment with Fill-In-the-Middle for Enhancing Code Generation
EMNLP 2025
Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization
EMNLP 2025
Predicate-Guided Generation for Mathematical Reasoning
EMNLP 2025
A Comprehensive Survey on Learning from Rewards for Large Language Models: Reward Models and Learning Strategies
EMNLP 2025
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning
ICCV 2025
START: Self-taught Reasoner with Tools
EMNLP 2025
Case-Based Decision-Theoretic Decoding with Quality Memories
EMNLP 2025
Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG
EMNLP 2025
SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin
EMNLP 2025
Mitigating Object Hallucinations via Sentence-Level Early Intervention
ICCV 2025
NeLLCom-Lex: A Neural-agent Framework to Study the Interplay between Lexical Systems and Language Use
EMNLP 2025
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
EMNLP 2025
DynaQuest: A Dynamic Question Answering Dataset Reflecting Real-World Knowledge Updates
ACL 2025
Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization
COLING 2025
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation
EMNLP 2025
Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards
EMNLP 2025
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
CVPR 2025
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
CVPR 2025
Exploration-Driven Generative Interactive Environments
CVPR 2025
Decision SpikeFormer: Spike-Driven Transformer for Decision Making
CVPR 2025
Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning
CVPR 2025
STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback
EMNLP 2025
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
NAACL 2025
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
CVPR 2025
ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation
NAACL 2025
<
1
…
4
5
6
…
118
>