Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Training Language Model to Critique for Better Refinement
ACL 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning
ACL 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
ICCV 2025
Henry at BEA 2025 Shared Task: Improving AI Tutor’s Guidance Evaluation Through Context-Aware Distillation
ACL 2025
Trial-Oriented Visual Rearrangement
ICCV 2025
Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?
ACL 2025
MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration
ICCV 2025
The Power of Simplicity in LLM-Based Event Forecasting
ACL 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
ICCV 2025
Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics
EMNLP 2025
ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning
ICCV 2025
DeMAC: Enhancing Multi-Agent Coordination with Dynamic DAG and Manager-Player Feedback
EMNLP 2025
Reinforcement Learning-Guided Data Selection via Redundancy Assessment
ICCV 2025
Continual SFT Matches Multimodal RLHF with Negative Supervision
CVPR 2025
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning
ICCV 2025
When2Call: When (not) to Call Tools
NAACL 2025
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
ICCV 2025
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
CVPR 2025
Mitigating Object Hallucinations via Sentence-Level Early Intervention
ICCV 2025
Narrative Studio: Visual narrative exploration using LLMs and Monte Carlo Tree Search
NAACL 2025
EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment
ICCV 2025
A Survey on the Feedback Mechanism of LLM-based AI Agents
IJCAI 2025
Training-free Generation of Temporally Consistent Rewards from VLMs
ICCV 2025
Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers
EMNLP 2025
<
1
…
13
14
15
…
118
>