Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Deep RL
3861 directly classified papers
Papers per year
2005: 1
2006: 9
2007: 14
2008: 15
2009: 9
2010: 21
2011: 27
2012: 32
2013: 21
2014: 17
2015: 10
2016: 33
2017: 102
2018: 222
2019: 399
2020: 450
2021: 533
2022: 478
2023: 532
2024: 513
2025: 326
2026: 97
Papers
RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler
CVPR 2025
Dialogue Systems for Emotional Support via Value Reinforcement
ACL 2025
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse
ACL 2025
Adversarial Preference Learning for Robust LLM Alignment
ACL 2025
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
EMNLP 2025
Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models
EMNLP 2025
Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability
EMNLP 2025
A Reinforcement Learning Framework for Cross-Lingual Stance Detection Using Chain-of-Thought Alignment
ACL 2025
Enhancing RLHF with Human Gaze Modeling
EMNLP 2025
Token-level Proximal Policy Optimization for Query Generation
EMNLP 2025
VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions
EMNLP 2025
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
EMNLP 2025
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
EMNLP 2025
A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
EMNLP 2025
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
CVPR 2025
M^3PC: Test-time Model Predictive Control using Pretrained Masked Trajectory Model
ICLR 2025
Deep Reinforcement Learning with Time-Scale Invariant Memory
AAAI 2025
Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater
CVPR 2025
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
CVPR 2025
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
CVPR 2025
Exploration-Driven Generative Interactive Environments
CVPR 2025
BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds
RSS 2025
Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
ACL 2025
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
ACL 2025
Reward-Directed Score-Based Diffusion Models via q-Learning
JMLR 2025
<
1
…
7
8
9
…
155
>