Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Deep RL
3861 directly classified papers
Papers per year
2005: 1
2006: 9
2007: 14
2008: 15
2009: 9
2010: 21
2011: 27
2012: 32
2013: 21
2014: 17
2015: 10
2016: 33
2017: 102
2018: 222
2019: 399
2020: 450
2021: 533
2022: 478
2023: 532
2024: 513
2025: 326
2026: 97
Papers
Fixing Distribution Shifts of LLM Self-Critique via On-Policy Self-Play Training
ACL 2025
On the Effects of Fine-tuning Language Models for Text-Based Reinforcement Learning
COLING 2025
Adversarial Preference Learning for Robust LLM Alignment
ACL 2025
An Efficient Dialogue Policy Agent with Model-Based Causal Reinforcement Learning
COLING 2025
Towards Adaptive Mechanism Activation in Language Agent
COLING 2025
Removing Prompt-template Bias in Reinforcement Learning from Human Feedback
ACL 2025
MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity
COLING 2025
Improving Retrospective Language Agents via Joint Policy Gradient Optimization
NAACL 2025
A Reinforcement Learning Framework for Cross-Lingual Stance Detection Using Chain-of-Thought Alignment
ACL 2025
Enhancing multi-modal Relation Extraction with Reinforcement Learning Guided Graph Diffusion Framework
COLING 2025
Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
ACL 2025
Optimizing RLHF Training for Large Language Models with Stage Fusion
NSDI 2025
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
IJCNLP 2025
A Graph Interaction Framework on Relevance for Multimodal Named Entity Recognition with Multiple Images
COLING 2025
NAT: Enhancing Agent Tuning with Negative Samples
NAACL 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
RSS 2025
Sketch-to-Skill: Bootstrapping Robot Learning with Human Drawn Trajectory Sketches
RSS 2025
Hierarchical and Modular Network on Non-prehensile Manipulation in General Environments
RSS 2025
Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports
RSS 2025
HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit
RSS 2025
Resolving Conflicting Constraints in Multi-Agent Reinforcement Learning with Layered Safety
RSS 2025
Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems
AAAI 2025
SrSv: Integrating Sequential Rollouts with Sequential Value Estimation for Multi-agent Reinforcement Learning
AAAI 2025
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning
AAAI 2025
BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds
RSS 2025
<
1
…
11
12
13
…
155
>