Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Debiasing Online Preference Learning via Preference Feature Preservation
ACL 2025
Confidence Improves Self-Consistency in LLMs
ACL 2025
Adversarial Preference Learning for Robust LLM Alignment
ACL 2025
MWPO: Enhancing LLMs Performance through Multi-Weight Preference Strength and Length Optimization
ACL 2025
A Reinforcement Learning Framework for Cross-Lingual Stance Detection Using Chain-of-Thought Alignment
ACL 2025
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
ACL 2025
LookAlike: Consistent Distractor Generation in Math MCQs
ACL 2025
Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
ACL 2025
Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning
ACL 2025
Removing Prompt-template Bias in Reinforcement Learning from Human Feedback
ACL 2025
DynaQuest: A Dynamic Question Answering Dataset Reflecting Real-World Knowledge Updates
ACL 2025
Sparse Rewards Can Self-Train Dialogue Agents
ACL 2025
Training Language Model to Critique for Better Refinement
ACL 2025
Search-in-Context: Efficient Multi-Hop QA over Long Contexts via Monte Carlo Tree Search with Dynamic KV Retrieval
ACL 2025
Henry at BEA 2025 Shared Task: Improving AI Tutor’s Guidance Evaluation Through Context-Aware Distillation
ACL 2025
Optimising Spatial Teamwork Under Uncertainty
AAAI 2025
The Bandit Whisperer: Communication Learning for Restless Bandits
AAAI 2025
Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?
ACL 2025
Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems
AAAI 2025
LiteSearch: Efficient Tree Search with Dynamic Exploration Budget for Math Reasoning
AAAI 2025
Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning
AAAI 2025
Revelations: A Decidable Class of POMDPs with Omega-Regular Objectives
AAAI 2025
Counterfactual Online Learning for Open-Loop Monte-Carlo Planning
AAAI 2025
Formally Verified Approximate Policy Iteration
AAAI 2025
Aligning to What? Limits to RLHF Based Alignment
NAACL 2025
<
1
…
16
17
18
…
118
>