← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning

2932 directly classified papers

Papers per year

Papers

Debiasing Online Preference Learning via Preference Feature Preservation ACL 2025

Confidence Improves Self-Consistency in LLMs ACL 2025

Adversarial Preference Learning for Robust LLM Alignment ACL 2025

MWPO: Enhancing LLMs Performance through Multi-Weight Preference Strength and Length Optimization ACL 2025

A Reinforcement Learning Framework for Cross-Lingual Stance Detection Using Chain-of-Thought Alignment ACL 2025

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification ACL 2025

LookAlike: Consistent Distractor Generation in Math MCQs ACL 2025

Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL ACL 2025

Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning ACL 2025

Removing Prompt-template Bias in Reinforcement Learning from Human Feedback ACL 2025

DynaQuest: A Dynamic Question Answering Dataset Reflecting Real-World Knowledge Updates ACL 2025

Sparse Rewards Can Self-Train Dialogue Agents ACL 2025

Training Language Model to Critique for Better Refinement ACL 2025

Search-in-Context: Efficient Multi-Hop QA over Long Contexts via Monte Carlo Tree Search with Dynamic KV Retrieval ACL 2025

Henry at BEA 2025 Shared Task: Improving AI Tutor’s Guidance Evaluation Through Context-Aware Distillation ACL 2025

Optimising Spatial Teamwork Under Uncertainty AAAI 2025

The Bandit Whisperer: Communication Learning for Restless Bandits AAAI 2025

Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data? ACL 2025

Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems AAAI 2025

LiteSearch: Efficient Tree Search with Dynamic Exploration Budget for Math Reasoning AAAI 2025

Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning AAAI 2025

Revelations: A Decidable Class of POMDPs with Omega-Regular Objectives AAAI 2025

Counterfactual Online Learning for Open-Loop Monte-Carlo Planning AAAI 2025

Formally Verified Approximate Policy Iteration AAAI 2025

Aligning to What? Limits to RLHF Based Alignment NAACL 2025