reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning EACL 2026

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs EACL 2026

Memory-Augmented Representation for Efficient Event-based Visuomotor Policy Learning with Adaptive Perception and Control WACV 2026

SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space EACL 2026

Tandem Training for Language Models EACL 2026

VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery EACL 2026

Reinforcement Learning-based Adaptive Control of Classifier-Free Guidance and Timestep Embeddings in Diffusion Models WACV 2026

Hestia: Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction WACV 2026

Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning EACL 2026

Reasoning or Knowledge: Stratified Evaluation of Biomedical LLMs EACL 2026

Incentivizing Strong Reasoning from Weak Supervision EACL 2026

A Reinforcement Learning Framework for Robust and Secure LLM Watermarking EACL 2026

Plasticity vs. Rigidity: The Impact of Low-Rank Adapters on Reasoning on a Micro-Budget EACL 2026

KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation EACL 2026

SAFER-AiD: Saccade-Assisted Foveal-peripheral vision Enhanced Reconstruction for Adversarial Defense WACV 2026

TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning WACV 2026

No MoCap Needed: Post-Training Motion Diffusion Models with Reinforcement Learning using Only Textual Prompts WACV 2026

MageBench: Bridging Large Multimodal Models to Agents WACV 2026

ST-Think: How Multimodal Large Language Models Reason About 4D Worlds from Ego-Centric Videos WACV 2026

SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models EACL 2026

AutoBool: Reinforcement-Learned LLM for Effective Automatic Systematic Reviews Boolean Query Generation EACL 2026

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR EACL 2026

Pseudo-Likelihood Training for Reasoning Diffusion Language Models EACL 2026

Offline Preference Optimization via Maximum Marginal Likelihood Estimation EACL 2026

LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-Correction EACL 2026