reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Agnostic KWIK learning and efficient approximate reinforcement learning COLT 2011

Speedy Q-Learning NIPS 2011

Improving Policy Gradient Estimates with Influence Information ACML 2011

A reinterpretation of the policy oscillation phenomenon in approximate policy iteration NIPS 2011

Selecting the State-Representation in Reinforcement Learning NIPS 2011

Robust Approximate Bilinear Programming for Value Function Approximation JMLR 2011

Generalized TD Learning JMLR 2011

Optimal Reinforcement Learning for Gaussian Systems NIPS 2011

Exploiting Best-Match Equations for Efficient Reinforcement Learning JMLR 2011

Learning to Agglomerate Superpixel Hierarchies NIPS 2011

Analysis and Improvement of Policy Gradient Estimation NIPS 2011

Convergent Fitted Value Iteration with Linear Function Approximation NIPS 2011

Policy Gradient Coagent Networks NIPS 2011

Dynamic Policy Programming with Function Approximation AISTATS 2011

Action-Gap Phenomenon in Reinforcement Learning NIPS 2011

A Convergent Online Single Time Scale Actor Critic Algorithm JMLR 2010

Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks NIPS 2010

Double Q-learning NIPS 2010

Model-Free Monte Carlo-like Policy Evaluation AISTATS 2010

Predictive State Temporal Difference Learning NIPS 2010

On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient NIPS 2010

Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks NIPS 2010

A Reduction from Apprenticeship Learning to Classification NIPS 2010

Variational methods for Reinforcement Learning AISTATS 2010

LSTD with Random Projections NIPS 2010