reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Sketch-Based Linear Value Function Approximation NIPS 2012

Non-parametric Approximate Dynamic Programming via the Kernel Method NIPS 2012

Neurally Plausible Reinforcement Learning of Working Memory Tasks NIPS 2012

Learned Prioritization for Trading Off Accuracy and Speed NIPS 2012

Weighted Likelihood Policy Search with Model Selection NIPS 2012

Hierarchical Optimistic Region Selection driven by Curiosity NIPS 2012

Timely Object Recognition NIPS 2012

Reducing Conservativeness in Safety Guarantees by Learning Disturbances Online: Iterated Guaranteed Safe Online Learning RSS 2012

Learning Partially Observable Models Using Temporally Abstract Decision Trees NIPS 2012

Tendon-Driven Variable Impedance Control Using Reinforcement Learning RSS 2012

Optimistic planning for Markov decision processes AISTATS 2012

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning NIPS 2012

On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes NIPS 2012

Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds NIPS 2012

Imitation Learning by Coaching NIPS 2012

A Bayesian Approach for Policy Learning from Trajectory Preference Queries NIPS 2012

Regularized Off-Policy TD-Learning NIPS 2012

Environmental statistics and the trade-off between model-based and TD learning in humans NIPS 2011

Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery NIPS 2011

Monte Carlo Value Iteration with Macro-Actions NIPS 2011

The Fixed Points of Off-Policy TD NIPS 2011

Transfer from Multiple MDPs NIPS 2011

TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning NIPS 2011

Blending Autonomous Exploration and Apprenticeship Learning NIPS 2011

A Reinforcement Learning Theory for Homeostatic Regulation NIPS 2011