reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Mutual Alignment Transfer Learning CORL 2017

CARLA: An Open Urban Driving Simulator CORL 2017

Exploration-Exploitation in MDPs with Options AISTATS 2017

Optimistic Planning for the Stochastic Knapsack Problem AISTATS 2017

ParlAI: A Dialog Research Software Platform EMNLP 2017

Learning Simple Algorithms from Examples ICML 2016

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning ICML 2016

Generalization and Exploration via Randomized Value Functions ICML 2016

Near Optimal Behavior via Approximate State Abstraction ICML 2016

Hierarchical Decision Making In Electricity Grid Management ICML 2016

On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games AISTATS 2016

A PAC RL Algorithm for Episodic POMDPs AISTATS 2016

Combined Optimization and Reinforcement Learning for Manipulation Skills RSS 2016

Hierarchical Relative Entropy Policy Search JMLR 2016

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning JMLR 2016

End-To-End Learning of Action Detection From Frame Glimpses in Videos CVPR 2016

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization ICML 2016

Anytime optimal algorithms in stochastic multi-armed bandits ICML 2016

Black-Box Policy Search with Probabilistic Programs AISTATS 2016

Adaptive Skills Adaptive Partitions (ASAP) NIPS 2016

Regularized Policy Iteration with Nonparametric Function Spaces JMLR 2016

Differentially Private Policy Evaluation ICML 2016

True Online Temporal-Difference Learning JMLR 2016

Strategic Attentive Writer for Learning Macro-Actions NIPS 2016

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes NIPS 2016