reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

What Can You Do with a Rock? Affordance Extraction via Word Embeddings IJCAI 2017

Deep Reinforcement Learning from Human Preferences NIPS 2017

Multi-View Decision Processes: The Helper-AI Problem NIPS 2017

Successor Features for Transfer in Reinforcement Learning NIPS 2017

Multi-Task Learning for Contextual Bandits NIPS 2017

QMDP-Net: Deep Learning for Planning under Partial Observability NIPS 2017

MUSE: Modularizing Unsupervised Sense Embeddings EMNLP 2017

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds NIPS 2017

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning EMNLP 2017

Is the Bellman residual a bad proxy? NIPS 2017

Online Reinforcement Learning in Stochastic Games NIPS 2017

Task-Oriented Query Reformulation with Reinforcement Learning EMNLP 2017

Affordable On-line Dialogue Policy Learning EMNLP 2017

Learning how to Active Learn: A Deep Reinforcement Learning Approach EMNLP 2017

Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning EMNLP 2017

Speeding up Reinforcement Learning-based Information Extraction Training using Asynchronous Methods EMNLP 2017

Learning Combinatorial Optimization Algorithms over Graphs NIPS 2017

Learning what to read: Focused machine reading EMNLP 2017

Saliency-based Sequential Image Attention with Multiset Prediction NIPS 2017

Coarse-to-Fine Question Answering for Long Documents ACL 2017

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access ACL 2017

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning ACL 2017

Towards Generalization and Simplicity in Continuous Control NIPS 2017

Sentence Simplification with Deep Reinforcement Learning EMNLP 2017

Semi-Supervised QA with Generative Domain-Adaptive Nets ACL 2017