reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning INTERSPEECH 2017

Interactive Learning of Grounded Verb Semantics towards Human-Robot Communication ACL 2017

Active Exploration for Learning Symbolic Representations NIPS 2017

Teaching Machines to Describe Images with Natural Language Feedback NIPS 2017

Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes NIPS 2017

Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols NIPS 2017

Natural Value Approximators: Learning when to Trust Past Estimates NIPS 2017

Bridging the Gap Between Value and Policy Based Reinforcement Learning NIPS 2017

Learning Unknown Markov Decision Processes: A Thompson Sampling Approach NIPS 2017

ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games NIPS 2017

Decoding with Value Networks for Neural Machine Translation NIPS 2017

A multi-agent reinforcement learning model of common-pool resource appropriation NIPS 2017

From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood ACL 2017

Deep Reinforcement Learning-Based Image Captioning With Embedding Reward CVPR 2017

Deep 360 Pilot: Learning a Deep Agent for Piloting Through 360deg Sports Videos CVPR 2017

A Reinforcement Learning Approach to the View Planning Problem CVPR 2017

Self-Critical Sequence Training for Image Captioning CVPR 2017

Cold-Start Reinforcement Learning with Softmax Policy Gradient NIPS 2017

Zap Q-Learning NIPS 2017

Value Iteration Networks IJCAI 2017

On Thompson Sampling and Asymptotic Optimality IJCAI 2017

Learning Conversational Systems that Interleave Task and Non-Task Content IJCAI 2017

A Monte Carlo Tree Search approach to Active Malware Analysis IJCAI 2017

Tensor Based Knowledge Transfer Across Skill Categories for Robot Control IJCAI 2017

Sequence Prediction with Unlabeled Data by Reward Function Learning IJCAI 2017