Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Methods
Reinforcement Learning
›
Methods
›
Policy Learning
2068 directly classified papers
Papers per year
2002: 6
2003: 1
2004: 1
2006: 11
2007: 10
2008: 14
2009: 9
2010: 23
2011: 15
2012: 25
2013: 25
2014: 24
2015: 23
2016: 27
2017: 61
2018: 107
2019: 187
2020: 216
2021: 274
2022: 259
2023: 321
2024: 247
2025: 153
2026: 29
Papers
Multi-Hop Knowledge Graph Reasoning with Reward Shaping
EMNLP 2018
Thread Popularity Prediction and Tracking with a Permutation-invariant Model
EMNLP 2018
A Study of Reinforcement Learning for Neural Machine Translation
EMNLP 2018
A Teacher-Student Framework for Maintainable Dialog Manager
EMNLP 2018
Joint Modeling for Query Expansion and Information Extraction with Reinforcement Learning
EMNLP 2018
Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations
EMNLP 2018
Macquarie University at BioASQ 6b: Deep learning and deep reinforcement learning for query-based summarisation
EMNLP 2018
CSGNet: Neural Shape Parser for Constructive Solid Geometry
CVPR 2018
A Bayesian Approach to Generative Adversarial Imitation Learning
NIPS 2018
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
NIPS 2018
Exponentially Weighted Imitation Learning for Batched Historical Data
NIPS 2018
Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning
NIPS 2018
An Off-policy Policy Gradient Theorem Using Emphatic Weightings
NIPS 2018
Constrained Cross-Entropy Method for Safe Reinforcement Learning
NIPS 2018
Learning Temporal Point Processes via Reinforcement Learning
NIPS 2018
Policy Optimization via Importance Sampling
NIPS 2018
Simple random search of static linear policies is competitive for reinforcement learning
NIPS 2018
Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes
NIPS 2018
End-to-End Reinforcement Learning for Automatic Taxonomy Induction
ACL 2018
Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning
ACL 2018
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
ACL 2018
From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction
ACL 2018
Bayesian Control of Large MDPs with Unknown Dynamics in Data-Poor Environments
NIPS 2018
A Case Study on the Importance of Belief State Representation for Dialogue Policy Management
INTERSPEECH 2018
Evolution-Guided Policy Gradient in Reinforcement Learning
NIPS 2018
<
1
…
68
69
70
…
83
>