Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Keywords
policy optimization
630 papers
Explore in graph
Also known as
GRPO
POLO
MAPO
PO
PPO
Co-occurring keywords
reinforcement learning
(4122)
markov decision process
(788)
offline reinforcement learning
(492)
deep reinforcement learning
(903)
model-based reinforcement learning
(415)
large language model
(12755)
safe reinforcement learning
(119)
policy learning
(699)
value function
(294)
regret bound
(1918)
Papers
Policy Optimization as Online Learning with Mediator Feedback
AAAI 2021
Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks
AAAI 2021
Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach
AAAI 2021
UBAR: Towards Fully End-to-End Task-Oriented Dialog System with GPT-2
AAAI 2021
Learning One Representation to Optimize All Rewards
NIPS 2021
Near Optimal Policy Optimization via REPS
NIPS 2021
Iterative Amortized Policy Optimization
NIPS 2021
Clipping Loops for Sample-Efficient Dialogue Policy Optimisation
NAACL 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
COLT 2021
Safe Driving via Expert Guided Policy Optimization
CORL 2021
Exploring Dynamic Selection of Branch Expansion Orders for Code Generation
IJCNLP 2021
On Effective Scheduling of Model-based Reinforcement Learning
NIPS 2021
Near-optimal Regret Bounds for Stochastic Shortest Path
ICML 2020
Learning to Score Behaviors for Guided Policy Optimization
ICML 2020
Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination
ICML 2020
Batch Reinforcement Learning with Hyperparameter Gradients
ICML 2020
Bidirectional Model-based Policy Optimization
ICML 2020
Monte-Carlo Tree Search as Regularized Policy Optimization
ICML 2020
On the Expressivity of Neural Networks for Deep Reinforcement Learning
ICML 2020
A distributional view on multi-objective policy optimization
ICML 2020
Stealthy and Efficient Adversarial Attacks against Deep Reinforcement Learning
AAAI 2020
Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
AAAI 2020
Policy Search by Target Distribution Learning for Continuous Control
AAAI 2020
Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration
NIPS 2020
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
NIPS 2020
<
1
…
18
19
20
…
26
>