Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Keywords
policy optimization
630 papers
Explore in graph
Also known as
GRPO
POLO
MAPO
PO
PPO
Co-occurring keywords
reinforcement learning
(4122)
markov decision process
(788)
offline reinforcement learning
(492)
deep reinforcement learning
(903)
model-based reinforcement learning
(415)
large language model
(12755)
safe reinforcement learning
(119)
policy learning
(699)
value function
(294)
regret bound
(1918)
Papers
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
ICML 2023
Revisiting the Minimalist Approach to Offline Reinforcement Learning
NIPS 2023
Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
NIPS 2023
Multi-Agent First Order Constrained Optimization in Policy Space
NIPS 2023
Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
ICML 2023
Tuning Computer Vision Models With Task Rewards
ICML 2023
Anytime-Competitive Reinforcement Learning with Policy Prior
NIPS 2023
Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization
ICML 2023
Multi-Modal Inverse Constrained Reinforcement Learning from a Mixture of Demonstrations
NIPS 2023
Learning from Active Human Involvement through Proxy Value Propagation
NIPS 2023
A Rigorous Risk-aware Linear Approach to Extended Markov Ratio Decision Processes with Embedded Learning
IJCAI 2023
Bi-Level Offline Policy Optimization with Limited Exploration
NIPS 2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
NIPS 2023
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
ICML 2023
How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization
NIPS 2023
Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms
NIPS 2023
Optimal Decision Tree Policies for Markov Decision Processes
IJCAI 2023
A Connection between One-Step RL and Critic Regularization in Reinforcement Learning
ICML 2023
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
NIPS 2023
Tempo Adaptation in Non-stationary Reinforcement Learning
NIPS 2023
VOCE: Variational Optimization with Conservative Estimation for Offline Safe Reinforcement Learning
NIPS 2023
Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback
ICML 2023
User-Oriented Robust Reinforcement Learning
AAAI 2023
Instance-Dependent Confidence and Early Stopping for Reinforcement Learning
JMLR 2023
Sequential Counterfactual Risk Minimization
ICML 2023
<
1
…
11
12
13
…
26
>