conftrace_

reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Diversify Question Generation with Retrieval-Augmented Style Transfer EMNLP 2023

Efficient Model-Free Exploration in Low-Rank MDPs NIPS 2023

Exploiting Proximity-Aware Tasks for Embodied Social Navigation ICCV 2023

Near-optimal Conservative Exploration in Reinforcement Learning under Episode-wise Constraints ICML 2023

Reward-Mixing MDPs with Few Latent Contexts are Learnable ICML 2023

LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework ICML 2023

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal ICML 2023

Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control AAAI 2023

Scaling Laws for Reward Model Overoptimization ICML 2023

A Coupled Flow Approach to Imitation Learning ICML 2023

Composing Efficient, Robust Tests for Policy Selection UAI 2023

Non-stationary Reinforcement Learning under General Function Approximation ICML 2023

Weighted Sampling without Replacement for Deep Top-$k$ Classification ICML 2023

Guiding Pretraining in Reinforcement Learning with Large Language Models ICML 2023

Lower Bounds for Learning in Revealing POMDPs ICML 2023

PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient ICML 2023

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback NIPS 2022

Batch size-invariance for policy optimization NIPS 2022

Efficient Learning for AlphaZero via Path Consistency ICML 2022

Toward Compositional Generalization in Object-Oriented World Modeling ICML 2022

Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning ICML 2022

Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble IJCAI 2022

Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes ICML 2022

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach ICML 2022

Transformer-based Objective-reinforced Generative Adversarial Network to Generate Desired Molecules IJCAI 2022