conftrace_

reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation ICML 2022

AnyMorph: Learning Transferable Polices By Inferring Agent Morphology ICML 2022

A Temporal-Difference Approach to Policy Gradient Estimation ICML 2022

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses ICML 2022

Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems ICML 2022

Symmetric Machine Theory of Mind ICML 2022

Short-Term Plasticity Neurons Learning to Learn and Forget ICML 2022

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning ICML 2022

Evolving Curricula with Regret-Based Environment Design ICML 2022

History Compression via Language Models in Reinforcement Learning ICML 2022

Improved Regret for Differentially Private Exploration in Linear MDP ICML 2022

The Importance of Non-Markovianity in Maximum State Entropy Exploration ICML 2022

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer ICML 2022

Learning Stochastic Shortest Path with Linear Function Approximation ICML 2022

Distributionally Robust $Q$-Learning ICML 2022

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks ICML 2022

Large Batch Experience Replay ICML 2022

Supervised Off-Policy Ranking ICML 2022

Action-Sufficient State Representation Learning for Control with Structural Constraints ICML 2022

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation ICML 2022

Contextual Information-Directed Sampling ICML 2022

Temporal Difference Learning for Model Predictive Control ICML 2022

A Parametric Class of Approximate Gradient Updates for Policy Optimization ICML 2022

Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity ICML 2022

SURF: Semantic-level Unsupervised Reward Function for Machine Translation NAACL 2022