conftrace_

reinforcement learning

4122 papers

Explore in graph

Also known as

RLVR HARL GRPO RL PPO REINFORCE RFT DRL RL NULL LQR RLHF

Co-occurring keywords

large language model (12755) policy learning (699) markov decision process (788) policy gradient (518) policy optimization (630) deep reinforcement learning (903) multi-agent system (1743) imitation learning (741) regret bound (1918) language model (4573)

Papers

The Importance of Non-Markovianity in Maximum State Entropy Exploration ICML 2022

Application of Neurosymbolic AI to Sequential Decision Making IJCAI 2022

Interactive Reinforcement Learning for Symbolic Regression from Multi-Format Human-Preference Feedbacks IJCAI 2022

VMAgent: A Practical Virtual Machine Scheduling Platform IJCAI 2022

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer ICML 2022

Learning Stochastic Shortest Path with Linear Function Approximation ICML 2022

Distributionally Robust $Q$-Learning ICML 2022

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks ICML 2022

Large Batch Experience Replay ICML 2022

Supervised Off-Policy Ranking ICML 2022

Action-Sufficient State Representation Learning for Control with Structural Constraints ICML 2022

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation ICML 2022

Contextual Information-Directed Sampling ICML 2022

Temporal Difference Learning for Model Predictive Control ICML 2022

A Parametric Class of Approximate Gradient Updates for Policy Optimization ICML 2022

Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity ICML 2022

SURF: Semantic-level Unsupervised Reward Function for Machine Translation NAACL 2022

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization AAAI 2022

Efficient Device Scheduling with Multi-Job Federated Learning AAAI 2022

Offline-to-Online Co-Evolutional User Simulator and Dialogue System EMNLP 2022

NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming AAAI 2022

Text Editing as Imitation Game EMNLP 2022

Masked Language Models Know Which are Popular: A Simple Ranking Strategy for Commonsense Question Answering EMNLP 2022

ScienceWorld: Is your Agent Smarter than a 5th Grader? EMNLP 2022

Guiding Abstractive Dialogue Summarization with Content Planning EMNLP 2022