reinforcement learning
4122 papers
Also known as
RLVR
HARL
GRPO
RL
PPO
REINFORCE
RFT
DRL
RL NULL
LQR
RLHF
Co-occurring keywords
Papers
Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch
NIPS 2024
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
NIPS 2024
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs
NIPS 2024
Policy Mirror Descent with Lookahead
NIPS 2024