Co-occurring keywords
Papers
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
EMNLP 2024
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking
NIPS 2024