Co-occurring keywords
Papers
CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization
EMNLP 2025
From General Reward to Targeted Reward: Improving Open-ended Long-context Generation Models
EMNLP 2025
In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
AAAI 2025