Co-occurring keywords
Papers
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment
NAACL 2025
Continuous-Time Reward Machines
IJCAI 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
ICCV 2025
Training Medical QA Models Based on Mixed Rewards from Multiple-Choice and Open-Ended Questions
EMNLP 2025
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions
CVPR 2025