Co-occurring keywords
Papers
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
EMNLP 2025
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
NIPS 2024
OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
AAAI 2024
Parameterized Projected Bellman Operator
AAAI 2024