Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Keywords
on-policy sampling
2 papers
Explore in graph
Co-occurring keywords
direct preference optimization
(317)
language model alignment
(142)
reward model
(251)
data efficiency
(124)
monte carlo estimator
(15)
preference datum
(28)
iterative training
(38)
alignment training
(15)
off-policy sampling
(2)
policy evaluation
(115)
Papers
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization
ACL 2025
Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
NIPS 2022
<
1
>