Ruiyang Sun
4 papers · 2023–2024 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓
🌍
Conference Polyglot
(3)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🐝
Cross-Pollinator
(15)
Conferences
NIPS (2)
ICLR (1)
JMLR (1)
Top co-authors
Keywords
safe reinforcement learning
(2)
constraint optimization
(1)
content moderation
(1)
policy learning
(1)
ai safety
(1)
safety alignment
(1)
reinforcement learning from human feedback
(1)
risk minimization
(1)
constraint satisfaction
(1)
safety benchmark
(1)
human preference
(1)
agent system
(1)
agent safety
(1)
safe policy optimization
(1)
large language model
(1)
reinforcement learning human feedback
(1)
policy optimization
(1)
human-preference dataset
(1)
Papers
Safe RLHF: Safe Reinforcement Learning from Human Feedback
ICLR 2024
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
JMLR 2024
Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark
NIPS 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
NIPS 2023