Jiayi Zhou
6 papers · 2023–2025 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+1 more ↓ Show less ↑
π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (4) π Cross-Pollinator (7) π Renaissance Researcher (5)
πΊοΈ
Taxonomy Completionist
(21)
Conferences
ACL (3)
AAAI (1)
JMLR (1)
NIPS (1)
Top co-authors
Keywords
reinforcement learning from human feedback
(3)
safe reinforcement learning
(2)
reward modeling
(2)
language model alignment
(2)
constraint optimization
(1)
preference optimization
(1)
policy learning
(1)
ai safety
(1)
safety alignment
(1)
risk minimization
(1)
constraint satisfaction
(1)
data compression
(1)
bayesian network
(1)
sequence-to-sequence learning
(1)
safety benchmark
(1)
human preference
(1)
preference datum
(1)
agent system
(1)
alignment fine-tuning
(1)
model elasticity
(1)
Papers
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
AAAI 2025
Language Models Resist Alignment: Evidence From Data Compression
ACL 2025
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
ACL 2025
Reward Generalization in RLHF: A Topological Perspective
ACL 2025
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
JMLR 2024
Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark
NIPS 2023