Dong Yan
16 papers · 2019–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
🏃 Academic Marathon (6) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (6)
🏃
Academic Marathon
(6)
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(6)
🧬
Topic Evolution
🔥
Unstoppable
(7)
💎
Century Club
(15)
⚡
Prolific Year
(5)
🗃️
Keyword Collector
(53)
Conferences
IJCAI (4)
AAAI (3)
ICLR (3)
ACL (2)
ICML (2)
EMNLP (1)
JMLR (1)
Top co-authors
Keywords
reinforcement learning from human feedback
(3)
language model alignment
(2)
reward modeling
(2)
policy learning
(2)
multi-task learning
(1)
off-policy evaluation
(1)
reinforcement learning
(1)
mathematical reasoning
(1)
robust optimization
(1)
preference learning
(1)
preference alignment
(1)
preference optimization
(1)
policy gradient
(1)
hierarchical reinforcement learning
(1)
sample efficiency
(1)
deep reinforcement learning
(1)
nash equilibrium
(1)
importance sampling
(1)
continuous control
(1)
model-agnostic meta-learning
(1)
Papers
What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time
ACL 2026
Learning LLM-as-a-Judge for Preference Alignment
ICLR 2025
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
ICLR 2025
Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling
AAAI 2025
Reward Generalization in RLHF: A Topological Perspective
ACL 2025
STAIR: Improving Safety Alignment with Introspective Reasoning
ICML 2025
Exploring the LLM Journey from Cognition to Expression with Linear Representations
ICML 2024
Reward Modeling Requires Automatic Adjustment Based on Data Quality
EMNLP 2024
On the Reuse Bias in Off-Policy Reinforcement Learning
IJCAI 2023
Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model
AAAI 2022
Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk
IJCAI 2022
Tianshou: A Highly Modularized Deep Reinforcement Learning Library
JMLR 2022
Learning Task-Distribution Reward Shaping with Meta-Learning
AAAI 2021
Combining Tree Search and Action Prediction for State-of-the-Art Performance in DouDiZhu
IJCAI 2021
Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information
ICLR 2020
Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning
IJCAI 2019