Dong Yan

16 papers · 2019–2026 · 7 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🏃 Academic Marathon (6) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (6)

🏃 Academic Marathon (6) 🧭 Keyword Pioneer 🐝 Cross-Pollinator (6) 🧬 Topic Evolution 🔥 Unstoppable (7) 💎 Century Club (15) ⚡ Prolific Year (5) 🗃️ Keyword Collector (53)

Conferences

IJCAI (4) AAAI (3) ICLR (3) ACL (2) ICML (2) EMNLP (1) JMLR (1)

Top co-authors

Jun Zhu (9) Hang Su (7) Jialian Li (4) Tongzheng Ren (3) Xinning Zhou (2) Ning Chen (2) Chengyang Ying (2) Jiayi Weng (2) jian xie (2) Haosheng Zou (2)

Keywords

reinforcement learning from human feedback (3) language model alignment (2) reward modeling (2) policy learning (2) multi-task learning (1) off-policy evaluation (1) reinforcement learning (1) mathematical reasoning (1) robust optimization (1) preference learning (1) preference alignment (1) preference optimization (1) policy gradient (1) hierarchical reinforcement learning (1) sample efficiency (1) deep reinforcement learning (1) nash equilibrium (1) importance sampling (1) continuous control (1) model-agnostic meta-learning (1)

Papers

What If Consensus Lies? Selective-Complementary Reinforcement Learning at Test Time ACL 2026 Learning LLM-as-a-Judge for Preference Alignment ICLR 2025 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward ICLR 2025 Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling AAAI 2025 Reward Generalization in RLHF: A Topological Perspective ACL 2025 STAIR: Improving Safety Alignment with Introspective Reasoning ICML 2025 Exploring the LLM Journey from Cognition to Expression with Linear Representations ICML 2024 Reward Modeling Requires Automatic Adjustment Based on Data Quality EMNLP 2024 On the Reuse Bias in Off-Policy Reinforcement Learning IJCAI 2023 Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model AAAI 2022 Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk IJCAI 2022 Tianshou: A Highly Modularized Deep Reinforcement Learning Library JMLR 2022 Learning Task-Distribution Reward Shaping with Meta-Learning AAAI 2021 Combining Tree Search and Action Prediction for State-of-the-Art Performance in DouDiZhu IJCAI 2021 Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information ICLR 2020 Playing FPS Games With Environment-Aware Hierarchical Reinforcement Learning IJCAI 2019