Yuhui Wang
19 papers · 2019–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Academic Marathon (6) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (8) π Cross-Pollinator (13)
πΊοΈ
Taxonomy Completionist
(27)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π§¬
Topic Evolution
π
Grand Slam
π
Conference Pioneer
β‘
Prolific Year
(6)
π
Century Club
(15)
Conferences
ACL (4)
ICML (4)
AAAI (3)
EMNLP (2)
ICLR (2)
NIPS (2)
ICCV (1)
UAI (1)
Top co-authors
Keywords
reinforcement learning
(5)
large language model
(4)
language model
(2)
trust region
(2)
proximal policy optimization
(2)
policy optimization
(2)
sample efficiency
(2)
deep reinforcement learning
(2)
chain-of-thought reasoning
(1)
mathematical reasoning
(1)
variational inference
(1)
question answering
(1)
sequential decision making
(1)
video understanding
(1)
robot control
(1)
uncertainty quantification
(1)
partially observable markov decision process
(1)
behavior cloning
(1)
belief propagation
(1)
instruction tuning
(1)
Papers
AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models
ACL 2026
VRPO: Rethinking Value Modeling for Robust RL under Noisy Supervision in LLM Post-Training
ACL 2026
LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models
ACL 2026
MetaAct-RL: Training Language Models for Reasoning Through Meta-Action-Based Reinforcement Learning
AAAI 2026
RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction
ICLR 2025
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
ACL 2025
Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning
EMNLP 2025
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
EMNLP 2025
Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
ICML 2025
Directly Forecasting Belief for Reinforcement Learning with Delays
ICML 2025
Variational Delayed Policy Optimization
NIPS 2024
Highway Value Iteration Networks
ICML 2024
Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays
ICML 2024
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
ICLR 2023
Learning to Identify Critical States for Reinforcement Learning from Videos
ICCV 2023
Deep Recurrent Belief Propagation Network for POMDPs
AAAI 2021
SMIX(Ξ»): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning
AAAI 2020
Trust Region-Guided Proximal Policy Optimization
NIPS 2019
Truly Proximal Policy Optimization
UAI 2019