Rui Zheng

43 papers · 2020–2026 · 11 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🌍 Conference Polyglot (11) 🏃 Academic Marathon (5) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (34) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (34) 🔬 Deep Specialist (10) 🧬 Topic Evolution 🚀 Conference Pioneer ⚡ Prolific Year (9) ❓ The Questioner 📈 Trend Setter 🔥 Unstoppable (6) 🗃️ Keyword Collector (170) 💎 Century Club (40)

Conferences

ACL (15) EMNLP (9) AAAI (4) COLING (4) ICLR (3) ICML (2) MICCAI (2) CVPR (1) IJCAI (1) IJCNLP (1) NIPS (1)

Top co-authors

Qi Zhang (36) Tao Gui (33) Xuanjing Huang (32) Zhiheng Xi (20) Yuhao Zhou (12) Shihan Dou (12) Xiao Wang (9) Wei Shen (7) Songyang Gao (7) Qin Liu (6)

Keywords

large language model (9) language model (5) reinforcement learning from human feedback (5) reinforcement learning (5) adversarial attack (4) reward model (4) text classification (4) preference alignment (3) reward modeling (3) contrastive learning (3) adversarial training (3) adversarial defense (3) code generation (3) model robustness (3) language model alignment (3) continual learning (2) transfer learning (2) distribution shift (2) out-of-distribution generalization (2) adversarial robustness (2)

Papers

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study AAAI 2026 Time-Frequency Token Advantage Clipping for Training Efficient Large Reasoning Model AAAI 2026 MetaAct-RL: Training Language Models for Reasoning Through Meta-Action-Based Reinforcement Learning AAAI 2026 Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs ICLR 2025 Alleviating Shifted Distribution in Human Preference Alignment through Meta-Learning AAAI 2025 AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments ACL 2025 Multi-Programming Language Sandbox for LLMs ACL 2025 SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models CVPR 2025 Toward Optimal LLM Alignments Using Two-Player Games EMNLP 2025 Fine-Grained Manipulation of Arithmetic Neurons EMNLP 2025 RMB: Comprehensively benchmarking reward models in LLM alignment ICLR 2025 LoRAMoE: Alleviating World Knowledge Forgetting in Large Language Models via MoE-Style Plugin ACL 2024 StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback ACL 2024 Uncertainty Aware Learning for Language Model Alignment ACL 2024 Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation ACL 2024 Reliable Source Approximation: Source-Free Unsupervised Domain Adaptation for Vestibular Schwannoma MRI Segmentation MICCAI 2024 ORTicket: Let One Robust BERT Ticket Transfer across Different Tasks COLING 2024 Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals COLING 2024 RoCoSDF: Row-Column Scanned Neural Signed Distance Fields for Freehand 3D Ultrasound Imaging Shape Reconstruction MICCAI 2024 Improving Discriminative Capability of Reward Models in RLHF Using Contrastive Learning EMNLP 2024 Reward Modeling Requires Automatic Adjustment Based on Data Quality EMNLP 2024 Improving Generalization of Alignment with Human Preferences through Group Invariant Learning ICLR 2024 DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation NIPS 2024 Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback ICML 2024 Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning ICML 2024 Enhancing Contrastive Learning with Noise-Guided Attack: Towards Continual Relation Extraction in the Wild ACL 2024 Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement EMNLP 2023 CASN:Class-Aware Score Network for Textual Adversarial Detection ACL 2023 Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization ACL 2023 Characterizing the Impacts of Instances on Robustness ACL 2023 Detecting Adversarial Samples through Sharpness of Loss Landscape ACL 2023 Connectivity Patterns are Task Embeddings ACL 2023 Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback EMNLP 2023 RealBehavior: A Framework for Faithfully Characterizing Foundation Models’ Human-like Behavior Mechanisms EMNLP 2023 Orthogonal Subspace Learning for Language Model Continual Learning EMNLP 2023 PlugAT: A Plug and Play Module to Defend against Textual Adversarial Attack COLING 2022 Efficient Adversarial Training with Robust Early-Bird Tickets EMNLP 2022 Flooding-X: Improving BERT’s Resistance to Adversarial Attacks via Loss-Restricted Fine-Tuning ACL 2022 Robust Lottery Tickets for Pre-trained Language Models ACL 2022 Decorrelate Irrelevant, Purify Relevant: Overcome Textual Spurious Correlations from a Feature Perspective COLING 2022 TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing ACL 2021 TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing IJCNLP 2021 GestureDet: Real-time Student Gesture Analysis with Multi-dimensional Attention-based Detector IJCAI 2020