Xuandong Zhao

31 papers · 2021–2025 · 11 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (11) 🐝 Cross-Pollinator (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (8)

🗺️ Taxonomy Completionist (43) 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (15) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (25) 🏆 Keyword Champion (4) 💎 Century Club (31) ⚡ Prolific Year (10) 🗃️ Keyword Collector (116) 🔥 Unstoppable (5)

Conferences

ACL (6) ICML (6) EMNLP (5) ICLR (5) NAACL (2) NIPS (2) AAAI (1) AACL (1) AISTATS (1) IJCNLP (1) UAI (1)

Top co-authors

Lei Li (15) Yu-Xiang Wang (11) Dawn Song (9) Kaiwen Zhou (4) Xin Eric Wang (4) Jayanth Srinivasa (3) Chengzhi Liu (3) Gaowen Liu (3) Tianneng Shi (2) Liangming Pan (2)

Research topics

Privacy (4) Differential Privacy (1)

Keywords

large language model (7) adversarial attack (4) text watermarking (4) knowledge distillation (3) text generation (3) language model (3) large reasoning model (3) watermark detection (3) adaptive online learning (2) safety assessment (2) text classification (2) prompt injection (2) intellectual property protection (2) differential privacy (2) ai-generated text detection (2) natural language inference (1) model distillation (1) online learning (1) attention mechanism (1) image generation (1)

Papers

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning EMNLP 2025 AGENTVIGIL: Automatic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents EMNLP 2025 The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 AACL 2025 An Undetectable Watermark for Generative Image Models ICLR 2025 MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models ICLR 2025 Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs ICLR 2025 Multimodal Situational Safety ICLR 2025 DIS-CO: Discovering Copyrighted Content in VLMs Training Data ICML 2025 A Practical Examination of AI-Generated Text Detectors for Large Language Models NAACL 2025 Efficiently Identifying Watermarked Segments in Mixed-Source Texts ACL 2025 Weak-to-Strong Jailbreaking on Large Language Models ICML 2025 Improving LLM Safety Alignment with Dual-Objective Optimization ICML 2025 The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 IJCNLP 2025 CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification AAAI 2025 Invisible Image Watermarks Are Provably Removable Using Generative AI NIPS 2024 Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature NIPS 2024 GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick ACL 2024 Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement ACL 2024 Watermarking for Large Language Models ACL 2024 MarkLLM: An Open-Source Toolkit for LLM Watermarking EMNLP 2024 A Survey on Detection of LLMs-Generated Content EMNLP 2024 Provable Robust Watermarking for AI-Generated Text ICLR 2024 DE-COP: Detecting Copyrighted Content in Language Models Training Data ICML 2024 Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews ICML 2024 Pre-trained Language Models Can be Fully Zero-Shot Learners ACL 2023 Protecting Language Generation Models via Invisible Watermarking ICML 2023 Private Prediction Strikes Back! Private Kernelized Nearest Neighbors with Individual Rényi Filter UAI 2023 Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation ACL 2022 Provably Confidential Language Modelling NAACL 2022 Distillation-Resistant Watermarking for Model Protection in NLP EMNLP 2022 An Optimal Reduction of TV-Denoising to Adaptive Online Learning AISTATS 2021