Dawn Song

105 papers · 2009–2026 · 16 conferences · across top CS/AI conferences

Achievements

+19 more ↓

🗺️ Taxonomy Completionist (14) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (4) 🏠 Conference Loyalist (20) 👥 Mega-Team (34) 🤝 Dynamic Duo (32) 🔬 Deep Specialist (16) 🧬 Topic Evolution 🏆 Keyword Champion 👑 Triple Crown 🏆 Grand Slam 📈 Trend Setter 💎 Century Club (102) ⚡ Prolific Year (8) 🔥 Unstoppable (8) ❓ The Questioner (2) 🗃️ Keyword Collector (289) 🚀 Conference Pioneer

Conferences

ICLR (24) NIPS (20) ICML (17) EMNLP (10) ACL (8) CVPR (8) IJCAI (4) AAAI (2) AISTATS (2) ECCV (2) ICCV (2) IJCNLP (2) AACL (1) COLING (1) OSDI (1) UAI (1)

Top co-authors

Bo Li (32) Dan Hendrycks (14) Xinyun Chen (13) Ruoxi Jia (12) Chenguang Wang (11) Xuandong Zhao (9) Chaowei Xiao (8) Jacob Steinhardt (8) Mantas Mazeika (8) Chang Liu (6)

Research topics

Differential Privacy (1) Privacy (1)

Keywords

adversarial example (8) backdoor attack (6) large language model (6) adversarial attack (6) program synthesis (6) adversarial learning (5) adversarial robustness (5) language model (5) neural network (5) federated learning (5) transfer learning (5) zero-shot learning (4) attention mechanism (4) differential privacy (4) code generation (4) adversarial training (4) relation extraction (3) diffusion model (3) instruction tuning (3) anomaly detection (3)

Papers

Can Editing LLMs Inject Harm? AAAI 2026 Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities ACL 2026 dLLM: Simple Diffusion Language Modeling ACL 2026 MLAN: Language-Based Instruction Tuning Preserves and Transfers Knowledge in Multimodal Language Models ACL 2025 Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI ICML 2025 Position: Political Neutrality in AI Is Impossible — But Here Is How to Approximate It ICML 2025 Multimodal Situational Safety ICLR 2025 Tamper-Resistant Safeguards for Open-Weight LLMs ICLR 2025 AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories ICLR 2025 MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models ICLR 2025 An Undetectable Watermark for Generative Image Models ICLR 2025 Data Shapley in One Training Run ICLR 2025 Capturing the Temporal Dependence of Training Data Influence ICLR 2025 The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 IJCNLP 2025 Improving LLM Safety Alignment with Dual-Objective Optimization ICML 2025 GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning ICML 2025 AGENTVIGIL: Automatic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents EMNLP 2025 CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification AAAI 2025 SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning EMNLP 2025 The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 AACL 2025 Position: Formal Mathematical Reasoning—A New Frontier in AI ICML 2025 COSMIC: Generalized Refusal Direction Identification in LLM Activations ACL 2025 Effective and Efficient Federated Tree Learning on Hybrid Data ICLR 2024 BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models EMNLP 2024 Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters EMNLP 2024 RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content ICML 2024 SHINE: Shielding Backdoors in Deep Reinforcement Learning ICML 2024 Position: Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation ICML 2024 Data Free Backdoor Attacks NIPS 2024 Boosting Alignment for Post-Unlearning Text-to-Image Generative Models NIPS 2024 RedCode: Risky Code Execution and Generation Benchmark for Code Agents NIPS 2024 AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases NIPS 2024 GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration NIPS 2024 Position: On the Societal Impact of Open Foundation Models ICML 2024 C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models ICML 2024 Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression ICML 2024 Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning ACL 2024 Agent Instructs Large Language Models to be General Zero-Shot Reasoners ICML 2024 GRATH: Gradual Self-Truthifying for Large Language Models ICML 2024 The False Promise of Imitating Proprietary Language Models ICLR 2024 BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning NIPS 2023 DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models NIPS 2023 DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification NIPS 2023 Byzantine-Robust Federated Learning with Optimal Statistical Rates AISTATS 2023 TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets CVPR 2023 DensePure: Understanding Diffusion Models for Adversarial Robustness ICLR 2023 Adversarial Collaborative Learning on Non-IID Features ICML 2023 Secure Federated Correlation Test and Entropy Estimation ICML 2023 IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models EMNLP 2022 Joint Language Semantic and Structure Embedding for Knowledge Graph Completion COLING 2022 Scaling Out-of-Distribution Detection for Real-World Settings ICML 2022 PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion EMNLP 2022 Benchmarking Language Models for Code Syntax Understanding EMNLP 2022 Perturbation type categorization for multiple adversarial perturbation robustness UAI 2022 How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios NIPS 2022 Differentially Private Fractional Frequency Moments Estimation with Polylogarithmic Space ICLR 2022 Forecasting Future World Events With Neural Networks NIPS 2022 DeepStruct: Pretraining of Language Models for Structure Prediction ACL 2022 PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures CVPR 2022 Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams NIPS 2021 Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages NIPS 2021 Model-Contrastive Federated Learning CVPR 2021 Grounded Graph Decoding improves Compositional Generalization in Question Answering EMNLP 2021 Zero-Shot Information Extraction as a Unified Text-to-Triple Translation EMNLP 2021 Measuring Massive Multitask Language Understanding ICLR 2021 Aligning AI With Shared Human Values ICLR 2021 TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models ICML 2021 The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization ICCV 2021 Natural Adversarial Examples CVPR 2021 Scalability vs. Utility: Do We Have To Sacrifice One for the Other in Data Importance Quantification? CVPR 2021 PlotCoder: Hierarchical Decoding for Synthesizing Visualization Code in Programmatic Context ACL 2021 PlotCoder: Hierarchical Decoding for Synthesizing Visualization Code in Programmatic Context IJCNLP 2021 BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning IJCAI 2021 Practical One-Shot Federated Learning for Cross-Silo Setting IJCAI 2021 Robust anomaly detection and backdoor attack detection via differential privacy ICLR 2020 Imitation Attacks and Defenses for Black-box Machine Translation Systems EMNLP 2020 Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension ICLR 2020 The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks CVPR 2020 Pretrained Transformers Improve Out-of-Distribution Robustness ACL 2020 Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis NIPS 2020 Towards practical differentially private causal graph discovery NIPS 2020 Compositional Generalization via Neural-Symbolic Stack Machines NIPS 2020 Towards Efficient Data Valuation Based on the Shapley Value AISTATS 2019 Execution-Guided Neural Program Synthesis ICLR 2019 GamePad: A Learning Environment for Theorem Proving ICLR 2019 Characterizing Audio Adversarial Examples Using Temporal Dependency ICLR 2019 AdvIT: Adversarial Frames Identifier Based on Temporal Consistency in Videos ICCV 2019 Synthetic Datasets for Neural Program Synthesis ICLR 2019 Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty NIPS 2019 Fooling Vision and Language Models Despite Localization and Attention Mechanism CVPR 2018 Robust Physical-World Attacks on Deep Learning Visual Classification CVPR 2018 Towards Synthesizing Complex Programs From Input-Output Examples ICLR 2018 Decision Boundary Analysis of Adversarial Examples ICLR 2018 Improving Neural Program Synthesis with Inferred Execution Traces NIPS 2018 Curriculum Adversarial Training IJCAI 2018 Generating Adversarial Examples with Adversarial Networks IJCAI 2018 Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation ECCV 2018 Spatially Transformed Adversarial Examples ICLR 2018 Tree-to-tree Neural Networks for Program Translation NIPS 2018 Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality ICLR 2018 Parametrized Hierarchical Procedures for Neural Programming ICLR 2018 Practical Black-box Attacks on Deep Neural Networks using Efficient Query Mechanisms ECCV 2018 Latent Attention For If-Then Program Synthesis NIPS 2016 Code-Pointer Integrity OSDI 2014 Tracking Dynamic Sources of Malicious Activity at Internet Scale NIPS 2009