Dawn Song
105 papers · 2009–2026 · 16 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (14) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(4)
π
Conference Loyalist
(20)
π₯
Mega-Team
(34)
π€
Dynamic Duo
(32)
π¬
Deep Specialist
(16)
π§¬
Topic Evolution
π
Keyword Champion
π
Triple Crown
π
Grand Slam
π
Trend Setter
π
Century Club
(102)
β‘
Prolific Year
(8)
π₯
Unstoppable
(8)
β
The Questioner
(2)
ποΈ
Keyword Collector
(289)
π
Conference Pioneer
Conferences
ICLR (24)
NIPS (20)
ICML (17)
EMNLP (10)
ACL (8)
CVPR (8)
IJCAI (4)
AAAI (2)
AISTATS (2)
ECCV (2)
ICCV (2)
IJCNLP (2)
AACL (1)
COLING (1)
OSDI (1)
UAI (1)
Top co-authors
Research topics
Keywords
adversarial example
(8)
backdoor attack
(6)
large language model
(6)
adversarial attack
(6)
program synthesis
(6)
adversarial learning
(5)
adversarial robustness
(5)
language model
(5)
neural network
(5)
federated learning
(5)
transfer learning
(5)
zero-shot learning
(4)
attention mechanism
(4)
differential privacy
(4)
code generation
(4)
adversarial training
(4)
relation extraction
(3)
diffusion model
(3)
instruction tuning
(3)
anomaly detection
(3)
Papers
Can Editing LLMs Inject Harm?
AAAI 2026
Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities
ACL 2026
dLLM: Simple Diffusion Language Modeling
ACL 2026
MLAN: Language-Based Instruction Tuning Preserves and Transfers Knowledge in Multimodal Language Models
ACL 2025
Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI
ICML 2025
Position: Political Neutrality in AI Is Impossible β But Here Is How to Approximate It
ICML 2025
Multimodal Situational Safety
ICLR 2025
Tamper-Resistant Safeguards for Open-Weight LLMs
ICLR 2025
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
ICLR 2025
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
ICLR 2025
An Undetectable Watermark for Generative Image Models
ICLR 2025
Data Shapley in One Training Run
ICLR 2025
Capturing the Temporal Dependence of Training Data Influence
ICLR 2025
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
IJCNLP 2025
Improving LLM Safety Alignment with Dual-Objective Optimization
ICML 2025
GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning
ICML 2025
AGENTVIGIL: Automatic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents
EMNLP 2025
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
AAAI 2025
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
EMNLP 2025
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
AACL 2025
Position: Formal Mathematical ReasoningβA New Frontier in AI
ICML 2025
COSMIC: Generalized Refusal Direction Identification in LLM Activations
ACL 2025
Effective and Efficient Federated Tree Learning on Hybrid Data
ICLR 2024
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
EMNLP 2024
Hidden Persuaders: LLMsβ Political Leaning and Their Influence on Voters
EMNLP 2024
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
ICML 2024
SHINE: Shielding Backdoors in Deep Reinforcement Learning
ICML 2024
Position: Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation
ICML 2024
Data Free Backdoor Attacks
NIPS 2024
Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
NIPS 2024
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
NIPS 2024
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
NIPS 2024
GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration
NIPS 2024
Position: On the Societal Impact of Open Foundation Models
ICML 2024
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models
ICML 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
ICML 2024
Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
ACL 2024
Agent Instructs Large Language Models to be General Zero-Shot Reasoners
ICML 2024
GRATH: Gradual Self-Truthifying for Large Language Models
ICML 2024
The False Promise of Imitating Proprietary Language Models
ICLR 2024
BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning
NIPS 2023
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
NIPS 2023
DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification
NIPS 2023
Byzantine-Robust Federated Learning with Optimal Statistical Rates
AISTATS 2023
TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets
CVPR 2023
DensePure: Understanding Diffusion Models for Adversarial Robustness
ICLR 2023
Adversarial Collaborative Learning on Non-IID Features
ICML 2023
Secure Federated Correlation Test and Entropy Estimation
ICML 2023
IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models
EMNLP 2022
Joint Language Semantic and Structure Embedding for Knowledge Graph Completion
COLING 2022
Scaling Out-of-Distribution Detection for Real-World Settings
ICML 2022
PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion
EMNLP 2022
Benchmarking Language Models for Code Syntax Understanding
EMNLP 2022
Perturbation type categorization for multiple adversarial perturbation robustness
UAI 2022
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
NIPS 2022
Differentially Private Fractional Frequency Moments Estimation with Polylogarithmic Space
ICLR 2022
Forecasting Future World Events With Neural Networks
NIPS 2022
DeepStruct: Pretraining of Language Models for Structure Prediction
ACL 2022
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
CVPR 2022
Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams
NIPS 2021
Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages
NIPS 2021
Model-Contrastive Federated Learning
CVPR 2021
Grounded Graph Decoding improves Compositional Generalization in Question Answering
EMNLP 2021
Zero-Shot Information Extraction as a Unified Text-to-Triple Translation
EMNLP 2021
Measuring Massive Multitask Language Understanding
ICLR 2021
Aligning AI With Shared Human Values
ICLR 2021
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models
ICML 2021
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
ICCV 2021
Natural Adversarial Examples
CVPR 2021
Scalability vs. Utility: Do We Have To Sacrifice One for the Other in Data Importance Quantification?
CVPR 2021
PlotCoder: Hierarchical Decoding for Synthesizing Visualization Code in Programmatic Context
ACL 2021
PlotCoder: Hierarchical Decoding for Synthesizing Visualization Code in Programmatic Context
IJCNLP 2021
BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning
IJCAI 2021
Practical One-Shot Federated Learning for Cross-Silo Setting
IJCAI 2021
Robust anomaly detection and backdoor attack detection via differential privacy
ICLR 2020
Imitation Attacks and Defenses for Black-box Machine Translation Systems
EMNLP 2020
Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension
ICLR 2020
The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks
CVPR 2020
Pretrained Transformers Improve Out-of-Distribution Robustness
ACL 2020
Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis
NIPS 2020
Towards practical differentially private causal graph discovery
NIPS 2020
Compositional Generalization via Neural-Symbolic Stack Machines
NIPS 2020
Towards Efficient Data Valuation Based on the Shapley Value
AISTATS 2019
Execution-Guided Neural Program Synthesis
ICLR 2019
GamePad: A Learning Environment for Theorem Proving
ICLR 2019
Characterizing Audio Adversarial Examples Using Temporal Dependency
ICLR 2019
AdvIT: Adversarial Frames Identifier Based on Temporal Consistency in Videos
ICCV 2019
Synthetic Datasets for Neural Program Synthesis
ICLR 2019
Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
NIPS 2019
Fooling Vision and Language Models Despite Localization and Attention Mechanism
CVPR 2018
Robust Physical-World Attacks on Deep Learning Visual Classification
CVPR 2018
Towards Synthesizing Complex Programs From Input-Output Examples
ICLR 2018
Decision Boundary Analysis of Adversarial Examples
ICLR 2018
Improving Neural Program Synthesis with Inferred Execution Traces
NIPS 2018
Curriculum Adversarial Training
IJCAI 2018
Generating Adversarial Examples with Adversarial Networks
IJCAI 2018
Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation
ECCV 2018
Spatially Transformed Adversarial Examples
ICLR 2018
Tree-to-tree Neural Networks for Program Translation
NIPS 2018
Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality
ICLR 2018
Parametrized Hierarchical Procedures for Neural Programming
ICLR 2018
Practical Black-box Attacks on Deep Neural Networks using Efficient Query Mechanisms
ECCV 2018
Latent Attention For If-Then Program Synthesis
NIPS 2016
Code-Pointer Integrity
OSDI 2014
Tracking Dynamic Sources of Malicious Activity at Internet Scale
NIPS 2009