Zhun Wang
4 papers · 2024–2025 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+1 more ↓ Show less ↑
π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (4) π Cross-Pollinator (14) πΊοΈ Taxonomy Completionist (11)
π₯
Mega-Team
(25)
Conferences
ACL (1)
EMNLP (1)
ICLR (1)
NIPS (1)
Top co-authors
Keywords
black-box optimization
(1)
text-to-image generation
(1)
machine unlearning
(1)
model interpretability
(1)
safety alignment
(1)
model alignment
(1)
generative model
(1)
diffusion model
(1)
copyright concern
(1)
generative model alignment
(1)
cosine similarity
(1)
llm agent
(1)
activation steering
(1)
indirect prompt injection
(1)
refusal detection
(1)
knowledge purging
(1)
Papers
COSMIC: Generalized Refusal Direction Identification in LLM Activations
ACL 2025
AGENTVIGIL: Automatic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents
EMNLP 2025
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
ICLR 2025
Boosting Alignment for Post-Unlearning Text-to-Image Generative Models
NIPS 2024