conftrace_

Zhun Wang

4 papers · 2024–2025 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+1 more ↓

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (4) 🐝 Cross-Pollinator (14) 🗺️ Taxonomy Completionist (11)

👥 Mega-Team (25)

Conferences

ACL (1) EMNLP (1) ICLR (1) NIPS (1)

Top co-authors

Dawn Song (4) Xuandong Zhao (2) Vincent Siu (2) Chenguang Wang (2) Zihao Yu (1) Nicholas Crispino (1) Dan Hendrycks (1) Jiawei Zhang (1) Francesco Pinto (1) Myeongseob Ko (1)

Keywords

black-box optimization (1) text-to-image generation (1) machine unlearning (1) model interpretability (1) safety alignment (1) model alignment (1) generative model (1) diffusion model (1) copyright concern (1) generative model alignment (1) cosine similarity (1) llm agent (1) activation steering (1) indirect prompt injection (1) refusal detection (1) knowledge purging (1)

Papers

COSMIC: Generalized Refusal Direction Identification in LLM Activations ACL 2025 AGENTVIGIL: Automatic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents EMNLP 2025 MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models ICLR 2025 Boosting Alignment for Post-Unlearning Text-to-Image Generative Models NIPS 2024