Dana Arad
8 papers · 2024–2026 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓
🌍
Conference Polyglot
(4)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(15)
👥
Mega-Team
(23)
Conferences
ACL (3)
EMNLP (3)
ICML (1)
NAACL (1)
Top co-authors
Keywords
mechanistic interpretability
(3)
text encoder
(2)
sparse autoencoder
(2)
machine unlearning
(1)
model editing
(1)
model interpretability
(1)
integer programming
(1)
object counting
(1)
diffusion model
(1)
latent representation
(1)
vision-language model
(1)
text-to-image diffusion
(1)
feature decomposition
(1)
attention head
(1)
parameter efficiency
(1)
feature suppression
(1)
model optimization
(1)
circuit discovery
(1)
factual association
(1)
concept unlearning
(1)
Papers
Mechanisms of Prompt-Induced Hallucination in Vision–Language Models
ACL 2026
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
ACL 2026
BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection
EMNLP 2025
Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models
EMNLP 2025
MIB: A Mechanistic Interpretability Benchmark
ICML 2025
SAEs Are Good for Steering – If You Select the Right Features
EMNLP 2025
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
NAACL 2024
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
ACL 2024