conftrace_

Dana Arad

8 papers · 2024–2026 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

🌍 Conference Polyglot (4) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (15) 👥 Mega-Team (23)

Conferences

ACL (3) EMNLP (3) ICML (1) NAACL (1)

Top co-authors

Yonatan Belinkov (8) Aaron Mueller (4) Martin Tutek (3) Hadas Orgad (3) Yaniv Nikankin (2) Anja Reusch (2) Jaden Fried Fiotto-Kaufman (1) Adam Belfki (1) Sarah Wiegreffe (1) Hosein Mohebbi (1)

Keywords

mechanistic interpretability (3) text encoder (2) sparse autoencoder (2) machine unlearning (1) model editing (1) model interpretability (1) integer programming (1) object counting (1) diffusion model (1) latent representation (1) vision-language model (1) text-to-image diffusion (1) feature decomposition (1) attention head (1) parameter efficiency (1) feature suppression (1) model optimization (1) circuit discovery (1) factual association (1) concept unlearning (1)

Papers

Mechanisms of Prompt-Induced Hallucination in Vision–Language Models ACL 2026 CRISP: Persistent Concept Unlearning via Sparse Autoencoders ACL 2026 BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection EMNLP 2025 Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models EMNLP 2025 MIB: A Mechanistic Interpretability Benchmark ICML 2025 SAEs Are Good for Steering – If You Select the Right Features EMNLP 2025 ReFACT: Updating Text-to-Image Models by Editing the Text Encoder NAACL 2024 Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines ACL 2024