Been Kim

26 papers · 2014–2025 · 4 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🏃 Academic Marathon (11) 🌍 Conference Polyglot (4) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (4) 🌟 Keyword Trendsetter Combo (5) 👑 Triple Crown 🏆 Keyword Champion (3) 🔬 Deep Specialist (12) 🧬 Topic Evolution 🗃️ Keyword Collector (90) 💎 Century Club (26) ❓ The Questioner 🚀 Conference Pioneer ⚡ Prolific Year (5) 📈 Trend Setter

Conferences

NIPS (15) ICML (6) ICLR (4) AISTATS (1)

Top co-authors

Michael Muelly (3) Julius Adebayo (3) Pieter-jan Kindermans (2) Justin Gilmer (2) Robert Geirhos (2) Asma Ghandeharioun (2) Julie A Shah (2) Rajiv Khanna (2) Martin Wattenberg (2) James Wexler (2)

Keywords

feature importance (4) model interpretability (3) concept-based explanation (3) generative model (2) image classification (2) neural network interpretability (2) out-of-distribution detection (2) feature attribution (2) saliency map (2) saliency method (2) neural network (2) prototype learning (2) knowledge editing (1) sequential decision making (1) feature extraction (1) epistemic uncertainty (1) variational inference (1) uncertainty quantification (1) feature selection (1) explainable ai (1)

Papers

How new data permeates LLM knowledge and how to dilute it ICLR 2025 Position: We Can’t Understand AI Using our Existing Vocabulary ICML 2025 Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty ICML 2025 Don’t trust your eyes: on the (un)reliability of feature visualizations ICML 2024 Gaussian Process Probes (GPP) for Uncertainty-Aware Probing NIPS 2023 Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models NIPS 2023 State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding NIPS 2023 On the Relationship Between Explanation and Prediction: A Causal View ICML 2023 DISSECT: Disentangled Simultaneous Explanations via Concept Traversals ICLR 2022 Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis NIPS 2022 Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation ICLR 2022 Debugging Tests for Model Explanations NIPS 2020 On Completeness-aware Concept-Based Explanations in Deep Neural Networks NIPS 2020 Concept Bottleneck Models ICML 2020 Towards Automatic Concept-based Explanations NIPS 2019 Interpreting Black Box Predictions using Fisher Kernels AISTATS 2019 Visualizing and Measuring the Geometry of BERT NIPS 2019 A Benchmark for Interpretability Methods in Deep Neural Networks NIPS 2019 Learning how to explain neural networks: PatternNet and PatternAttribution ICLR 2018 Sanity Checks for Saliency Maps NIPS 2018 Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) ICML 2018 Human-in-the-Loop Interpretability Prior NIPS 2018 To Trust Or Not To Trust A Classifier NIPS 2018 Examples are not enough, learn to criticize! Criticism for Interpretability NIPS 2016 Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction NIPS 2015 The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification NIPS 2014