Suraj Srinivas

13 papers · 2018–2026 · 5 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (4) 🐝 Cross-Pollinator (13)

🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🐝 Cross-Pollinator (13) 🔥 Unstoppable (5) 💎 Century Club (12) ❓ The Questioner (3) 🗃️ Keyword Collector (59)

Conferences

NIPS (7) ICML (2) UAI (2) EACL (1) ICLR (1)

Top co-authors

Himabindu Lakkaraju (8) Francois Fleuret (4) Usha Bhalla (3) Tessa Han (2) Sebastian Bordt (2) Marwa El Halabi (1) Flavio P. Calmon (1) Aaron J. Li (1) Ulrike Von Luxburg (1) Alex Oesterling (1)

Keywords

adversarial robustness (2) neural network (2) model interpretability (2) representation learning (1) transfer learning (1) domain adaptation (1) model robustness (1) embedding learning (1) feature disentanglement (1) knowledge distillation (1) batch normalization (1) multimodal learning (1) model architecture (1) explainable ai (1) concept representation (1) adversarial training (1) feature attribution (1) model editing (1) function approximation (1) sparse recovery (1)

Papers

Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders EACL 2026 How Much Can We Forget about Data Contamination? ICML 2025 Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) NIPS 2024 Characterizing Data Point Vulnerability as Average-Case Robustness UAI 2024 On Minimizing the Impact of Dataset Shifts on Actionable Explanations UAI 2023 Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability NIPS 2023 Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness NIPS 2023 Data-Efficient Structured Pruning via Submodular Optimization NIPS 2022 Efficient Training of Low-Curvature Neural Networks NIPS 2022 Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations NIPS 2022 Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability ICLR 2021 Full-Gradient Representation for Neural Network Visualization NIPS 2019 Knowledge Transfer with Jacobian Matching ICML 2018