Himabindu Lakkaraju
43 papers · 2016–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Conference Polyglot (7) π Academic Marathon (9) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (13)
πΊοΈ
Taxonomy Completionist
(51)
π§
Keyword Pioneer
π
Conference Polyglot
(7)
π¬
Deep Specialist
(20)
π
Keyword Champion
(2)
π
Triple Crown
π
Grand Slam
π§¬
Topic Evolution
π₯
Unstoppable
(6)
β‘
Prolific Year
(7)
π
Century Club
(40)
β
The Questioner
(3)
ποΈ
Keyword Collector
(147)
Conferences
NIPS (16)
AISTATS (6)
ICML (6)
ICLR (4)
UAI (4)
NAACL (3)
ACL (2)
AAAI (1)
EACL (1)
Top co-authors
Research topics
Keywords
counterfactual explanation
(5)
large language model
(5)
feature attribution
(4)
adversarial training
(4)
model interpretability
(4)
post hoc explanation
(4)
adversarial robustness
(4)
uncertainty quantification
(3)
algorithmic recourse
(3)
algorithmic fairness
(3)
benchmark evaluation
(2)
chain-of-thought prompting
(2)
right to be forgotten
(2)
post-hoc explanation
(2)
data deletion
(2)
generative model
(2)
interpretable model
(2)
causal inference
(2)
in-context learning
(2)
explainable ai
(2)
Papers
Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders
EACL 2026
Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
ACL 2026
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior
ACL 2026
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
ICLR 2025
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
NAACL 2025
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
ICLR 2025
Quantifying Generalization Complexity for Large Language Models
ICLR 2025
In-Context Unlearning: Language Models as Few-Shot Unlearners
ICML 2024
MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models
NIPS 2024
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
NIPS 2024
Quantifying Uncertainty in Natural Language Explanations of Large Language Models
AISTATS 2024
Fair Machine Unlearning: Data Removal while Mitigating Disparities
AISTATS 2024
Understanding the Effects of Iterative Prompting on Truthfulness
ICML 2024
Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications
NAACL 2024
A Study on the Calibration of In-context Learning
NAACL 2024
Characterizing Data Point Vulnerability as Average-Case Robustness
UAI 2024
Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten
ICML 2023
$\mathcal{M}^4$: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities and Models
NIPS 2023
On the Privacy Risks of Algorithmic Recourse
AISTATS 2023
On Minimizing the Impact of Dataset Shifts on Actionable Explanations
UAI 2023
Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse
ICLR 2023
Post Hoc Explanations of Language Models Can Improve Language Models
NIPS 2023
Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability
NIPS 2023
On the Impact of Algorithmic Recourse on Social Segregation
ICML 2023
Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness
NIPS 2023
Efficient Training of Low-Curvature Neural Networks
NIPS 2022
OpenXAI: Towards a Transparent Evaluation of Model Explanations
NIPS 2022
Data poisoning attacks on off-policy policy evaluation methods
UAI 2022
Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods
AISTATS 2022
Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis
AISTATS 2022
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations
NIPS 2022
Towards a unified framework for fair and stable graph representation learning
UAI 2021
Learning Models for Actionable Recourse
NIPS 2021
Towards Robust and Reliable Algorithmic Recourse
NIPS 2021
Towards the Unification and Robustness of Perturbation and Gradient Based Explanations
ICML 2021
Counterfactual Explanations Can Be Manipulated
NIPS 2021
Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
NIPS 2021
Fair Influence Maximization: a Welfare Optimization Approach
AAAI 2021
Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses
NIPS 2020
Robust and Stable Black Box Explanations
ICML 2020
Incorporating Interpretable Output Constraints in Bayesian Neural Networks
NIPS 2020
Learning Cost-Effective and Interpretable Treatment Regimes
AISTATS 2017
Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making
NIPS 2016