Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning EMNLP 2025

Is OpenVLA Truly Robust? A Systematic Evaluation of Positional Robustness AACL 2025

Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization EMNLP 2025

Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts EMNLP 2025

Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation AACL 2025

An Analysis of Large Language Models for Simulating User Responses in Surveys AACL 2025

ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language AAAI 2025

Probing and Boosting Large Language Models Capabilities via Attention Heads EMNLP 2025

Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks AAAI 2025

Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25 EMNLP 2025

Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding EMNLP 2025

Extracting Linguistic Information from Large Language Models: Syntactic Relations and Derivational Knowledge EMNLP 2025

Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning AACL 2025

Interpreting the Effects of Quantization on LLMs AACL 2025

uir-cis at SemEval-2025 Task 3: Detection of Hallucinations in Generated Text SEMEVAL 2025

HalluCounter: Reference-free LLM Hallucination Detection in the Wild! IJCNLP 2025

SmurfCat at SHROOM-CAP: Factual but Awkward? Fluent but Wrong? Tackling Both in LLM Scientific QA IJCNLP 2025

Explaining in Diffusion: Explaining a Classifier with Diffusion Semantics CVPR 2025

Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution AAAI 2025

Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI CVPR 2025

Do Transformer Interpretability Methods Transfer to RNNs? AAAI 2025

Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow AAAI 2025

Attention IoU: Examining Biases in CelebA using Attention Maps CVPR 2025

Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms AACL 2025

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly CVPR 2025