Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
EMNLP 2025
Is OpenVLA Truly Robust? A Systematic Evaluation of Positional Robustness
AACL 2025
Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization
EMNLP 2025
Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
EMNLP 2025
Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation
AACL 2025
An Analysis of Large Language Models for Simulating User Responses in Surveys
AACL 2025
ExDDI: Explaining Drug-Drug Interaction Predictions with Natural Language
AAAI 2025
Probing and Boosting Large Language Models Capabilities via Attention Heads
EMNLP 2025
Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks
AAAI 2025
Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25
EMNLP 2025
Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding
EMNLP 2025
Extracting Linguistic Information from Large Language Models: Syntactic Relations and Derivational Knowledge
EMNLP 2025
Understanding and Controlling Repetition Neurons and Induction Heads in In-Context Learning
AACL 2025
Interpreting the Effects of Quantization on LLMs
AACL 2025
uir-cis at SemEval-2025 Task 3: Detection of Hallucinations in Generated Text
SEMEVAL 2025
HalluCounter: Reference-free LLM Hallucination Detection in the Wild!
IJCNLP 2025
SmurfCat at SHROOM-CAP: Factual but Awkward? Fluent but Wrong? Tackling Both in LLM Scientific QA
IJCNLP 2025
Explaining in Diffusion: Explaining a Classifier with Diffusion Semantics
CVPR 2025
Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution
AAAI 2025
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI
CVPR 2025
Do Transformer Interpretability Methods Transfer to RNNs?
AAAI 2025
Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow
AAAI 2025
Attention IoU: Examining Biases in CelebA using Attention Maps
CVPR 2025
Explainable Ethical Assessment on Human Behaviors by Generating Conflicting Social Norms
AACL 2025
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
CVPR 2025
<
1
…
42
43
44
…
293
>