Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Smaller Large Language Models Can Do Moral Self-Correction
NAACL 2025
Error Detection for Multimodal Classification
NAACL 2025
Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models
NAACL 2025
Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads
NAACL 2025
A Calibrated Reflection Approach for Enhancing Confidence Estimation in LLMs
NAACL 2025
Evaluating Design Choices in Verifiable Generation with Open-source Models
NAACL 2025
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings
NAACL 2025
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression
EMNLP 2025
Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics
ACL 2025
Enhancing Automated Interpretability with Output-Centric Feature Descriptions
ACL 2025
Multi-View Collaborative Learning Network for Speech Deepfake Detection
AAAI 2025
Do Large Language Models Truly Grasp Addition? A Rule-Focused Diagnostic Using Two-Integer Arithmetic
EMNLP 2025
SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models
ICCV 2025
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
ACL 2025
Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases
ACL 2025
When Truthful Representations Flip Under Deceptive Instructions?
EMNLP 2025
Critical Forgetting-Based Multi-Scale Disentanglement for Deepfake Detection
AAAI 2025
Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models
WACV 2025
SQUAB: Evaluating LLM robustness to Ambiguous and Unanswerable Questions in Semantic Parsing
EMNLP 2025
Looking in the Mirror: A Faithful Counterfactual Explanation Method for Interpreting Deep Image Classification Models
ICCV 2025
On the Generalization of Representation Uncertainty in Earth Observation
ICCV 2025
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
ICCV 2025
Region-Level Data Attribution for Text-to-Image Generative Models
ICCV 2025
Long-Form Information Alignment Evaluation Beyond Atomic Facts
EMNLP 2025
LATTE: Learning to Think with Vision Specialists
EMNLP 2025
<
1
…
31
32
33
…
293
>