Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Smaller Large Language Models Can Do Moral Self-Correction NAACL 2025

Error Detection for Multimodal Classification NAACL 2025

Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models NAACL 2025

Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads NAACL 2025

A Calibrated Reflection Approach for Enhancing Confidence Estimation in LLMs NAACL 2025

Evaluating Design Choices in Verifiable Generation with Open-source Models NAACL 2025

Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings NAACL 2025

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression EMNLP 2025

Evaluating Intermediate Reasoning of Code-Assisted Large Language Models for Mathematics ACL 2025

Enhancing Automated Interpretability with Output-Centric Feature Descriptions ACL 2025

Multi-View Collaborative Learning Network for Speech Deepfake Detection AAAI 2025

Do Large Language Models Truly Grasp Addition? A Rule-Focused Diagnostic Using Two-Integer Arithmetic EMNLP 2025

SHIFT: Smoothing Hallucinations by Information Flow Tuning for Multimodal Large Language Models ICCV 2025

Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home ACL 2025

Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases ACL 2025

When Truthful Representations Flip Under Deceptive Instructions? EMNLP 2025

Critical Forgetting-Based Multi-Scale Disentanglement for Deepfake Detection AAAI 2025

Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models WACV 2025

SQUAB: Evaluating LLM robustness to Ambiguous and Unanswerable Questions in Semantic Parsing EMNLP 2025

Looking in the Mirror: A Faithful Counterfactual Explanation Method for Interpreting Deep Image Classification Models ICCV 2025

On the Generalization of Representation Uncertainty in Earth Observation ICCV 2025

TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models ICCV 2025

Region-Level Data Attribution for Text-to-Image Generative Models ICCV 2025

Long-Form Information Alignment Evaluation Beyond Atomic Facts EMNLP 2025

LATTE: Learning to Think with Vision Specialists EMNLP 2025