Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

AdvERSEM: Adversarial Robustness Testing and Training of LLM-based Groundedness Evaluators via Semantic Structure Manipulation EMNLP 2025

Ambiguity Detection and Uncertainty Calibration for Question Answering with Large Language Models NAACL 2025

Towards Universal AI-Generated Image Detection by Variational Information Bottleneck Network CVPR 2025

Any-Resolution AI-Generated Image Detection by Spectral Learning CVPR 2025

Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification CVPR 2025

BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology CVPR 2025

Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition CVPR 2025

Language Guided Concept Bottleneck Models for Interpretable Continual Learning CVPR 2025

Interpretable Generative Models through Post-hoc Concept Bottlenecks CVPR 2025

Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads NAACL 2025

A Calibrated Reflection Approach for Enhancing Confidence Estimation in LLMs NAACL 2025

Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models’ Uncertainty? ACL 2025

Leveraging Human Production-Interpretation Asymmetries to Test LLM Cognitive Plausibility ACL 2025

Pattern Recognition or Medical Knowledge? The Problem with Multiple-Choice Questions in Medicine ACL 2025

Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings NAACL 2025

ProgCo: Program Helps Self-Correction of Large Language Models ACL 2025

PEIRCE: Unifying Material and Formal Reasoning via LLM-Driven Neuro-Symbolic Refinement ACL 2025

REVISE: A Framework for Revising OCRed text in Practical Information Systems with Data Contamination Strategy ACL 2025

ConSim: Measuring Concept-Based Explanations’ Effectiveness with Automated Simulatability ACL 2025

From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning ICCV 2025

Enhancing Automated Interpretability with Output-Centric Feature Descriptions ACL 2025

FiRC-NLP at SemEval-2025 Task 3: Exploring Prompting Approaches for Detecting Hallucinations in LLMs ACL 2025

Why AI Is WEIRD and Shouldn't Be This Way: Towards AI for Everyone, with Everyone, by Everyone AAAI 2025

Attributive Reasoning for Hallucination Diagnosis of Large Language Models AAAI 2025

LLaMAs Have Feelings Too: Unveiling Sentiment and Emotion Representations in LLaMA Models Through Probing ACL 2025