Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts EMNLP 2025

A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates ACL 2025

LLMs Don’t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations EMNLP 2025

Exploring Supervised Approaches to the Detection of Anthropomorphic Language in the Reporting of NLP Venues ACL 2025

From Language to Cognition: How LLMs Outgrow the Human Language Network EMNLP 2025

Unsupervised Automatic Short Answer Grading and Essay Scoring: A Weakly Supervised Explainable Approach ACL 2025

Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates EMNLP 2025

SmurfCat at SemEval-2025 Task 3: Bridging External Knowledge and Model Uncertainty for Enhanced Hallucination Detection ACL 2025

GraphProt: Certified Black-Box Shielding Against Backdoored Graph Models IJCAI 2025

HalluSearch at SemEval-2025 Task 3: A Search-Enhanced RAG Pipeline for Hallucination Detection ACL 2025

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning EMNLP 2025

CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization ACL 2025

Self-calibration Enhanced Whole Slide Pathology Image Analysis IJCAI 2025

Com2 : A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models ACL 2025

Unveiling the Influence of Amplifying Language-Specific Neurons IJCNLP 2025

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? ACL 2025

RPMIL: Rethinking Uncertainty-Aware Probabilistic Multiple Instance Learning for Whole Slide Pathology Diagnosis IJCAI 2025

Rubrik’s Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset ACL 2025

Enhancing Trustworthiness of Graph Neural Networks with Rank-Based Conformal Training AAAI 2025

On the Consistency of Commonsense in Large Language Models ACL 2025

Circuit-Aware d-DNNF Compilation IJCAI 2025

STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond ACL 2025

Read Between the Lines: A Benchmark for Uncovering Political Bias in Bangla News Articles IJCNLP 2025

Understanding the Dark Side of LLMs’ Intrinsic Self-Correction ACL 2025

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries EMNLP 2025