Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

MatchXplain: Analyzing Preferences, Explaining Outcomes, and Simplifying Decisions IJCAI 2025

Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference AAAI 2025

uir-cis at SemEval-2025 Task 3: Detection of Hallucinations in Generated Text ACL 2025

TactfulToM: Do LLMs have the Theory of Mind ability to understand White Lies? EMNLP 2025

A Logic of General Attention Using Edge-Conditioned Event Models IJCAI 2025

SFAL: Semantic-Functional Alignment Scores for Distributional Evaluation of Auto-Interpretability in Sparse Autoencoders EMNLP 2025

Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection AAAI 2025

Detecting Omissions in LLM-Generated Medical Summaries EMNLP 2025

Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning. ICCV 2025

Calibrating Large Language Models with Sample Consistency AAAI 2025

Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection AAAI 2025

RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks EMNLP 2025

AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models ICCV 2025

Beyond Accuracy: On the Effects of Fine-Tuning Towards Vision-Language Model’s Prediction Rationality AAAI 2025

Are You Trying to Convince Me or Are You Trying to Deceive Me? Using Argumentation Types to Identify Deceptive News ACL 2025

Benchmarking and Understanding Compositional Relational Reasoning of LLMs AAAI 2025

Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation ICCV 2025

Controlling Equational Reasoning in Large Language Models with Prompt Interventions AAAI 2025

Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding EMNLP 2025

Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs EMNLP 2025

Judge and Improve: Towards a Better Reasoning of Knowledge Graphs with Large Language Models EMNLP 2025

Tree-of-Quote Prompting Improves Factuality and Attribution in Multi-Hop and Medical Reasoning EMNLP 2025

ECC: An Emotion-Cause Conversation Dataset for Empathy Response EMNLP 2025

ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations EMNLP 2025

The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs EMNLP 2025