Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning ACL 2025

Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models ACL 2025

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification ACL 2025

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers ACL 2025

Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning NAACL 2025

Emergent Wisdom at BEA 2025 Shared Task: From Lexical Understanding to Reflective Reasoning for Pedagogical Ability Assessment ACL 2025

Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models EMNLP 2025

CafGa: Customizing Feature Attributions to Explain Language Models EMNLP 2025

Exploiting Contextual Knowledge in LLMs through 𝒱-usable Information based Layer Enhancement ACL 2025

𝛿-Stance: A Large-Scale Real World Dataset of Stances in Legal Argumentation ACL 2025

Emergence of symbolic abstraction heads for in-context learning in large language models COLING 2025

SocialEval: Evaluating Social Intelligence of Large Language Models ACL 2025

ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries ACL 2025

Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference NAACL 2025

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers NAACL 2025

DeepReview: Improving LLM-based Paper Review with Human-like Deep Thinking Process ACL 2025

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models ACL 2025

TimelyMed: AI-Driven Clinical Course Attribution and Temporal Mapping for Psychiatric Medical Records IJCAI 2025

Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling ACL 2025

Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes ACL 2025

Efficient Rectification of Neuro-Symbolic Reasoning Inconsistencies by Abductive Reflection (Extended Abstract) IJCAI 2025

Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models NAACL 2025

FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks EMNLP 2025

FOCUS: Evaluating Pre-trained Vision-Language Models on Underspecification Reasoning ACL 2025

Exposing the Achilles’ Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning ACL 2025