Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic EMNLP 2025

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation EMNLP 2025

Ask-Before-Detection: Identifying and Mitigating Conformity Bias in LLM-Powered Error Detector for Math Word Problem Solutions ACL 2025

Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs EMNLP 2025

Disentangling Memory and Reasoning Ability in Large Language Models ACL 2025

A Practical Method for Generating String Counterfactuals NAACL 2025

Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models NAACL 2025

Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection NAACL 2025

Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax NAACL 2025

From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs NAACL 2025

Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning NAACL 2025

On the Feasibility of In-Context Probing for Data Attribution NAACL 2025

Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs NAACL 2025

Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference NAACL 2025

MedThink: A Rationale-Guided Framework for Explaining Medical Visual Question Answering NAACL 2025

Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning NAACL 2025

Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks AAAI 2025

Explainable ICD Coding via Entity Linking NAACL 2025

Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models NAACL 2025

Bridging the Faithfulness Gap in Prototypical Models NAACL 2025

Error Reflection Prompting: Can Large Language Models Successfully Understand Errors? NAACL 2025

Probing Internal Representations of Multi-Word Verbs in Large Language Models NAACL 2025

The AI Co-Ethnographer: How Far Can Automation Take Qualitative Research? NAACL 2025

VLG-BERT: Towards Better Interpretability in LLMs through Visual and Linguistic Grounding NAACL 2025

Learning About Algorithm Auditing in Five Steps: Scaffolding How High School Youth Can Systematically and Critically Evaluate Machine Learning Applications AAAI 2025