Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles EMNLP 2025

Explanation Regularisation through the Lens of Attributions COLING 2025

Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty? COLING 2025

Who’s the Author? How Explanations Impact User Reliance in AI-Assisted Authorship Attribution EMNLP 2025

Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making EMNLP 2025

How well do LLMs reason over tabular data, really? ACL 2025

Are Language Models Consequentialist or Deontological Moral Reasoners? EMNLP 2025

Learning Visual-Semantic Hierarchical Attribute Space for Interpretable Open-Set Recognition WACV 2025

ReX: A Framework for Incorporating Temporal Information in Model-Agnostic Local Explanation Techniques AAAI 2025

Sneaking Syntax into Transformer Language Models with Tree Regularization NAACL 2025

CAMIEval: Enhancing NLG Evaluation through Multidimensional Comparative Instruction-Following Analysis NAACL 2025

Where Confabulation Lives: Latent Feature Discovery in LLMs EMNLP 2025

Analyzing Memorization in Large Language Models through the Lens of Model Attribution NAACL 2025

One fish, two fish, but not the whole sea: Alignment reduces language models’ conceptual diversity NAACL 2025

The Stochastic Parrot on LLM’s Shoulder: A Summative Assessment of Physical Concept Understanding NAACL 2025

REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models SEMEVAL 2025

Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild EMNLP 2025

UZH at SemEval-2025 Task 3: Token-Level Self-Consistency for Hallucination Detection SEMEVAL 2025

SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models NAACL 2025

NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT SEMEVAL 2025

PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims EMNLP 2025

ATLANTIS at SemEval-2025 Task 3 : Detecting Hallucinated Text Spans in Question Answering SEMEVAL 2025

FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs NAACL 2025

FENJI at SemEval-2025 Task 3: Retrieval-Augmented Generation and Hallucination Span Detection SEMEVAL 2025

Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models NAACL 2025