Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

A Causal Lens for Evaluating Faithfulness Metrics EMNLP 2025

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning EMNLP 2025

Where Confabulation Lives: Latent Feature Discovery in LLMs EMNLP 2025

Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores EMNLP 2025

Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs EMNLP 2025

Improving Rule-based Reasoning in LLMs using Neurosymbolic Representations EMNLP 2025

Are Language Models Consequentialist or Deontological Moral Reasoners? EMNLP 2025

PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims EMNLP 2025

All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens EMNLP 2025

Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding EMNLP 2025

AutoCT: Automating Interpretable Clinical Trial Prediction with LLM Agents EMNLP 2025

Beyond the Leaderboard: Understanding Performance Disparities in Large Language Models via Model Diffing EMNLP 2025

Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability Prediction EMNLP 2025

ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom EMNLP 2025

Extractive Fact Decomposition for Interpretable Natural Language Inference in one Forward Pass EMNLP 2025

All Roads Lead to Rome: Graph-Based Confidence Estimation for Large Language Model Reasoning EMNLP 2025

How Persuasive Is Your Context? EMNLP 2025

Reasoning under Uncertainty: Efficient LLM Inference via Unsupervised Confidence Dilution and Convergent Adaptive Sampling EMNLP 2025

REVIVING YOUR MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing EMNLP 2025

Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions EMNLP 2025

Discursive Circuits: How Do Language Models Understand Discourse Relations? EMNLP 2025

Language Models Identify Ambiguities and Exploit Loopholes EMNLP 2025

AraEval: An Arabic Multi-Task Evaluation Suite for Large Language Models EMNLP 2025

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation EMNLP 2025

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought CVPR 2025