Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Understanding Subword Compositionality of Large Language Models EMNLP 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? EMNLP 2025

Cross-Refine: Improving Natural Language Explanation Generation by Learning in Tandem COLING 2025

ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations EMNLP 2025

Rethinking Backdoor Detection Evaluation for Language Models EMNLP 2025

Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers? COLING 2025

Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents EMNLP 2025

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth EMNLP 2025

Language Models Encode the Value of Numbers Linearly COLING 2025

CAST: Cross-modal Alignment Similarity Test for Vision Language Models COLING 2025

CLEV: LLM-Based Evaluation Through Lightweight Efficient Voting for Free-Form Question-Answering IJCNLP 2025

Unveiling the Influence of Amplifying Language-Specific Neurons IJCNLP 2025

Isolating Culture Neurons in Multilingual Large Language Models IJCNLP 2025

Learning from Hallucinations: Mitigating Hallucinations in LLMs via Internal Representation Intervention IJCNLP 2025

Moral Self-correction is Not An Innate Capability in Language Models IJCNLP 2025

Surprisal Dynamics for the Detection of Multi-Word Expressions in English IJCNLP 2025

Structured Outputs in Prompt Engineering: Enhancing LLM Adaptability on Counterintuitive Instructions IJCNLP 2025

Improving Explainable Fact-Checking with Claim-Evidence Correlations COLING 2025

Multilingual Political Views of Large Language Models: Identification and Steering IJCNLP 2025

Tree-of-Quote Prompting Improves Factuality and Attribution in Multi-Hop and Medical Reasoning EMNLP 2025

AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender EMNLP 2025

Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach WACV 2025

HalluCounter: Reference-free LLM Hallucination Detection in the Wild! IJCNLP 2025

Modular Arithmetic: Language Models Solve Math Digit by Digit IJCNLP 2025

To Generate or Discriminate? Methodological Considerations for Measuring Cultural Alignment in LLMs IJCNLP 2025