Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Neuron-Level Differentiation of Memorization and Generalization in Large Language Models EMNLP 2025

Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs EMNLP 2025

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries EMNLP 2025

Enhancing Chain-of-Thought Reasoning via Neuron Activation Differential Analysis EMNLP 2025

When Truthful Representations Flip Under Deceptive Instructions? EMNLP 2025

Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild EMNLP 2025

ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models EMNLP 2025

PychoAgent: Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster Events EMNLP 2025

GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models EMNLP 2025

V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models EMNLP 2025

Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References? EMNLP 2025

Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs EMNLP 2025

BANMIME : Misogyny Detection with Metaphor Explanation on Bangla Memes EMNLP 2025

SQUAB: Evaluating LLM robustness to Ambiguous and Unanswerable Questions in Semantic Parsing EMNLP 2025

UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging EMNLP 2025

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld’s Episode Theory EMNLP 2025

Towards a Unified Paradigm of Concept Editing in Large Language Models EMNLP 2025

FacLens: Transferable Probe for Foreseeing Non-Factuality in Fact-Seeking Question Answering of Large Language Models EMNLP 2025

Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups EMNLP 2025

Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework EMNLP 2025

ReSURE: Regularizing Supervision Unreliability for Multi-turn Dialogue Fine-tuning EMNLP 2025

Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions EMNLP 2025

Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment EMNLP 2025

A Graph-Theoretical Framework for Analyzing the Behavior of Causal Language Models EMNLP 2025

Developing a Reliable, Fast, General-Purpose Hallucination Detection and Mitigation Service NAACL 2025