Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

ALiiCE: Evaluating Positional Fine-grained Citation Generation NAACL 2025

PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation WACV 2025

Neurons to Words: A Novel Method for Automated Neural Network Interpretability and Alignment AAAI 2025

Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps EMNLP 2025

ScamNet: Toward Explainable Large Language Model-Based Fraudulent Shopping Website Detection AAAI 2025

Internal Chain-of-Thought: Empirical Evidence for Layer‐wise Subtask Scheduling in LLMs EMNLP 2025

RAP: A Metric for Balancing Repetition and Performance in Open-Source Large Language Models NAACL 2025

What Did I Do Wrong? Quantifying LLMs’ Sensitivity and Consistency to Prompt Engineering NAACL 2025

Understanding Subword Compositionality of Large Language Models EMNLP 2025

An Interpretable and Crosslingual Method for Evaluating Second-Language Dialogues NAACL 2025

Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models NAACL 2025

From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks NAACL 2025

Linguistic and Embedding-Based Profiling of Texts Generated by Humans and Large Language Models EMNLP 2025

Token-Aware Editing of Internal Activations for Large Language Model Alignment EMNLP 2025

RATT: A Thought Structure for Coherent and Correct LLM Reasoning AAAI 2025

What’s in a prompt? Language models encode literary style in prompt embeddings EMNLP 2025

Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models NAACL 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? EMNLP 2025

Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions NAACL 2025

Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs? NAACL 2025

Probe-Free Low-Rank Activation Intervention NAACL 2025

HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection EMNLP 2025

Large Language Models with Reinforcement Learning from Human Feedback Approach for Enhancing Explainable Sexism Detection COLING 2025

Teeth Reconstruction and Performance Capture Using a Phone Camera ICCV 2025

Content-free Logical Modification of Large Language Model by Disentangling and Modifying Logic Representation AAAI 2025