Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Explainability and Interpretability of Multilingual Large Language Models: A Survey EMNLP 2025

DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models EMNLP 2025

EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models EMNLP 2025

Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models EMNLP 2025

Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary EMNLP 2025

MMAG: Multimodal Learning for Mucus Anomaly Grading in Nasal Endoscopy via Semantic Attribute Prompting EMNLP 2025

“I’ve Decided to Leak”: Probing Internals Behind Prompt Leakage Intents EMNLP 2025

Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety EMNLP 2025

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon EMNLP 2025

Unsupervised Hallucination Detection by Inspecting Reasoning Processes EMNLP 2025

Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents EMNLP 2025

Understanding Subword Compositionality of Large Language Models EMNLP 2025

Internal Chain-of-Thought: Empirical Evidence for Layer‐wise Subtask Scheduling in LLMs EMNLP 2025

Linguistic and Embedding-Based Profiling of Texts Generated by Humans and Large Language Models EMNLP 2025

WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? EMNLP 2025

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth EMNLP 2025

FLARE: Faithful Logic-Aided Reasoning and Exploration EMNLP 2025

RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs EMNLP 2025

What’s in a prompt? Language models encode literary style in prompt embeddings EMNLP 2025

Identifying and Answering Questions with False Assumptions: An Interpretable Approach EMNLP 2025

LLMs Don’t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations EMNLP 2025

From Language to Cognition: How LLMs Outgrow the Human Language Network EMNLP 2025

Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates EMNLP 2025

AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender EMNLP 2025

LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers ACL 2025