Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions EACL 2026

Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models EACL 2026

How Do LLMs Generate Contrastive Sentiments? A Mechanistic Perspective EACL 2026

Evidential Semantic Entropy for LLM Uncertainty Quantification EACL 2026

Tandem Training for Language Models EACL 2026

Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation EACL 2026

Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning Models EACL 2026

Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs EACL 2026

CHiRPE: A Step Towards Real-World Clinical NLP with Clinician-Oriented Model Explanations EACL 2026

LLMs Know More About Numbers than They Can Say EACL 2026

Simplifying Outcomes of Language Model Component Analyses with ELIA EACL 2026

Similar, but why? A Toolkit for Explaining Text Similarity EACL 2026

RAGVUE: A Diagnostic View for Explainable and Automated Evaluation of Retrieval-Augmented Generation EACL 2026

Thesis proposal: COGNILENS: Analyzing Cognitive Decline in Language Models for Alzheimer’s Monitoring EACL 2026

From Sentences to Proof Trees: Leveraging Language Models for Structured Reasoning EACL 2026

SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models EACL 2026

Benchmarking and Mitigating the Impact of Noisy User Prompts in Medical VLMs via Cross-Modal Reflection EACL 2026

Cognitive Effects and Biases in Large Language Models EACL 2026

Don’t Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation EACL 2026

Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations EACL 2026

Detection of Adversarial Prompts with Model Predictive Entropy EACL 2026

Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders EACL 2026

Interpretable Graph-Language Modeling for Detecting Youth Illicit Drug Use EACL 2026

Beyond Multiple Choice: Evaluating Steering Vectors for Summarization EACL 2026

How Does Chain of Thought Think? Mechanistic Interpretability of Chain-of-Thought Reasoning with Sparse Autoencoding AAAI 2026