Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities EACL 2026

Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework EACL 2026

How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains EACL 2026

SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine EACL 2026

Mind the Gap: Benchmarking LLM Uncertainty and Calibration with Specialty-Aware Clinical QA and Reasoning-Based Behavioural Features EACL 2026

Can Activation Steering Generalize Across Languages? A Study on Syllogistic Reasoning in Language Models EACL 2026

Safe-Unsafe Concept Separation Emerges from a Single Direction in Language Models Activation Space EACL 2026

When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training EACL 2026

A Unified View on Emotion Representation in Large Language Models EACL 2026

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models EACL 2026

Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering EACL 2026

FaithLM: Towards Faithful Explanations for Large Language Models EACL 2026

Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking EACL 2026

HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations EACL 2026

Attribution-Guided Multi-Object Hallucination and Bias Detection in Vision-Language Models EACL 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs EACL 2026

Knowing the Facts but Choosing the Shortcut: Understanding How Large Language Models Compare Entities EACL 2026

Recursive numeral systems are highly regular and easy to process EACL 2026

MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval EACL 2026

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding EACL 2026

Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs EACL 2026

Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis EACL 2026

Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions EACL 2026

Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models EACL 2026

Feature Drift: How Fine-Tuning Repurposes Representations in LLMs EACL 2026