Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models EACL 2026

Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models EACL 2026

Detecting (Un)answerability in Large Language Models with Linear Directions EACL 2026

Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities EACL 2026

Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework EACL 2026

How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains EACL 2026

SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine EACL 2026

Mind the Gap: Benchmarking LLM Uncertainty and Calibration with Specialty-Aware Clinical QA and Reasoning-Based Behavioural Features EACL 2026

Can Activation Steering Generalize Across Languages? A Study on Syllogistic Reasoning in Language Models EACL 2026

Safe-Unsafe Concept Separation Emerges from a Single Direction in Language Models Activation Space EACL 2026

When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training EACL 2026

A Unified View on Emotion Representation in Large Language Models EACL 2026

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models EACL 2026

Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering EACL 2026

FaithLM: Towards Faithful Explanations for Large Language Models EACL 2026

Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking EACL 2026

HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations EACL 2026

Attribution-Guided Multi-Object Hallucination and Bias Detection in Vision-Language Models EACL 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs EACL 2026

Knowing the Facts but Choosing the Shortcut: Understanding How Large Language Models Compare Entities EACL 2026

Recursive numeral systems are highly regular and easy to process EACL 2026

MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval EACL 2026

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding EACL 2026

Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs EACL 2026

FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models AAAI 2026