Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Beyond Checkmate: Exploring the Creative Choke Points for AI Generated Texts EMNLP 2025

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation EMNLP 2025

ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement EMNLP 2025

Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression EMNLP 2025

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance EMNLP 2025

Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders EMNLP 2025

How do autoregressive transformers solve full addition? EMNLP 2025

Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation EMNLP 2025

Do Large Language Models Truly Grasp Addition? A Rule-Focused Diagnostic Using Two-Integer Arithmetic EMNLP 2025

The Impact of Negated Text on Hallucination with Large Language Models EMNLP 2025

MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media EMNLP 2025

Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE EMNLP 2025

Transparent and Coherent Procedural Mistake Detection EMNLP 2025

Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models EMNLP 2025

SHARP: Steering Hallucination in LVLMs via Representation Engineering EMNLP 2025

Multi-Domain Explainability of Preferences EMNLP 2025

WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification EMNLP 2025

Calibration Across Layers: Understanding Calibration Evolution in LLMs EMNLP 2025

The discordance between embedded ethics and cultural inference in large language models EMNLP 2025

SSA: Semantic Contamination of LLM-Driven Fake News Detection EMNLP 2025

Evaluating Taxonomy Free Character Role Labeling (TF-CRL) in News Stories using Large Language Models EMNLP 2025

KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models EMNLP 2025

Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis EMNLP 2025

Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks EMNLP 2025

Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning NAACL 2025