Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models NAACL 2025

ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models EMNLP 2025

SafeQuant: LLM Safety Analysis via Quantized Gradient Inspection NAACL 2025

Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability Prediction EMNLP 2025

Who’s the Author? How Explanations Impact User Reliance in AI-Assisted Authorship Attribution EMNLP 2025

Neural Reasoning Networks: Efficient Interpretable Neural Networks with Automatic Textual Explanations AAAI 2025

Things Machine Learning Models Know That They Don’t Know AAAI 2025

Interpretable Sparse Features for Probing Self-Supervised Speech Models IJCNLP 2025

Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge IJCNLP 2025

Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency WACV 2025

MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps NAACL 2025

Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering IJCNLP 2025

WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification EMNLP 2025

IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages? IJCNLP 2025

CLEV: LLM-Based Evaluation Through Lightweight Efficient Voting for Free-Form Question-Answering IJCNLP 2025

Calibration Across Layers: Understanding Calibration Evolution in LLMs EMNLP 2025

To Generate or Discriminate? Methodological Considerations for Measuring Cultural Alignment in LLMs IJCNLP 2025

Read Between the Lines: A Benchmark for Uncovering Political Bias in Bangla News Articles IJCNLP 2025

The discordance between embedded ethics and cultural inference in large language models EMNLP 2025

SHARP: Steering Hallucination in LVLMs via Representation Engineering EMNLP 2025

Surprisal Dynamics for the Detection of Multi-Word Expressions in English IJCNLP 2025

Unveiling the Influence of Amplifying Language-Specific Neurons IJCNLP 2025

Multi-Domain Explainability of Preferences EMNLP 2025

Learning from Hallucinations: Mitigating Hallucinations in LLMs via Internal Representation Intervention IJCNLP 2025

Modular Arithmetic: Language Models Solve Math Digit by Digit IJCNLP 2025