Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records AAAI 2026

Semantic Volume: Quantifying and Detecting Both External and Internal Uncertainty in LLMs AAAI 2026

Mitigating Hallucinations in Large Language Models via Causal Reasoning AAAI 2026

Focusing on Language: Revealing and Exploiting Language Attention Heads in Multilingual Large Language Models AAAI 2026

OX-MABSR: A Benchmark for Open-domain Explainable Multimodal Aspect-Based Sentiment Reasoning AAAI 2026

MirrorShield: Towards Dynamic Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting AAAI 2026

Positional Cognitive Specialization: Where Do LLMs Learn to Comprehend and Speak Your Language? AAAI 2026

Beyond Plain Demos: A Demo-Centric Anchoring Paradigm for In-Context Learning in Alzheimer’s Disease Detection AAAI 2026

Bridging the Language Gap: Uncovering and Aligning Shared Circuits for Multi-Hop Reasoning in Multilingual LLMs AAAI 2026

Efficient Transcoder Adaptation for Fine-Tuned Models: Revealing Medical Reasoning Mechanisms in Large Language Models AAAI 2026

CharBench: Evaluating the Role of Tokenization in Character-Level Tasks AAAI 2026

PRAGWORLD: A Benchmark Evaluating LLMs’ Local World Model Under Minimal Linguistic Alterations and Conversational Dynamics AAAI 2026

Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models AAAI 2026

Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs AAAI 2026

GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization AAAI 2026

Test-time Prompt Intervention AAAI 2026

Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory AAAI 2026

Global-Local Confidence Fusion for Hallucination Detection in Mathematical Reasoning Task AAAI 2026

Interpretable Reward Model via Sparse Autoencoder AAAI 2026

The Other Mind: How Language Models Exhibit Human Temporal Cognition AAAI 2026

Efficiently Computing Compact Formal Explanations AAAI 2026

CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing AAAI 2026

Differentiated Directional Intervention: A Framework for Evading LLM Safety Alignment AAAI 2026

Driving with Regulation: Trustworthy and Interpretable Decision-Making for Autonomous Driving with Retrieval-Augmented Reasoning AAAI 2026

RECoRD: A Multi-Agent LLM Framework for Reverse Engineering Codebase to Relational Diagram AAAI 2026