Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Attributive Reasoning for Hallucination Diagnosis of Large Language Models AAAI 2025

SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses AAAI 2025

Tuning-Free Accountable Intervention for LLM Deployment – a Metacognitive Approach AAAI 2025

Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models? AAAI 2025

Cooperative or Competitive? Understanding the Interaction between Attention Heads From A Game Theory Perspective ACL 2025

MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis ACL 2025

Benchmarking and Understanding Compositional Relational Reasoning of LLMs AAAI 2025

Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models through Bidirectional Hidden State Intervention ACL 2025

Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense Diagnosis ACL 2025

Calibrating Large Language Models with Sample Consistency AAAI 2025

Beyond Accuracy: On the Effects of Fine-Tuning Towards Vision-Language Model’s Prediction Rationality AAAI 2025

Knowledge-Augmented Multimodal Clinical Rationale Generation for Disease Diagnosis with Small Language Models ACL 2025

The Knowledge Microscope: Features as Better Analytical Lenses than Neurons ACL 2025

FLUE: Streamlined Uncertainty Estimation for Large Language Models AAAI 2025

Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models ACL 2025

CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction ACL 2025

Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments AAAI 2025

Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference AAAI 2025

Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection AAAI 2025

Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes ACL 2025

Comparing LLM-generated and human-authored news text using formal syntactic theory ACL 2025

Quality-Informed Segment-Level Error Correction Using Natural Language Explanations from xTower and Large Language Models EMNLP 2025

Circuit Stability Characterizes Language Model Generalization ACL 2025

Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice ACL 2025

Targeted Source Text Editing for Machine Translation: Exploiting Quality Estimators and Large Language Models EMNLP 2025