Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

Reasoning to Attend: Try to Understand How <SEG> Token Works CVPR 2025

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers ACL 2025

Sequential Conditional Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness AAAI 2025

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification ACL 2025

Connecting Concept Layers and Rationales to Enhance Language Model Interpretability EMNLP 2025

bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning ACL 2025

Does Your AI Agent Get You? A Personalizable Framework for Approximating Human Models from Argumentation-based Dialogue Traces AAAI 2025

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models ACL 2025

Active Fourier Auditor for Estimating Distributional Properties of ML Models AAAI 2025

Dynamic Head Selection for Neural Lexicalized Constituency Parsing ACL 2025

Unlocking the Game: Estimating Games in Möbius Representation for Explanation and High-Order Interaction Detection AAAI 2025

Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis ACL 2025

Extracting PAC Decision Trees from Black Box Binary Classifiers: The Gender Bias Study Case on BERT-based Language Models AAAI 2025

Mamba Knockout for Unraveling Factual Information Flow ACL 2025

EXCGEC: A Benchmark for Edit-Wise Explainable Chinese Grammatical Error Correction AAAI 2025

Language Models Grow Less Humanlike beyond Phase Transition ACL 2025

Uncertainty-aware Knowledge Tracing AAAI 2025

PRISM: A Framework for Producing Interpretable Political Bias Embeddings with Political-Aware Cross-Encoder ACL 2025

Do Large Language Models Know When Not to Answer in Medical QA? EMNLP 2025

IRIS: Interpretable Retrieval-Augmented Classification for Long Interspersed Document Sequences ACL 2025

An XAI Social Media Platform for Teaching K-12 Students AI-Driven Profiling, Clustering, and Engagement-Based Recommending AAAI 2025

Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey ACL 2025

Certain but not Probable? Differentiating Certainty from Probability in LLM Token Outputs for Probabilistic Scenarios EMNLP 2025

Don’t Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models ACL 2025

Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models EMNLP 2025