Artificial Intelligence › Core AI ›

Interpretability

7318 directly classified papers

Papers per year

Papers

clujteam at SemEval-2025 Task 10: Finetuning SmolLM2 with Taxonomy-based Prompting for Explaining the Dominant Narrative in Propaganda Textt ACL 2025

Learning Accurate and Interpretable Decision Trees (Extended Abstract) IJCAI 2025

A Semantic Framework for Neurosymbolic Computation (Abstract Reprint) IJCAI 2025

EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations ACL 2025

Explain It as Simple as Possible, but No Simpler – Explanation via Model Simplification for Addressing Inferential Gap (Abstract Reprint) IJCAI 2025

Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs EMNLP 2025

RomanLens: The Role Of Latent Romanization In Multilinguality In LLMs ACL 2025

Beyond Context to Cognitive Appraisal: Emotion Reasoning as a Theory of Mind Benchmark for Large Language Models ACL 2025

LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation ACL 2025

Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification CONLL 2025

Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs ACL 2025

HumT DumT: Measuring and controlling human-like language in LLMs ACL 2025

LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-Encoder EMNLP 2025

MemeQA: Holistic Evaluation for Meme Understanding ACL 2025

Extended Abstract: Probing-Guided Parameter-Efficient Fine-Tuning for Balancing Linguistic Adaptation and Safety in LLM-based Social Influence Systems ACL 2025

Leveraging What’s Overfixed: Post-Correction via LLM Grammatical Error Overcorrection EMNLP 2025

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles EMNLP 2025

RUC Team at SemEval-2025 Task 5: Fast Automated Subject Indexing: A Method Based on Similar Records Matching and Related Subject Ranking ACL 2025

Investigating Psychometric Predictive Power of Syntactic Attention ACL 2025

What Language Do Non-English-Centric Large Language Models Think in? ACL 2025

Quasi-symbolic Semantic Geometry over Transformer-based Variational AutoEncoder ACL 2025

FADE: Why Bad Descriptions Happen to Good Features ACL 2025

MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models ACL 2025

Improving Large Language Model Safety with Contrastive Representation Learning EMNLP 2025

Unsupervised Concept Vector Extraction for Bias Control in LLMs EMNLP 2025