conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

MedThink: A Rationale-Guided Framework for Explaining Medical Visual Question Answering NAACL 2025

Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning NAACL 2025

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers NAACL 2025

Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation NAACL 2025

Explainability for NLP in Pharmacovigilance: A Study on Adverse Event Report Triage in Swedish NAACL 2025

Explainable ICD Coding via Entity Linking NAACL 2025

Who We Are, Where We Are: Mental Health at the Intersection of Person, Situation, and Large Language Models NAACL 2025

Bridging the Faithfulness Gap in Prototypical Models NAACL 2025

Error Reflection Prompting: Can Large Language Models Successfully Understand Errors? NAACL 2025

Interpretable Models for Detecting Linguistic Variation in Russian Media: Towards Unveiling Propagandistic Strategies during the Russo-Ukrainian War NAACL 2025

Probing Internal Representations of Multi-Word Verbs in Large Language Models NAACL 2025

The AI Co-Ethnographer: How Far Can Automation Take Qualitative Research? NAACL 2025

VLG-BERT: Towards Better Interpretability in LLMs through Visual and Linguistic Grounding NAACL 2025

Ambiguity Detection and Uncertainty Calibration for Question Answering with Large Language Models NAACL 2025

Smaller Large Language Models Can Do Moral Self-Correction NAACL 2025

Error Detection for Multimodal Classification NAACL 2025

Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models NAACL 2025

Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads NAACL 2025

A Calibrated Reflection Approach for Enhancing Confidence Estimation in LLMs NAACL 2025

Evaluating Design Choices in Verifiable Generation with Open-source Models NAACL 2025

Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings NAACL 2025

Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation NAACL 2025

Holmes: Localizing Irregularities in LLM Training with Mega-scale GPU Clusters NSDI 2025

Learning Interpretable Features from Interventions RSS 2025

REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models SEMEVAL 2025