conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

Analyzing (In)Abilities of SAEs via Formal Languages NAACL 2025

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering NAACL 2025

Verify-in-the-Graph: Entity Disambiguation Enhancement for Complex Claim Verification with Interactive Graph Representation NAACL 2025

Aggregation Artifacts in Subjective Tasks Collapse Large Language Models’ Posteriors NAACL 2025

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages NAACL 2025

Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance based Consistent Reasoning NAACL 2025

Sneaking Syntax into Transformer Language Models with Tree Regularization NAACL 2025

Analyzing the Inner Workings of Transformers in Compositional Generalization NAACL 2025

CAMIEval: Enhancing NLG Evaluation through Multidimensional Comparative Instruction-Following Analysis NAACL 2025

CAVE: Controllable Authorship Verification Explanations NAACL 2025

Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation NAACL 2025

Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs NAACL 2025

MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps NAACL 2025

LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue NAACL 2025

Analyzing Memorization in Large Language Models through the Lens of Model Attribution NAACL 2025

Main Predicate and Their Arguments as Explanation Signals For Intent Classification NAACL 2025

The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units NAACL 2025

A Systematic Examination of Preference Learning through the Lens of Instruction-Following NAACL 2025

One fish, two fish, but not the whole sea: Alignment reduces language models’ conceptual diversity NAACL 2025

The Stochastic Parrot on LLM’s Shoulder: A Summative Assessment of Physical Concept Understanding NAACL 2025

What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation NAACL 2025

Are explicit belief representations necessary? A comparison between Large Language Models and Bayesian probabilistic models NAACL 2025

Characterizing the Role of Similarity in the Property Inferences of Language Models NAACL 2025

MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools NAACL 2025

SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models NAACL 2025