conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

Linguistically Grounded Analysis of Language Models using Shapley Head Values NAACL 2025

Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning NAACL 2025

Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models NAACL 2025

From Argumentation to Deliberation: Perspectivized Stance Vectors for Fine-grained (Dis)agreement Analysis NAACL 2025

Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models NAACL 2025

Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation NAACL 2025

Evaluating Vision-Language Models for Emotion Recognition NAACL 2025

Open Domain Question Answering with Conflicting Contexts NAACL 2025

SAFR: Neuron Redistribution for Interpretability NAACL 2025

Attention Tracker: Detecting Prompt Injection Attacks in LLMs NAACL 2025

Neuro-symbolic Training for Reasoning over Spatial Language NAACL 2025

On Localizing and Deleting Toxic Memories in Large Language Models NAACL 2025

Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model NAACL 2025

Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step NAACL 2025

Neuroplasticity and Corruption in Model Mechanisms: A Case Study Of Indirect Object Identification NAACL 2025

A Practical Method for Generating String Counterfactuals NAACL 2025

Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models NAACL 2025

Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection NAACL 2025

Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax NAACL 2025

Efficient Nearest Neighbor based Uncertainty Estimation for Natural Language Processing Tasks NAACL 2025

From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs NAACL 2025

Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning NAACL 2025

On the Feasibility of In-Context Probing for Data Attribution NAACL 2025

Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs NAACL 2025

Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference NAACL 2025