Exploring Attention Attractors in Large Language Models

Ziheng Wang; Zihao Yue; Wenxuan Wang; Qin Jin

2026 ACL ACL 2026

Exploring Attention Attractors in Large Language Models

Abstract

AbstractThis paper explores attention attractors, tokens that draw significantly high attention, in large language models. We analyze them from three perspectives: (1) Functionality: We demonstrate their role in aggregating information from preceding contexts to facilitate future predictions. (2) Distribution: Through layer-wise and token-wise analysis, we reveal that attention attractors are widely distributed across layers but predominantly originate from low-semantic words like "_the". (3) Mechanism: We demonstrate the correlation between attention weights allocated to tokens with their specific activation dimension values. We hope these findings provide new insights into the attention mechanisms of large language models and inspire further exploration.

Authors

Ziheng Wang , Zihao Yue , Wenxuan Wang , Qin Jin

Topics

Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Attention Deep Learning > Techniques > Attention Mechanism

Keywords

attention mechanism large language model attention attractor

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026