SCOPE: Boosting LLM Efficiency with Scoped Position Encoding

Qingguo Qi; Hongyang Chen; Zhao Li

2026 ACL ACL 2026

SCOPE: Boosting LLM Efficiency with Scoped Position Encoding

Abstract

AbstractPositional encodings are fundamental to Transformers, yet explicit methods like RoPE can degrade under length extrapolation and may incur extra arithmetic and memory-access overhead. In this paper, we propose Scoped Position Encoding (ScoPE), a novel framework that reimagines structured sparsity as an intrinsic position encoding mechanism. Instead of relying on explicit arithmetic signals, ScoPE assigns exponentially scaled look-back scopes to attention heads. We theoretically demonstrate that this simple topological constraint transforms multi-head attention into a hierarchical processor, yielding an order awareness horizon that grows exponentially with depth up to the sequence length. Consequently, ScoPE is parameter-free and avoids relying on fragile positional arithmetic. Empirically, it significantly enhances efficiency by masking the majority of attention computations, offering a theoretical 8x reduction in attention FLOPs at long contexts. Extensive evaluations on LLaMA-3-8B architectures reveal that ScoPE achieves superior native length extrapolation and robust retrieval fidelity compared to RoPE, all while substantially reducing training and inference latency. The code is available at https://github.com/oncemoe/ScoPE.

Authors

Qingguo Qi , Hongyang Chen , Zhao Li

Topics

Deep Learning > Architectures > Transformers Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Efficient Computing

Keywords

structured sparsity length extrapolation position encoding multi-head attention attention computation

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026