SSA: Improving Performance With a Better Scoring Function

Omar Naim; Swarnadeep Bhar; Jérôme Bolte; Nicholas Asher

2026 ACL ACL 2026

SSA: Improving Performance With a Better Scoring Function

Abstract

AbstractWhile transformer models exhibit strong in-context learning (ICL) abilities, they often fail to generalize under simple distribution shifts. We analyze these failures and identify Softmax, the scoring function in the attention mechanism, as a contributing factor. We propose Scaled Signed Averaging (SSA), a novel attention scoring function that mitigates these failures. SSA significantly improves performance on our ICL tasks and outperforms transformer models with Softmax on several NLP benchmarks and linguistic probing tasks, in both decoder-only and encoder-only architectures.

Authors

Omar Naim , Swarnadeep Bhar , Jérôme Bolte , Nicholas Asher

Topics

Deep Learning > Architectures > Transformers Deep Learning > Learning Types > In-Context Learning Artificial Intelligence > Core AI > Attention

Keywords

attention mechanism in-context learning distribution shift scoring function transformer model scaled signed averaging

Download PDF

Related papers

No Reader Left Behind: Multi-Agent Summaries Everyone Can Understand 2026

One-step Nonautoregressive Natural Language Generation with Shortcut Flow Matching Models 2026

Optimizing Retrieval-Augmented Generation for E-Commerce How-To Assistance 2026

Make Mechanistic Interpretability Auditable: A Call to Develop Guidelines via Continuous Collaborative Reviewing 2026

MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation 2026