Robertha: Eigenspectrum Regularized Attention for Robust Natural Language Understanding
Abstract
AbstractWe study asymmetric vulnerability to embedding corruption in encoder-based language models, where uniform perturbations dis-proportionately degrade low-magnitude embeddings compared to high-magnitude ones. Since critical words concentrate in low-norm space, this asymmetry causes catastrophic degradation of grammatical and semantic structure even under moderate corruption. Existing robustness approaches either sacrifice clean performance or fail to generalize to higher corruption levels. To address this problem, we propose Robertha, an attention mechanism built on Modern Hopfield Networks, in which semantic patterns act as stable states (attractors) that pull corrupted embeddings toward correct representations. We introduce iterative refinement for differential recovery: heavily corrupted embeddings require multiple convergence steps, while lightly corrupted embeddings converge quickly. To strengthen this mechanism, we introduce Eigenspectrum Regularization (ESR), which enforces low-rank key structures by controlling eigenvalue entropy, creating strong, well separated attractors with wide recovery basins. Across 13 GLUE and SuperGLUE tasks, Robertha significantly outperforms existing robustness methods while maintaining competitive clean performance.