Co-occurring keywords
Papers
LSNet: See Large, Focus Small
CVPR 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
IJCNLP 2025
Deconstructing Attention: Investigating Design Principles for Effective Language Modeling
IJCNLP 2025
PowerMLP: An Efficient Version of KAN
AAAI 2025