Neural Network Optimization
3648 directly classified papers
Papers per year
Papers
Transformers without Normalization
CVPR 2025
Taming LLMs with Gradient Grouping
ACL 2025
LESA: Learnable LLM Layer Scaling-Up
ACL 2025