transformer architecture
1555 papers
Also known as
TTE
STTR
TGT
DIT
ENT
BERT
DETR
ROBERTA
TA
Co-occurring keywords
Papers
FFT-Based Dynamic Token Mixer for Vision
AAAI 2024
Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization
NIPS 2024
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
NIPS 2024
Algorithmic progress in language models
NIPS 2024