transformer architecture
1555 papers
Also known as
TTE
STTR
TGT
DIT
ENT
BERT
DETR
ROBERTA
TA
Co-occurring keywords
Papers
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
NIPS 2024
Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration
CVPR 2024
How Much Context Does My Attention-Based ASR System Need?
INTERSPEECH 2024
Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
NIPS 2024
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
NIPS 2024