Co-occurring keywords
Papers
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
COLING 2024
Rewiring the Transformer with Depth-Wise LSTMs
COLING 2024