Co-occurring keywords
Papers
BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data
CONLL 2024
Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models
COLING 2024
Probe Then Retrieve and Reason: Distilling Probing and Reasoning Capabilities into Smaller Language Models
COLING 2024
MoDE-CoTD: Chain-of-Thought Distillation for Complex Reasoning Tasks with Mixture of Decoupled LoRA-Experts
COLING 2024
Knowledge Distillation for Tiny Speech Enhancement with Latent Feature Augmentation
INTERSPEECH 2024
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention
INTERSPEECH 2024
TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale
NAACL 2024