knowledge distillation
3680 papers
Also known as
CD
DMD
LORA
KL
DNA
SELF-DISTILLATION
TKD
NBOD
AD
KD
AOTD
KI
GID
FD
MKD
SEQKD
Co-occurring keywords
Papers
Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence
CONLL 2024
GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model
INTERSPEECH 2024
Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance
AAAI 2024
Deep Classifier Mimicry without Data Access
AISTATS 2024
BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data
CONLL 2024
Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models
COLING 2024
Probe Then Retrieve and Reason: Distilling Probing and Reasoning Capabilities into Smaller Language Models
COLING 2024
MoDE-CoTD: Chain-of-Thought Distillation for Complex Reasoning Tasks with Mixture of Decoupled LoRA-Experts
COLING 2024
Knowledge Distillation for Tiny Speech Enhancement with Latent Feature Augmentation
INTERSPEECH 2024
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention
INTERSPEECH 2024