knowledge distillation

3680 papers

Explore in graph

Also known as

CD DMD LORA KL DNA SELF-DISTILLATION TKD NBOD AD KD AOTD KI GID FD MKD SEQKD

Co-occurring keywords

model compression (3283) large language model (12755) transfer learning (5442) domain adaptation (4578) representation learning (6174) neural network (6616) language model (4573) continual learning (1164) catastrophic forgetting (939) contrastive learning (3979)

Papers

Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence CONLL 2024

GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model INTERSPEECH 2024

Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance AAAI 2024

Prophecy Distillation for Boosting Abstractive Summarization COLING 2024

Deep Classifier Mimicry without Data Access AISTATS 2024

BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data CONLL 2024

RADCoT: Retrieval-Augmented Distillation to Specialization Models for Generating Chain-of-Thoughts in Query Expansion COLING 2024

All Rivers Run to the Sea: Private Learning with Asymmetric Flows CVPR 2024

Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models COLING 2024

Probe Then Retrieve and Reason: Distilling Probing and Reasoning Capabilities into Smaller Language Models COLING 2024

PIRB: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods COLING 2024

MoDE-CoTD: Chain-of-Thought Distillation for Complex Reasoning Tasks with Mixture of Decoupled LoRA-Experts COLING 2024

UniPTS: A Unified Framework for Proficient Post-Training Sparsity CVPR 2024

Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation COLING 2024

Self-Supervised Quantization-Aware Knowledge Distillation AISTATS 2024

A Dynamic GCN with Cross-Representation Distillation for Event-Based Learning AAAI 2024

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning ACL 2024

Knowledge Distillation for Tiny Speech Enhancement with Latent Feature Augmentation INTERSPEECH 2024

RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention INTERSPEECH 2024

Neural Machine Translation between Low-Resource Languages with Synthetic Pivoting COLING 2024

Accurate Knowledge Distillation via n-best Reranking NAACL 2024

PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning NAACL 2024

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale NAACL 2024

CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants NAACL 2024

Mind’s Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models NAACL 2024