conftrace_

model compression

3302 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3725) large language model (13587) neural network (6616) efficient computing (781) neural network optimization (1293) transfer learning (5449) convolutional neural network (4226) neural network pruning (265) language model (4599) parameter efficiency (417)

Papers

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models EMNLP 2024

Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale EMNLP 2024

Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models EMNLP 2024

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution NIPS 2024

KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches EMNLP 2024

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning EMNLP 2024

Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models COLING 2024

Probe Then Retrieve and Reason: Distilling Probing and Reasoning Capabilities into Smaller Language Models COLING 2024

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification NIPS 2024

Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind COLING 2024

LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models COLING 2024

Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation COLING 2024

Efficient Audio Captioning with Encoder-Level Knowledge Distillation INTERSPEECH 2024

DeltaDEQ: Exploiting Heterogeneous Convergence for Accelerating Deep Equilibrium Iterations NIPS 2024

In-Context Former: Lightning-fast Compressing Context for Large Language Model EMNLP 2024

FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation INTERSPEECH 2024

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment NIPS 2024

RRADistill: Distilling LLMs’ Passage Ranking Ability for Long-Tail Queries Document Re-Ranking on a Search Engine EMNLP 2024

Visual Editing with LLM-based Tool Chaining: An Efficient Distillation Approach for Real-Time Applications EMNLP 2024

Divide-or-Conquer? Which Part Should You Distill Your LLM? EMNLP 2024

EAVE: Efficient Product Attribute Value Extraction via Lightweight Sparse-layer Interaction EMNLP 2024

Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch CVPR 2024

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference NIPS 2024

Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging EMNLP 2024

Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning EMNLP 2024