model compression

3283 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3680) large language model (12755) neural network (6616) efficient computing (779) neural network optimization (1293) transfer learning (5442) convolutional neural network (4216) neural network pruning (265) language model (4573) parameter efficiency (415)

Papers

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization NIPS 2024

MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers NIPS 2024

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization NIPS 2024

Compact Language Models via Pruning and Knowledge Distillation NIPS 2024

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion NIPS 2024

SparseLLM: Towards Global Pruning of Pre-trained Language Models NIPS 2024

LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing NIPS 2024

SlimSAM: 0.1% Data Makes Segment Anything Slim NIPS 2024

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference NIPS 2024

$\texttt{Model-GLUE}$: Democratized LLM Scaling for A Large Model Zoo in the Wild NIPS 2024

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference NIPS 2024

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model NIPS 2024

SnapKV: LLM Knows What You are Looking for Before Generation NIPS 2024

Cluster-Learngene: Inheriting Adaptive Clusters for Vision Transformers NIPS 2024

Binarized Diffusion Model for Image Super-Resolution NIPS 2024

Sparse maximal update parameterization: A holistic approach to sparse training dynamics NIPS 2024

Multistep Distillation of Diffusion Models via Moment Matching NIPS 2024

BiDM: Pushing the Limit of Quantization for Diffusion Models NIPS 2024

SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform NIPS 2024

Exploring Token Pruning in Vision State Space Models NIPS 2024

HEPrune: Fast Private Training of Deep Neural Networks With Encrypted Data Pruning NIPS 2024

Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization NIPS 2024

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution NIPS 2024

QTIP: Quantization with Trellises and Incoherence Processing NIPS 2024

Training Binary Neural Networks via Gaussian Variational Inference and Low-Rank Semidefinite Programming NIPS 2024