model compression

3283 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3680) large language model (12755) neural network (6616) efficient computing (779) neural network optimization (1293) transfer learning (5442) convolutional neural network (4216) neural network pruning (265) language model (4573) parameter efficiency (415)

Papers

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization EMNLP 2024

QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models EMNLP 2024

LRQuant: Learnable and Robust Post-Training Quantization for Large Language Models ACL 2024

MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning ACL 2024

CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework NIPS 2024

Cross-model Control: Improving Multiple Large Language Models in One-time Training NIPS 2024

KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis NIPS 2024

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner NIPS 2024

Toward Efficient Inference for Mixture of Experts NIPS 2024

Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers NIPS 2024

TinyLUT: Tiny Look-Up Table for Efficient Image Restoration at the Edge NIPS 2024

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning EMNLP 2024

Compact 3D Gaussian Representation for Radiance Field CVPR 2024

Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation IJCAI 2024

UPS: Unified Projection Sharing for Lightweight Single-Image Super-resolution and Beyond NIPS 2024

Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction ACL 2024

SparseFlow: Accelerating Transformers by Sparsifying Information Flows ACL 2024

Dual-Space Knowledge Distillation for Large Language Models EMNLP 2024

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech ACL 2024

Group and Shuffle: Efficient Structured Orthogonal Parametrization NIPS 2024

Revisiting Knowledge Distillation for Autoregressive Language Models ACL 2024

Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions EMNLP 2024

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution NIPS 2024

Efficient Large Multi-modal Models via Visual Context Compression NIPS 2024

Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models COLING 2024