← Optimization & Theory

Deep Learning › Optimization & Theory ›

Model Compression

1674 directly classified papers

Papers per year

Papers

DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs NIPS 2024

F³-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis AAAI 2024

UltraSparseBERT: 99% Conditionally Sparse Language Modelling ACL 2024

Structured Unrestricted-Rank Matrices for Parameter Efficient Finetuning NIPS 2024

Streamlining Speech Enhancement DNNs: an Automated Pruning Method Based on Dependency Graph with Advanced Regularized Loss Strategies INTERSPEECH 2024

MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter ACL 2024

Lightweight Transducer Based on Frame-Level Criterion INTERSPEECH 2024

Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression ACL 2024

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging NIPS 2024

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA ACL 2024

Expanding Sparse Tuning for Low Memory Usage NIPS 2024

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention NIPS 2024

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model NIPS 2024

QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models EMNLP 2024

SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization NIPS 2024

CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization NIPS 2024

Cross-model Control: Improving Multiple Large Language Models in One-time Training NIPS 2024

Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation NIPS 2024

Reasons and Solutions for the Decline in Model Performance after Editing NIPS 2024

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution NIPS 2024

FM-Delta: Lossless Compression for Storing Massive Fine-tuned Foundation Models NIPS 2024

OneBit: Towards Extremely Low-bit Large Language Models NIPS 2024

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification NIPS 2024

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length NIPS 2024

Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models NIPS 2024