Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs NIPS 2024

On the Impact of Calibration Data in Post-training Quantization and Pruning ACL 2024

How Sparse Can We Prune A Deep Network: A Fundamental Limit Perspective NIPS 2024

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer CVPR 2024

Adversarial Distillation Based on Slack Matching and Attribution Region Alignment CVPR 2024

Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space ACL 2024

On the social bias of speech self-supervised models INTERSPEECH 2024

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention NIPS 2024

Exploring compressibility of transformer based text-to-music (TTM) models INTERSPEECH 2024

SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget ACL 2024

Learn and Don't Forget: Adding a New Language to ASR Foundation Models INTERSPEECH 2024

FedMef: Towards Memory-efficient Federated Dynamic Pruning CVPR 2024

Efficient CNNs with Quaternion Transformations and Pruning for Audio Tagging INTERSPEECH 2024

WRP: Weight Recover Prune for Structured Sparsity ACL 2024

Streamlining Speech Enhancement DNNs: an Automated Pruning Method Based on Dependency Graph with Advanced Regularized Loss Strategies INTERSPEECH 2024

FM-Delta: Lossless Compression for Storing Massive Fine-tuned Foundation Models NIPS 2024

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning CVPR 2024

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment NIPS 2024

SparseFlow: Accelerating Transformers by Sparsifying Information Flows ACL 2024

Compact 3D Gaussian Representation for Radiance Field CVPR 2024

Data-Free Quantization via Pseudo-label Filtering CVPR 2024

Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression ACL 2024

Efficient Multi-task LLM Quantization and Serving for Multiple LoRA Adapters NIPS 2024

Surgical Feature-Space Decomposition of LLMs: Why, When and How? ACL 2024

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation ACL 2024