Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Optimization & Theory
Deep Learning
›
Optimization & Theory
›
Model Compression
1674 directly classified papers
Papers per year
2012: 1
2013: 2
2014: 2
2015: 7
2016: 9
2017: 27
2018: 51
2019: 79
2020: 189
2021: 165
2022: 206
2023: 207
2024: 325
2025: 399
2026: 5
Papers
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
NIPS 2024
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
ACL 2024
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
ACL 2024
UniPTS: A Unified Framework for Proficient Post-Training Sparsity
CVPR 2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
NIPS 2024
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
ACL 2024
On the Impact of Calibration Data in Post-training Quantization and Pruning
ACL 2024
USDN: A Unified Sample-Wise Dynamic Network With Mixed-Precision and Early-Exit
WACV 2024
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
NIPS 2024
BOLD: Boolean Logic Deep Learning
NIPS 2024
Dodo: Dynamic Contextual Compression for Decoder-only LMs
ACL 2024
Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
ACL 2024
Learning To Compose SuperWeights for Neural Parameter Allocation Search
WACV 2024
NACL: A General and Effective KV Cache Eviction Framework for LLM at Inference Time
ACL 2024
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget
ACL 2024
PartialFormer: Modeling Part Instead of Whole for Machine Translation
ACL 2024
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
CVPR 2024
WRP: Weight Recover Prune for Structured Sparsity
ACL 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
ACL 2024
SparseFlow: Accelerating Transformers by Sparsifying Information Flows
ACL 2024
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning
ACL 2024
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
CVPR 2024
FM-Delta: Lossless Compression for Storing Massive Fine-tuned Foundation Models
NIPS 2024
QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning
EMNLP 2024
Parameter Competition Balancing for Model Merging
NIPS 2024
<
1
…
28
29
30
…
67
>