← Optimization & Theory

Deep Learning › Optimization & Theory ›

Model Compression

1674 directly classified papers

Papers per year

Papers

DCSF-KD: Dynamic Channel-wise Spatial Feature Knowledge Distillation for Object Detection AAAI 2025

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models ACL 2025

Q-Mamba: Towards more efficient Mamba models via post-training quantization ACL 2025

FPE2M2: Approaching Lossless and Efficient Quantization with Native Floating Point ACL 2025

Numerical Pruning for Efficient Autoregressive Models AAAI 2025

Pretraining Context Compressor for Large Language Models with Embedding-Based Memory ACL 2025

Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference ACL 2025

FocusLLM: Precise Understanding of Long Context by Dynamic Condensing ACL 2025

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs ACL 2025

Run LoRA Run: Faster and Lighter LoRA Implementations ACL 2025

ProCut: LLM Prompt Compression via Attribution Estimation EMNLP 2025

Parameter-Efficient Fine-Tuning via Circular Convolution ACL 2025

LLMs on a Budget? Say HOLA EMNLP 2025

Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation EMNLP 2025

Multi-Task Pre-Finetuning of Lightweight Transformer Encoders for Text Classification and NER EMNLP 2025

Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems EMNLP 2025

Low-Rank Interconnected Adaptation across Layers ACL 2025

Maximum Score Routing For Mixture-of-Experts ACL 2025

GenPTQ: Green Post-Training Quantization for Large-Scale ASR Models with Mixed-Precision Bit Allocation EMNLP 2025

Revisiting Pruning vs Quantization for Small Language Models EMNLP 2025

SwiftPrune: Hessian-Free Weight Pruning for Large Language Models EMNLP 2025

Sensitivity-LoRA : Low-Load Sensitivity-Based Fine-Tuning for Large Language Models EMNLP 2025

1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models EMNLP 2025

BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion EMNLP 2025

WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models CVPR 2025