model compression

3283 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3680) large language model (12755) neural network (6616) efficient computing (779) neural network optimization (1293) transfer learning (5442) convolutional neural network (4216) neural network pruning (265) language model (4573) parameter efficiency (415)

Papers

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs EMNLP 2025

Large Language Models Are Overparameterized Text Encoders NAACL 2025

Integrating Independent Layer-Wise Rank Selection with Low-Rank SVD Training for Model Compression: A Theory-Driven Approach IJCAI 2025

EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models EMNLP 2025

Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models EMNLP 2025

GAP: a Global Adaptive Pruning Method for Large Language Models EMNLP 2025

Revisiting Pruning vs Quantization for Small Language Models EMNLP 2025

HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging EMNLP 2025

StepER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models EMNLP 2025

FISTAPruner: Layer-wise Post-training Pruning for Large Language Models EMNLP 2025

As easy as PIE: understanding when pruning causes language models to disagree NAACL 2025

Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-Knowingly EMNLP 2025

RankAdaptor: Hierarchical Rank Allocation for Efficient Fine-Tuning Pruned LLMs via Performance Model NAACL 2025

AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models EMNLP 2025

Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant IJCAI 2025

SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression AAAI 2025

ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression AAAI 2025

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment AAAI 2025

Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information IJCAI 2025

BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference AAAI 2025

MT2ST: Adaptive Multi-Task to Single-Task Learning ACL 2025

Sample-aware Adaptive Structured Pruning for Large Language Models AAAI 2025

WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models CVPR 2025

From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers AAAI 2025

MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models EMNLP 2025