model compression

3283 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3680) large language model (12755) neural network (6616) efficient computing (779) neural network optimization (1293) transfer learning (5442) convolutional neural network (4216) neural network pruning (265) language model (4573) parameter efficiency (415)

Papers

BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation COLING 2025

MoKA:Parameter Efficiency Fine-Tuning via Mixture of Kronecker Product Adaption COLING 2025

Memory-Efficient Generative Models via Product Quantization ICCV 2025

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models ACL 2025

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models COLING 2025

XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression EMNLP 2025

Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity EMNLP 2025

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning ICCV 2025

MLWQ: Efficient Small Language Model Deployment via Multi-Level Weight Quantization EMNLP 2025

ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning EMNLP 2025

StitchLLM: Serving LLMs, One Block at a Time ACL 2025

Teach Small Models to Reason by Curriculum Distillation EMNLP 2025

Agentic-R1: Distilled Dual-Strategy Reasoning EMNLP 2025

Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster EMNLP 2025

PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation EMNLP 2025

Power doesn’t reside in size: A Low Parameter Hybrid Language Model (HLM) for Sentiment Analysis in Code-mixed data EMNLP 2025

Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions EMNLP 2025

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs EMNLP 2025

MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines EMNLP 2025

EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models EMNLP 2025

GAP: a Global Adaptive Pruning Method for Large Language Models EMNLP 2025

Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models EMNLP 2025

Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models EMNLP 2025

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression EMNLP 2025

Propulsion: Steering LLM with Tiny Fine-Tuning COLING 2025