model compression

3283 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3680) large language model (12755) neural network (6616) efficient computing (779) neural network optimization (1293) transfer learning (5442) convolutional neural network (4216) neural network pruning (265) language model (4573) parameter efficiency (415)

Papers

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference AAAI 2025

Multilingual Iterative Model Pruning: What Matters? IJCNLP 2025

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness AAAI 2025

Argus: A Compact and Versatile Foundation Model for Vision CVPR 2025

Logits-Based Finetuning EMNLP 2025

LangCompress: Language-Aware Compression of Large Language Models IJCNLP 2025

Fast and Slow Gradient Approximation for Binary Neural Network Optimization AAAI 2025

Interpreting the Effects of Quantization on LLMs IJCNLP 2025

Can Students Beyond the Teacher? Distilling Knowledge from Teacher’s Bias AAAI 2025

ControlMed: Adding Reasoning Control to Medical Language Model IJCNLP 2025

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture CVPR 2025

L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models ACL 2025

NAAM: Node-Aware Attention Mechanism for Distilling GNNs-to-MLP (Student Abstract) AAAI 2025

A High-Efficiency Federated Learning Method Using Complementary Pruning for D2D Communication (Student Abstract) AAAI 2025

Compression-Aware Computing for Scalable and Sustainable AI AAAI 2025

Pre-training Distillation for Large Language Models: A Design Space Exploration ACL 2025

TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models AAAI 2025

BitNet: 1-bit Pre-training for Large Language Models JMLR 2025

Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference EMNLP 2025

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models ACL 2025

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers AAAI 2025

Walk and Read Less: Improving the Efficiency of Vision-and-Language Navigation via Tuning-Free Multimodal Token Pruning EMNLP 2025

TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization ACL 2025

GradOT: Training-free Gradient-preserving Offsite-tuning for Large Language Models ACL 2025

Efficient Speech Translation through Model Compression and Knowledge Distillation ACL 2025