model compression

3283 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3680) large language model (12755) neural network (6616) efficient computing (779) neural network optimization (1293) transfer learning (5442) convolutional neural network (4216) neural network pruning (265) language model (4573) parameter efficiency (415)

Papers

OAC: Output-adaptive Calibration for Accurate Post-training Quantization AAAI 2025

Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference AAAI 2025

SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression AAAI 2025

BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference AAAI 2025

Inference-Time Diffusion Model Distillation ICCV 2025

RILQ: Rank-Insensitive LoRA-Based Quantization Error Compensation for Boosting 2-Bit Large Language Model Accuracy AAAI 2025

Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment AAAI 2025

From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers AAAI 2025

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning AAAI 2025

ScaleOT: Privacy-utility-scalable Offsite-tuning with Dynamic LayerReplace and Selective Rank Compression AAAI 2025

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models AAAI 2025

Treasures in Discarded Weights for LLM Quantization AAAI 2025

Channel Merging: Preserving Specialization for Merged Experts AAAI 2025

ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization AAAI 2025

MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices AAAI 2025

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference AAAI 2025

C3oT: Generating Shorter Chain-of-Thought Without Compromising Effectiveness AAAI 2025

Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal AAAI 2025

PAT: Pruning-Aware Tuning for Large Language Models AAAI 2025

A Compact Model for Mathematics Problem Representations Distilled from BERT AAAI 2025

CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation AAAI 2025

Hybrid Data-Free Knowledge Distillation AAAI 2025

JAQ: Joint Efficient Architecture Design and Low-Bit Quantization with Hardware-Software Co-Exploration AAAI 2025

Can Students Beyond the Teacher? Distilling Knowledge from Teacher’s Bias AAAI 2025

Random Conditioning for Diffusion Model Compression with Distillation CVPR 2025