Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

AnalyticKWS: Towards Exemplar-Free Analytic Class Incremental Learning for Small-footprint Keyword Spotting ACL 2025

MLWQ: Efficient Small Language Model Deployment via Multi-Level Weight Quantization EMNLP 2025

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs CVPR 2025

AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity EMNLP 2025

FREE: Fast and Robust Vision Language Models with Early Exits ACL 2025

Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance EMNLP 2025

Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information AAAI 2025

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework EMNLP 2025

Demystifying Small Language Models for Edge Deployment ACL 2025

Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge EMNLP 2025

Revisiting Pruning vs Quantization for Small Language Models EMNLP 2025

Speculative Decoding for Multi-Sample Inference EMNLP 2025

SwiftPrune: Hessian-Free Weight Pruning for Large Language Models EMNLP 2025

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation EMNLP 2025

1+1>2: A Synergistic Sparse and Low-Rank Compression Method for Large Language Models EMNLP 2025

Human-Inspired Obfuscation for Model Unlearning: Local and Global Strategies with Hyperbolic Representations EMNLP 2025

Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs EMNLP 2025

FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation EMNLP 2025

TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers’ Guidance EMNLP 2025

FAEDKV: Infinite-Window Fourier Transform for Unbiased KV Cache Compression EMNLP 2025

MONAQ: Multi-Objective Neural Architecture Querying for Time-Series Analysis on Resource-Constrained Devices EMNLP 2025

KurTail : Kurtosis-based LLM Quantization EMNLP 2025

FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference EMNLP 2025

Controllable Memorization in LLMs via Weight Pruning EMNLP 2025

One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments ACL 2025