Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermarks ACL 2025

AdaTP: Attention-Debiased Token Pruning for Video Large Language Models EMNLP 2025

PIP: Perturbation-based Iterative Pruning for Large Language Models EMNLP 2025

Learning What to Remember: Adaptive Probabilistic Memory Retention for Memory-Efficient Language Models EMNLP 2025

Variable Layerwise Quantization: A Simple and Effective Approach to Quantize LLMs ACL 2025

Beyond the Surface: A Solution-Aware Retrieval Model for Competition-level Code Generation EMNLP 2025

EmByte: Decomposition and Compression Learning for Small yet Private NLP EMNLP 2025

Efficiently Editing Mixture-of-Experts Models with Compressed Experts EMNLP 2025

Q-Mamba: Towards more efficient Mamba models via post-training quantization ACL 2025

Talking Head Anime 4: Distillation for Real-Time Performance WACV 2025

CARVQ: Corrective Adaptor with Group Residual Vector Quantization for LLM Embedding Compression EMNLP 2025

AdaEdit: Advancing Continuous Knowledge Editing For Large Language Models ACL 2025

QSpec: Speculative Decoding with Complementary Quantization Schemes EMNLP 2025

A Drop-In Solution for On-the-Fly Adaptation of Speculative Decoding in Large Language Models ACL 2025

RESF: Regularized-Entropy-Sensitive Fingerprinting for Black-Box Tamper Detection of Large Language Models EMNLP 2025

Squeezed Attention: Accelerating Long Context Length LLM Inference ACL 2025

DART: Distilling Autoregressive Reasoning to Silent Thought EMNLP 2025

FPE2M2: Approaching Lossless and Efficient Quantization with Native Floating Point ACL 2025

QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models EMNLP 2025

Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts ACL 2025

Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition ACL 2025

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization EMNLP 2025

Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification EMNLP 2025

Rethinking Low-Rank Adaptation in Vision: Exploring Head-Level Responsiveness Across Diverse Tasks WACV 2025

LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression NAACL 2025