Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

AROMA: Autonomous Rank-one Matrix Adaptation EMNLP 2025

WINS: Winograd Structured Pruning for Fast Winograd Convolution ICCV 2025

Hopscotch: Discovering and Skipping Redundancies in Language Models EMNLP 2025

ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs ACL 2025

Prompt Compression for Large Language Models: A Survey NAACL 2025

All You Need in Knowledge Distillation Is a Tailored Coordinate System AAAI 2025

SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models ACL 2025

HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging EMNLP 2025

MPPQ: Enhancing Post-Training Quantization for LLMs via Mixed Supervision, Proxy Rounding, and Pre-Searching IJCAI 2025

DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration ACL 2025

Libra-Merging: Importance-redundancy and Pruning-merging Trade-off for Acceleration Plug-in in Large Vision-Language Model CVPR 2025

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs CVPR 2025

VoCo-LLaMA: Towards Vision Compression with Large Language Models CVPR 2025

Why Do Some Inputs Break Low-Bit LLM Quantization? EMNLP 2025

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models EMNLP 2025

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference EMNLP 2025

CASP: Compression of Large Multimodal Models Based on Attention Sparsity CVPR 2025

EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models CVPR 2025

Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs EMNLP 2025

Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs EMNLP 2025

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers CVPR 2025

SpecCoT: Accelerating Chain-of-Thought Reasoning through Speculative Exploration EMNLP 2025

Multimodal Promptable Token Merging for Diffusion Models AAAI 2025

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping CVPR 2025

Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization ACL 2025