Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs ICCV 2025

SuLoRA: Subspace Low-Rank Adaptation for Parameter-Efficient Fine-Tuning ACL 2025

Information Theoretic Pruning of Coupled Channels in Deep Neural Networks WACV 2025

CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis AAAI 2025

Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping ACL 2025

MONAQ: Multi-Objective Neural Architecture Querying for Time-Series Analysis on Resource-Constrained Devices EMNLP 2025

Efficient Inference for Large Language Models –Algorithm, Model, and System EMNLP 2025

Beyond Dynamic Quantization: An Efficient Static Hierarchical Mix-precision Framework for Near-Lossless LLM Compression EMNLP 2025

Position-Aware Depth Decay Decoding (D3): Boosting Large Language Model Inference Efficiency ACL 2025

CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models EMNLP 2025

Riemannian Optimization for LoRA on the Stiefel Manifold EMNLP 2025

FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction EMNLP 2025

Recover-LoRA: Data-Free Accuracy Recovery of Degraded Language Models via Low-Rank Adaptation EMNLP 2025

KurTail : Kurtosis-based LLM Quantization EMNLP 2025

reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs EMNLP 2025

Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer EMNLP 2025

Comparative Knowledge Distillation WACV 2025

ELMGS: Enhancing Memory and Computation Scalability through Compression for 3D Gaussian Splatting WACV 2025

Knockoff Branch: Model Stealing Attack via Adding Neurons in the Pre-Trained Model WACV 2025

Data Generation for Hardware-Friendly Post-Training Quantization WACV 2025

LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones WACV 2025

Q-TempFusion: Quantization-Aware Temporal Multi-Sensor Fusion on Bird's-Eye View Representation WACV 2025

ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training EMNLP 2025

Block Circulant Adapter for Large Language Models IJCAI 2025

MeMoTune: A Measure and Moment-Driven Fine-Tuning Framework for Quantized Large Language Models ACL 2025