← Application Areas

Machine Learning › Application Areas ›

Model Compression

1503 directly classified papers

Papers per year

Papers

A Middle Path for On-Premises LLM Deployment: Preserving Privacy Without Sacrificing Model Confidentiality EMNLP 2025

Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study EMNLP 2025

Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity EMNLP 2025

FaST: Feature-aware Sampling and Tuning for Personalized Preference Alignment with Limited Data EMNLP 2025

XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression EMNLP 2025

Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance EMNLP 2025

Calibrating LLM Confidence by Probing Perturbed Representation Stability EMNLP 2025

SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading EMNLP 2025

IG-Pruning: Input-Guided Block Pruning for Large Language Models EMNLP 2025

NeuroAda: Activating Each Neuron’s Potential for Parameter-Efficient Fine-Tuning EMNLP 2025

Gamma-Guard: Lightweight Residual Adapters for Robust Guardrails in Large Language Models EMNLP 2025

PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation EMNLP 2025

Power doesn’t reside in size: A Low Parameter Hybrid Language Model (HLM) for Sentiment Analysis in Code-mixed data EMNLP 2025

FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference EMNLP 2025

Controllable Memorization in LLMs via Weight Pruning EMNLP 2025

ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models EMNLP 2025

An Orthogonal High-Rank Adaptation for Large Language Models EMNLP 2025

MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines EMNLP 2025

EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models EMNLP 2025

GAP: a Global Adaptive Pruning Method for Large Language Models EMNLP 2025

zFLoRA: Zero-Latency Fused Low-Rank Adapters EMNLP 2025

Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning EMNLP 2025

GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction EMNLP 2025

Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models EMNLP 2025

EfficientCrackNet: A Lightweight Model for Crack Segmentation WACV 2025