Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

AROMA: Autonomous Rank-one Matrix Adaptation EMNLP 2025

Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers CVPR 2025

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection EMNLP 2025

GAP: a Global Adaptive Pruning Method for Large Language Models EMNLP 2025

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles AAAI 2025

Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge EMNLP 2025

TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning ACL 2025

Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs ACL 2025

MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines EMNLP 2025

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs EMNLP 2025

FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers IJCAI 2025

CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter ACL 2025

VisiPruner: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs EMNLP 2025

An Orthogonal High-Rank Adaptation for Large Language Models EMNLP 2025

CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information COLING 2025

TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models EMNLP 2025

Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions EMNLP 2025

Aggregation Mechanism Based Graph Heterogeneous Networks Distillation IJCAI 2025

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation ACL 2025

CoT-Valve: Length-Compressible Chain-of-Thought Tuning ACL 2025

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models EMNLP 2025

Grammar Pruning: Enabling Low-Latency Zero-Shot Task-Oriented Language Models for Edge AI EMNLP 2025

Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form Generation EMNLP 2025

GIL-IIMAS UNAM at SemEval-2025 Task 4: LA-Min(E): LLM Unlearning Approaches Under Function Minimizing Evaluation Constraints ACL 2025

MULTIGUARD: An Efficient Approach for AI Safety Moderation Across Languages and Modalities EMNLP 2025