Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Model Compression
1928 directly classified papers
Papers per year
2013: 2
2014: 1
2015: 6
2016: 4
2017: 13
2018: 47
2019: 81
2020: 114
2021: 172
2022: 191
2023: 272
2024: 370
2025: 489
2026: 166
Papers
Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble Based Sample Selection
WACV 2024
EnOF-SNN: Training Accurate Spiking Neural Networks via Enhancing the Output Feature
NIPS 2024
Memory-Efficient Fine-Tuning of Transformers via Token Selection
EMNLP 2024
Quantization of Large Language Models with an Overdetermined Basis
UAI 2024
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
ACL 2024
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
NIPS 2024
Dual-Space Knowledge Distillation for Large Language Models
EMNLP 2024
Visual Editing with LLM-based Tool Chaining: An Efficient Distillation Approach for Real-Time Applications
EMNLP 2024
CeeBERT: Cross-Domain Inference in Early Exit BERT
ACL 2024
Token Alignment via Character Matching for Subword Completion
ACL 2024
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
AAAI 2024
MERGE: Fast Private Text Generation
AAAI 2024
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
AAAI 2024
Fairness-Aware Structured Pruning in Transformers
AAAI 2024
VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks
NIPS 2024
MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers
NIPS 2024
HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
NIPS 2024
Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting
NIPS 2024
Task-agnostic Distillation of Encoder-Decoder Language Models
COLING 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
NIPS 2024
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
NIPS 2024
Finding Transformer Circuits With Edge Pruning
NIPS 2024
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
NIPS 2024
Protecting Your LLMs with Information Bottleneck
NIPS 2024
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
CVPR 2024
<
1
…
31
32
33
…
78
>