Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble Based Sample Selection WACV 2024

EnOF-SNN: Training Accurate Spiking Neural Networks via Enhancing the Output Feature NIPS 2024

Memory-Efficient Fine-Tuning of Transformers via Token Selection EMNLP 2024

Quantization of Large Language Models with an Overdetermined Basis UAI 2024

MediSwift: Efficient Sparse Pre-trained Biomedical Language Models ACL 2024

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification NIPS 2024

Dual-Space Knowledge Distillation for Large Language Models EMNLP 2024

Visual Editing with LLM-based Tool Chaining: An Efficient Distillation Approach for Real-Time Applications EMNLP 2024

CeeBERT: Cross-Domain Inference in Early Exit BERT ACL 2024

Token Alignment via Character Matching for Subword Completion ACL 2024

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation AAAI 2024

MERGE: Fast Private Text Generation AAAI 2024

Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge AAAI 2024

Fairness-Aware Structured Pruning in Transformers AAAI 2024

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks NIPS 2024

MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers NIPS 2024

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning NIPS 2024

Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting NIPS 2024

Task-agnostic Distillation of Encoder-Decoder Language Models COLING 2024

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization NIPS 2024

The Mamba in the Llama: Distilling and Accelerating Hybrid Models NIPS 2024

Finding Transformer Circuits With Edge Pruning NIPS 2024

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization NIPS 2024

Protecting Your LLMs with Information Bottleneck NIPS 2024

Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs CVPR 2024