Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

PrivCirNet: Efficient Private Inference via Block Circulant Transformation NIPS 2024

FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction NIPS 2024

S2HPruner: Soft-to-Hard Distillation Bridges the Discretization Gap in Pruning NIPS 2024

Activation Map Compression through Tensor Decomposition for Deep Learning NIPS 2024

Spectral Adapter: Fine-Tuning in Spectral Space NIPS 2024

NVRC: Neural Video Representation Compression NIPS 2024

Learn more, but bother less: parameter efficient continual learning NIPS 2024

Learn To be Efficient: Build Structured Sparsity in Large Language Models NIPS 2024

SlimGPT: Layer-wise Structured Pruning for Large Language Models NIPS 2024

Adaptive Layer Sparsity for Large Language Models via Activation Correlation Assessment NIPS 2024

xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token NIPS 2024

Adversarial Moment-Matching Distillation of Large Language Models NIPS 2024

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention NIPS 2024

Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation NIPS 2024

Q-VLM: Post-training Quantization for Large Vision-Language Models NIPS 2024

LoQT: Low-Rank Adapters for Quantized Pretraining NIPS 2024

$\textit{Read-ME}$: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design NIPS 2024

SLTrain: a sparse plus low rank approach for parameter and memory efficient pretraining NIPS 2024

Unveiling LoRA Intrinsic Ranks via Salience Analysis NIPS 2024

Refusal in Language Models Is Mediated by a Single Direction NIPS 2024

Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models NIPS 2024

Search for Efficient Large Language Models NIPS 2024

Uncovering the Redundancy in Graph Self-supervised Learning Models NIPS 2024

The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information NIPS 2024

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models NIPS 2024