← Optimization & Theory

Deep Learning › Optimization & Theory ›

Model Compression

1674 directly classified papers

Papers per year

Papers

Torque Based Structured Pruning for Deep Neural Network WACV 2024

Fast Randomized Low-Rank Adaptation of Pre-trained Language Models with PAC Regularization ACL 2024

Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection CVPR 2024

SnapKV: LLM Knows What You are Looking for Before Generation NIPS 2024

LLM can Achieve Self-Regulation via Hyperparameter Aware Generation ACL 2024

PartialFormer: Modeling Part Instead of Whole for Machine Translation ACL 2024

LM-Cocktail: Resilient Tuning of Language Models via Model Merging ACL 2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization NIPS 2024

Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction ACL 2024

D-LLM: A Token Adaptive Computing Resource Allocation Strategy for Large Language Models NIPS 2024

Wino Vidi Vici: Conquering Numerical Instability of 8-Bit Winograd Convolution for Accurate Inference Acceleration on Edge WACV 2024

LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning ACL 2024

Reducing the Side-Effects of Oscillations in Training of Quantized YOLO Networks WACV 2024

Reasons and Solutions for the Decline in Model Performance after Editing NIPS 2024

ResLoRA: Identity Residual Mapping in Low-Rank Adaption ACL 2024

DB-LLM: Accurate Dual-Binarization for Efficient LLMs ACL 2024

BASS: Batched Attention-optimized Speculative Sampling ACL 2024

A Brain-Inspired Way of Reducing the Network Complexity via Concept-Regularized Coding for Emotion Recognition AAAI 2024

LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild ACL 2024

IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact ACL 2024

Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning ACL 2024

LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks ACL 2024

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs ACL 2024

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices NIPS 2024

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding ACL 2024