conftrace_

model compression

3302 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3725) large language model (13587) neural network (6616) efficient computing (781) neural network optimization (1293) transfer learning (5449) convolutional neural network (4226) neural network pruning (265) language model (4599) parameter efficiency (417)

Papers

Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention EMNLP 2024

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model ACL 2024

DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs NIPS 2024

Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging EMNLP 2024

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers CVPR 2024

Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions EMNLP 2024

ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning CVPR 2024

LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking CVPR 2024

Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning CVPR 2024

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution NIPS 2024

MaxQ: Multi-Axis Query for N:M Sparsity Network CVPR 2024

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs ACL 2024

Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning EMNLP 2024

MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning ACL 2024

Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models COLING 2024

Probe Then Retrieve and Reason: Distilling Probing and Reasoning Capabilities into Smaller Language Models COLING 2024

ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification NIPS 2024

SparseFlow: Accelerating Transformers by Sparsifying Information Flows ACL 2024

On the Impact of Calibration Data in Post-training Quantization and Pruning ACL 2024

Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs CVPR 2024

Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind COLING 2024

Task-agnostic Distillation of Encoder-Decoder Language Models COLING 2024

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization EMNLP 2024

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning EMNLP 2024

LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models COLING 2024