conftrace_

model compression

3302 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3725) large language model (13587) neural network (6616) efficient computing (781) neural network optimization (1293) transfer learning (5449) convolutional neural network (4226) neural network pruning (265) language model (4599) parameter efficiency (417)

Papers

FedPFT: Federated Proxy Fine-Tuning of Foundation Models IJCAI 2024

Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention EMNLP 2024

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model ACL 2024

A Survey on Efficient Federated Learning Methods for Foundation Model Training IJCAI 2024

Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging EMNLP 2024

BabyLM Challenge: Experimenting with Self-Distillation and Reverse-Distillation for Language Model Pre-Training on Constrained Datasets CONLL 2024

Global-Pruner: A Stable and Efficient Pruner for Retraining-Free Pruning of Encoder-Based Language Models CONLL 2024

NACL: A General and Effective KV Cache Eviction Framework for LLM at Inference Time ACL 2024

Further Compressing Distilled Language Models via Frequency-aware Partial Sparse Coding of Embeddings CONLL 2024

Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation NIPS 2024

Minimal Distillation Schedule for Extreme Language Model Compression EACL 2024

To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation ACL 2024

Compact 3D Gaussian Representation for Radiance Field CVPR 2024

Learning Low-Rank Tensor Cores with Probabilistic ℓ0-Regularized Rank Selection for Model Compression IJCAI 2024

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs ACL 2024

Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning EMNLP 2024

DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs NIPS 2024

Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression ACL 2024

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA ACL 2024

SpaFL: Communication-Efficient Federated Learning With Sparse Models And Low Computational Overhead NIPS 2024

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization EMNLP 2024

QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models EMNLP 2024

LRQuant: Learnable and Robust Post-Training Quantization for Large Language Models ACL 2024

MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning ACL 2024

CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework NIPS 2024