conftrace_

model compression

3302 papers

Explore in graph

Also known as

MC

Co-occurring keywords

knowledge distillation (3725) large language model (13587) neural network (6616) efficient computing (781) neural network optimization (1293) transfer learning (5449) convolutional neural network (4226) neural network pruning (265) language model (4599) parameter efficiency (417)

Papers

Memory-Efficient Fine-Tuning of Transformers via Token Selection EMNLP 2024

xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics EMNLP 2024

Building an Efficient Multilingual Non-Profit IR System for the Islamic Domain Leveraging Multiprocessing Design in Rust EMNLP 2024

GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation EMNLP 2024

Normalized Narrow Jump To Conclusions: Normalized Narrow Shortcuts for Parameter Efficient Early Exit Transformer Prediction EMNLP 2024

Dual-teacher Knowledge Distillation for Low-frequency Word Translation EMNLP 2024

Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection EMNLP 2024

LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation EMNLP 2024

MobileQuant: Mobile-friendly Quantization for On-device Language Models EMNLP 2024

PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning EMNLP 2024

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs EMNLP 2024

Exploring Quantization for Efficient Pre-Training of Transformer Language Models EMNLP 2024

DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model EMNLP 2024

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression EMNLP 2024

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models NIPS 2024

Certified Machine Unlearning via Noisy Stochastic Gradient Descent NIPS 2024

FedGTST: Boosting Global Transferability of Federated Models via Statistics Tuning NIPS 2024

A Single-Step, Sharpness-Aware Minimization is All You Need to Achieve Efficient and Accurate Sparse Training NIPS 2024

Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective NIPS 2024

LoRA-GA: Low-Rank Adaptation with Gradient Approximation NIPS 2024

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers NIPS 2024

SpikedAttention: Training-Free and Fully Spike-Driven Transformer-to-SNN Conversion with Winner-Oriented Spike Shift for Softmax Operation NIPS 2024

On the Inductive Bias of Stacking Towards Improving Reasoning NIPS 2024

How Sparse Can We Prune A Deep Network: A Fundamental Limit Perspective NIPS 2024

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation NIPS 2024