Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

KG-Adapter: Enabling Knowledge Graph Integration in Large Language Models through Parameter-Efficient Fine-Tuning ACL 2024

Unraveling Babel: Exploring Multilingual Activation Patterns of LLMs and Their Applications EMNLP 2024

Sparsity-Accelerated Training for Large Language Models ACL 2024

CodeAgent: Autonomous Communicative Agents for Code Review EMNLP 2024

SIRIUS : Contexual Sparisty with Correction for Efficient LLMs NIPS 2024

Differentially Private Knowledge Distillation via Synthetic Text Generation ACL 2024

A Comprehensive Evaluation of Quantization Strategies for Large Language Models ACL 2024

Breaking ReLU Barrier: Generalized MoEfication for Dense Pretrained Models EMNLP 2024

Graph-Structured Speculative Decoding ACL 2024

AS-ES Learning: Towards efficient CoT learning in small models ACL 2024

Unlocking Memorization in Large Language Models with Dynamic Soft Prompting EMNLP 2024

ResLoRA: Identity Residual Mapping in Low-Rank Adaption ACL 2024

DB-LLM: Accurate Dual-Binarization for Efficient LLMs ACL 2024

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models EMNLP 2024

IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact ACL 2024

Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning ACL 2024

Extending Context Window of Large Language Models from a Distributional Perspective EMNLP 2024

BitDelta: Your Fine-Tune May Only Be Worth One Bit NIPS 2024

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models ACL 2024

Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion EMNLP 2024

Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model EMNLP 2024

Identifiability of Product of Experts Models AISTATS 2024

Order of Magnitude Speedups for LLM Membership Inference EMNLP 2024

S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity NIPS 2024

Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization EMNLP 2024