Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

Blind-Touch: Homomorphic Encryption-Based Distributed Neural Network Inference for Privacy-Preserving Fingerprint Authentication AAAI 2024

Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and Beyond AAAI 2024

ShareBERT: Embeddings Are Capable of Learning Hidden Layers AAAI 2024

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models AAAI 2024

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model NIPS 2024

Building on Efficient Foundations: Effective Training of LLMs with Structured Feedforward Layers NIPS 2024

RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions COLING 2024

Pruning before Fine-tuning: A Retraining-free Compression Framework for Pre-trained Language Models COLING 2024

Probe Then Retrieve and Reason: Distilling Probing and Reasoning Capabilities into Smaller Language Models COLING 2024

Expanding Sparse Tuning for Low Memory Usage NIPS 2024

Multilingual Brain Surgeon: Large Language Models Can Be Compressed Leaving No Language behind COLING 2024

LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models COLING 2024

Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding ACL 2024

Anchor-based Large Language Models ACL 2024

SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation NIPS 2024

Fast Randomized Low-Rank Adaptation of Pre-trained Language Models with PAC Regularization ACL 2024

Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers NIPS 2024

FlattenQuant: Breaking through the Inference Compute-bound for Large Language Models with Per-tensor Quantization COLING 2024

PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs ACL 2024

Sinkhorn Distance Minimization for Knowledge Distillation COLING 2024

NN-Defined Modulator: Reconfigurable and Portable Software Modulator on IoT Gateways NSDI 2024

Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study COLING 2024

BMRS: Bayesian Model Reduction for Structured Pruning NIPS 2024

ELAD: Explanation-Guided Large Language Models Active Distillation ACL 2024

EFTNAS: Searching for Efficient Language Models in First-Order Weight-Reordered Super-Networks COLING 2024