Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

MoE-I2: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition EMNLP 2024

LoRASC: Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning EMNLP 2024

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference EMNLP 2024

Further Compressing Distilled Language Models via Frequency-aware Partial Sparse Coding of Embeddings EMNLP 2024

Less is Fed More: Sparsity Reduces Feature Distortion in Federated Learning EMNLP 2024

STTATTS: Unified Speech-To-Text And Text-To-Speech Model EMNLP 2024

Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging EMNLP 2024

Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper EMNLP 2024

Edge Inference With Fully Differentiable Quantized Mixed Precision Neural Networks WACV 2024

Mini but Mighty: Finetuning ViTs With Mini Adapters WACV 2024

PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks WACV 2024

Task-Agnostic Self-Distillation for Few-Shot Action Recognition IJCAI 2024

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection INTERSPEECH 2024

Language-Specific Pruning for Efficient Reduction of Large Language Models COLING 2024

Adaptive Rank Selections for Low-Rank Approximation of Language Models NAACL 2024

PEMA: An Offsite-Tunable Plug-in External Memory Adaptation for Language Models NAACL 2024

Investigating Acceleration of LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with ‘LITE’ NAACL 2024

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation EACL 2024

Style Vectors for Steering Generative Large Language Models EACL 2024

Parameter-Efficient Fine-Tuning: Is There An Optimal Subset of Parameters to Tune? EACL 2024

Resource-Efficient Neural Networks for Embedded Systems JMLR 2024

Elastic Weight Removal for Faithful and Abstractive Dialogue Generation NAACL 2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs NIPS 2024

LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold Model NIPS 2024

DEPrune: Depth-wise Separable Convolution Pruning for Maximizing GPU Parallelism NIPS 2024