Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning ACL 2024

Dual-Space Knowledge Distillation for Large Language Models EMNLP 2024

A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression EMNLP 2024

Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference EACL 2024

On the Robustness of Neural Models for Full Sentence Transformation NAACL 2024

DimA: A Parameter-efficient Fine-tuning Method with Knowledge Transfer Based on Transformer COLING 2024

On the Intractability to Synthesize Factual Inconsistencies in Summarization EACL 2024

On the Way to Lossless Compression of Language Transformers: Exploring Cross-Domain Properties of Quantization COLING 2024

ParsNets: A Parsimonious Composition of Orthogonal and Low-Rank Linear Networks for Zero-Shot Learning IJCAI 2024

NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models ACL 2024

AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models ACL 2024

Minimal Distillation Schedule for Extreme Language Model Compression EACL 2024

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric CVPR 2024

MediSwift: Efficient Sparse Pre-trained Biomedical Language Models ACL 2024

Representation and Generation of Machine Learning Test Functions EACL 2024

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models NAACL 2024

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models ACL 2024

PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning NAACL 2024

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification EMNLP 2024

Divergent Token Metrics: Measuring degradation to prune away LLM components – and optimize quantization NAACL 2024

Neural Video Compression with Feature Modulation CVPR 2024

Pruning as a Domain-specific LLM Extractor NAACL 2024

Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective NIPS 2024

Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other NAACL 2024

ELAD: Explanation-Guided Large Language Models Active Distillation ACL 2024