Artificial Intelligence › Core AI ›

Model Compression

1928 directly classified papers

Papers per year

Papers

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks NIPS 2024

MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers NIPS 2024

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization NIPS 2024

Protecting Your LLMs with Information Bottleneck NIPS 2024

Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge AAAI 2024

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation AAAI 2024

MERGE: Fast Private Text Generation AAAI 2024

Fairness-Aware Structured Pruning in Transformers AAAI 2024

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression ACL 2024

CeeBERT: Cross-Domain Inference in Early Exit BERT ACL 2024

Papilusion at DAGPap24: Paper or Illusion? Detecting AI-generated Scientific Papers ACL 2024

Token Alignment via Character Matching for Subword Completion ACL 2024

AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models ACL 2024

Mentor-KD: Making Small Language Models Better Multi-step Reasoners EMNLP 2024

RETAIN: Interactive Tool for Regression Testing Guided LLM Migration EMNLP 2024

ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency EMNLP 2024

Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models EMNLP 2024

Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization EMNLP 2024

When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models EMNLP 2024

MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning EMNLP 2024

ATQ: Activation Transformation forWeight-Activation Quantization of Large Language Models EMNLP 2024

Stochastic Fine-Tuning of Language Models Using Masked Gradients EMNLP 2024

Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning EMNLP 2024

Global-Pruner: A Stable and Efficient Pruner for Retraining-Free Pruning of Encoder-Based Language Models EMNLP 2024

LinChance-NTU for Unconstrained WMT2024 Literary Translation EMNLP 2024