model quantization

279 papers

Explore in graph

Also known as

PTQ INT8 QDNN

Co-occurring keywords

model compression (3283) large language model (12755) knowledge distillation (3680) weight quantization (133) post-training quantization (124) neural network optimization (1293) efficient computing (779) neural network (6616) activation quantization (47) efficient inference (225)

Papers

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer CVPR 2023

Zero-shot Sharpness-Aware Quantization for Pre-trained Language Models EMNLP 2023

Training Transformers with 4-bit Integers NIPS 2023

Low-bit Shift Network for End-to-End Spoken Language Understanding INTERSPEECH 2022

BiT: Robustly Binarized Multi-distilled Transformer NIPS 2022

Scaling Language Model Size in Cross-Device Federated Learning ACL 2022

Fast Lossless Neural Compression with Integer-Only Discrete Flows ICML 2022

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning NIPS 2022

Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models EMNLP 2022

Zero-Shot Dynamic Quantization for Transformer Inference EMNLP 2022

Quantized Training of Gradient Boosting Decision Trees NIPS 2022

4-bit Conformer with Native Quantization Aware Training for Speech Recognition INTERSPEECH 2022

SDQ: Stochastic Differentiable Quantization with Mixed Precision ICML 2022

BMCook: A Task-agnostic Compression Toolkit for Big Models EMNLP 2022

Who Says Elephants Can’t Run: Bringing Large Scale MoE Models into Cloud Scale Production EMNLP 2022

Low-complex and Highly-performed Binary Residual Neural Network for Small-footprint Keyword Spotting INTERSPEECH 2022

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization INTERSPEECH 2022

Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization ICML 2022

ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences NIPS 2022

Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition INTERSPEECH 2022

Online Hybrid Lightweight Representations Learning: Its Application to Visual Tracking IJCAI 2022

Edinburgh’s Submission to the WMT 2022 Efficiency Task EMNLP 2022

Too Brittle to Touch: Comparing the Stability of Quantization and Distillation towards Developing Low-Resource MT Models EMNLP 2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training ICML 2022

DAdaQuant: Doubly-adaptive quantization for communication-efficient Federated Learning ICML 2022