Co-occurring keywords
Papers
Plug-and-Play Diffusion Distillation
CVPR 2024
Efficient Stitchable Task Adaptation
CVPR 2024
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
NIPS 2024
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
NIPS 2024
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
NIPS 2024