Co-occurring keywords
Papers
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
NIPS 2024
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
NIPS 2024
Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference
NIPS 2024