Co-occurring keywords
Papers
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs
EMNLP 2025
AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
EMNLP 2025
Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information
IJCAI 2025
BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference
AAAI 2025