Zhewei Yao

35 papers · 2018–2026 · 11 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🌍 Conference Polyglot (10) 🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (5) 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (50) 🏆 Grand Slam 🧬 Topic Evolution 🏆 Keyword Champion (2) 🤝 Dynamic Duo (16) 👑 Triple Crown 🔬 Deep Specialist (12) 🗃️ Keyword Collector (138) ❓ The Questioner (2) ⚡ Prolific Year (7) 💎 Century Club (33) 🔥 Unstoppable (8) 📈 Trend Setter

Conferences

NIPS (7) ICML (6) AAAI (5) EMNLP (4) ACL (3) ICLR (3) CVPR (2) NAACL (2) EACL (1) ICCV (1) WACV (1)

Top co-authors

Kurt Keutzer (16) Yuxiong He (16) Amir Gholami (13) Michael W. Mahoney (10) Michael Mahoney (7) Sheng Shen (7) Xiaoxia Wu (7) Zhen Dong (6) Seung-won Hwang (5) Cheng Li (4)

Keywords

model compression (14) neural network optimization (8) neural network quantization (4) knowledge distillation (4) large language model (4) mixed-precision quantization (4) model quantization (3) transformer architecture (3) inference optimization (3) retrieval-augmented generation (3) mixture of expert (3) weight quantization (3) latency optimization (2) hessian analysis (2) activation quantization (2) sparse model (2) post-training quantization (2) batch normalization (2) deep learning (2) training efficiency (2)

Papers

TAGQuant: Token-Aware Clustering for Group-Wise Quantization EACL 2026 GRAD: Generalizing RAG Adaptation with Decoding ACL 2026 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation EMNLP 2025 Inference Scaling for Bridging Retrieval and Augmented Generation NAACL 2025 CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation NAACL 2025 STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning ACL 2025 Optimizing Reasoning for Text-to-SQL with Execution Feedback ACL 2025 Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation AAAI 2024 ZeRO++: Extremely Efficient Collective Communication for Large Model Training ICLR 2024 Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding NIPS 2024 DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing AAAI 2024 Scaling Vision-Language Models with Sparse Mixture of Experts EMNLP 2023 DySR: Adaptive Super-Resolution via Algorithm and System Co-design ICLR 2023 Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases ICML 2023 How Much Can CLIP Benefit Vision-and-Language Tasks? ICLR 2022 Hessian-Aware Pruning and Optimal Neural Implant WACV 2022 XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient NIPS 2022 ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers NIPS 2022 DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale ICML 2022 What’s Hidden in a One-layer Randomly Weighted Transformer? EMNLP 2021 ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning AAAI 2021 ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training ICML 2021 I-BERT: Integer-only BERT Quantization ICML 2021 HAWQ-V3: Dyadic Neural Network Quantization ICML 2021 PowerNorm: Rethinking Batch Normalization in Transformers ICML 2020 HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks NIPS 2020 A Statistical Framework for Low-bitwidth Training of Deep Neural Networks NIPS 2020 ZeroQ: A Novel Zero Shot Quantization Framework CVPR 2020 Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT AAAI 2020 Inefficiency of K-FAC for Large Batch Size Training AAAI 2020 MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding EMNLP 2020 ANODEV2: A Coupled Neural ODE Framework NIPS 2019 Trust Region Based Adversarial Attack on Neural Networks CVPR 2019 HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision ICCV 2019 Hessian-based Analysis of Large Batch Training and Robustness to Adversaries NIPS 2018