Zhewei Yao
35 papers · 2018–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π Conference Polyglot (10) π Academic Marathon (7) π§ Keyword Pioneer π Interdisciplinary Bridge π£ Hot Topic Early Bird
π
Renaissance Researcher
(5)
π§
Keyword Pioneer
πΊοΈ
Taxonomy Completionist
(50)
π
Grand Slam
π§¬
Topic Evolution
π
Keyword Champion
(2)
π€
Dynamic Duo
(16)
π
Triple Crown
π¬
Deep Specialist
(12)
ποΈ
Keyword Collector
(138)
β
The Questioner
(2)
β‘
Prolific Year
(7)
π
Century Club
(33)
π₯
Unstoppable
(8)
π
Trend Setter
Conferences
NIPS (7)
ICML (6)
AAAI (5)
EMNLP (4)
ACL (3)
ICLR (3)
CVPR (2)
NAACL (2)
EACL (1)
ICCV (1)
WACV (1)
Top co-authors
Keywords
model compression
(14)
neural network optimization
(8)
neural network quantization
(4)
knowledge distillation
(4)
large language model
(4)
mixed-precision quantization
(4)
model quantization
(3)
transformer architecture
(3)
inference optimization
(3)
retrieval-augmented generation
(3)
mixture of expert
(3)
weight quantization
(3)
latency optimization
(2)
hessian analysis
(2)
activation quantization
(2)
sparse model
(2)
post-training quantization
(2)
batch normalization
(2)
deep learning
(2)
training efficiency
(2)
Papers
TAGQuant: Token-Aware Clustering for Group-Wise Quantization
EACL 2026
GRAD: Generalizing RAG Adaptation with Decoding
ACL 2026
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
EMNLP 2025
Inference Scaling for Bridging Retrieval and Augmented Generation
NAACL 2025
CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation
NAACL 2025
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
ACL 2025
Optimizing Reasoning for Text-to-SQL with Execution Feedback
ACL 2025
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
AAAI 2024
ZeRO++: Extremely Efficient Collective Communication for Large Model Training
ICLR 2024
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
NIPS 2024
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
AAAI 2024
Scaling Vision-Language Models with Sparse Mixture of Experts
EMNLP 2023
DySR: Adaptive Super-Resolution via Algorithm and System Co-design
ICLR 2023
Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases
ICML 2023
How Much Can CLIP Benefit Vision-and-Language Tasks?
ICLR 2022
Hessian-Aware Pruning and Optimal Neural Implant
WACV 2022
XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient
NIPS 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
NIPS 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
ICML 2022
Whatβs Hidden in a One-layer Randomly Weighted Transformer?
EMNLP 2021
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
AAAI 2021
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
ICML 2021
I-BERT: Integer-only BERT Quantization
ICML 2021
HAWQ-V3: Dyadic Neural Network Quantization
ICML 2021
PowerNorm: Rethinking Batch Normalization in Transformers
ICML 2020
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
NIPS 2020
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
NIPS 2020
ZeroQ: A Novel Zero Shot Quantization Framework
CVPR 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
AAAI 2020
Inefficiency of K-FAC for Large Batch Size Training
AAAI 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
EMNLP 2020
ANODEV2: A Coupled Neural ODE Framework
NIPS 2019
Trust Region Based Adversarial Attack on Neural Networks
CVPR 2019
HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision
ICCV 2019
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
NIPS 2018