Yuxiong He

26 papers · 2018–2026 · 8 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (7) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird

🐝 Cross-Pollinator (8) 🗺️ Taxonomy Completionist (42) 🌍 Conference Polyglot (7) 🤝 Dynamic Duo (14) 🏆 Grand Slam 🧬 Topic Evolution 💎 Century Club (24) 🚀 Conference Pioneer 🗃️ Keyword Collector (92) ⚡ Prolific Year (5) 🔥 Unstoppable (6)

Conferences

NIPS (7) ICLR (5) AAAI (3) ACL (3) ICML (3) EMNLP (2) NAACL (2) EACL (1)

Top co-authors

Zhewei Yao (16) Minjia Zhang (11) Conglong Li (7) Samyam Rajbhandari (6) Seung-won Hwang (6) Xiaoxia Wu (6) Cheng Li (4) Feng Yan (3) Reza Yazdani Aminabadi (3) Connor Holmes (3)

Keywords

model compression (8) knowledge distillation (5) large language model (5) neural network optimization (3) inference optimization (3) efficient computing (3) mixture of expert (3) weight quantization (3) retrieval-augmented generation (3) batch size (2) transformer model (2) efficient training (2) post-training quantization (2) sparse model (2) training efficiency (2) natural language understanding (1) model pretraining (1) domain adaptation (1) multimodal learning (1) language modeling (1)

Papers

TAGQuant: Token-Aware Clustering for Group-Wise Quantization EACL 2026 GRAD: Generalizing RAG Adaptation with Decoding ACL 2026 SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation EMNLP 2025 STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning ACL 2025 Optimizing Reasoning for Text-to-SQL with Execution Feedback ACL 2025 ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments ICLR 2025 CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation NAACL 2025 Inference Scaling for Bridging Retrieval and Augmented Generation NAACL 2025 DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing AAAI 2024 Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation AAAI 2024 ZeRO++: Extremely Efficient Collective Communication for Large Model Training ICLR 2024 Scaling Vision-Language Models with Sparse Mixture of Experts EMNLP 2023 Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases ICML 2023 DySR: Adaptive Super-Resolution via Algorithm and System Co-design ICLR 2023 Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam ICLR 2023 The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models NIPS 2022 DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale ICML 2022 Adversarial Data Augmentation for Task-Specific Knowledge Distillation of Pre-trained Transformers AAAI 2022 XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient NIPS 2022 ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers NIPS 2022 NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM NIPS 2021 1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed ICML 2021 SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement NIPS 2021 Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping NIPS 2020 Learning Intrinsic Sparse Structures within Long Short-Term Memory ICLR 2018 Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models NIPS 2018