Xiaoxia Wu

14 papers · 2019–2024 · 6 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (6) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (5)

🐝 Cross-Pollinator (12) 🌍 Conference Polyglot (6) 🏃 Academic Marathon (5) 🏆 Grand Slam 🗃️ Keyword Collector (51) 💎 Century Club (14) 🔥 Unstoppable (6) ❓ The Questioner

Conferences

NIPS (4) AAAI (3) AISTATS (2) ICLR (2) ICML (2) JMLR (1)

Top co-authors

Zhewei Yao (7) Yuxiong He (6) Rachel Ward (5) Minjia Zhang (3) Cheng Li (3) Conglong Li (3) Reza Yazdani Aminabadi (2) Connor Holmes (2) Yuege Xie (2) Olatunji Ruwase (2)

Keywords

model compression (4) linear convergence (3) weight quantization (3) stochastic gradient descent (3) post-training quantization (2) deep learning (2) nonconvex optimization (2) transformer model (2) adaptive gradient (2) knowledge distillation (2) gradient descent (2) convergence rate (2) large language model (2) efficient computing (1) model pretraining (1) convergence analysis (1) efficient training (1) adaptive learning rate (1) attention mechanism (1) outlier robustness (1)

Papers

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing AAAI 2024 Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation AAAI 2024 Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding NIPS 2024 ZeRO++: Extremely Efficient Collective Communication for Large Model Training ICLR 2024 Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases ICML 2023 ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers NIPS 2022 AdaLoss: A Computationally-Efficient and Provably Convergent Adaptive Gradient Method AAAI 2022 XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient NIPS 2022 When Do Curricula Work? ICLR 2021 AdaGrad stepsizes: Sharp convergence over nonconvex landscapes JMLR 2020 Linear Convergence of Adaptive Stochastic Gradient Descent AISTATS 2020 Choosing the Sample with Lowest Loss makes SGD Robust AISTATS 2020 Implicit Regularization and Convergence for Weight Normalization NIPS 2020 AdaGrad Stepsizes: Sharp Convergence Over Nonconvex Landscapes ICML 2019