Beidi Chen

48 papers · 2019–2026 · 5 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (16) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (5)

🐝 Cross-Pollinator (12) 🗺️ Taxonomy Completionist (16) 🧭 Keyword Pioneer 🔬 Deep Specialist (11) 🏆 Keyword Champion (4) 🤝 Dynamic Duo (12) 👑 Triple Crown 🗃️ Keyword Collector (135) ❓ The Questioner ⚡ Prolific Year (20) 💎 Century Club (46) 🔥 Unstoppable (7)

Conferences

NIPS (19) ICML (15) ICLR (10) ACL (3) COLT (1)

Top co-authors

Christopher Re (12) Yuandong Tian (10) Anshumali Shrivastava (8) Binhang Yuan (7) Zhuoming Chen (7) Ce Zhang (7) Tri Dao (7) Zhaozhuo Xu (6) Xinyu Yang (6) Zhao Song (5)

Research topics

Education (1)

Keywords

large language model (12) model compression (9) inference efficiency (5) speculative decoding (4) inference optimization (3) language model (3) distributed learning (3) attention mechanism (3) stochastic gradient descent (3) inference acceleration (2) memory optimization (2) foundation model (2) batch processing (2) contextual sparsity (2) structured sparsity (2) knowledge distillation (2) parameter efficient (2) computational efficiency (2) locality sensitive hashing (2) communication compression (2)

Papers

When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents? ACL 2026 MedVerse: Efficient and Reliable Medical Reasoning via DAG-Structured Parallel Execution ACL 2026 GSM-$∞$: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length? ICML 2025 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference ICML 2025 Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation ICML 2025 Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity ICLR 2025 APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding ICLR 2025 Memory Mosaics ICLR 2025 MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding ICLR 2025 MagicPIG: LSH Sampling for Efficient LLM Generation ICLR 2025 On the Surprising Effectiveness of Attention Transfer for Vision Transformers NIPS 2024 Sequoia: Scalable and Robust Speculative Decoding NIPS 2024 LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding ACL 2024 Soft Prompt Recovers Compressed LLMs, Transferably ICML 2024 $\texttt{Model-GLUE}$: Democratized LLM Scaling for A Large Model Zoo in the Wild NIPS 2024 SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices NIPS 2024 SIRIUS : Contexual Sparisty with Correction for Efficient LLMs NIPS 2024 S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity NIPS 2024 Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding NIPS 2024 Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length NIPS 2024 Nearest Neighbor Speculative Decoding for LLM Generation and Attribution NIPS 2024 Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training NIPS 2024 Learn To be Efficient: Build Structured Sparsity in Large Language Models NIPS 2024 JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention ICLR 2024 Efficient Streaming Language Models with Attention Sinks ICLR 2024 LoCoCo: Dropping In Convolutions for Long Context Compression ICML 2024 Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference ICML 2024 HexGen: Generative Inference of Large Language Model over Heterogeneous Environment ICML 2024 KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache ICML 2024 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection ICML 2024 H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models NIPS 2023 Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time ICML 2023 FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU ICML 2023 CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks ICML 2023 Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions NIPS 2023 Fast Algorithms for a New Relaxation of Optimal Transport COLT 2023 Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer NIPS 2023 Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models ICLR 2022 Monarch: Expressive Structured Matrices for Efficient and Accurate Training ICML 2022 Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees NIPS 2022 Decentralized Training of Foundation Models in Heterogeneous Environments NIPS 2022 SOLAR: Sparse Orthogonal Learned and Random Embeddings ICLR 2021 Locality Sensitive Teaching NIPS 2021 A Tale of Two Efficient and Informative Negative Sampling Distributions ICML 2021 Scatterbrain: Unifying Sparse and Low-rank Attention NIPS 2021 MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training ICLR 2021 Angular Visual Hardness ICML 2020 Fast and Accurate Stochastic Gradient Estimation NIPS 2019