Guangxuan Xiao
8 papers · 2023–2025 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+1 more ↓ Show less ↑
🧭 Keyword Pioneer 🌍 Conference Polyglot (3) 🐝 Cross-Pollinator (12) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird
👑
Triple Crown
Conferences
ICLR (3)
ICML (3)
NIPS (2)
Top co-authors
Keywords
large language model
(2)
weight quantization
(2)
model compression
(2)
attention mechanism
(1)
efficient inference
(1)
parameter-efficient fine-tuning
(1)
inference efficiency
(1)
multi-tenant serving
(1)
knowledge compression
(1)
inference acceleration
(1)
weight activation quantization
(1)
context memory
(1)
activation quantization
(1)
activation outlier smoothing
(1)
delta weight
(1)
long context extrapolation
(1)
token relevant unit
(1)
model quantization
(1)
post-training quantization
(1)
Papers
XAttention: Block Sparse Attention with Antidiagonal Scoring
ICML 2025
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
ICLR 2025
Retrieval Head Mechanistically Explains Long-Context Factuality
ICLR 2025
QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
ICML 2024
BitDelta: Your Fine-Tune May Only Be Worth One Bit
NIPS 2024
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
NIPS 2024
Efficient Streaming Language Models with Attention Sinks
ICLR 2024
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
ICML 2023