Sebastian Jaszczur
5 papers · 2021–2025 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓
🌍
Conference Polyglot
(3)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🐝
Cross-Pollinator
(15)
Conferences
ICML (2)
NIPS (2)
AAAI (1)
Top co-authors
Keywords
transformer architecture
(1)
autoregressive generation
(1)
language modeling
(1)
efficient inference
(1)
mixture of expert
(1)
context window
(1)
sparse attention
(1)
long context
(1)
context utilization
(1)
parameter scaling
(1)
large language model
(1)
sparse layer
(1)
continuous moe
(1)
cross-example aggregation
(1)
Papers
Structured Packing in LLM Training Improves Long Context Utilization
AAAI 2025
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
ICML 2025
Mixture of Tokens: Continuous MoE through Cross-Example Aggregation
NIPS 2024
Scaling Laws for Fine-Grained Mixture of Experts
ICML 2024
Sparse is Enough in Scaling Transformers
NIPS 2021