Noam Shazeer
21 papers · 2015–2023 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
🌍 Conference Polyglot (9) 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (8)
🏃
Academic Marathon
(8)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🌟
Keyword Trendsetter Combo
(3)
👑
Triple Crown
🌱
Topic Pioneer
👥
Mega-Team
(67)
🚀
Conference Pioneer
⚡
Prolific Year
(8)
🗃️
Keyword Collector
(67)
💎
Century Club
(21)
❓
The Questioner
(2)
📈
Trend Setter
🔥
Unstoppable
(9)
Conferences
NIPS (5)
JMLR (4)
ICLR (3)
ICML (3)
EMNLP (2)
ACL (1)
CVPR (1)
INTERSPEECH (1)
NAACL (1)
Top co-authors
Keywords
neural network
(4)
language model
(4)
transformer architecture
(4)
autoregressive model
(3)
machine translation
(3)
neural machine translation
(2)
large language model
(2)
recurrent neural network
(2)
parallel decoding
(2)
transformer model
(2)
image super-resolution
(2)
distributed computing
(2)
model scaling
(2)
image captioning
(1)
language modeling
(1)
curriculum learning
(1)
question answering
(1)
transfer learning
(1)
speech recognition
(1)
deep learning
(1)
Papers
Scaling Up Models and Data with t5x and seqio
JMLR 2023
PaLM: Scaling Language Modeling with Pathways
JMLR 2023
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
JMLR 2022
Do Transformer Modifications Transfer Across Implementations and Applications?
EMNLP 2021
Searching for Efficient Transformers for Language Modeling
NIPS 2021
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
ICLR 2021
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
EMNLP 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
JMLR 2020
Corpora Generation for Grammatical Error Correction
NAACL 2019
Music Transformer: Generating Music with Long-Term Structure
ICLR 2019
Mesh-TensorFlow: Deep Learning for Supercomputers
NIPS 2018
HydraNets: Specialized Dynamic Architectures for Efficient Inference
CVPR 2018
Fast Decoding in Sequence Models Using Discrete Latent Variables
ICML 2018
Image Transformer
ICML 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
ICML 2018
Generating Wikipedia by Summarizing Long Sequences
ICLR 2018
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
ACL 2018
Blockwise Parallel Decoding for Deep Autoregressive Models
NIPS 2018
Attention is All you Need
NIPS 2017
NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition
INTERSPEECH 2016
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
NIPS 2015