Noam Shazeer

21 papers · 2015–2023 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (9) 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (8)

🏃 Academic Marathon (8) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌟 Keyword Trendsetter Combo (3) 👑 Triple Crown 🌱 Topic Pioneer 👥 Mega-Team (67) 🚀 Conference Pioneer ⚡ Prolific Year (8) 🗃️ Keyword Collector (67) 💎 Century Club (21) ❓ The Questioner (2) 📈 Trend Setter 🔥 Unstoppable (9)

Conferences

NIPS (5) JMLR (4) ICLR (3) ICML (3) EMNLP (2) ACL (1) CVPR (1) INTERSPEECH (1) NAACL (1)

Top co-authors

Jakob Uszkoreit (6) Ashish Vaswani (6) Niki Parmar (6) Adam Roberts (5) LUKASZ KAISER (4) Sharan Narang (4) Colin Raffel (4) Ryan Sepassi (4) Hyung Won Chung (3) Noah Fiedel (3)

Keywords

neural network (4) language model (4) transformer architecture (4) autoregressive model (3) machine translation (3) neural machine translation (2) large language model (2) recurrent neural network (2) parallel decoding (2) transformer model (2) image super-resolution (2) distributed computing (2) model scaling (2) image captioning (1) language modeling (1) curriculum learning (1) question answering (1) transfer learning (1) speech recognition (1) deep learning (1)

Papers

Scaling Up Models and Data with t5x and seqio JMLR 2023 PaLM: Scaling Language Modeling with Pathways JMLR 2023 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity JMLR 2022 Do Transformer Modifications Transfer Across Implementations and Applications? EMNLP 2021 Searching for Efficient Transformers for Language Modeling NIPS 2021 GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding ICLR 2021 How Much Knowledge Can You Pack Into the Parameters of a Language Model? EMNLP 2020 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer JMLR 2020 Corpora Generation for Grammatical Error Correction NAACL 2019 Music Transformer: Generating Music with Long-Term Structure ICLR 2019 Mesh-TensorFlow: Deep Learning for Supercomputers NIPS 2018 HydraNets: Specialized Dynamic Architectures for Efficient Inference CVPR 2018 Fast Decoding in Sequence Models Using Discrete Latent Variables ICML 2018 Image Transformer ICML 2018 Adafactor: Adaptive Learning Rates with Sublinear Memory Cost ICML 2018 Generating Wikipedia by Summarizing Long Sequences ICLR 2018 The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation ACL 2018 Blockwise Parallel Decoding for Deep Autoregressive Models NIPS 2018 Attention is All you Need NIPS 2017 NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition INTERSPEECH 2016 Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks NIPS 2015