Yi Tay

77 papers · 2016–2025 · 11 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (11) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11)

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11) 🐝 Cross-Pollinator (3) 👥 Mega-Team (67) 🏆 Grand Slam 👑 Triple Crown 🔬 Deep Specialist (20) 🏆 Keyword Champion (2) 🤝 Dynamic Duo (25) 🗃️ Keyword Collector (248) ❓ The Questioner (8) ⚡ Prolific Year (8) 🚀 Conference Pioneer 📈 Trend Setter 💎 Century Club (77) 🔥 Unstoppable (8)

Conferences

ACL (17) EMNLP (15) ICLR (15) NIPS (8) ICML (6) IJCNLP (5) CVPR (3) IJCAI (3) AAAI (2) JMLR (2) NAACL (1)

Top co-authors

Donald Metzler (25) Dara Bahri (18) Mostafa Dehghani (17) Anh Tuan Luu (15) Siu Cheung Hui (13) Jinfeng Rao (11) SHUAI ZHANG (10) Hyung Won Chung (10) Aston Zhang (9) Jai Gupta (9)

Research topics

Architectures (1) Privacy (1) Optimization & Theory (1)

Keywords

large language model (8) few-shot learning (8) question answering (7) model architecture (6) transformer architecture (6) attention mechanism (6) neural network (5) natural language inference (5) representation learning (5) transformer model (4) machine translation (4) natural language processing (4) language model (4) information retrieval (3) text classification (3) reading comprehension (3) model scaling (3) parameter efficiency (3) contrastive learning (2) dependency parsing (2)

Papers

BIG-Bench Extra Hard ACL 2025 Scaling Instruction-Finetuned Language Models JMLR 2024 On Scaling Up a Multilingual Vision and Language Model CVPR 2024 Transcending Scaling Laws with 0.1% Extra Compute EMNLP 2023 Symbol tuning improves in-context learning in language models EMNLP 2023 Recommender Systems with Generative Retrieval NIPS 2023 Scaling Vision Transformers to 22 Billion Parameters ICML 2023 Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints ICLR 2023 Recitation-Augmented Language Models ICLR 2023 Language models are multilingual chain-of-thought reasoners ICLR 2023 UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining ICLR 2023 UL2: Unifying Language Learning Paradigms ICLR 2023 Inverse Scaling Can Become U-Shaped EMNLP 2023 DSI++: Updating Transformer Memory with New Documents EMNLP 2023 CoLT5: Faster Long-Range Transformers with Conditional Computation EMNLP 2023 PaLM: Scaling Language Modeling with Pathways JMLR 2023 The Flan Collection: Designing Data and Methods for Effective Instruction Tuning ICML 2023 Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them ACL 2023 Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? EMNLP 2023 Transformer Memory as a Differentiable Search Index NIPS 2022 HyperPrompt: Prompt-based Task-Conditioning of Transformers ICML 2022 Confident Adaptive Language Modeling NIPS 2022 Improving Compositional Generalization with Self-Training for Data-to-Text Generation ACL 2022 Sharpness-Aware Minimization Improves Language Model Generalization ACL 2022 ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference ACL 2022 ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning ICLR 2022 Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification EMNLP 2022 The Efficiency Misnomer ICLR 2022 Scarf: Self-Supervised Contrastive Learning using Random Feature Corruption ICLR 2022 Charformer: Fast Character Transformers via Gradient-based Subword Tokenization ICLR 2022 Scale Efficiently: Insights from Pretraining and Finetuning Transformers ICLR 2022 Scenic: A JAX Library for Computer Vision Research and Beyond CVPR 2022 Knowledge Router: Learning Disentangled Representations for Knowledge Graphs NAACL 2021 Self-Instantiated Recurrent Units with Dynamic Soft Recursion NIPS 2021 Are Pretrained Convolutions Better than Pretrained Transformers? ACL 2021 StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling ACL 2021 On Orthogonality Constraints for Transformers ACL 2021 How Reliable are Model Diagnostics? ACL 2021 Do Transformer Modifications Transfer Across Implementations and Applications? EMNLP 2021 HyperGrid Transformers: Towards A Single Model for Multiple Tasks ICLR 2021 Long Range Arena : A Benchmark for Efficient Transformers ICLR 2021 Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters ICLR 2021 Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees? ICLR 2021 Synthesizer: Rethinking Self-Attention for Transformer Models ICML 2021 OmniNet: Omnidirectional Representations from Transformers ICML 2021 Are Pretrained Convolutions Better than Pretrained Transformers? IJCNLP 2021 StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling IJCNLP 2021 On Orthogonality Constraints for Transformers IJCNLP 2021 How Reliable are Model Diagnostics? IJCNLP 2021 Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences ACL 2020 Interactive Machine Comprehension with Information Seeking Agents ACL 2020 Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder EMNLP 2020 Sparse Sinkhorn Attention ICML 2020 Reverse Engineering Configurations of Neural Text Generation Models ACL 2020 Multi-Level Head-Wise Match and Aggregation in Transformer for Textual Sequence Matching AAAI 2020 Jacobian Adversarially Regularized Networks for Robustness ICLR 2020 What It Thinks Is Important Is Important: Robustness Transfers Through Input Gradients CVPR 2020 Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling IJCNLP 2019 Confusionset-guided Pointer Networks for Chinese Spelling Check ACL 2019 Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives ACL 2019 Robust Representation Learning of Biomedical Names ACL 2019 Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks ACL 2019 Compositional De-Attention Networks NIPS 2019 Holographic Factorization Machines for Recommendation AAAI 2019 Quaternion Knowledge Graph Embeddings NIPS 2019 Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling EMNLP 2019 Quaternion Collaborative Filtering for Recommendation IJCAI 2019 DeepRec: An Open-source Toolkit for Deep Learning based Recommendation IJCAI 2019 Reasoning with Sarcasm by Reading In-Between ACL 2018 Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences EMNLP 2018 Attentive Gated Lexicon Reader with Contrastive Contextual Co-Attention for Sentiment Classification EMNLP 2018 Multi-Granular Sequence Encoding via Dilated Compositional Units for Reading Comprehension EMNLP 2018 Densely Connected Attention Propagation for Reading Comprehension NIPS 2018 Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference EMNLP 2018 Recurrently Controlled Recurrent Networks NIPS 2018 Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains IJCAI 2018 Learning Term Embeddings for Taxonomic Relation Identification Using Dynamic Weighting Neural Network EMNLP 2016