Yi Tay
77 papers · 2016–2025 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π§ Keyword Pioneer π£ Hot Topic Early Bird πΊοΈ Taxonomy Completionist (11) π Interdisciplinary Bridge π Conference Polyglot (11)
π
Interdisciplinary Bridge
π
Conference Polyglot
(11)
π
Cross-Pollinator
(3)
π₯
Mega-Team
(67)
π
Grand Slam
π
Triple Crown
π¬
Deep Specialist
(20)
π
Keyword Champion
(2)
π€
Dynamic Duo
(25)
ποΈ
Keyword Collector
(248)
β
The Questioner
(8)
β‘
Prolific Year
(8)
π
Conference Pioneer
π
Trend Setter
π
Century Club
(77)
π₯
Unstoppable
(8)
Conferences
ACL (17)
EMNLP (15)
ICLR (15)
NIPS (8)
ICML (6)
IJCNLP (5)
CVPR (3)
IJCAI (3)
AAAI (2)
JMLR (2)
NAACL (1)
Top co-authors
Research topics
Keywords
large language model
(8)
few-shot learning
(8)
question answering
(7)
model architecture
(6)
transformer architecture
(6)
attention mechanism
(6)
neural network
(5)
natural language inference
(5)
representation learning
(5)
transformer model
(4)
machine translation
(4)
natural language processing
(4)
language model
(4)
information retrieval
(3)
text classification
(3)
reading comprehension
(3)
model scaling
(3)
parameter efficiency
(3)
contrastive learning
(2)
dependency parsing
(2)
Papers
BIG-Bench Extra Hard
ACL 2025
Scaling Instruction-Finetuned Language Models
JMLR 2024
On Scaling Up a Multilingual Vision and Language Model
CVPR 2024
Transcending Scaling Laws with 0.1% Extra Compute
EMNLP 2023
Symbol tuning improves in-context learning in language models
EMNLP 2023
Recommender Systems with Generative Retrieval
NIPS 2023
Scaling Vision Transformers to 22 Billion Parameters
ICML 2023
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
ICLR 2023
Recitation-Augmented Language Models
ICLR 2023
Language models are multilingual chain-of-thought reasoners
ICLR 2023
UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining
ICLR 2023
UL2: Unifying Language Learning Paradigms
ICLR 2023
Inverse Scaling Can Become U-Shaped
EMNLP 2023
DSI++: Updating Transformer Memory with New Documents
EMNLP 2023
CoLT5: Faster Long-Range Transformers with Conditional Computation
EMNLP 2023
PaLM: Scaling Language Modeling with Pathways
JMLR 2023
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
ICML 2023
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
ACL 2023
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
EMNLP 2023
Transformer Memory as a Differentiable Search Index
NIPS 2022
HyperPrompt: Prompt-based Task-Conditioning of Transformers
ICML 2022
Confident Adaptive Language Modeling
NIPS 2022
Improving Compositional Generalization with Self-Training for Data-to-Text Generation
ACL 2022
Sharpness-Aware Minimization Improves Language Model Generalization
ACL 2022
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
ACL 2022
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
ICLR 2022
Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification
EMNLP 2022
The Efficiency Misnomer
ICLR 2022
Scarf: Self-Supervised Contrastive Learning using Random Feature Corruption
ICLR 2022
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
ICLR 2022
Scale Efficiently: Insights from Pretraining and Finetuning Transformers
ICLR 2022
Scenic: A JAX Library for Computer Vision Research and Beyond
CVPR 2022
Knowledge Router: Learning Disentangled Representations for Knowledge Graphs
NAACL 2021
Self-Instantiated Recurrent Units with Dynamic Soft Recursion
NIPS 2021
Are Pretrained Convolutions Better than Pretrained Transformers?
ACL 2021
StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
ACL 2021
On Orthogonality Constraints for Transformers
ACL 2021
How Reliable are Model Diagnostics?
ACL 2021
Do Transformer Modifications Transfer Across Implementations and Applications?
EMNLP 2021
HyperGrid Transformers: Towards A Single Model for Multiple Tasks
ICLR 2021
Long Range Arena : A Benchmark for Efficient Transformers
ICLR 2021
Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters
ICLR 2021
Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?
ICLR 2021
Synthesizer: Rethinking Self-Attention for Transformer Models
ICML 2021
OmniNet: Omnidirectional Representations from Transformers
ICML 2021
Are Pretrained Convolutions Better than Pretrained Transformers?
IJCNLP 2021
StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
IJCNLP 2021
On Orthogonality Constraints for Transformers
IJCNLP 2021
How Reliable are Model Diagnostics?
IJCNLP 2021
Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences
ACL 2020
Interactive Machine Comprehension with Information Seeking Agents
ACL 2020
Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder
EMNLP 2020
Sparse Sinkhorn Attention
ICML 2020
Reverse Engineering Configurations of Neural Text Generation Models
ACL 2020
Multi-Level Head-Wise Match and Aggregation in Transformer for Textual Sequence Matching
AAAI 2020
Jacobian Adversarially Regularized Networks for Robustness
ICLR 2020
What It Thinks Is Important Is Important: Robustness Transfers Through Input Gradients
CVPR 2020
Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling
IJCNLP 2019
Confusionset-guided Pointer Networks for Chinese Spelling Check
ACL 2019
Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives
ACL 2019
Robust Representation Learning of Biomedical Names
ACL 2019
Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks
ACL 2019
Compositional De-Attention Networks
NIPS 2019
Holographic Factorization Machines for Recommendation
AAAI 2019
Quaternion Knowledge Graph Embeddings
NIPS 2019
Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling
EMNLP 2019
Quaternion Collaborative Filtering for Recommendation
IJCAI 2019
DeepRec: An Open-source Toolkit for Deep Learning based Recommendation
IJCAI 2019
Reasoning with Sarcasm by Reading In-Between
ACL 2018
Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences
EMNLP 2018
Attentive Gated Lexicon Reader with Contrastive Contextual Co-Attention for Sentiment Classification
EMNLP 2018
Multi-Granular Sequence Encoding via Dilated Compositional Units for Reading Comprehension
EMNLP 2018
Densely Connected Attention Propagation for Reading Comprehension
NIPS 2018
Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference
EMNLP 2018
Recurrently Controlled Recurrent Networks
NIPS 2018
Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains
IJCAI 2018
Learning Term Embeddings for Taxonomic Relation Identification Using Dynamic Weighting Neural Network
EMNLP 2016