Mostafa Dehghani

35 papers · 2018–2024 · 11 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (11) 🏃 Academic Marathon (6) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (3)

🐝 Cross-Pollinator (3) 🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (57) 🔬 Deep Specialist (12) 👥 Mega-Team (43) 🤝 Dynamic Duo (17) 👑 Triple Crown 🌱 Topic Pioneer 🗃️ Keyword Collector (91) ⚡ Prolific Year (8) 📈 Trend Setter 🚀 Conference Pioneer 💎 Century Club (35) ❓ The Questioner (3)

Conferences

ICLR (12) NIPS (5) CVPR (4) EMNLP (3) ICML (3) ACL (2) IJCNLP (2) ECCV (1) ICCV (1) JMLR (1) NAACL (1)

Top co-authors

Yi Tay (17) Donald Metzler (11) Neil Houlsby (10) Anurag Arnab (9) Dara Bahri (7) Matthias Minderer (6) Jinfeng Rao (5) Mario Lucic (5) Samira Abnar (4) Basil Mustafa (4)

Research topics

Architectures (1) Optimization & Theory (1)

Keywords

few-shot learning (5) transformer architecture (5) model architecture (4) transfer learning (3) model scaling (3) large language model (3) parameter-efficient fine-tuning (3) object detection (2) scaling law (2) pretrained model (2) representation learning (2) adaptive computation (2) convolutional neural network (2) vision transformer (2) video transformer (2) multi-task learning (2) image recognition (2) video understanding (2) action recognition (2) language model (2)

Papers

On Scaling Up a Multilingual Vision and Language Model CVPR 2024 End-to-End Spatio-Temporal Action Localisation with Video Transformers CVPR 2024 Scaling Instruction-Finetuned Language Models JMLR 2024 Fractal Patterns May Illuminate the Success of Next-Token Prediction NIPS 2024 Low-Rank Adaptation for Multilingual Summarization: An Empirical Study NAACL 2024 Frozen Feature Augmentation for Few-Shot Image Classification CVPR 2024 Transcending Scaling Laws with 0.1% Extra Compute EMNLP 2023 DSI++: Updating Transformer Memory with New Documents EMNLP 2023 Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution NIPS 2023 Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? EMNLP 2023 $\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing Operation Selection among Cells ICLR 2023 UL2: Unifying Language Learning Paradigms ICLR 2023 Scaling Vision Transformers to 22 Billion Parameters ICML 2023 Adaptive Computation with Elastic Input Sequence ICML 2023 Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints ICLR 2023 Scale Efficiently: Insights from Pretraining and Finetuning Transformers ICLR 2022 Exploring the Limits of Large Scale Pre-training ICLR 2022 Scenic: A JAX Library for Computer Vision Research and Beyond CVPR 2022 Simple Open-Vocabulary Object Detection with Vision Transformers ECCV 2022 Confident Adaptive Language Modeling NIPS 2022 Transformer Memory as a Differentiable Search Index NIPS 2022 The Efficiency Misnomer ICLR 2022 Discrete Representations Strengthen Vision Transformer Robustness ICLR 2022 TokenLearner: Adaptive Space-Time Tokenization for Videos NIPS 2021 Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks ACL 2021 Are Pretrained Convolutions Better than Pretrained Transformers? ACL 2021 ViViT: A Video Vision Transformer ICCV 2021 IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression ICLR 2021 Long Range Arena : A Benchmark for Efficient Transformers ICLR 2021 An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale ICLR 2021 OmniNet: Omnidirectional Representations from Transformers ICML 2021 Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks IJCNLP 2021 Are Pretrained Convolutions Better than Pretrained Transformers? IJCNLP 2021 Universal Transformers ICLR 2019 Fidelity-Weighted Learning ICLR 2018