Colin Raffel

39 papers · 2017–2025 · 7 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (8)

🗺️ Taxonomy Completionist (56) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 👥 Mega-Team (45) 🔬 Deep Specialist (10) 🗃️ Keyword Collector (123) ⚡ Prolific Year (6) 🚀 Conference Pioneer 📈 Trend Setter 💎 Century Club (39) 🔥 Unstoppable (9) ❓ The Questioner (6)

Conferences

ICML (12) ACL (8) ICLR (8) EMNLP (6) JMLR (3) NAACL (1) NIPS (1)

Top co-authors

Adam Roberts (9) Nikhil Kandpal (5) Teven Le Scao (5) M Saiful Bari (4) Noam Shazeer (4) Thomas Wang (4) Sheng Shen (3) Ian Goodfellow (3) Albert Webson (3) Niklas Muennighoff (3)

Research topics

Privacy (1)

Keywords

large language model (8) language model (7) transfer learning (4) prompt engineering (3) model scaling (3) few-shot learning (3) transformer architecture (3) zero-shot generalization (3) distributed learning (2) semi-supervised learning (2) attention mechanism (2) multilingual language model (2) distributed computing (2) question answering (2) text quality (2) cross-lingual transfer (1) entity linking (1) data augmentation (1) model distillation (1) privacy attack (1)

Papers

Scaling Data-Constrained Language Models JMLR 2025 Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator ICML 2025 AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution ICLR 2025 Position: The Most Expensive Part of an LLM *should* be its Training Data ICML 2025 The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions ICML 2025 The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale NIPS 2024 Learning to Route Among Specialized Experts for Zero-Shot Generalization ICML 2024 DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows ACL 2024 Scaling Up Models and Data with t5x and seqio JMLR 2023 Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models ICML 2023 Large Language Models Struggle to Learn Long-Tail Knowledge ICML 2023 ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning ACL 2023 Crosslingual Generalization through Multitask Finetuning ACL 2023 Petals: Collaborative Inference and Fine-tuning of Large Models ACL 2023 Evaluating the Factual Consistency of Large Language Models Through News Summarization ACL 2023 Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model EMNLP 2023 Knowledge is a Region in Weight Space for Fine-tuned Language Models EMNLP 2023 Bidirectional Language Models Are Also Few-shot Learners ICLR 2023 PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts ACL 2022 Learning with Limited Text Data ACL 2022 Multitask Prompted Training Enables Zero-Shot Task Generalization ICLR 2022 What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization? ICML 2022 Deduplicating Training Data Mitigates Privacy Risks in Language Models ICML 2022 What Language Model to Train if You Have One Million GPU Hours? EMNLP 2022 mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer NAACL 2021 Improving and Simplifying Pattern Exploiting Training EMNLP 2021 Do Transformer Modifications Transfer Across Implementations and Applications? EMNLP 2021 Robust and Generalizable Visual Representation Learning via Random Convolutions ICLR 2021 Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions ICLR 2020 How Much Knowledge Can You Pack Into the Parameters of a Language Model? EMNLP 2020 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer JMLR 2020 ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring ICLR 2020 Monotonic Infinite Lookback Attention for Simultaneous Machine Translation ACL 2019 Towards GAN Benchmarks Which Require Generalization ICLR 2019 Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition ICML 2019 Is Generator Conditioning Causally Related to GAN Performance? ICML 2018 Thermometer Encoding: One Hot Way To Resist Adversarial Examples ICLR 2018 A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music ICML 2018 Online and Linear-Time Attention by Enforcing Monotonic Alignments ICML 2017