Colin Raffel
39 papers · 2017–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π£ Hot Topic Early Bird π Conference Polyglot (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Academic Marathon (8)
πΊοΈ
Taxonomy Completionist
(56)
π£
Hot Topic Early Bird
π
Interdisciplinary Bridge
π₯
Mega-Team
(45)
π¬
Deep Specialist
(10)
ποΈ
Keyword Collector
(123)
β‘
Prolific Year
(6)
π
Conference Pioneer
π
Trend Setter
π
Century Club
(39)
π₯
Unstoppable
(9)
β
The Questioner
(6)
Conferences
ICML (12)
ACL (8)
ICLR (8)
EMNLP (6)
JMLR (3)
NAACL (1)
NIPS (1)
Top co-authors
Research topics
Keywords
large language model
(8)
language model
(7)
transfer learning
(4)
prompt engineering
(3)
model scaling
(3)
few-shot learning
(3)
transformer architecture
(3)
zero-shot generalization
(3)
distributed learning
(2)
semi-supervised learning
(2)
attention mechanism
(2)
multilingual language model
(2)
distributed computing
(2)
question answering
(2)
text quality
(2)
cross-lingual transfer
(1)
entity linking
(1)
data augmentation
(1)
model distillation
(1)
privacy attack
(1)
Papers
Scaling Data-Constrained Language Models
JMLR 2025
Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator
ICML 2025
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
ICLR 2025
Position: The Most Expensive Part of an LLM *should* be its Training Data
ICML 2025
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
ICML 2025
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
NIPS 2024
Learning to Route Among Specialized Experts for Zero-Shot Generalization
ICML 2024
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
ACL 2024
Scaling Up Models and Data with t5x and seqio
JMLR 2023
Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
ICML 2023
Large Language Models Struggle to Learn Long-Tail Knowledge
ICML 2023
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
ACL 2023
Crosslingual Generalization through Multitask Finetuning
ACL 2023
Petals: Collaborative Inference and Fine-tuning of Large Models
ACL 2023
Evaluating the Factual Consistency of Large Language Models Through News Summarization
ACL 2023
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
EMNLP 2023
Knowledge is a Region in Weight Space for Fine-tuned Language Models
EMNLP 2023
Bidirectional Language Models Are Also Few-shot Learners
ICLR 2023
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
ACL 2022
Learning with Limited Text Data
ACL 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
ICLR 2022
What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?
ICML 2022
Deduplicating Training Data Mitigates Privacy Risks in Language Models
ICML 2022
What Language Model to Train if You Have One Million GPU Hours?
EMNLP 2022
mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
NAACL 2021
Improving and Simplifying Pattern Exploiting Training
EMNLP 2021
Do Transformer Modifications Transfer Across Implementations and Applications?
EMNLP 2021
Robust and Generalizable Visual Representation Learning via Random Convolutions
ICLR 2021
Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions
ICLR 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
EMNLP 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
JMLR 2020
ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring
ICLR 2020
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
ACL 2019
Towards GAN Benchmarks Which Require Generalization
ICLR 2019
Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
ICML 2019
Is Generator Conditioning Causally Related to GAN Performance?
ICML 2018
Thermometer Encoding: One Hot Way To Resist Adversarial Examples
ICLR 2018
A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music
ICML 2018
Online and Linear-Time Attention by Enforcing Monotonic Alignments
ICML 2017