Sara Hooker
37 papers · 2019–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Academic Marathon (6) π Cross-Pollinator (13) π Conference Polyglot (8) π Interdisciplinary Bridge π Renaissance Researcher (7)
πΊοΈ
Taxonomy Completionist
(58)
π
Conference Polyglot
(8)
π
Academic Marathon
(6)
π₯
Mega-Team
(82)
π
Triple Crown
π€
Dynamic Duo
(11)
π¬
Deep Specialist
(13)
π
Keyword Champion
(2)
π
Century Club
(36)
β
The Questioner
(2)
β‘
Prolific Year
(13)
π₯
Unstoppable
(5)
ποΈ
Keyword Collector
(149)
Conferences
EMNLP (12)
ACL (9)
ICLR (6)
NIPS (6)
CVPR (1)
ICML (1)
NAACL (1)
UAI (1)
Top co-authors
Research topics
Keywords
large language model
(15)
low-resource language
(5)
multilingual language model
(4)
model compression
(4)
multilingual evaluation
(3)
benchmark evaluation
(3)
cross-lingual transfer
(3)
language model
(3)
synthetic datum
(2)
domain adaptation
(2)
multilingual model
(2)
reinforcement learning from human feedback
(2)
language model evaluation
(2)
model alignment
(2)
multilingual nlp
(2)
mathematical reasoning
(2)
machine translation
(2)
knowledge distillation
(2)
model quantization
(2)
text generation
(2)
Papers
One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers
ACL 2026
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
NAACL 2025
MMTEB: Massive Multilingual Text Embedding Benchmark
ICLR 2025
Bridging the Data Provenance Gap Across Text, Speech, and Video
ICLR 2025
To Code or Not To Code? Exploring Impact of Code in Pre-training
ICLR 2025
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
EMNLP 2025
Nexus: Adaptive Upcycling to Efficiently Pretrain Mixture of Experts
EMNLP 2025
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
ICLR 2025
M-RewardBench: Evaluating Reward Models in Multilingual Settings
ACL 2025
Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress
ACL 2025
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
ACL 2025
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
EMNLP 2024
How Does Quantization Affect Multilingual LLMs?
EMNLP 2024
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
EMNLP 2024
Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs
ACL 2024
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
ICLR 2024
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
ACL 2024
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
ACL 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
ACL 2024
On The Fairness Impacts of Hardware Selection in Machine Learning
ICML 2024
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
NIPS 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons
NIPS 2024
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
ACL 2024
LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives
EMNLP 2024
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
EMNLP 2023
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
NIPS 2023
The Grand Illusion: The Myth of Software Portability and Implications for ML Progress.
NIPS 2023
Intriguing Properties of Quantization at Scale
NIPS 2023
On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research
EMNLP 2023
Locally Differentially Private Document Generation Using Zero Shot Prompting
EMNLP 2023
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
EMNLP 2023
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
ICLR 2023
Robust distillation for worst-class performance: on the interplay between teacher and student objectives
UAI 2023
Intriguing Properties of Compression on Multilingual Models
EMNLP 2022
Estimating Example Difficulty Using Variance of Gradients
CVPR 2022
The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation
EMNLP 2021
A Benchmark for Interpretability Methods in Deep Neural Networks
NIPS 2019