Sara Hooker

37 papers · 2019–2026 · 8 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (6) 🐝 Cross-Pollinator (13) 🌍 Conference Polyglot (8) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (7)

🗺️ Taxonomy Completionist (58) 🌍 Conference Polyglot (8) 🏃 Academic Marathon (6) 👥 Mega-Team (82) 👑 Triple Crown 🤝 Dynamic Duo (11) 🔬 Deep Specialist (13) 🏆 Keyword Champion (2) 💎 Century Club (36) ❓ The Questioner (2) ⚡ Prolific Year (13) 🔥 Unstoppable (5) 🗃️ Keyword Collector (149)

Conferences

EMNLP (12) ACL (9) ICLR (6) NIPS (6) CVPR (1) ICML (1) NAACL (1) UAI (1)

Top co-authors

Marzieh Fadaee (12) Julia Kreutzer (10) Ahmet Üstün (10) Beyza Ermis (9) Arash Ahmadian (5) Sebastian Ruder (5) Niklas Muennighoff (5) Shivalika Singh (4) Shayne Longpre (4) Acyr Locatelli (4)

Research topics

Differential Privacy (1)

Keywords

large language model (15) low-resource language (5) multilingual language model (4) model compression (4) multilingual evaluation (3) benchmark evaluation (3) cross-lingual transfer (3) language model (3) synthetic datum (2) domain adaptation (2) multilingual model (2) reinforcement learning from human feedback (2) language model evaluation (2) model alignment (2) multilingual nlp (2) mathematical reasoning (2) machine translation (2) knowledge distillation (2) model quantization (2) text generation (2)

Papers

One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers ACL 2026 IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models NAACL 2025 MMTEB: Massive Multilingual Text Embedding Benchmark ICLR 2025 Bridging the Data Provenance Gap Across Text, Speech, and Video ICLR 2025 To Code or Not To Code? Exploring Impact of Code in Pre-training ICLR 2025 When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs EMNLP 2025 Nexus: Adaptive Upcycling to Efficiently Pretrain Mixture of Experts EMNLP 2025 INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge ICLR 2025 M-RewardBench: Evaluating Reward Models in Multilingual Settings ACL 2025 Multilingual Arbitration: Optimizing Data Pools to Accelerate Multilingual Progress ACL 2025 Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation ACL 2025 The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm EMNLP 2024 How Does Quantization Affect Multilingual LLMs? EMNLP 2024 RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs EMNLP 2024 Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs ACL 2024 Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning ICLR 2024 Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model ACL 2024 Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning ACL 2024 From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models ACL 2024 On The Fairness Impacts of Hardware Selection in Machine Learning ICML 2024 Elo Uncovered: Robustness and Best Practices in Language Model Evaluation NIPS 2024 Consent in Crisis: The Rapid Decline of the AI Data Commons NIPS 2024 Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning ACL 2024 LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives EMNLP 2024 Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models EMNLP 2023 The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs NIPS 2023 The Grand Illusion: The Myth of Software Portability and Implications for ML Progress. NIPS 2023 Intriguing Properties of Quantization at Scale NIPS 2023 On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research EMNLP 2023 Locally Differentially Private Document Generation Using Zero Shot Prompting EMNLP 2023 Elo Uncovered: Robustness and Best Practices in Language Model Evaluation EMNLP 2023 Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics ICLR 2023 Robust distillation for worst-class performance: on the interplay between teacher and student objectives UAI 2023 Intriguing Properties of Compression on Multilingual Models EMNLP 2022 Estimating Example Difficulty Using Variance of Gradients CVPR 2022 The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation EMNLP 2021 A Benchmark for Interpretability Methods in Deep Neural Networks NIPS 2019