Pang Wei Koh

33 papers · 2017–2025 · 8 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🧭 Keyword Pioneer 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (11) 🌍 Conference Polyglot (8)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (15) 🌍 Conference Polyglot (8) 🌟 Keyword Trendsetter Combo (4) 🏆 Keyword Champion (3) 👥 Mega-Team (60) 🤝 Dynamic Duo (11) 🔥 Unstoppable (7) 🗃️ Keyword Collector (137) ⚡ Prolific Year (13) 💎 Century Club (33)

Conferences

ICML (10) ICLR (7) NIPS (6) EMNLP (4) ACL (3) AISTATS (1) CVPR (1) NAACL (1)

Top co-authors

Percy Liang (11) Shiori Sagawa (7) Luca Soldaini (5) Emma Pierson (5) Hannaneh Hajishirzi (4) Noah A. Smith (4) Dirk Groeneveld (4) Sewon Min (4) Chelsea Finn (4) Sewoong Oh (4)

Research topics

Reinforcement Learning (1)

Keywords

language model (5) domain generalization (5) large language model (5) spurious correlation (3) data curation (3) distribution shift (3) knowledge transfer (2) domain adaptation (2) transfer learning (2) vision-language model (2) model merging (2) inductive bia (2) training datum (2) uncertainty quantification (1) feature learning (1) embedding space (1) multi-task learning (1) representation learning (1) cross-lingual transfer (1) contrastive learning (1)

Papers

Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder ACL 2025 OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens ACL 2025 DataDecide: How to Predict Best Pretraining Data with Small Experiments ICML 2025 S4S: Solving for a Fast Diffusion Model Solver ICML 2025 OLMoE: Open Mixture-of-Experts Language Models ICLR 2025 PLeaS - Merging Models with Permutations and Least Squares CVPR 2025 NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric ICML 2025 Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions ICLR 2025 Language models scale reliably with over-training and on downstream tasks ICLR 2025 Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging EMNLP 2024 The Generative AI Paradox: “What It Can Create, It May Not Understand” ICLR 2024 Improving Domain Generalization with Domain Relations ICLR 2024 The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better NIPS 2024 DataComp-LM: In search of the next generation of training sets for language models NIPS 2024 Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs NIPS 2024 MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning NIPS 2024 Scaling Retrieval-Based Language Models with a Trillion-Token Datastore NIPS 2024 Multilingual Diversity Improves Vision-Language Representations NIPS 2024 Annotation alignment: Comparing LLM and human annotations of conversational safety EMNLP 2024 CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation EMNLP 2024 Position Paper: Data-Centric AI in the Age of Large Language Models EMNLP 2024 Instructional Fingerprinting of Large Language Models NAACL 2024 Out-of-Domain Robustness via Targeted Augmentations ICML 2023 Extending the WILDS Benchmark for Unsupervised Adaptation ICLR 2022 Just Train Twice: Improving Group Robustness without Training Group Information ICML 2021 Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization ICML 2021 WILDS: A Benchmark of in-the-Wild Distribution Shifts ICML 2021 Selective Classification Can Magnify Disparities Across Groups ICLR 2021 An Investigation of Why Overparameterization Exacerbates Spurious Correlations ICML 2020 Concept Bottleneck Models ICML 2020 ExpBERT: Representation Engineering with Natural Language Explanations ACL 2020 Inferring Multidimensional Rates of Aging from Cross-Sectional Data AISTATS 2019 Understanding Black-box Predictions via Influence Functions ICML 2017