Pang Wei Koh
33 papers · 2017–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π§ Keyword Pioneer π Renaissance Researcher (6) π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (11) π Conference Polyglot (8)
π§
Keyword Pioneer
π
Cross-Pollinator
(15)
π
Conference Polyglot
(8)
π
Keyword Trendsetter Combo
(4)
π
Keyword Champion
(3)
π₯
Mega-Team
(60)
π€
Dynamic Duo
(11)
π₯
Unstoppable
(7)
ποΈ
Keyword Collector
(137)
β‘
Prolific Year
(13)
π
Century Club
(33)
Conferences
ICML (10)
ICLR (7)
NIPS (6)
EMNLP (4)
ACL (3)
AISTATS (1)
CVPR (1)
NAACL (1)
Top co-authors
Research topics
Keywords
language model
(5)
domain generalization
(5)
large language model
(5)
spurious correlation
(3)
data curation
(3)
distribution shift
(3)
knowledge transfer
(2)
domain adaptation
(2)
transfer learning
(2)
vision-language model
(2)
model merging
(2)
inductive bia
(2)
training datum
(2)
uncertainty quantification
(1)
feature learning
(1)
embedding space
(1)
multi-task learning
(1)
representation learning
(1)
cross-lingual transfer
(1)
contrastive learning
(1)
Papers
Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder
ACL 2025
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
ACL 2025
DataDecide: How to Predict Best Pretraining Data with Small Experiments
ICML 2025
S4S: Solving for a Fast Diffusion Model Solver
ICML 2025
OLMoE: Open Mixture-of-Experts Language Models
ICLR 2025
PLeaS - Merging Models with Permutations and Least Squares
CVPR 2025
NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric
ICML 2025
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions
ICLR 2025
Language models scale reliably with over-training and on downstream tasks
ICLR 2025
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging
EMNLP 2024
The Generative AI Paradox: βWhat It Can Create, It May Not Understandβ
ICLR 2024
Improving Domain Generalization with Domain Relations
ICLR 2024
The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
NIPS 2024
DataComp-LM: In search of the next generation of training sets for language models
NIPS 2024
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs
NIPS 2024
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
NIPS 2024
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
NIPS 2024
Multilingual Diversity Improves Vision-Language Representations
NIPS 2024
Annotation alignment: Comparing LLM and human annotations of conversational safety
EMNLP 2024
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
EMNLP 2024
Position Paper: Data-Centric AI in the Age of Large Language Models
EMNLP 2024
Instructional Fingerprinting of Large Language Models
NAACL 2024
Out-of-Domain Robustness via Targeted Augmentations
ICML 2023
Extending the WILDS Benchmark for Unsupervised Adaptation
ICLR 2022
Just Train Twice: Improving Group Robustness without Training Group Information
ICML 2021
Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization
ICML 2021
WILDS: A Benchmark of in-the-Wild Distribution Shifts
ICML 2021
Selective Classification Can Magnify Disparities Across Groups
ICLR 2021
An Investigation of Why Overparameterization Exacerbates Spurious Correlations
ICML 2020
Concept Bottleneck Models
ICML 2020
ExpBERT: Representation Engineering with Natural Language Explanations
ACL 2020
Inferring Multidimensional Rates of Aging from Cross-Sectional Data
AISTATS 2019
Understanding Black-box Predictions via Influence Functions
ICML 2017