Seonghoon Yang
3 papers · 2024–2025 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓
π
Conference Polyglot
(3)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(11)
π§
Keyword Pioneer
π
Cross-Pollinator
(15)
Conferences
ACL (1)
EMNLP (1)
NAACL (1)
Top co-authors
Keywords
data filtering
(2)
large language model
(2)
data quality
(2)
instruction tuning
(1)
language model
(1)
model ensemble
(1)
model scaling
(1)
dataset curation
(1)
data pipeline
(1)
data curation
(1)
n-gram model
(1)
n-gram language model
(1)
text quality
(1)
dataset filtering
(1)
cpu-based processing
(1)
purpose-driven dataset
(1)
depth up-scaling
(1)
cpu computing
(1)
cpu-only processing
(1)
purpose-driven datum
(1)
Papers
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
ACL 2025
LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models
EMNLP 2025
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
NAACL 2024