Sukyung Lee
7 papers · 2024–2025 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+1 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (4) π Cross-Pollinator (12) π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (13)
β‘
Prolific Year
(5)
Conferences
NAACL (3)
ACL (2)
COLING (1)
EMNLP (1)
Top co-authors
Keywords
large language model
(6)
data quality
(2)
data pipeline
(2)
data filtering
(2)
korean language
(2)
web corpus
(1)
continued pretraining
(1)
instruction tuning
(1)
language model
(1)
model ensemble
(1)
evaluation benchmark
(1)
model scaling
(1)
dataset curation
(1)
multilingual model
(1)
data curation
(1)
n-gram model
(1)
data leakage analysis
(1)
linguistic diversity
(1)
cultural understanding
(1)
n-gram language model
(1)
Papers
LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models
EMNLP 2025
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
ACL 2025
Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
COLING 2025
Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
NAACL 2025
Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models
NAACL 2025
Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
ACL 2024
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
NAACL 2024