Eunsu Kim
14 papers · 2024–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
π Conference Polyglot (8) π£ Hot Topic Early Bird π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (13)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π€
Dynamic Duo
(10)
π₯
Mega-Team
(22)
ποΈ
Keyword Collector
(52)
β
The Questioner
β‘
Prolific Year
(7)
π
Conference Pioneer
π
Century Club
(11)
Conferences
ACL (5)
EMNLP (2)
AACL (1)
COLING (1)
EACL (1)
IJCNLP (1)
MICCAI (1)
NAACL (1)
NIPS (1)
Top co-authors
Keywords
large language model
(11)
benchmark dataset
(4)
benchmark evaluation
(3)
cultural knowledge
(3)
multilingual nlp
(2)
linguistic understanding
(2)
evaluation framework
(2)
multimodal learning
(1)
text generation
(1)
preference alignment
(1)
llm evaluation
(1)
low-resource language
(1)
diffusion model
(1)
chain-of-thought prompting
(1)
instruction following
(1)
cultural awareness
(1)
multilingual generation
(1)
text-to-image generation
(1)
multi-turn interaction
(1)
multimodal large language model
(1)
Papers
Are they lovers or friends? Evaluating LLMsβ Social Reasoning in English and Korean Dialogues
ACL 2026
LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control
ACL 2026
Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation
ACL 2025
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
ACL 2025
Uncovering Factor-Level Preference to Improve Human-Model Alignment
EMNLP 2025
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language
EMNLP 2025
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
ACL 2025
BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge
IJCNLP 2025
WHEN TOM EATS KIMCHI: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts
NAACL 2025
BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge
AACL 2025
The Generative AI Paradox in Evaluation: βWhat It Can Solve, It May Not Evaluateβ
EACL 2024
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
NIPS 2024
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean
COLING 2024
Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model
MICCAI 2024