Eunsu Kim

14 papers · 2024–2026 · 9 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🌍 Conference Polyglot (8) 🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (10) 👥 Mega-Team (22) 🗃️ Keyword Collector (52) ❓ The Questioner ⚡ Prolific Year (7) 🚀 Conference Pioneer 💎 Century Club (11)

Conferences

ACL (5) EMNLP (2) AACL (1) COLING (1) EACL (1) IJCNLP (1) MICCAI (1) NAACL (1) NIPS (1)

Top co-authors

Alice Oh (13) Juhyun Oh (4) Kiwoong Park (3) Junyeong Park (3) James Thorne (3) Seyoung Song (3) Inha Cha (2) Daeen Kabir (2) Sheikh Shafayat (2) Dongkwan Kim (2)

Keywords

large language model (11) benchmark dataset (4) benchmark evaluation (3) cultural knowledge (3) multilingual nlp (2) linguistic understanding (2) evaluation framework (2) multimodal learning (1) text generation (1) preference alignment (1) llm evaluation (1) low-resource language (1) diffusion model (1) chain-of-thought prompting (1) instruction following (1) cultural awareness (1) multilingual generation (1) text-to-image generation (1) multi-turn interaction (1) multimodal large language model (1)

Papers

Are they lovers or friends? Evaluating LLMs’ Social Reasoning in English and Korean Dialogues ACL 2026 LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control ACL 2026 Spotting Out-of-Character Behavior: Atomic-Level Evaluation of Persona Fidelity in Open-Ended Generation ACL 2025 LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation ACL 2025 Uncovering Factor-Level Preference to Improve Human-Model Alignment EMNLP 2025 MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language EMNLP 2025 Diffusion Models Through a Global Lens: Are They Culturally Inclusive? ACL 2025 BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge IJCNLP 2025 WHEN TOM EATS KIMCHI: Evaluating Cultural Awareness of Multimodal Large Language Models in Cultural Mixture Contexts NAACL 2025 BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge AACL 2025 The Generative AI Paradox in Evaluation: “What It Can Solve, It May Not Evaluate” EACL 2024 BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages NIPS 2024 CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean COLING 2024 Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model MICCAI 2024