Youngjae Yu

64 papers · 2017–2026 · 10 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🏃 Academic Marathon (8) 🌍 Conference Polyglot (10) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (13) 🌍 Conference Polyglot (10) 🤝 Dynamic Duo (17) 🔬 Deep Specialist (27) 🧬 Topic Evolution 🏆 Keyword Champion (3) ❓ The Questioner (4) 📈 Trend Setter 🗃️ Keyword Collector (254) ⚡ Prolific Year (7) 🔥 Unstoppable (6) 💎 Century Club (58) 🚀 Conference Pioneer

Conferences

EMNLP (15) ACL (14) CVPR (7) AAAI (6) ICCV (6) NAACL (6) NIPS (4) ECCV (3) ICLR (2) MICCAI (1)

Top co-authors

Jiwan Chung (19) Yejin Choi (16) Gunhee Kim (16) Jack Hessel (12) Ximing Lu (9) Seungju Han (7) Jinyoung Yeo (6) Seungbeen Lee (6) Dongha Lee (5) Seungwon Lim (5)

Keywords

multimodal learning (21) large language model (11) video understanding (6) self-supervised learning (5) dialogue system (5) conversational agent (4) zero-shot learning (4) knowledge distillation (4) multimodal large language model (4) visual reasoning (3) visual commonsense (3) commonsense reasoning (3) vision-language model (3) visual question answering (3) reinforcement learning (3) image generation (2) visual grounding (2) visual context (2) data augmentation (2) direct preference optimization (2)

Papers

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance ACL 2026 Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation AAAI 2026 Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism AAAI 2026 Investigating Counterfactual Unfairness in LLMs towards Identities through Humor ACL 2026 Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification ACL 2026 Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding ACL 2026 DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation AAAI 2025 DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding ICCV 2025 MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation EMNLP 2025 VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms EMNLP 2025 Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making EMNLP 2025 C2: Scalable Auto-Feedback for LLM-based Chart Generation NAACL 2025 Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd EMNLP 2025 Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues ACL 2025 Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists? ACL 2025 Representation Bending for Large Language Model Safety ACL 2025 Persona Dynamics: Unveiling the Impact of Persona Traits on Agents in Text-Based Games ACL 2025 Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics NAACL 2025 EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild NAACL 2025 Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation EMNLP 2025 VAGUE: Visual Contexts Clarify Ambiguous Expressions ICCV 2025 V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models ICCV 2025 Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation MICCAI 2025 ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO AAAI 2025 MASS: Overcoming Language Bias in Image-Text Matching AAAI 2025 How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models EMNLP 2024 Towards Visual Text Design Transfer Across Languages NIPS 2024 Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback ACL 2024 Aligning Large Language Models by On-Policy Self-Judgment ACL 2024 Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation ACL 2024 Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset ACL 2024 SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models NAACL 2024 ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos ECCV 2024 Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding EMNLP 2024 Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! EMNLP 2024 Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models EMNLP 2024 Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory EMNLP 2024 SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization EMNLP 2023 Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text NIPS 2023 Localized Symbolic Knowledge Distillation for Visual Commonsense Models NIPS 2023 Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step ACL 2023 Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning CVPR 2023 VLIS: Unimodal Language Models Guide Multimodal Language Generation EMNLP 2023 Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms EMNLP 2023 Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents EMNLP 2023 CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos ICCV 2023 ProsocialDialog: A Prosocial Backbone for Conversational Agents EMNLP 2022 NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics NAACL 2022 Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer NAACL 2022 MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound CVPR 2022 Self-Supervised Learning of Compressed Video Representations ICLR 2021 ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning ICCV 2021 Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos ICCV 2021 MERLOT: Multimodal Neural Script Knowledge Models NIPS 2021 Parameter Efficient Multimodal Transformers for Video Representation Learning ICLR 2021 Transitional Adaptation of Pretrained Models for Visual Storytelling CVPR 2021 Dual Compositional Learning in Interactive Image Retrieval AAAI 2021 Character Grounding and Re-Identification in Story of Videos and Text Descriptions ECCV 2020 Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context ACL 2020 A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos CVPR 2018 A Joint Sequence Fusion Model for Video Question Answering and Retrieval ECCV 2018 TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering CVPR 2017 End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering CVPR 2017 Supervising Neural Attention Models for Video Captioning by Human Gaze Data CVPR 2017