Youngjae Yu
64 papers · 2017–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Academic Marathon (8) π Conference Polyglot (10) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (13)
π§
Keyword Pioneer
π
Cross-Pollinator
(13)
π
Conference Polyglot
(10)
π€
Dynamic Duo
(17)
π¬
Deep Specialist
(27)
π§¬
Topic Evolution
π
Keyword Champion
(3)
β
The Questioner
(4)
π
Trend Setter
ποΈ
Keyword Collector
(254)
β‘
Prolific Year
(7)
π₯
Unstoppable
(6)
π
Century Club
(58)
π
Conference Pioneer
Conferences
EMNLP (15)
ACL (14)
CVPR (7)
AAAI (6)
ICCV (6)
NAACL (6)
NIPS (4)
ECCV (3)
ICLR (2)
MICCAI (1)
Top co-authors
Keywords
multimodal learning
(21)
large language model
(11)
video understanding
(6)
self-supervised learning
(5)
dialogue system
(5)
conversational agent
(4)
zero-shot learning
(4)
knowledge distillation
(4)
multimodal large language model
(4)
visual reasoning
(3)
visual commonsense
(3)
commonsense reasoning
(3)
vision-language model
(3)
visual question answering
(3)
reinforcement learning
(3)
image generation
(2)
visual grounding
(2)
visual context
(2)
data augmentation
(2)
direct preference optimization
(2)
Papers
GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance
ACL 2026
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation
AAAI 2026
Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism
AAAI 2026
Investigating Counterfactual Unfairness in LLMs towards Identities through Humor
ACL 2026
Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification
ACL 2026
Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding
ACL 2026
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
AAAI 2025
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
ICCV 2025
MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation
EMNLP 2025
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
EMNLP 2025
Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making
EMNLP 2025
C2: Scalable Auto-Feedback for LLM-based Chart Generation
NAACL 2025
Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd
EMNLP 2025
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
ACL 2025
Are Any-to-Any Models More Consistent Across Modality Transfers Than Specialists?
ACL 2025
Representation Bending for Large Language Model Safety
ACL 2025
Persona Dynamics: Unveiling the Impact of Persona Traits on Agents in Text-Based Games
ACL 2025
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
NAACL 2025
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
NAACL 2025
Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation
EMNLP 2025
VAGUE: Visual Contexts Clarify Ambiguous Expressions
ICCV 2025
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
ICCV 2025
Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation
MICCAI 2025
ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO
AAAI 2025
MASS: Overcoming Language Bias in Image-Text Matching
AAAI 2025
How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models
EMNLP 2024
Towards Visual Text Design Transfer Across Languages
NIPS 2024
Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
ACL 2024
Aligning Large Language Models by On-Policy Self-Judgment
ACL 2024
Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation
ACL 2024
Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset
ACL 2024
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
NAACL 2024
ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos
ECCV 2024
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
EMNLP 2024
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
EMNLP 2024
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
EMNLP 2024
Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory
EMNLP 2024
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization
EMNLP 2023
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text
NIPS 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
NIPS 2023
Symbolic Chain-of-Thought Distillation: Small Models Can Also βThinkβ Step-by-Step
ACL 2023
Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning
CVPR 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
EMNLP 2023
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
EMNLP 2023
Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents
EMNLP 2023
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
ICCV 2023
ProsocialDialog: A Prosocial Backbone for Conversational Agents
EMNLP 2022
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics
NAACL 2022
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer
NAACL 2022
MERLOT Reserve: Neural Script Knowledge Through Vision and Language and Sound
CVPR 2022
Self-Supervised Learning of Compressed Video Representations
ICLR 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
ICCV 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos
ICCV 2021
MERLOT: Multimodal Neural Script Knowledge Models
NIPS 2021
Parameter Efficient Multimodal Transformers for Video Representation Learning
ICLR 2021
Transitional Adaptation of Pretrained Models for Visual Storytelling
CVPR 2021
Dual Compositional Learning in Interactive Image Retrieval
AAAI 2021
Character Grounding and Re-Identification in Story of Videos and Text Descriptions
ECCV 2020
Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context
ACL 2020
A Memory Network Approach for Story-Based Temporal Summarization of 360Β° Videos
CVPR 2018
A Joint Sequence Fusion Model for Video Question Answering and Retrieval
ECCV 2018
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
CVPR 2017
End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering
CVPR 2017
Supervising Neural Attention Models for Video Captioning by Human Gaze Data
CVPR 2017