Yong Jae Lee

72 papers · 2013–2026 · 10 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🏃 Academic Marathon (13) 🧭 Keyword Pioneer 🌍 Conference Polyglot (10) 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (14)

🐝 Cross-Pollinator (14) 🧭 Keyword Pioneer 🏃 Academic Marathon (13) 🌟 Keyword Trendsetter Combo (4) 🏠 Conference Loyalist (26) 🔬 Deep Specialist (19) 🏆 Keyword Champion 🧬 Topic Evolution 🤝 Dynamic Duo (16) ⚡ Prolific Year (10) 🗃️ Keyword Collector (264) ❓ The Questioner 📈 Trend Setter 🔥 Unstoppable (14) 🚀 Conference Pioneer 💎 Century Club (71)

Conferences

CVPR (26) ICCV (11) NIPS (10) ECCV (7) WACV (6) ICLR (5) EMNLP (3) ACL (2) ICML (1) UAI (1)

Top co-authors

Krishna Kumar Singh (16) Yuheng Li (15) Haotian Liu (12) Utkarsh Ojha (12) Mu Cai (11) Jianwei Yang (7) Fanyi Xiao (7) Jianfeng Gao (6) Chunyuan Li (6) Zeyi Huang (6)

Keywords

vision-language model (9) transfer learning (9) multimodal learning (8) image generation (8) large language model (7) object detection (6) few-shot learning (5) large multimodal model (5) disentangled representation (4) visual question answering (4) generative model (4) weakly supervised learning (4) domain adaptation (4) domain generalization (4) generative adversarial network (4) weakly-supervised learning (4) zero-shot learning (3) vision language model (3) representation learning (3) image segmentation (3)

Papers

Agentic Very Long Video Understanding ACL 2026 LASER: Lip Landmark Assisted Speaker Detection for Robustness WACV 2026 CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems ICCV 2025 Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs CVPR 2025 Yo'Chameleon: Personalized Vision and Language Generation CVPR 2025 Aligned Datasets Improve Detection of Latent Diffusion-Generated Images ICLR 2025 Matryoshka Multimodal Models ICLR 2025 An Investigation on LLMs' Visual Understanding Ability using SVG for Image-Text Bridging WACV 2025 Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection ICML 2025 LLaRA: Supercharging Robot Learning Data for Vision-Language Policy ICLR 2025 Customizing Domain Adapters for Domain Generalization ICCV 2025 X-Fusion: Introducing New Modality to Frozen Large Language Models ICCV 2025 LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models ICCV 2025 VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation EMNLP 2024 Interfacing Foundation Models' Embeddings NIPS 2024 Yo'LLaVA: Your Personalized Language and Vision Assistant NIPS 2024 LP-3DGS: Learning to Prune 3D Gaussian Splatting NIPS 2024 CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples ACL 2024 ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts CVPR 2024 Edit One for All: Interactive Batch Image Editing CVPR 2024 Improved Baselines with Visual Instruction Tuning CVPR 2024 Removing Distributional Discrepancies in Captions Improves Image-Text Alignment ECCV 2024 MATE: Meet At The Embedding - Connecting Images with Long Texts EMNLP 2024 Computer Vision on the Edge: Individual Cattle Identification in Real-Time With ReadMyCow System WACV 2024 Visual Instruction Inversion: Image Editing via Image Prompting NIPS 2023 Learning Customized Visual Models With Retrieval-Augmented Knowledge CVPR 2023 GLIGEN: Open-Set Grounded Text-to-Image Generation CVPR 2023 What Knowledge Gets Distilled in Knowledge Distillation? NIPS 2023 Segment Everything Everywhere All at Once NIPS 2023 Visual Instruction Tuning NIPS 2023 Generalized Decoding for Pixel, Image, and Language CVPR 2023 Towards Universal Fake Image Detectors That Generalize Across Generative Models CVPR 2023 A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance ICCV 2023 InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning ICLR 2023 The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization CVPR 2022 Contrastive Learning for Diverse Disentangled Foreground Generation ECCV 2022 ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models NIPS 2022 Equine Pain Behavior Classification via Self-Supervised Disentangled Pose Representation WACV 2022 GIRAFFE HD: A High-Resolution 3D-Aware Generative Model CVPR 2022 Toward learning human-aligned cross-domain robust models by countering misaligned features UAI 2022 Masked Discrimination for Self-Supervised Learning on Point Clouds ECCV 2022 Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains ICLR 2021 Collaging Class-Specific GANs for Semantic Image Synthesis ICCV 2021 SinGAN-GIF: Learning a Generative Video Model From a Single GIF WACV 2021 Few-Shot Image Generation via Cross-Domain Correspondence CVPR 2021 Progressive Temporal Feature Alignment Network for Video Inpainting CVPR 2021 Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection CVPR 2020 Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data NIPS 2020 Password-conditioned Anonymization and Deanonymization with Face Identity Transformers ECCV 2020 Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks WACV 2020 MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation CVPR 2020 Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias CVPR 2020 HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds CVPR 2019 YOLACT: Real-Time Instance Segmentation ICCV 2019 Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos ICCV 2019 You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection CVPR 2019 FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery CVPR 2019 Video Object Detection with an Aligned Spatial-Temporal Memory ECCV 2018 A Visual Attention Grounding Neural Model for Multimodal Machine Translation EMNLP 2018 DOCK: Detecting Objects by transferring Common-sense Knowledge ECCV 2018 Learning to Anonymize Faces for Privacy Preserving Action Detection ECCV 2018 Cross-Domain Self-Supervised Multi-Task Feature Learning Using Synthetic Imagery CVPR 2018 Weakly-Supervised Visual Grounding of Phrases With Linguistic Structures CVPR 2017 Identifying First-Person Camera Wearers in Third-Person Videos CVPR 2017 Interspecies Knowledge Transfer for Facial Keypoint Detection CVPR 2017 Hide-And-Seek: Forcing a Network to Be Meticulous for Weakly-Supervised Object and Action Localization ICCV 2017 Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection CVPR 2016 Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals CVPR 2016 Discovering the Spatial Extent of Relative Attributes ICCV 2015 FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences CVPR 2015 Weakly-supervised Discovery of Visual Pattern Configurations NIPS 2014 Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time ICCV 2013