Yong Jae Lee
72 papers · 2013–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
🏃 Academic Marathon (13) 🧭 Keyword Pioneer 🌍 Conference Polyglot (10) 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (14)
🐝
Cross-Pollinator
(14)
🧭
Keyword Pioneer
🏃
Academic Marathon
(13)
🌟
Keyword Trendsetter Combo
(4)
🏠
Conference Loyalist
(26)
🔬
Deep Specialist
(19)
🏆
Keyword Champion
🧬
Topic Evolution
🤝
Dynamic Duo
(16)
⚡
Prolific Year
(10)
🗃️
Keyword Collector
(264)
❓
The Questioner
📈
Trend Setter
🔥
Unstoppable
(14)
🚀
Conference Pioneer
💎
Century Club
(71)
Conferences
CVPR (26)
ICCV (11)
NIPS (10)
ECCV (7)
WACV (6)
ICLR (5)
EMNLP (3)
ACL (2)
ICML (1)
UAI (1)
Top co-authors
Keywords
vision-language model
(9)
transfer learning
(9)
multimodal learning
(8)
image generation
(8)
large language model
(7)
object detection
(6)
few-shot learning
(5)
large multimodal model
(5)
disentangled representation
(4)
visual question answering
(4)
generative model
(4)
weakly supervised learning
(4)
domain adaptation
(4)
domain generalization
(4)
generative adversarial network
(4)
weakly-supervised learning
(4)
zero-shot learning
(3)
vision language model
(3)
representation learning
(3)
image segmentation
(3)
Papers
Agentic Very Long Video Understanding
ACL 2026
LASER: Lip Landmark Assisted Speaker Detection for Robustness
WACV 2026
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
ICCV 2025
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
CVPR 2025
Yo'Chameleon: Personalized Vision and Language Generation
CVPR 2025
Aligned Datasets Improve Detection of Latent Diffusion-Generated Images
ICLR 2025
Matryoshka Multimodal Models
ICLR 2025
An Investigation on LLMs' Visual Understanding Ability using SVG for Image-Text Bridging
WACV 2025
Stay-Positive: A Case for Ignoring Real Image Features in Fake Image Detection
ICML 2025
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
ICLR 2025
Customizing Domain Adapters for Domain Generalization
ICCV 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
ICCV 2025
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
ICCV 2025
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
EMNLP 2024
Interfacing Foundation Models' Embeddings
NIPS 2024
Yo'LLaVA: Your Personalized Language and Vision Assistant
NIPS 2024
LP-3DGS: Learning to Prune 3D Gaussian Splatting
NIPS 2024
CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
ACL 2024
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
CVPR 2024
Edit One for All: Interactive Batch Image Editing
CVPR 2024
Improved Baselines with Visual Instruction Tuning
CVPR 2024
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
ECCV 2024
MATE: Meet At The Embedding - Connecting Images with Long Texts
EMNLP 2024
Computer Vision on the Edge: Individual Cattle Identification in Real-Time With ReadMyCow System
WACV 2024
Visual Instruction Inversion: Image Editing via Image Prompting
NIPS 2023
Learning Customized Visual Models With Retrieval-Augmented Knowledge
CVPR 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation
CVPR 2023
What Knowledge Gets Distilled in Knowledge Distillation?
NIPS 2023
Segment Everything Everywhere All at Once
NIPS 2023
Visual Instruction Tuning
NIPS 2023
Generalized Decoding for Pixel, Image, and Language
CVPR 2023
Towards Universal Fake Image Detectors That Generalize Across Generative Models
CVPR 2023
A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance
ICCV 2023
InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning
ICLR 2023
The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization
CVPR 2022
Contrastive Learning for Diverse Disentangled Foreground Generation
ECCV 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
NIPS 2022
Equine Pain Behavior Classification via Self-Supervised Disentangled Pose Representation
WACV 2022
GIRAFFE HD: A High-Resolution 3D-Aware Generative Model
CVPR 2022
Toward learning human-aligned cross-domain robust models by countering misaligned features
UAI 2022
Masked Discrimination for Self-Supervised Learning on Point Clouds
ECCV 2022
Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains
ICLR 2021
Collaging Class-Specific GANs for Semantic Image Synthesis
ICCV 2021
SinGAN-GIF: Learning a Generative Video Model From a Single GIF
WACV 2021
Few-Shot Image Generation via Cross-Domain Correspondence
CVPR 2021
Progressive Temporal Feature Alignment Network for Video Inpainting
CVPR 2021
Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection
CVPR 2020
Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data
NIPS 2020
Password-conditioned Anonymization and Deanonymization with Face Identity Transformers
ECCV 2020
Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks
WACV 2020
MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation
CVPR 2020
Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias
CVPR 2020
HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds
CVPR 2019
YOLACT: Real-Time Instance Segmentation
ICCV 2019
Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos
ICCV 2019
You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection
CVPR 2019
FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery
CVPR 2019
Video Object Detection with an Aligned Spatial-Temporal Memory
ECCV 2018
A Visual Attention Grounding Neural Model for Multimodal Machine Translation
EMNLP 2018
DOCK: Detecting Objects by transferring Common-sense Knowledge
ECCV 2018
Learning to Anonymize Faces for Privacy Preserving Action Detection
ECCV 2018
Cross-Domain Self-Supervised Multi-Task Feature Learning Using Synthetic Imagery
CVPR 2018
Weakly-Supervised Visual Grounding of Phrases With Linguistic Structures
CVPR 2017
Identifying First-Person Camera Wearers in Third-Person Videos
CVPR 2017
Interspecies Knowledge Transfer for Facial Keypoint Detection
CVPR 2017
Hide-And-Seek: Forcing a Network to Be Meticulous for Weakly-Supervised Object and Action Localization
ICCV 2017
Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection
CVPR 2016
Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals
CVPR 2016
Discovering the Spatial Extent of Relative Attributes
ICCV 2015
FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences
CVPR 2015
Weakly-supervised Discovery of Visual Pattern Configurations
NIPS 2014
Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time
ICCV 2013