Sangho Lee

16 papers · 2017–2025 · 8 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🏃 Academic Marathon (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (8) 🐝 Cross-Pollinator (13)

🏃 Academic Marathon (8) 🗺️ Taxonomy Completionist (37) 🐝 Cross-Pollinator (13) 👥 Mega-Team (50) ❓ The Questioner 🗃️ Keyword Collector (79) 💎 Century Club (16)

Conferences

CVPR (5) AAAI (3) ICCV (2) ICLR (2) AISTATS (1) ECCV (1) EMNLP (1) ICML (1)

Top co-authors

Gunhee Kim (8) Aniruddha Kembhavi (4) Christopher Clark (4) Youngjae Yu (4) Jiasen Lu (3) Yale Song (3) Joonseok Lee (2) Seongsu Ha (2) Donghwa Kim (2) Thomas Breuel (2)

Keywords

multimodal learning (3) image generation (2) mutual information (2) self-supervised learning (2) benchmark evaluation (1) image captioning (1) transformer architecture (1) pose estimation (1) contrastive learning (1) visual question answering (1) video captioning (1) online learning (1) audio-visual learning (1) self-attention mechanism (1) representation learning (1) depth estimation (1) image synthesis (1) instruction following (1) efficient computing (1) unsupervised learning (1)

Papers

One Diffusion to Generate Them All CVPR 2025 MAMS: Model-Agnostic Module Selection Framework for Video Captioning AAAI 2025 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models CVPR 2025 ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams CVPR 2025 Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation ECCV 2024 Proxyformer: Nyström-Based Linear Transformer with Trainable Proxy Tokens AAAI 2024 Towards a Complete Benchmark on Video Moment Localization AISTATS 2024 Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action CVPR 2024 Can Language Models Laugh at YouTube Short-form Videos? EMNLP 2023 Unsupervised Representation Learning via Neural Activation Coding ICML 2021 ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning ICCV 2021 Parameter Efficient Multimodal Transformers for Video Representation Learning ICLR 2021 Self-Supervised Learning of Compressed Video Representations ICLR 2021 URNet: User-Resizable Residual Networks with Conditional Gating Module AAAI 2020 A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos CVPR 2018 A Read-Write Memory Network for Movie Story Understanding ICCV 2017