Kevin Qinghong Lin

19 papers · 2022–2025 · 7 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌈 Renaissance Researcher (7) 🧭 Keyword Pioneer 🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (14) 🌉 Interdisciplinary Bridge

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (41) 🤝 Dynamic Duo (18) 🏆 Grand Slam 🔬 Deep Specialist (12) ⚡ Prolific Year (5) 💎 Century Club (19) 🗃️ Keyword Collector (84)

Conferences

CVPR (8) NIPS (4) ICCV (3) AAAI (1) ECCV (1) ICLR (1) ICML (1)

Top co-authors

Mike Zheng Shou (18) Difei Gao (7) Joya Chen (6) Shiwei Wu (3) Jinheng Xie (3) Ziteng Gao (3) Rui Yan (3) Pengchuan Zhang (3) Yuchao Gu (2) Lijuan Wang (2)

Keywords

multimodal learning (5) transfer learning (4) video understanding (4) vision transformer (3) video-language pre-training (2) video-language pretraining (2) action recognition (2) video-language model (2) graphical user interface (2) contrastive learning (2) video-text retrieval (2) video generation (2) scene understanding (2) multi-modal learning (2) bounding box (2) large language model (2) video captioning (2) vision-language model (2) egocentric vision (1) text generation (1)

Papers

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction ICML 2025 VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting AAAI 2025 VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary CVPR 2025 MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation CVPR 2025 ROICtrl: Boosting Instance Control for Visual Generation CVPR 2025 ShowUI: One Vision-Language-Action Model for GUI Visual Agent CVPR 2025 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation ICLR 2025 Bootstrapping SparseFormers from Vision Foundation Models CVPR 2024 VideoGUI: A Benchmark for GUI Automation from Instructional Videos NIPS 2024 VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation NIPS 2024 Learning Video Context as Interleaved Multimodal Sequences ECCV 2024 VideoLLM-online: Online Video Large Language Model for Streaming Video CVPR 2024 All in One: Exploring Unified Video-Language Pre-Training CVPR 2023 Affordance Grounding From Demonstration Video To Target Image CVPR 2023 Too Large; Data Reduction for Vision-Language Pre-Training ICCV 2023 UniVTG: Towards Unified Video-Language Temporal Grounding ICCV 2023 EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone ICCV 2023 Learning Visual Prior via Generative Pre-Training NIPS 2023 Egocentric Video-Language Pretraining NIPS 2022