Kevin Qinghong Lin
19 papers · 2022–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Renaissance Researcher (7) π§ Keyword Pioneer π Conference Polyglot (7) π Cross-Pollinator (14) π Interdisciplinary Bridge
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(41)
π€
Dynamic Duo
(18)
π
Grand Slam
π¬
Deep Specialist
(12)
β‘
Prolific Year
(5)
π
Century Club
(19)
ποΈ
Keyword Collector
(84)
Conferences
CVPR (8)
NIPS (4)
ICCV (3)
AAAI (1)
ECCV (1)
ICLR (1)
ICML (1)
Top co-authors
Keywords
multimodal learning
(5)
transfer learning
(4)
video understanding
(4)
vision transformer
(3)
video-language pre-training
(2)
video-language pretraining
(2)
action recognition
(2)
video-language model
(2)
graphical user interface
(2)
contrastive learning
(2)
video-text retrieval
(2)
video generation
(2)
scene understanding
(2)
multi-modal learning
(2)
bounding box
(2)
large language model
(2)
video captioning
(2)
vision-language model
(2)
egocentric vision
(1)
text generation
(1)
Papers
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
ICML 2025
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting
AAAI 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
CVPR 2025
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
CVPR 2025
ROICtrl: Boosting Instance Control for Visual Generation
CVPR 2025
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
CVPR 2025
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
ICLR 2025
Bootstrapping SparseFormers from Vision Foundation Models
CVPR 2024
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
NIPS 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
NIPS 2024
Learning Video Context as Interleaved Multimodal Sequences
ECCV 2024
VideoLLM-online: Online Video Large Language Model for Streaming Video
CVPR 2024
All in One: Exploring Unified Video-Language Pre-Training
CVPR 2023
Affordance Grounding From Demonstration Video To Target Image
CVPR 2023
Too Large; Data Reduction for Vision-Language Pre-Training
ICCV 2023
UniVTG: Towards Unified Video-Language Temporal Grounding
ICCV 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
ICCV 2023
Learning Visual Prior via Generative Pre-Training
NIPS 2023
Egocentric Video-Language Pretraining
NIPS 2022