Yinan He

16 papers · 2021–2025 · 5 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🗺️ Taxonomy Completionist (29) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (5) 🧭 Keyword Pioneer

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (14) 👥 Mega-Team (38) ⚡ Prolific Year (6) 🗃️ Keyword Collector (69) 🔥 Unstoppable (5) 💎 Century Club (16) ❓ The Questioner

Conferences

CVPR (6) ICCV (4) ECCV (3) ICLR (2) NIPS (1)

Top co-authors

Yu Qiao (14) Yi Wang (12) Limin Wang (12) Yali Wang (10) Kunchang Li (7) Jiashuo Yu (5) Ziwei Liu (4) Yizhuo Li (4) Xinhao Li (4) Guo Chen (3)

Keywords

video understanding (4) zero-shot learning (2) large language model (2) diffusion model (2) benchmark evaluation (2) vision transformer (2) masked autoencoder (2) image restoration (1) multi-task learning (1) semantic segmentation (1) video classification (1) self-supervised learning (1) weakly supervised learning (1) in-context learning (1) video recognition (1) video generation (1) multi-modal learning (1) multimodal learning (1) question answering (1) temporal reasoning (1)

Papers

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment CVPR 2025 WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images CVPR 2025 DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations ICCV 2025 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ICLR 2025 VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos ICCV 2025 VideoMamba: State Space Model for Efficient Video Understanding ECCV 2024 MVBench: A Comprehensive Multi-modal Video Understanding Benchmark CVPR 2024 VBench: Comprehensive Benchmark Suite for Video Generative Models CVPR 2024 Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection? NIPS 2024 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding ECCV 2024 InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation ICLR 2024 Unmasked Teacher: Towards Training-Efficient Video Foundation Models ICCV 2023 VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking CVPR 2023 UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding ICCV 2023 X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation ECCV 2022 ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis CVPR 2021