Zhenheng Yang

16 papers · 2017–2026 · 6 conferences · across top CS/AI conferences

Achievements

+7 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (5) 🏃 Academic Marathon (8) 🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (35)

🌍 Conference Polyglot (5) 🏃 Academic Marathon (8) 🌈 Renaissance Researcher (5) 💎 Century Club (15) 🗃️ Keyword Collector (59) ⚡ Prolific Year (7) 🔥 Unstoppable (5)

Conferences

CVPR (7) ICCV (4) ICLR (2) AAAI (1) ECCV (1) EMNLP (1)

Top co-authors

Ram Nevatia (5) Jian Yang (3) Yang Wang (3) Peng Wang (3) Rui Xie (3) Penghao Zhou (3) Wei Xu (3) Ying Tai (3) Jiyang Gao (2) Tiehan Fan (2)

Keywords

video generation (3) unsupervised learning (3) large language model (2) multimodal learning (2) temporal consistency (2) weakly supervised learning (2) convolutional neural network (2) object detection (1) video segmentation (1) action recognition (1) video captioning (1) mathematical reasoning (1) depth estimation (1) video super-resolution (1) image restoration (1) context modeling (1) edge detection (1) joint learning (1) geometric prior (1) instance segmentation (1)

Papers

UniAPO: Unified Multimodal Automated Prompt Optimization AAAI 2026 InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning EMNLP 2025 InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption CVPR 2025 Parallelized Autoregressive Visual Generation CVPR 2025 Long Context Tuning for Video Generation ICCV 2025 STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution ICCV 2025 OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation ICLR 2025 Show-o: One Single Transformer to Unify Multimodal Understanding and Generation ICLR 2025 Weakly Supervised Instance Segmentation for Videos With Temporal Mask Consistency CVPR 2021 SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization ECCV 2020 Activity Driven Weakly Supervised Object Detection CVPR 2019 UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos CVPR 2019 Occlusion Aware Unsupervised Learning of Optical Flow CVPR 2018 LEGO: Learning Edge With Geometry All at Once by Watching Videos CVPR 2018 TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals ICCV 2017 TALL: Temporal Activity Localization via Language Query ICCV 2017