Xitong Yang

19 papers · 2015–2025 · 5 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (5) 🏃 Academic Marathon (10) 🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (36)

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (5) 🏃 Academic Marathon (10) 👥 Mega-Team (100) 💎 Century Club (19) 🗃️ Keyword Collector (79) ⚡ Prolific Year (5) 🔥 Unstoppable (7)

Conferences

CVPR (11) ECCV (4) ICCV (2) ICML (1) NIPS (1)

Top co-authors

Zuxuan Wu (6) Lorenzo Torresani (5) Yu-Gang Jiang (4) Md Mohaiminul Islam (3) Zejia Weng (3) Larry S. Davis (3) Huiyu Wang (3) Gedas Bertasius (3) Fu-Jen Chu (3) Tushar Nagarajan (3)

Research topics

Core AI (1)

Keywords

video understanding (6) action recognition (5) egocentric video (3) weakly supervised learning (3) video captioning (2) video recognition (2) weakly-supervised learning (2) long video (2) multiple instance learning (2) multimodal learning (2) video generation (1) object detection (1) vision transformer (1) zero-shot learning (1) curriculum learning (1) temporal dynamics (1) metric learning (1) entity linking (1) temporal reasoning (1) pose estimation (1)

Papers

Progress-Aware Video Frame Captioning CVPR 2025 GenRec: Unifying Video Generation and Recognition with Diffusion Models NIPS 2024 Learning to Segment Referred Objects from Narrated Egocentric Videos CVPR 2024 Video ReCap: Recursive Captioning of Hour-Long Videos CVPR 2024 Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives CVPR 2024 "Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos" ECCV 2024 Vision Transformers Are Good Mask Auto-Labelers CVPR 2023 Relational Space-Time Query in Long-Form Videos CVPR 2023 Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization ICML 2023 Towards Scalable Neural Representation for Diverse Videos CVPR 2023 ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization CVPR 2022 Semi-Supervised Vision Transformers ECCV 2022 Efficient Video Transformers with Spatial-Temporal Token Selection ECCV 2022 Beyond Short Clips: End-to-End Video-Level Learning With Collaborative Memories CVPR 2021 A Generic Visualization Approach for Convolutional Neural Networks ECCV 2020 Cross-X Learning for Fine-Grained Visual Categorization ICCV 2019 STEP: Spatio-Temporal Progressive Learning for Video Action Detection CVPR 2019 Deep Multimodal Representation Learning From Temporal Data CVPR 2017 Semantic Video Entity Linking Based on Visual Content and Metadata ICCV 2015