Zehuan Yuan

46 papers · 2017–2026 · 8 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌍 Conference Polyglot (8) 🏃 Academic Marathon (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)

🐝 Cross-Pollinator (11) 🗺️ Taxonomy Completionist (61) 👑 Triple Crown 🤝 Dynamic Duo (25) 👥 Mega-Team (22) 🔬 Deep Specialist (11) 🏆 Grand Slam 🏆 Keyword Champion (3) 💎 Century Club (45) 🔥 Unstoppable (6) ❓ The Questioner ⚡ Prolific Year (13) 🗃️ Keyword Collector (181)

Conferences

CVPR (16) NIPS (8) ECCV (6) ICCV (6) AAAI (4) ICLR (4) ICML (1) IJCAI (1)

Top co-authors

Yi Jiang (26) Ping Luo (15) Peize Sun (11) Changhu Wang (9) Chuofan Ma (5) Jiannan Wu (5) Bin Yan (5) Chuang Lin (4) Dongdong Yu (4) Huchuan Lu (4)

Keywords

object detection (10) vision-language model (5) image generation (5) semantic segmentation (4) transformer architecture (4) knowledge distillation (4) contrastive learning (4) model compression (3) zero-shot learning (3) object tracking (3) diffusion model (3) visual generation (3) instance segmentation (3) region proposal (3) image classification (3) multi-modal learning (2) multimodal learning (2) video understanding (2) representation learning (2) video generation (2)

Papers

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation AAAI 2026 Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis CVPR 2025 TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation CVPR 2025 Goku: Flow Based Video Generative Foundation Models CVPR 2025 Recognize Any Regions NIPS 2024 Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction NIPS 2024 Generative Region-Language Pretraining for Open-Ended Object Detection CVPR 2024 EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE AAAI 2024 General Object Foundation Model for Images and Videos at Scale CVPR 2024 Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models ECCV 2024 OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation NIPS 2024 Meta Compositional Referring Expression Segmentation CVPR 2023 CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection NIPS 2023 Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-Commerce CVPR 2023 Universal Instance Perception As Object Discovery and Retrieval CVPR 2023 Token Boosting for Robust Self-Supervised Visual Transformer Pre-Training CVPR 2023 EGC: Image Generation and Classification via a Diffusion Energy-Based Model ICCV 2023 Segment Every Reference Object in Spatial and Temporal Spaces ICCV 2023 Exploring Transformers for Open-world Instance Segmentation ICCV 2023 Learning Object-Language Alignments for Open-Vocabulary Object Detection ICLR 2023 Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling ICLR 2023 Content-Variant Reference Image Quality Assessment via Knowledge Distillation AAAI 2022 Objects in Semantic Topology ICLR 2022 Language As Queries for Referring Video Object Segmentation CVPR 2022 QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query NIPS 2022 Focal and Global Knowledge Distillation for Detectors CVPR 2022 Rethinking Resolution in the Context of Efficient Video Recognition NIPS 2022 You Should Look at All Objects ECCV 2022 Masked Generative Distillation ECCV 2022 Towards Grand Unification of Object Tracking ECCV 2022 ByteTrack: Multi-Object Tracking by Associating Every Detection Box ECCV 2022 Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation ECCV 2022 Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding NIPS 2022 DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion CVPR 2022 Exploring Balanced Feature Spaces for Representation Learning ICLR 2021 Slimmable Generative Adversarial Networks AAAI 2021 Domain-Invariant Disentangled Network for Generalizable Object Detection ICCV 2021 Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective ICCV 2021 Weakly Supervised Person Search With Region Siamese Networks ICCV 2021 Sparse R-CNN: End-to-End Object Detection With Learnable Proposals CVPR 2021 Disentangled Contrastive Learning on Graphs NIPS 2021 What Makes for End-to-End Object Detection? ICML 2021 Controllable Orthogonalization in Training DNNs CVPR 2020 Non-Local Neural Networks With Grouped Bilinear Attentional Transforms CVPR 2020 Temporal Action Localization by Structured Maximal Sums CVPR 2017 Deep-dense Conditional Random Fields for Object Co-segmentation IJCAI 2017