Zehuan Yuan
46 papers · 2017–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Conference Polyglot (8) π Academic Marathon (8) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (11)
π
Cross-Pollinator
(11)
πΊοΈ
Taxonomy Completionist
(61)
π
Triple Crown
π€
Dynamic Duo
(25)
π₯
Mega-Team
(22)
π¬
Deep Specialist
(11)
π
Grand Slam
π
Keyword Champion
(3)
π
Century Club
(45)
π₯
Unstoppable
(6)
β
The Questioner
β‘
Prolific Year
(13)
ποΈ
Keyword Collector
(181)
Conferences
CVPR (16)
NIPS (8)
ECCV (6)
ICCV (6)
AAAI (4)
ICLR (4)
ICML (1)
IJCAI (1)
Top co-authors
Keywords
object detection
(10)
vision-language model
(5)
image generation
(5)
semantic segmentation
(4)
transformer architecture
(4)
knowledge distillation
(4)
contrastive learning
(4)
model compression
(3)
zero-shot learning
(3)
object tracking
(3)
diffusion model
(3)
visual generation
(3)
instance segmentation
(3)
region proposal
(3)
image classification
(3)
multi-modal learning
(2)
multimodal learning
(2)
video understanding
(2)
representation learning
(2)
video generation
(2)
Papers
FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
AAAI 2026
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
CVPR 2025
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
CVPR 2025
Goku: Flow Based Video Generative Foundation Models
CVPR 2025
Recognize Any Regions
NIPS 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
NIPS 2024
Generative Region-Language Pretraining for Open-Ended Object Detection
CVPR 2024
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
AAAI 2024
General Object Foundation Model for Images and Videos at Scale
CVPR 2024
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
ECCV 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
NIPS 2024
Meta Compositional Referring Expression Segmentation
CVPR 2023
CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
NIPS 2023
Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-Commerce
CVPR 2023
Universal Instance Perception As Object Discovery and Retrieval
CVPR 2023
Token Boosting for Robust Self-Supervised Visual Transformer Pre-Training
CVPR 2023
EGC: Image Generation and Classification via a Diffusion Energy-Based Model
ICCV 2023
Segment Every Reference Object in Spatial and Temporal Spaces
ICCV 2023
Exploring Transformers for Open-world Instance Segmentation
ICCV 2023
Learning Object-Language Alignments for Open-Vocabulary Object Detection
ICLR 2023
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
ICLR 2023
Content-Variant Reference Image Quality Assessment via Knowledge Distillation
AAAI 2022
Objects in Semantic Topology
ICLR 2022
Language As Queries for Referring Video Object Segmentation
CVPR 2022
QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query
NIPS 2022
Focal and Global Knowledge Distillation for Detectors
CVPR 2022
Rethinking Resolution in the Context of Efficient Video Recognition
NIPS 2022
You Should Look at All Objects
ECCV 2022
Masked Generative Distillation
ECCV 2022
Towards Grand Unification of Object Tracking
ECCV 2022
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
ECCV 2022
Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation
ECCV 2022
Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
NIPS 2022
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
CVPR 2022
Exploring Balanced Feature Spaces for Representation Learning
ICLR 2021
Slimmable Generative Adversarial Networks
AAAI 2021
Domain-Invariant Disentangled Network for Generalizable Object Detection
ICCV 2021
Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective
ICCV 2021
Weakly Supervised Person Search With Region Siamese Networks
ICCV 2021
Sparse R-CNN: End-to-End Object Detection With Learnable Proposals
CVPR 2021
Disentangled Contrastive Learning on Graphs
NIPS 2021
What Makes for End-to-End Object Detection?
ICML 2021
Controllable Orthogonalization in Training DNNs
CVPR 2020
Non-Local Neural Networks With Grouped Bilinear Attentional Transforms
CVPR 2020
Temporal Action Localization by Structured Maximal Sums
CVPR 2017
Deep-dense Conditional Random Fields for Object Co-segmentation
IJCAI 2017