Shoufa Chen

14 papers · 2021–2026 · 6 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (5) 🏃 Academic Marathon (5) 🗺️ Taxonomy Completionist (22)

🐝 Cross-Pollinator (15) 🌍 Conference Polyglot (5) 🏆 Grand Slam 👑 Triple Crown 👥 Mega-Team (22) 🤝 Dynamic Duo (11) 🔥 Unstoppable (5) 💎 Century Club (13)

Conferences

ICCV (4) ICLR (4) CVPR (2) ICML (2) AAAI (1) NIPS (1)

Top co-authors

Ping Luo (12) Peize Sun (8) Chongjian GE (8) Shilong Zhang (3) Yibing Song (3) Runjian Chen (3) Wenqi Shao (2) Jiangliu Wang (2) Zhan Tong (2) Zehuan Yuan (2)

Keywords

text-to-video generation (3) object detection (2) diffusion transformer (2) diffusion model (2) video generation (2) text-to-image generation (2) flow matching (2) action recognition (1) multi-task learning (1) contrastive learning (1) video recognition (1) preference alignment (1) prompt engineering (1) generative model (1) vision transformer (1) action classification (1) state representation (1) transfer learning (1) parameter-efficient transfer learning (1) image generation (1)

Papers

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation AAAI 2026 ControlAR: Controllable Image Generation with Autoregressive Models ICLR 2025 Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM ICCV 2025 Goku: Flow Based Video Generative Foundation Models CVPR 2025 FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing ICLR 2024 RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis ICML 2024 GenTron: Diffusion Transformers for Image and Video Generation CVPR 2024 Going Denser with Open-Vocabulary Part Segmentation ICCV 2023 Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning ICLR 2023 DiffusionDet: Diffusion Model for Object Detection ICCV 2023 CycleMLP: A MLP-like Architecture for Dense Prediction ICLR 2022 AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition NIPS 2022 CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer ICML 2022 Watch Only Once: An End-to-End Video Action Detection Framework ICCV 2021