Zhongang Qi

27 papers · 2019–2025 · 7 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🏃 Academic Marathon (6) 🌍 Conference Polyglot (7) 🗺️ Taxonomy Completionist (55)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (7) 🤝 Dynamic Duo (22) ❓ The Questioner ⚡ Prolific Year (8) 💎 Century Club (27) 🗃️ Keyword Collector (135) 🔥 Unstoppable (7)

Conferences

CVPR (8) AAAI (7) ICCV (6) NIPS (3) ECCV (1) ICML (1) IJCAI (1)

Top co-authors

Ying Shan (22) Yuxin Chen (9) chunfeng yuan (7) Xintao Wang (7) Weiming Hu (7) Bing Li (7) Ziqi Zhang (7) Zongyang Ma (6) Xi Li (4) Xiaohu Qie (4)

Keywords

diffusion model (5) controllable generation (3) object detection (3) image synthesis (3) contrastive learning (2) image generation (2) spherical geometry (2) convolutional neural network (2) image-text retrieval (2) fine-grained understanding (2) transfer learning (2) video understanding (2) multimodal large language model (2) text-to-image generation (2) video captioning (1) mathematical reasoning (1) video generation (1) benchmark evaluation (1) disparity estimation (1) multimodal learning (1)

Papers

VisionMath: Vision-Form Mathematical Problem-Solving ICCV 2025 Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion CVPR 2025 Taming Rectified Flow for Inversion and Editing ICML 2025 CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities AAAI 2025 Less is More: Empowering GUI Agent with Context-Aware Simplification ICCV 2025 DOGR: Towards Versatile Visual Document Grounding and Referring ICCV 2025 Mamba-3VL: Taming State Space Model for 3D Vision Language Learning ICCV 2025 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding NIPS 2024 PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding CVPR 2024 How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval? CVPR 2024 T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models AAAI 2024 EA-VTR: Event-Aware Video-Text Retrieval ECCV 2024 SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model AAAI 2024 SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation IJCAI 2023 Exploiting Contextual Objects and Relations for 3D Visual Grounding NIPS 2023 Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval AAAI 2023 Accelerating the Training of Video Super-resolution Models AAAI 2023 LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation CVPR 2023 ViLEM: Visual-Language Error Modeling for Image-Text Retrieval CVPR 2023 Order-Prompted Tag Sequence Generation for Video Tagging ICCV 2023 MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing ICCV 2023 BTS: A Bi-Lingual Benchmark for Text Segmentation in the Wild CVPR 2022 Open-Book Video Captioning With Retrieve-Copy-Generate Network CVPR 2021 Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution NIPS 2021 Visualizing Deep Networks by Optimizing with Integrated Gradients AAAI 2020 ScaleNet - Improve CNNs through Recursively Rescaling Objects AAAI 2020 PointConv: Deep Convolutional Networks on 3D Point Clouds CVPR 2019