Qi Dai

38 papers · 2015–2026 · 10 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌍 Conference Polyglot (10) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (11)

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (11) 🤝 Dynamic Duo (11) 🏆 Grand Slam 🔬 Deep Specialist (10) 🧬 Topic Evolution 📈 Trend Setter 🚀 Conference Pioneer ⚡ Prolific Year (6) 🗃️ Keyword Collector (170) 💎 Century Club (35) 🔥 Unstoppable (9)

Conferences

CVPR (14) ICCV (10) AAAI (3) ACL (2) ICLR (2) NIPS (2) WACV (2) ECCV (1) ICML (1) IJCAI (1)

Top co-authors

Zuxuan Wu (11) Chong Luo (11) Zhi-Qi Cheng (10) Yu-Gang Jiang (10) Han Hu (8) Zhen Xing (6) Kai Qiu (5) Jianmin Bao (5) Yifan Yang (4) Zheng Zhang (4)

Keywords

diffusion model (9) video generation (7) representation learning (4) image generation (4) vision-language model (3) multimodal learning (3) reinforcement learning (3) video editing (3) text-to-video generation (3) variational autoencoder (2) image classification (2) weakly supervised learning (2) transfer learning (2) convolutional neural network (2) depth estimation (2) vision transformer (2) video recognition (2) image retrieval (2) contrastive learning (2) self-supervised learning (2)

Papers

LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation AAAI 2026 MageBench: Bridging Large Multimodal Models to Agents WACV 2026 SimRPD: Optimizing Recruitment Proactive Dialogue Agents through Simulator-Based Data Evaluation and Selection ACL 2026 HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models AAAI 2026 JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers ICCV 2025 REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents ICCV 2025 FaceA-Net: Facial Attribute-Driven ID Preserving Image Generation Network AAAI 2025 UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval WACV 2025 MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance ICCV 2025 MotionFollower: Editing Video Motion via Score-Guided Diffusion ICCV 2025 AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction ICCV 2025 FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis CVPR 2025 StableAnimator: High-Quality Identity-Preserving Human Image Animation CVPR 2025 HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion CVPR 2025 Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms NIPS 2024 Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions NIPS 2024 MotionEditor: Editing Video Motion via Content-Aware Diffusion CVPR 2024 SimDA: Simple Diffusion Adapter for Efficient Video Generation CVPR 2024 MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation CVPR 2024 BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition CVPR 2024 SVFormer: Semi-Supervised Video Transformer for Action Recognition CVPR 2023 ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules ICCV 2023 Implicit Temporal Modeling with Learnable Alignment for Video Recognition ICCV 2023 All in Tokens: Unifying Output Space of Visual Tasks via Soft Token ICCV 2023 HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer ICLR 2023 ResFormer: Scaling ViTs With Multi-Resolution Training CVPR 2023 On Data Scaling in Masked Image Modeling CVPR 2023 MPII: Multi-Level Mutual Promotion for Inference and Interpretation ACL 2022 On the Connection between Local Attention and Dynamic Depth-wise Convolution ICLR 2022 Rethinking Spatial Invariance of Convolutional Networks for Object Counting CVPR 2022 SimMIM: A Simple Framework for Masked Image Modeling CVPR 2022 Temporal Action Detection With Multi-Level Supervision ICCV 2021 Informative Dropout for Robust Representation Learning: A Shape-bias Perspective ICML 2020 Weakly-Supervised Action Localization by Generative Attention Modeling CVPR 2020 Deep Incremental Hashing Network for Efficient Image Retrieval CVPR 2019 Learning Spatial Awareness to Improve Crowd Counting ICCV 2019 Recurrent Tubelet Proposal and Recognition Networks for Action Detection ECCV 2018 Optimal Bayesian Hashing for Efficient Face Recognition IJCAI 2015