Qi Dai
38 papers · 2015–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
🌍 Conference Polyglot (10) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (11)
🧭
Keyword Pioneer
🐣
Hot Topic Early Bird
🏃
Academic Marathon
(11)
🤝
Dynamic Duo
(11)
🏆
Grand Slam
🔬
Deep Specialist
(10)
🧬
Topic Evolution
📈
Trend Setter
🚀
Conference Pioneer
⚡
Prolific Year
(6)
🗃️
Keyword Collector
(170)
💎
Century Club
(35)
🔥
Unstoppable
(9)
Conferences
CVPR (14)
ICCV (10)
AAAI (3)
ACL (2)
ICLR (2)
NIPS (2)
WACV (2)
ECCV (1)
ICML (1)
IJCAI (1)
Top co-authors
Keywords
diffusion model
(9)
video generation
(7)
representation learning
(4)
image generation
(4)
vision-language model
(3)
multimodal learning
(3)
reinforcement learning
(3)
video editing
(3)
text-to-video generation
(3)
variational autoencoder
(2)
image classification
(2)
weakly supervised learning
(2)
transfer learning
(2)
convolutional neural network
(2)
depth estimation
(2)
vision transformer
(2)
video recognition
(2)
image retrieval
(2)
contrastive learning
(2)
self-supervised learning
(2)
Papers
LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
AAAI 2026
MageBench: Bridging Large Multimodal Models to Agents
WACV 2026
SimRPD: Optimizing Recruitment Proactive Dialogue Agents through Simulator-Based Data Evaluation and Selection
ACL 2026
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
AAAI 2026
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
ICCV 2025
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
ICCV 2025
FaceA-Net: Facial Attribute-Driven ID Preserving Image Generation Network
AAAI 2025
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
WACV 2025
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
ICCV 2025
MotionFollower: Editing Video Motion via Score-Guided Diffusion
ICCV 2025
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
ICCV 2025
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis
CVPR 2025
StableAnimator: High-Quality Identity-Preserving Human Image Animation
CVPR 2025
HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
CVPR 2025
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
NIPS 2024
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
NIPS 2024
MotionEditor: Editing Video Motion via Content-Aware Diffusion
CVPR 2024
SimDA: Simple Diffusion Adapter for Efficient Video Generation
CVPR 2024
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
CVPR 2024
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
CVPR 2024
SVFormer: Semi-Supervised Video Transformer for Action Recognition
CVPR 2023
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
ICCV 2023
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
ICCV 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
ICCV 2023
HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer
ICLR 2023
ResFormer: Scaling ViTs With Multi-Resolution Training
CVPR 2023
On Data Scaling in Masked Image Modeling
CVPR 2023
MPII: Multi-Level Mutual Promotion for Inference and Interpretation
ACL 2022
On the Connection between Local Attention and Dynamic Depth-wise Convolution
ICLR 2022
Rethinking Spatial Invariance of Convolutional Networks for Object Counting
CVPR 2022
SimMIM: A Simple Framework for Masked Image Modeling
CVPR 2022
Temporal Action Detection With Multi-Level Supervision
ICCV 2021
Informative Dropout for Robust Representation Learning: A Shape-bias Perspective
ICML 2020
Weakly-Supervised Action Localization by Generative Attention Modeling
CVPR 2020
Deep Incremental Hashing Network for Efficient Image Retrieval
CVPR 2019
Learning Spatial Awareness to Improve Crowd Counting
ICCV 2019
Recurrent Tubelet Proposal and Recognition Networks for Action Detection
ECCV 2018
Optimal Bayesian Hashing for Efficient Face Recognition
IJCAI 2015