Tao Mei

123 papers · 2015–2026 · 10 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🌍 Conference Polyglot (10) 🏃 Academic Marathon (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10)

🐝 Cross-Pollinator (10) 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (123) 🏠 Conference Loyalist (56) 🧬 Topic Evolution 🤝 Dynamic Duo (77) 🏆 Keyword Champion (3) 🏆 Grand Slam 🔬 Deep Specialist (19) 🚀 Conference Pioneer 🗃️ Keyword Collector (457) 📈 Trend Setter 🔥 Unstoppable (11) 💎 Century Club (122) ⚡ Prolific Year (17)

Conferences

CVPR (56) ICCV (22) ECCV (18) AAAI (11) IJCAI (6) NIPS (4) ICML (3) ACL (1) EMNLP (1) ICLR (1)

Top co-authors

Ting Yao (78) Yingwei Pan (51) Zhaofan Qiu (30) Yehao Li (23) Fuchen Long (13) Chong-Wah Ngo (13) Wu Liu (12) Jiebo Luo (11) Wei Zhang (9) Hailin Shi (8)

Keywords

representation learning (13) convolutional neural network (13) diffusion model (10) video understanding (9) image captioning (8) action recognition (8) attention mechanism (7) domain adaptation (6) metric learning (6) semantic segmentation (6) image generation (6) video captioning (6) self-supervised learning (6) transfer learning (5) visual question answering (5) contrastive learning (5) object detection (5) recurrent neural network (5) multimodal learning (4) unsupervised learning (4)

Papers

FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting AAAI 2026 Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots ICML 2025 Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion AAAI 2025 Aligning Global Semantics and Local Textures in Generative Video Enhancement ICCV 2025 Denoising Token Prediction in Masked Autoregressive Models ICCV 2025 Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On ICLR 2025 Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction CVPR 2025 MotionPro: A Precise Motion Controller for Image-to-Video Generation CVPR 2025 Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution CVPR 2024 DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation ECCV 2024 VideoStudio: Generating Consistent-Content and Multi-Scene Videos ECCV 2024 Improving Virtual Try-On with Garment-focused Diffusion Models ECCV 2024 Improving Text-guided Object Inpainting with Semantic Pre-inpainting ECCV 2024 VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation CVPR 2024 SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer CVPR 2024 TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models CVPR 2024 Boosting Diffusion Models with Moving Average Sampling in Frequency Domain CVPR 2024 Prompt Refinement with Image Pivot for Text-to-Image Generation ACL 2024 TRACE: 5D Temporal Regression of Avatars With Dynamic Cameras in 3D Environments CVPR 2023 Learning Neural Implicit Surfaces with Object-Aware Radiance Fields ICCV 2023 ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion ICCV 2023 PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering CVPR 2023 Modality-Agnostic Debiasing for Single Domain Generalization CVPR 2023 HGNet: Learning Hierarchical Geometry From Points, Edges, and Surfaces CVPR 2023 Semantic-Conditional Diffusion Networks for Image Captioning CVPR 2023 AnchorFormer: Point Cloud Completion From Discriminative Nodes CVPR 2023 Learning To Generate Language-Supervised and Open-Vocabulary Scene Graph Using Pre-Trained Visual-Semantic Space CVPR 2023 Gait Recognition in the Wild With Dense 3D Representations and a Benchmark CVPR 2022 Memory-Augmented Non-Local Attention for Video Super-Resolution CVPR 2022 Putting People in Their Place: Monocular Regression of 3D People in Depth CVPR 2022 Out-of-Distribution Detection via Conditional Kernel Independence Model NIPS 2022 Generalized One-shot Domain Adaptation of Generative Adversarial Networks NIPS 2022 Directional Self-Supervised Learning for Heavy Image Augmentations CVPR 2022 Comprehending and Ordering Semantics for Image Captioning CVPR 2022 Stand-Alone Inter-Frame Attention in Video Models CVPR 2022 MLP-3D: A MLP-Like 3D Architecture With Grouped Time Mixing CVPR 2022 Responsive Listening Head Generation: A Benchmark Dataset and Baseline ECCV 2022 Dynamic Temporal Filtering In Video Models ECCV 2022 Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning ECCV 2022 CAViT: Contextual Alignment Vision Transformer for Video Object Re-identification ECCV 2022 SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement ECCV 2022 Exploring Structure-Aware Transformer Over Interaction Proposals for Human-Object Interaction Detection CVPR 2022 Monocular, One-Stage, Regression of Multiple 3D People ICCV 2021 Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration NIPS 2021 Exploiting Relationship for Complex-scene Image Generation AAAI 2021 Weakly Supervised Semantic Segmentation for Large-Scale Point Cloud AAAI 2021 Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network AAAI 2021 SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning AAAI 2021 Dive Into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition CVPR 2021 Boosting Video Representation Learning With Multi-Faceted Integration CVPR 2021 Representing Videos As Discriminative Sub-Graphs for Action Recognition CVPR 2021 Group-aware Label Transfer for Domain Adaptive Person Re-identification CVPR 2021 Action Unit Memory Network for Weakly Supervised Temporal Action Localization CVPR 2021 Condensing a Sequence to One Informative Frame for Video Recognition ICCV 2021 Explainable Person Re-Identification With Attribute-Guided Metric Distillation ICCV 2021 Motion-Focused Contrastive Learning of Video Representations ICCV 2021 A Style and Semantic Memory Mechanism for Domain Generalization ICCV 2021 CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification ICCV 2021 Optimization Planning for 3D ConvNets ICML 2021 Mis-Classified Vector Guided Softmax Loss for Face Recognition AAAI 2020 Learning to Localize Actions from Moments ECCV 2020 Semi-Siamese Training for Shallow Face Learning ECCV 2020 Edge-aware Graph Representation Learning and Reasoning for Face Parsing ECCV 2020 Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation ECCV 2020 Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition ECCV 2020 Joint Contrastive Learning with Infinite Possibilities NIPS 2020 Learning the Compositional Visual Coherence for Complementary Recommendations IJCAI 2020 Transferring and Regularizing Prediction for Semantic Segmentation CVPR 2020 Exploring Category-Agnostic Clusters for Open-Set Domain Adaptation CVPR 2020 Look-Into-Object: Self-Supervised Structure Modeling for Object Recognition CVPR 2020 X-Linear Attention Networks for Image Captioning CVPR 2020 Learning a Unified Sample Weighting Network for Object Detection CVPR 2020 Loss Function Search for Face Recognition ICML 2020 A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing AAAI 2020 Learning Spatio-Temporal Representation With Local and Global Diffusion CVPR 2019 Customizable Architecture Search for Semantic Segmentation CVPR 2019 Destruction and Construction Learning for Fine-Grained Image Recognition CVPR 2019 Social Relation Recognition From Videos via Multi-Scale Spatial-Temporal Reasoning CVPR 2019 Unsupervised Person Image Generation With Semantic Parsing Transformation CVPR 2019 ScratchDet: Training Single-Shot Object Detectors From Scratch CVPR 2019 Transferrable Prototypical Networks for Unsupervised Domain Adaptation CVPR 2019 Gaussian Temporal Awareness Networks for Action Localization CVPR 2019 Hierarchy Parsing for Image Captioning ICCV 2019 Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation ICCV 2019 Relation Distillation Networks for Video Object Detection ICCV 2019 Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization ICCV 2019 Co-Mining: Deep Face Recognition With Noisy Labels ICCV 2019 VrR-VG: Refocusing Visually-Relevant Relationships ICCV 2019 Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation IJCAI 2019 To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression AAAI 2019 Structured Two-Stream Attention Network for Video Question Answering AAAI 2019 Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning AAAI 2019 Pointing Novel Objects in Image Captioning CVPR 2019 Deep Attention Neural Tensor Network for Visual Question Answering ECCV 2018 Exploring Visual Relationship for Image Captioning ECCV 2018 DA-GAN: Instance-Level Image Translation by Deep Attention Generative Adversarial Networks CVPR 2018 Memory Matching Networks for One-Shot Image Recognition CVPR 2018 Part-Aligned Bilinear Representations for Person Re-Identification ECCV 2018 Fully Convolutional Adaptation Networks for Semantic Segmentation CVPR 2018 Jointly Localizing and Describing Events for Dense Video Captioning CVPR 2018 Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions EMNLP 2018 Recurrent Tubelet Proposal and Recognition Networks for Action Detection ECCV 2018 Boosting Image Captioning With Attributes ICCV 2017 Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition ICCV 2017 Learning Spatio-Temporal Representation With Pseudo-3D Residual Networks ICCV 2017 Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks IJCAI 2017 Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects CVPR 2017 Deep Quantization: Encoding Convolutional Activations With Deep Generative Model CVPR 2017 Video Captioning With Transferred Semantic Attributes CVPR 2017 Multi-Level Attention Networks for Visual Question Answering CVPR 2017 Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition CVPR 2017 Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge ICCV 2017 Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure IJCAI 2016 Highlight Detection With Pairwise Deep Ranking for First-Person Video Summarization CVPR 2016 You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images CVPR 2016 Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval IJCAI 2016 Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks IJCAI 2016 MSR-VTT: A Large Video Description Dataset for Bridging Video and Language CVPR 2016 Jointly Modeling Embedding and Translation to Bridge Video and Language CVPR 2016 Learning Query and Image Similarities With Ranking Canonical Correlation Analysis ICCV 2015 Relaxing From Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging ICCV 2015 Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection CVPR 2015 Semi-Supervised Domain Adaptation With Subspace Learning for Visual Recognition CVPR 2015