Zhuowen Tu

75 papers · 2013–2026 · 13 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (13) 🏃 Academic Marathon (13) 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (107)

🗺️ Taxonomy Completionist (107) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (29) 🤝 Dynamic Duo (12) 🏆 Keyword Champion (2) 🔬 Deep Specialist (11) 🏆 Grand Slam 🧬 Topic Evolution 💎 Century Club (75) ❓ The Questioner 🔥 Unstoppable (14) 📈 Trend Setter 🗃️ Keyword Collector (303) ⚡ Prolific Year (11) 🚀 Conference Pioneer

Conferences

CVPR (29) ICCV (16) ECCV (5) ICLR (4) NIPS (4) WACV (4) EMNLP (3) ICML (3) AAAI (2) AISTATS (2) ACL (1) IJCAI (1) IJCNLP (1)

Top co-authors

Stefano Soatto (12) Weijian Xu (10) Xiang Zhang (9) Yifan Xu (9) Zheng Ding (9) Zeyuan Chen (8) Saining Xie (5) Justin Lazarow (5) Mingze Xu (4) Zhaowei Cai (4)

Research topics

Techniques (1)

Keywords

image classification (9) convolutional neural network (7) diffusion model (6) generative model (6) instance segmentation (6) representation learning (5) transformer architecture (5) semi-supervised learning (4) object detection (4) multimodal learning (4) panoptic segmentation (4) knowledge distillation (4) vision transformer (3) scene understanding (3) self-supervised learning (3) image segmentation (3) image generation (3) 3d reconstruction (3) 3d vision (3) unsupervised learning (3)

Papers

Gaussian Swaying: Surface-Based Framework for Aerodynamic Simulation with 3D Gaussians WACV 2026 AuthGuard: Generalizable Deepfake Detection via Language Guidance WACV 2026 CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning WACV 2026 Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels CVPR 2025 DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion ICCV 2025 YOLO-Count: Differentiable Object Counting for Text-to-Image Generation ICCV 2025 Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers ICCV 2025 On the Scalability of Diffusion-based Text-to-Image Generation CVPR 2024 HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data CVPR 2024 Non-autoregressive Sequence-to-Sequence Vision-Language Models CVPR 2024 TokenCompose: Text-to-Image Diffusion with Token-level Supervision CVPR 2024 Enhancing Vision-Language Pre-training with Rich Supervisions CVPR 2024 Bayesian Diffusion Models for 3D Shape Reconstruction CVPR 2024 Restoration by Generation with Constrained Priors CVPR 2024 Patched Denoising Diffusion Models For High-Resolution Image Synthesis ICLR 2024 BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions AAAI 2024 When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages EMNLP 2024 DocKD: Knowledge Distillation from LLMs for Open-World Document Understanding Models EMNLP 2024 Dolfin: Diffusion Layout Transformers without Autoencoder ECCV 2024 Open-World Dynamic Prompt and Continual Visual Representation Learning ECCV 2024 DiffusionRig: Learning Personalized Priors for Facial Appearance Editing CVPR 2023 Guided Recommendation for Model Fine-Tuning CVPR 2023 Distilling Large Vision-Language Model with Out-of-Distribution Generalizability ICCV 2023 MasQCLIP for Open-Vocabulary Universal Image Segmentation ICCV 2023 Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction ICCV 2023 Object-Centric Multiple Object Tracking ICCV 2023 SkeleTR: Towards Skeleton-based Action Recognition in the Wild ICCV 2023 DocTr: Document Transformer for Structured Information Extraction in Documents ICCV 2023 Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction ICCV 2023 On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning ICLR 2023 Open-Vocabulary Universal Image Segmentation with MaskCLIP ICML 2023 An In-depth Study of Stochastic Backpropagation NIPS 2022 X-DETR: A Versatile Architecture for Instance-Wise Vision-Language Tasks ECCV 2022 The Geometry of Multilingual Language Model Representations EMNLP 2022 Instance Segmentation With Mask-Supervised Polygonal Boundary Transformers CVPR 2022 Text Spotting Transformers CVPR 2022 MeMOT: Multi-Object Tracking With Memory CVPR 2022 Semi-supervised Vision Transformers at Scale NIPS 2022 ViTGAN: Training GANs with Vision Transformers ICLR 2022 Co-Scale Conv-Attentional Image Transformers ICCV 2021 Visual Relationship Detection Using Part-and-Sum Transformers With Composite Queries ICCV 2021 Exponential Moving Average Normalization for Self-Supervised and Semi-Supervised Learning CVPR 2021 Attentional Constellation Nets for Few-Shot Learning ICLR 2021 Long Short-Term Transformer for Online Action Detection NIPS 2021 Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models ACL 2021 Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models IJCNLP 2021 Dual Contradistinctive Generative Autoencoder CVPR 2021 Line Segment Detection Using Transformers Without Edges CVPR 2021 Compatibility-Aware Heterogeneous Visual Search CVPR 2021 Pose Recognition With Cascade Transformers CVPR 2021 One-Pixel Signature: Characterizing CNN Models for Backdoor Detection ECCV 2020 Local Binary Pattern Networks WACV 2020 Guided Variational Autoencoder for Disentanglement Learning CVPR 2020 Recognizing Objects From Any View With Object and Viewer-Centered Representations CVPR 2020 Learning Instance Occlusion for Panoptic Segmentation CVPR 2020 3D Volumetric Modeling with Introspective Neural Networks AAAI 2019 Attentional ShapeContextNet for Point Cloud Recognition CVPR 2018 Wasserstein Introspective Neural Networks CVPR 2018 Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification ECCV 2018 Deep Convolutional Neural Networks with Merge-and-Run Mappings IJCAI 2018 Deeply Supervised Salient Object Detection With Short Connections CVPR 2017 Aggregated Residual Transformations for Deep Neural Networks CVPR 2017 Introspective Neural Networks for Generative Modeling ICCV 2017 Introspective Classification with Convolutional Nets NIPS 2017 Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree AISTATS 2016 Deeply-Supervised Nets AISTATS 2015 Holistically-Nested Edge Detection ICCV 2015 MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation CVPR 2014 Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification ICCV 2013 Action Recognition with Actons ICCV 2013 Sparse Subspace Denoising for Image Manifolds CVPR 2013 Robust Estimation of Nonrigid Transformation for Point Set Registration CVPR 2013 Harvesting Mid-level Visual Concepts from Large-Scale Internet Images CVPR 2013 Max-Margin Multiple-Instance Dictionary Learning ICML 2013 Fixed-Point Model For Structured Labeling ICML 2013