Zuxuan Wu

84 papers · 2016–2026 · 11 conferences · across top CS/AI conferences

Achievements

+19 more ↓

🌍 Conference Polyglot (11) 🏃 Academic Marathon (9) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12)

🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (114) 🌉 Interdisciplinary Bridge 🌟 Keyword Trendsetter Combo (3) 🏠 Conference Loyalist (32) 📛 The Namer 🤝 Dynamic Duo (48) 🏆 Grand Slam 👥 Mega-Team (20) 🔬 Deep Specialist (15) 🧬 Topic Evolution 🏆 Keyword Champion (11) 📈 Trend Setter ⚡ Prolific Year (15) 🚀 Conference Pioneer ❓ The Questioner 🔥 Unstoppable (10) 💎 Century Club (82) 🗃️ Keyword Collector (373)

Conferences

CVPR (32) ICCV (15) AAAI (12) ECCV (9) NIPS (9) ICLR (2) ACL (1) EMNLP (1) ICML (1) IJCAI (1) WACV (1)

Top co-authors

Yu-Gang Jiang (49) Larry S. Davis (14) Qi Dai (11) Zhen Xing (11) Xintong Han (10) Jingjing Chen (8) Dongdong Chen (8) Shiyi Lan (8) Junke Wang (7) Ser-Nam Lim (7)

Research topics

Core AI (1)

Keywords

diffusion model (11) video recognition (11) object detection (9) video generation (8) multimodal learning (7) reinforcement learning (7) action recognition (6) contrastive learning (5) adversarial attack (5) adversarial perturbation (5) vision transformer (5) image generation (5) video understanding (5) knowledge distillation (5) semantic segmentation (4) convolutional neural network (4) zero-shot learning (4) video classification (4) multi-modal learning (4) transformer architecture (4)

Papers

DriveSuprim: Towards Precise Trajectory Selection for End-to-End Planning AAAI 2026 Human2Robot: Learning Robot Actions from Paired Human-Robot Videos AAAI 2026 MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance ICCV 2025 Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis ICCV 2025 CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation ICCV 2025 Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training ICCV 2025 MotionFollower: Editing Video Motion via Score-Guided Diffusion ICCV 2025 BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers CVPR 2025 REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents ICCV 2025 VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks ICCV 2025 ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning EMNLP 2025 EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation CVPR 2025 Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning ICCV 2025 Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection AAAI 2025 FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-from-gradients AAAI 2025 FOCUS: Towards Universal Foreground Segmentation AAAI 2025 AdaDiff: Adaptive Step Selection for Fast Diffusion Models AAAI 2025 AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments ACL 2025 AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction ICCV 2025 StableAnimator: High-Quality Identity-Preserving Human Image Animation CVPR 2025 Adaptive Retention & Correction: Test-Time Training for Continual Learning ICLR 2025 MotionEditor: Editing Video Motion via Content-Aware Diffusion CVPR 2024 MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing ECCV 2024 SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation ECCV 2024 DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation ECCV 2024 PromptFusion: Decoupling Stability and Plasticity for Continual Learning ECCV 2024 DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs NIPS 2024 Zero-shot High-fidelity and Pose-controllable Character Animation IJCAI 2024 OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation NIPS 2024 Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms NIPS 2024 GenRec: Unifying Video Generation and Recognition with Diffusion Models NIPS 2024 SimDA: Simple Diffusion Adapter for Efficient Video Generation CVPR 2024 Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding CVPR 2024 BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection CVPR 2024 Learning to Rank Patches for Unbiased Image Redundancy Reduction CVPR 2024 OmniViD: A Generative Framework for Universal Video Understanding CVPR 2024 Vision Transformers Are Good Mask Auto-Labelers CVPR 2023 SVFormer: Semi-Supervised Video Transformer for Action Recognition CVPR 2023 Look Before You Match: Instance Understanding Matters in Video Object Segmentation CVPR 2023 Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning CVPR 2023 Enhancing the Self-Universality for Transferable Targeted Attacks CVPR 2023 Prototypical Residual Networks for Anomaly Detection and Localization CVPR 2023 Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization ICML 2023 Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding CVPR 2023 Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection NIPS 2023 Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation NIPS 2023 Resolving Task Confusion in Dynamic Expansion Architectures for Class Incremental Learning AAAI 2023 Implicit Temporal Modeling with Learnable Alignment for Video Recognition ICCV 2023 Towards Scalable Neural Representation for Diverse Videos CVPR 2023 ResFormer: Scaling ViTs With Multi-Resolution Training CVPR 2023 ObjectFormer for Image Manipulation Detection and Localization CVPR 2022 OmniVL: One Foundation Model for Image-Language and Video-Language Tasks NIPS 2022 Attacking Video Recognition Models with Bullet-Screen Comments AAAI 2022 Rethinking Pseudo Labels for Semi-supervised Object Detection AAAI 2022 Boosting the Transferability of Video Adversarial Examples via Temporal Translation AAAI 2022 Towards Transferable Adversarial Attacks on Vision Transformers AAAI 2022 Robust Optimization As Data Augmentation for Large-Scale Graphs CVPR 2022 Cross-Modal Transferable Adversarial Attacks From Images to Videos CVPR 2022 BEVT: BERT Pretraining of Video Transformers CVPR 2022 AdaViT: Adaptive Vision Transformers for Efficient Image Recognition CVPR 2022 Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors ECCV 2022 Semi-Supervised Vision Transformers ECCV 2022 Efficient Video Transformers with Spatial-Temporal Token Selection ECCV 2022 M3DETR: Multi-Representation, Multi-Scale, Mutual-Relation 3D Object Detection With Transformers WACV 2022 Intentonomy: A Dataset and Study Towards Human Intent Understanding CVPR 2021 VideoLT: Large-Scale Long-Tailed Video Recognition ICCV 2021 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition CVPR 2021 Efficient Object Embedding for Spliced Image Retrieval CVPR 2021 Exploring Visual Engagement Signals for Representation Learning ICCV 2021 Encoding Robustness to Image Style via Adversarial Feature Perturbations NIPS 2021 Learning From Noisy Anchors for One-Stage Object Detection CVPR 2020 Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors ECCV 2020 Recognizing Instagram Filtered Images with Feature De-Stylization AAAI 2020 ACE: Adapting to Changing Environments for Semantic Segmentation ICCV 2019 LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition NIPS 2019 The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation CVPR 2019 AdaFrame: Adaptive Frame Selection for Fast Video Recognition CVPR 2019 Self-Monitoring Navigation Agent via Auxiliary Progress Estimation ICLR 2019 FiNet: Compatible and Diverse Fashion Image Inpainting ICCV 2019 VITON: An Image-Based Virtual Try-On Network CVPR 2018 DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation ECCV 2018 BlockDrop: Dynamic Inference Paths in Residual Networks CVPR 2018 Automatic Spatially-Aware Fashion Concept Discovery ICCV 2017 Harnessing Object and Scene Semantics for Large-Scale Video Understanding CVPR 2016