Zuxuan Wu
84 papers · 2016–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
π Conference Polyglot (11) π Academic Marathon (9) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (12)
π£
Hot Topic Early Bird
πΊοΈ
Taxonomy Completionist
(114)
π
Interdisciplinary Bridge
π
Keyword Trendsetter Combo
(3)
π
Conference Loyalist
(32)
π
The Namer
π€
Dynamic Duo
(48)
π
Grand Slam
π₯
Mega-Team
(20)
π¬
Deep Specialist
(15)
π§¬
Topic Evolution
π
Keyword Champion
(11)
π
Trend Setter
β‘
Prolific Year
(15)
π
Conference Pioneer
β
The Questioner
π₯
Unstoppable
(10)
π
Century Club
(82)
ποΈ
Keyword Collector
(373)
Conferences
CVPR (32)
ICCV (15)
AAAI (12)
ECCV (9)
NIPS (9)
ICLR (2)
ACL (1)
EMNLP (1)
ICML (1)
IJCAI (1)
WACV (1)
Top co-authors
Research topics
Keywords
diffusion model
(11)
video recognition
(11)
object detection
(9)
video generation
(8)
multimodal learning
(7)
reinforcement learning
(7)
action recognition
(6)
contrastive learning
(5)
adversarial attack
(5)
adversarial perturbation
(5)
vision transformer
(5)
image generation
(5)
video understanding
(5)
knowledge distillation
(5)
semantic segmentation
(4)
convolutional neural network
(4)
zero-shot learning
(4)
video classification
(4)
multi-modal learning
(4)
transformer architecture
(4)
Papers
DriveSuprim: Towards Precise Trajectory Selection for End-to-End Planning
AAAI 2026
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
AAAI 2026
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
ICCV 2025
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
ICCV 2025
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
ICCV 2025
Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training
ICCV 2025
MotionFollower: Editing Video Motion via Score-Guided Diffusion
ICCV 2025
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
CVPR 2025
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
ICCV 2025
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
ICCV 2025
ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning
EMNLP 2025
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
CVPR 2025
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning
ICCV 2025
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
AAAI 2025
FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-from-gradients
AAAI 2025
FOCUS: Towards Universal Foreground Segmentation
AAAI 2025
AdaDiff: Adaptive Step Selection for Fast Diffusion Models
AAAI 2025
AgentGym: Evaluating and Training Large Language Model-based Agents across Diverse Environments
ACL 2025
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
ICCV 2025
StableAnimator: High-Quality Identity-Preserving Human Image Animation
CVPR 2025
Adaptive Retention & Correction: Test-Time Training for Continual Learning
ICLR 2025
MotionEditor: Editing Video Motion via Content-Aware Diffusion
CVPR 2024
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
ECCV 2024
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
ECCV 2024
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
ECCV 2024
PromptFusion: Decoupling Stability and Plasticity for Continual Learning
ECCV 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
NIPS 2024
Zero-shot High-fidelity and Pose-controllable Character Animation
IJCAI 2024
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
NIPS 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
NIPS 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models
NIPS 2024
SimDA: Simple Diffusion Adapter for Efficient Video Generation
CVPR 2024
Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding
CVPR 2024
BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
CVPR 2024
Learning to Rank Patches for Unbiased Image Redundancy Reduction
CVPR 2024
OmniViD: A Generative Framework for Universal Video Understanding
CVPR 2024
Vision Transformers Are Good Mask Auto-Labelers
CVPR 2023
SVFormer: Semi-Supervised Video Transformer for Action Recognition
CVPR 2023
Look Before You Match: Instance Understanding Matters in Video Object Segmentation
CVPR 2023
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning
CVPR 2023
Enhancing the Self-Universality for Transferable Targeted Attacks
CVPR 2023
Prototypical Residual Networks for Anomaly Detection and Localization
CVPR 2023
Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization
ICML 2023
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
CVPR 2023
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
NIPS 2023
Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
NIPS 2023
Resolving Task Confusion in Dynamic Expansion Architectures for Class Incremental Learning
AAAI 2023
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
ICCV 2023
Towards Scalable Neural Representation for Diverse Videos
CVPR 2023
ResFormer: Scaling ViTs With Multi-Resolution Training
CVPR 2023
ObjectFormer for Image Manipulation Detection and Localization
CVPR 2022
OmniVL: One Foundation Model for Image-Language and Video-Language Tasks
NIPS 2022
Attacking Video Recognition Models with Bullet-Screen Comments
AAAI 2022
Rethinking Pseudo Labels for Semi-supervised Object Detection
AAAI 2022
Boosting the Transferability of Video Adversarial Examples via Temporal Translation
AAAI 2022
Towards Transferable Adversarial Attacks on Vision Transformers
AAAI 2022
Robust Optimization As Data Augmentation for Large-Scale Graphs
CVPR 2022
Cross-Modal Transferable Adversarial Attacks From Images to Videos
CVPR 2022
BEVT: BERT Pretraining of Video Transformers
CVPR 2022
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
CVPR 2022
Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors
ECCV 2022
Semi-Supervised Vision Transformers
ECCV 2022
Efficient Video Transformers with Spatial-Temporal Token Selection
ECCV 2022
M3DETR: Multi-Representation, Multi-Scale, Mutual-Relation 3D Object Detection With Transformers
WACV 2022
Intentonomy: A Dataset and Study Towards Human Intent Understanding
CVPR 2021
VideoLT: Large-Scale Long-Tailed Video Recognition
ICCV 2021
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition
CVPR 2021
Efficient Object Embedding for Spliced Image Retrieval
CVPR 2021
Exploring Visual Engagement Signals for Representation Learning
ICCV 2021
Encoding Robustness to Image Style via Adversarial Feature Perturbations
NIPS 2021
Learning From Noisy Anchors for One-Stage Object Detection
CVPR 2020
Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors
ECCV 2020
Recognizing Instagram Filtered Images with Feature De-Stylization
AAAI 2020
ACE: Adapting to Changing Environments for Semantic Segmentation
ICCV 2019
LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition
NIPS 2019
The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation
CVPR 2019
AdaFrame: Adaptive Frame Selection for Fast Video Recognition
CVPR 2019
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
ICLR 2019
FiNet: Compatible and Diverse Fashion Image Inpainting
ICCV 2019
VITON: An Image-Based Virtual Try-On Network
CVPR 2018
DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation
ECCV 2018
BlockDrop: Dynamic Inference Paths in Residual Networks
CVPR 2018
Automatic Spatially-Aware Fashion Concept Discovery
ICCV 2017
Harnessing Object and Scene Semantics for Large-Scale Video Understanding
CVPR 2016