Ziwei Liu
212 papers · 2015–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
🏃 Academic Marathon (10) 🌍 Conference Polyglot (11) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🌈
Renaissance Researcher
(11)
🏠
Conference Loyalist
(22)
🌟
Keyword Trendsetter Combo
(4)
🤝
Dynamic Duo
(43)
👑
Triple Crown
🏆
Grand Slam
👥
Mega-Team
(23)
🌱
Topic Pioneer
🔬
Deep Specialist
(34)
🧬
Topic Evolution
🏆
Keyword Champion
(28)
📈
Trend Setter
⚡
Prolific Year
(25)
🚀
Conference Pioneer
🔥
Unstoppable
(11)
❓
The Questioner
(4)
💎
Century Club
(208)
🗃️
Keyword Collector
(698)
Conferences
CVPR (71)
ECCV (38)
ICCV (38)
NIPS (22)
ICLR (21)
AAAI (7)
ACL (7)
ICML (3)
WACV (3)
IJCAI (1)
NAACL (1)
Top co-authors
Research topics
Keywords
diffusion model
(28)
image generation
(12)
3d reconstruction
(11)
multimodal learning
(11)
semantic segmentation
(10)
human pose estimation
(8)
neural radiance field
(8)
generative model
(8)
video generation
(8)
representation learning
(8)
novel view synthesis
(8)
3d vision
(7)
few-shot learning
(7)
point cloud
(6)
generative adversarial network
(6)
contrastive learning
(6)
benchmark evaluation
(6)
gaussian splatting
(5)
autonomous driving
(5)
domain adaptation
(5)
Papers
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark
ACL 2026
MMSearch-R1: Incentivizing LMMs to Search
ACL 2026
Video-MMMU: Evaluating Knowledge Acquisition from Multidisciplinary Professional Videos
ACL 2026
Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models
AAAI 2026
EgoLife: Towards Egocentric Life Assistant
CVPR 2025
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
CVPR 2025
WildAvatar: Learning In-the-wild 3D Avatars from the Web
CVPR 2025
EgoLM: Multi-Modal Language Model of Egocentric Motions
CVPR 2025
Generative Gaussian Splatting for Unbounded 3D City Generation
CVPR 2025
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
CVPR 2025
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
CVPR 2025
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
CVPR 2025
Material Anything: Generating Materials for Any 3D Object via Diffusion
CVPR 2025
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
ICLR 2025
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
CVPR 2025
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
CVPR 2025
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
CVPR 2025
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
CVPR 2025
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
ICLR 2025
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
ICLR 2025
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
ICLR 2025
SIGMA: Selective Gated Mamba for Sequential Recommendation
AAAI 2025
Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
ACL 2025
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
ACL 2025
Dynamic Parallel Tree Search for Efficient LLM Reasoning
ACL 2025
MMInA: Benchmarking Multihop Multimodal Internet Agents
ACL 2025
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
ICLR 2025
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
ICCV 2025
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
ICCV 2025
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
ICCV 2025
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
ICCV 2025
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
ICCV 2025
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
ICCV 2025
DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior
ICCV 2025
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
ICCV 2025
Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
ICCV 2025
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
ICCV 2025
GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination
ICCV 2025
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
ICCV 2025
Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding
WACV 2025
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
NAACL 2025
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
ICML 2025
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
CVPR 2025
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
CVPR 2024
Link-Context Learning for Multimodal LLMs
CVPR 2024
Digital Life Project: Autonomous 3D Characters with Social Intelligence
CVPR 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
CVPR 2024
Multi-Space Alignments Towards Universal LiDAR Segmentation
CVPR 2024
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
CVPR 2024
InstructVideo: Instructing Video Diffusion Models with Human Feedback
CVPR 2024
VideoBooth: Diffusion-based Video Generation with Image Prompts
CVPR 2024
StructLDM: Structured Latent Diffusion for 3D Human Generation
ECCV 2024
TC4D: Trajectory-Conditioned Text-to-4D Generation
ECCV 2024
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
ECCV 2024
GroupDiff: Diffusion-based Group Portrait Editing
ECCV 2024
WHAC: World-grounded Humans and Cameras
ECCV 2024
Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
ECCV 2024
Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild
ECCV 2024
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
ECCV 2024
Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time
ECCV 2024
MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
ECCV 2024
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
CVPR 2024
URHand: Universal Relightable Hands
CVPR 2024
4D Contrastive Superflows are Dense 3D Representation Learners
ECCV 2024
FunQA: Towards Surprising Video Comprehension
ECCV 2024
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
ECCV 2024
AID: Attention Interpolation of Text-to-Image Diffusion
NIPS 2024
Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials
NIPS 2024
Move Anything with Layered Scene Diffusion
CVPR 2024
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
ECCV 2024
MMBENCH: Is Your Multi-Modal Model an All-around Player?
ECCV 2024
Large Motion Model for Unified Multi-Modal Motion Generation
ECCV 2024
FreeInit: Bridging Initialization Gap in Video Diffusion Models
ECCV 2024
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
NIPS 2024
L4GM: Large 4D Gaussian Reconstruction Model
NIPS 2024
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
CVPR 2024
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
CVPR 2024
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
CVPR 2024
FreeU: Free Lunch in Diffusion U-Net
CVPR 2024
Large-Vocabulary 3D Diffusion Model with Transformer
ICLR 2024
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
ICLR 2024
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
ICLR 2024
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
ICLR 2024
Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment
ICLR 2024
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
ICLR 2024
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
CVPR 2024
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
CVPR 2024
Vlogger: Make Your Dream A Vlog
CVPR 2024
Sparse Mixture-of-Experts are Domain Generalizable Learners
ICLR 2023
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
NIPS 2023
SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation
NIPS 2023
PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation
NIPS 2023
FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
NIPS 2023
Towards Robust and Expressive Whole-body Human Pose and Shape Estimation
NIPS 2023
What Makes Good Examples for Visual In-Context Learning?
NIPS 2023
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
NIPS 2023
InsActor: Instruction-driven Physics-based Characters
NIPS 2023
4D Panoptic Scene Graph Generation
NIPS 2023
Large Language Models are Visual Reasoning Coordinators
NIPS 2023
Robust Video Portrait Reenactment via Personalized Representation Quantization
AAAI 2023
F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories
CVPR 2023
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator
CVPR 2023
LaserMix for Semi-Supervised LiDAR Semantic Segmentation
CVPR 2023
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
CVPR 2023
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation
CVPR 2023
Panoptic Video Scene Graph Generation
CVPR 2023
Detecting and Grounding Multi-Modal Media Manipulation
CVPR 2023
Collaborative Diffusion for Multi-Modal Face Generation and Editing
CVPR 2023
Deep Geometrized Cartoon Line Inbetweening
ICCV 2023
Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing
ICCV 2023
SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling
ICCV 2023
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
ICCV 2023
DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering
ICCV 2023
SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis
ICCV 2023
DeformToon3D: Deformable Neural Radiance Fields for 3D Toonification
ICCV 2023
UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation
ICCV 2023
StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces
ICCV 2023
ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
ICCV 2023
Rethinking Range View Representation for LiDAR Segmentation
ICCV 2023
Text2Performer: Text-Driven Human Video Generation
ICCV 2023
SHERF: Generalizable Human NeRF from a Single Image
ICCV 2023
Masked Frequency Modeling for Self-Supervised Visual Pre-Training
ICLR 2023
DiffMimic: Efficient Motion Mimicking with Differentiable Physics
ICLR 2023
EVA3D: Compositional 3D Human Generation from 2D Image Collections
ICLR 2023
Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction
ICLR 2023
BiBench: Benchmarking and Analyzing Network Binarization
ICML 2023
UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation
ECCV 2022
Benchmarking and Analyzing Point Cloud Classification under Corruptions
ICML 2022
TCTrack: Temporal Contexts for Aerial Tracking
CVPR 2022
OpenOOD: Benchmarking Generalized Out-of-Distribution Detection
NIPS 2022
Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
CVPR 2022
Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms
NIPS 2022
AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies
NIPS 2022
Versatile Multi-Modal Pre-Training for Human-Centric Perception
CVPR 2022
Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts
CVPR 2022
Mind the Gap in Distilling StyleGANs
ECCV 2022
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
ECCV 2022
StyleGAN-Human: A Data-Centric Odyssey of Human Generation
ECCV 2022
StyleLight: HDR Panorama Generation for Lighting Estimation and Editing
ECCV 2022
Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis
ECCV 2022
StyleSwap: Style-Based Generator Empowers Robust Face Swapping
ECCV 2022
Relighting4D: Neural Relightable Human from Videos
ECCV 2022
Panoptic Scene Graph Generation
ECCV 2022
Audio-Driven Co-Speech Gesture Video Generation
NIPS 2022
BiBERT: Accurate Fully Binarized BERT
ICLR 2022
TAda! Temporally-Adaptive Convolutions for Video Understanding
ICLR 2022
Detecting and Recovering Sequential DeepFake Manipulation
ECCV 2022
Unsupervised Image-to-Image Translation With Generative Prior
CVPR 2022
Full-Range Virtual Try-On With Recurrent Tri-Level Transform
CVPR 2022
Conditional Prompt Learning for Vision-Language Models
CVPR 2022
Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory
CVPR 2022
Balanced MSE for Imbalanced Visual Regression
CVPR 2022
CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
ECCV 2022
Benchmarking Omni-Vision Representation through the Lens of Visual Realms
ECCV 2022
SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation
AAAI 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
AAAI 2022
HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling
ECCV 2022
Differentiable Dynamic Wirings for Neural Networks
ICCV 2021
Robust Reference-Based Super-Resolution via C2-Matching
CVPR 2021
Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency
ICCV 2021
Talk-To-Edit: Fine-Grained Facial Editing via Dialog
ICCV 2021
Incorporating Convolution Designs Into Visual Transformers
ICCV 2021
Semantically Coherent Out-of-Distribution Detection
ICCV 2021
BlockPlanner: City Block Generation With Vectorized Graph Representation
ICCV 2021
Energy-Based Open-World Uncertainty Modeling for Confidence Calibration
ICCV 2021
Garment4D: Garment Reconstruction from Point Cloud Sequences
NIPS 2021
Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation
IJCAI 2021
Person-in-Context Synthesis With Compositional Structural Space
WACV 2021
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
CVPR 2021
Deep Animation Video Interpolation in the Wild
CVPR 2021
ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
CVPR 2021
Variational Relational Point Completion Network
CVPR 2021
Seesaw Loss for Long-Tailed Instance Segmentation
CVPR 2021
LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network
CVPR 2021
Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination
CVPR 2021
Adversarial Robustness Under Long-Tailed Distribution
CVPR 2021
Visually Informed Binaural Audio Generation without Binaural Audios
CVPR 2021
Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs
ICLR 2021
Long-tailed Recognition by Routing Diverse Distribution-Aware Experts
ICLR 2021
Few-Shot Object Detection via Association and DIscrimination
NIPS 2021
Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion
NIPS 2021
Unsupervised Object-Level Representation Learning from Scene Images
NIPS 2021
PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention
WACV 2020
CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations
ECCV 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
ECCV 2020
Knowledge Distillation Meets Self-Supervision
ECCV 2020
Online Deep Clustering for Unsupervised Representation Learning
CVPR 2020
When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks
CVPR 2020
Self-Supervised Scene De-Occlusion
CVPR 2020
Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations
ECCV 2020
Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets
ECCV 2020
Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement
ECCV 2020
Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images
CVPR 2020
MaskGAN: Towards Diverse and Interactive Facial Image Manipulation
CVPR 2020
Open Compound Domain Adaptation
CVPR 2020
Instance-Level Facial Attributes Transfer with Geometry-Aware Flow
AAAI 2019
Hybrid Task Cascade for Instance Segmentation
CVPR 2019
Self-Supervised Learning via Conditional Motion Propagation
CVPR 2019
Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild
ICCV 2019
CARAFE: Content-Aware ReAssembly of FEatures
ICCV 2019
Vision-Infused Deep Audio Inpainting
ICCV 2019
Large-Scale Long-Tailed Recognition in an Open World
CVPR 2019
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
AAAI 2019
Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition
ECCV 2018
Adaptive Affinity Fields for Semantic Segmentation
ECCV 2018
Video Frame Synthesis Using Deep Voxel Flow
ICCV 2017
Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
CVPR 2017
DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations
CVPR 2016
Semantic Image Segmentation via Deep Parsing Network
ICCV 2015
Deep Learning Face Attributes in the Wild
ICCV 2015