Ziwei Liu

212 papers · 2015–2026 · 11 conferences · across top CS/AI conferences

Achievements

+20 more ↓

🏃 Academic Marathon (10) 🌍 Conference Polyglot (11) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (11) 🏠 Conference Loyalist (22) 🌟 Keyword Trendsetter Combo (4) 🤝 Dynamic Duo (43) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (23) 🌱 Topic Pioneer 🔬 Deep Specialist (34) 🧬 Topic Evolution 🏆 Keyword Champion (28) 📈 Trend Setter ⚡ Prolific Year (25) 🚀 Conference Pioneer 🔥 Unstoppable (11) ❓ The Questioner (4) 💎 Century Club (208) 🗃️ Keyword Collector (698)

Conferences

CVPR (71) ECCV (38) ICCV (38) NIPS (22) ICLR (21) AAAI (7) ACL (7) ICML (3) WACV (3) IJCAI (1) NAACL (1)

Top co-authors

Chen Change Loy (43) Dahua Lin (34) Liang Pan (29) Zhongang Cai (22) Lei Yang (21) Fangzhou Hong (20) Hang Zhou (16) Jingkang Yang (15) Bo Li (14) Tong Wu (14)

Research topics

Representation (1) Core AI (1)

Keywords

diffusion model (28) image generation (12) 3d reconstruction (11) multimodal learning (11) semantic segmentation (10) human pose estimation (8) neural radiance field (8) generative model (8) video generation (8) representation learning (8) novel view synthesis (8) 3d vision (7) few-shot learning (7) point cloud (6) generative adversarial network (6) contrastive learning (6) benchmark evaluation (6) gaussian splatting (5) autonomous driving (5) domain adaptation (5)

Papers

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark ACL 2026 MMSearch-R1: Incentivizing LMMs to Search ACL 2026 Video-MMMU: Evaluating Knowledge Acquisition from Multidisciplinary Professional Videos ACL 2026 Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models AAAI 2026 EgoLife: Towards Egocentric Life Assistant CVPR 2025 AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers CVPR 2025 WildAvatar: Learning In-the-wild 3D Avatars from the Web CVPR 2025 EgoLM: Multi-Modal Language Model of Egocentric Motions CVPR 2025 Generative Gaussian Splatting for Unbounded 3D City Generation CVPR 2025 MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D CVPR 2025 Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models CVPR 2025 Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion CVPR 2025 Material Anything: Generating Materials for Any 3D Object via Diffusion CVPR 2025 FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality ICLR 2025 SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters CVPR 2025 HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation CVPR 2025 LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes CVPR 2025 Disco4D: Disentangled 4D Human Generation and Animation from a Single Image CVPR 2025 DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes ICLR 2025 AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation ICLR 2025 Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution ICLR 2025 SIGMA: Selective Gated Mamba for Sequential Recommendation AAAI 2025 Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models ACL 2025 Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models ACL 2025 Dynamic Parallel Tree Search for Efficient LLM Reasoning ACL 2025 MMInA: Benchmarking Multihop Multimodal Internet Agents ACL 2025 Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion ICLR 2025 VistaDream: Sampling multiview consistent images for single-view scene reconstruction ICCV 2025 FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion ICCV 2025 Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives ICCV 2025 Large Multi-modal Models Can Interpret Features in Large Multi-modal Models ICCV 2025 FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model ICCV 2025 Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding ICCV 2025 DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior ICCV 2025 Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers ICCV 2025 Dual-Expert Consistency Model for Efficient and High-Quality Video Generation ICCV 2025 GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography ICCV 2025 GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination ICCV 2025 Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency ICCV 2025 Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding WACV 2025 LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models NAACL 2025 Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM ICML 2025 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion CVPR 2025 VBench: Comprehensive Benchmark Suite for Video Generative Models CVPR 2024 FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation CVPR 2024 Link-Context Learning for Multimodal LLMs CVPR 2024 Digital Life Project: Autonomous 3D Characters with Social Intelligence CVPR 2024 GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation CVPR 2024 Multi-Space Alignments Towards Universal LiDAR Segmentation CVPR 2024 Towards Language-Driven Video Inpainting via Multimodal Large Language Models CVPR 2024 InstructVideo: Instructing Video Diffusion Models with Human Feedback CVPR 2024 VideoBooth: Diffusion-based Video Generation with Image Prompts CVPR 2024 StructLDM: Structured Latent Diffusion for 3D Human Generation ECCV 2024 TC4D: Trajectory-Conditioned Text-to-4D Generation ECCV 2024 ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer ECCV 2024 GroupDiff: Diffusion-based Group Portrait Editing ECCV 2024 WHAC: World-grounded Humans and Cameras ECCV 2024 Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation ECCV 2024 Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild ECCV 2024 ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance ECCV 2024 Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time ECCV 2024 MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo ECCV 2024 GauHuman: Articulated Gaussian Splatting from Monocular Human Videos CVPR 2024 URHand: Universal Relightable Hands CVPR 2024 4D Contrastive Superflows are Dense 3D Representation Learners ECCV 2024 FunQA: Towards Surprising Video Comprehension ECCV 2024 Octopus: Embodied Vision-Language Programmer from Environmental Feedback ECCV 2024 AID: Attention Interpolation of Text-to-Image Diffusion NIPS 2024 Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials NIPS 2024 Move Anything with Layered Scene Diffusion CVPR 2024 LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation ECCV 2024 MMBENCH: Is Your Multi-Modal Model an All-around Player? ECCV 2024 Large Motion Model for Unified Multi-Modal Motion Generation ECCV 2024 FreeInit: Bridging Initialization Gap in Video Diffusion Models ECCV 2024 FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models NIPS 2024 L4GM: Large 4D Gaussian Reconstruction Model NIPS 2024 CityDreamer: Compositional Generative Model of Unbounded 3D Cities CVPR 2024 HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting CVPR 2024 AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation CVPR 2024 FreeU: Free Lunch in Diffusion U-Net CVPR 2024 Large-Vocabulary 3D Diffusion Model with Transformer ICLR 2024 HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion ICLR 2024 FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling ICLR 2024 DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation ICLR 2024 Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment ICLR 2024 InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation ICLR 2024 SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction ICLR 2024 SinSR: Diffusion-Based Image Super-Resolution in a Single Step CVPR 2024 SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering CVPR 2024 Vlogger: Make Your Dream A Vlog CVPR 2024 Sparse Mixture-of-Experts are Domain Generalizable Learners ICLR 2023 RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars NIPS 2023 SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation NIPS 2023 PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation NIPS 2023 FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing NIPS 2023 Towards Robust and Expressive Whole-body Human Pose and Shape Estimation NIPS 2023 What Makes Good Examples for Visual In-Context Learning? NIPS 2023 Segment Any Point Cloud Sequences by Distilling Vision Foundation Models NIPS 2023 InsActor: Instruction-driven Physics-based Characters NIPS 2023 4D Panoptic Scene Graph Generation NIPS 2023 Large Language Models are Visual Reasoning Coordinators NIPS 2023 Robust Video Portrait Reenactment via Personalized Representation Quantization AAAI 2023 F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories CVPR 2023 StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator CVPR 2023 LaserMix for Semi-Supervised LiDAR Semantic Segmentation CVPR 2023 Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation CVPR 2023 OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation CVPR 2023 Panoptic Video Scene Graph Generation CVPR 2023 Detecting and Grounding Multi-Modal Media Manipulation CVPR 2023 Collaborative Diffusion for Multi-Modal Face Generation and Editing CVPR 2023 Deep Geometrized Cartoon Line Inbetweening ICCV 2023 Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing ICCV 2023 SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling ICCV 2023 Robo3D: Towards Robust and Reliable 3D Perception against Corruptions ICCV 2023 DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering ICCV 2023 SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis ICCV 2023 DeformToon3D: Deformable Neural Radiance Fields for 3D Toonification ICCV 2023 UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation ICCV 2023 StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces ICCV 2023 ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model ICCV 2023 Rethinking Range View Representation for LiDAR Segmentation ICCV 2023 Text2Performer: Text-Driven Human Video Generation ICCV 2023 SHERF: Generalizable Human NeRF from a Single Image ICCV 2023 Masked Frequency Modeling for Self-Supervised Visual Pre-Training ICLR 2023 DiffMimic: Efficient Motion Mimicking with Differentiable Physics ICLR 2023 EVA3D: Compositional 3D Human Generation from 2D Image Collections ICLR 2023 Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction ICLR 2023 BiBench: Benchmarking and Analyzing Network Binarization ICML 2023 UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation ECCV 2022 Benchmarking and Analyzing Point Cloud Classification under Corruptions ICML 2022 TCTrack: Temporal Contexts for Aerial Tracking CVPR 2022 OpenOOD: Benchmarking Generalized Out-of-Distribution Detection NIPS 2022 Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer CVPR 2022 Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms NIPS 2022 AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies NIPS 2022 Versatile Multi-Modal Pre-Training for Human-Centric Perception CVPR 2022 Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts CVPR 2022 Mind the Gap in Distilling StyleGANs ECCV 2022 X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation ECCV 2022 StyleGAN-Human: A Data-Centric Odyssey of Human Generation ECCV 2022 StyleLight: HDR Panorama Generation for Lighting Estimation and Editing ECCV 2022 Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis ECCV 2022 StyleSwap: Style-Based Generator Empowers Robust Face Swapping ECCV 2022 Relighting4D: Neural Relightable Human from Videos ECCV 2022 Panoptic Scene Graph Generation ECCV 2022 Audio-Driven Co-Speech Gesture Video Generation NIPS 2022 BiBERT: Accurate Fully Binarized BERT ICLR 2022 TAda! Temporally-Adaptive Convolutions for Video Understanding ICLR 2022 Detecting and Recovering Sequential DeepFake Manipulation ECCV 2022 Unsupervised Image-to-Image Translation With Generative Prior CVPR 2022 Full-Range Virtual Try-On With Recurrent Tri-Level Transform CVPR 2022 Conditional Prompt Learning for Vision-Language Models CVPR 2022 Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory CVPR 2022 Balanced MSE for Imbalanced Visual Regression CVPR 2022 CelebV-HQ: A Large-Scale Video Facial Attributes Dataset ECCV 2022 Benchmarking Omni-Vision Representation through the Lens of Visual Realms ECCV 2022 SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation AAAI 2022 Visual Sound Localization in the Wild by Cross-Modal Interference Erasing AAAI 2022 HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling ECCV 2022 Differentiable Dynamic Wirings for Neural Networks ICCV 2021 Robust Reference-Based Super-Resolution via C2-Matching CVPR 2021 Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency ICCV 2021 Talk-To-Edit: Fine-Grained Facial Editing via Dialog ICCV 2021 Incorporating Convolution Designs Into Visual Transformers ICCV 2021 Semantically Coherent Out-of-Distribution Detection ICCV 2021 BlockPlanner: City Block Generation With Vectorized Graph Representation ICCV 2021 Energy-Based Open-World Uncertainty Modeling for Confidence Calibration ICCV 2021 Garment4D: Garment Reconstruction from Point Cloud Sequences NIPS 2021 Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation IJCAI 2021 Person-in-Context Synthesis With Compositional Structural Space WACV 2021 Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation CVPR 2021 Deep Animation Video Interpolation in the Wild CVPR 2021 ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis CVPR 2021 Variational Relational Point Completion Network CVPR 2021 Seesaw Loss for Long-Tailed Instance Segmentation CVPR 2021 LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network CVPR 2021 Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination CVPR 2021 Adversarial Robustness Under Long-Tailed Distribution CVPR 2021 Visually Informed Binaural Audio Generation without Binaural Audios CVPR 2021 Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs ICLR 2021 Long-tailed Recognition by Routing Diverse Distribution-Aware Experts ICLR 2021 Few-Shot Object Detection via Association and DIscrimination NIPS 2021 Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion NIPS 2021 Unsupervised Object-Level Representation Learning from Scene Images NIPS 2021 PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention WACV 2020 CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations ECCV 2020 Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation ECCV 2020 Knowledge Distillation Meets Self-Supervision ECCV 2020 Online Deep Clustering for Unsupervised Representation Learning CVPR 2020 When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks CVPR 2020 Self-Supervised Scene De-Occlusion CVPR 2020 Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations ECCV 2020 Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets ECCV 2020 Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement ECCV 2020 Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images CVPR 2020 MaskGAN: Towards Diverse and Interactive Facial Image Manipulation CVPR 2020 Open Compound Domain Adaptation CVPR 2020 Instance-Level Facial Attributes Transfer with Geometry-Aware Flow AAAI 2019 Hybrid Task Cascade for Instance Segmentation CVPR 2019 Self-Supervised Learning via Conditional Motion Propagation CVPR 2019 Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild ICCV 2019 CARAFE: Content-Aware ReAssembly of FEatures ICCV 2019 Vision-Infused Deep Audio Inpainting ICCV 2019 Large-Scale Long-Tailed Recognition in an Open World CVPR 2019 Talking Face Generation by Adversarially Disentangled Audio-Visual Representation AAAI 2019 Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition ECCV 2018 Adaptive Affinity Fields for Semantic Segmentation ECCV 2018 Video Frame Synthesis Using Deep Voxel Flow ICCV 2017 Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade CVPR 2017 DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations CVPR 2016 Semantic Image Segmentation via Deep Parsing Network ICCV 2015 Deep Learning Face Attributes in the Wild ICCV 2015