Xihui Liu
61 papers · 2017–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
🏃 Academic Marathon (8) 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10)
🌈
Renaissance Researcher
(8)
🌍
Conference Polyglot
(7)
🏃
Academic Marathon
(8)
🏆
Grand Slam
👑
Triple Crown
🤝
Dynamic Duo
(10)
🔬
Deep Specialist
(17)
🧬
Topic Evolution
🏆
Keyword Champion
(2)
🗃️
Keyword Collector
(234)
🔥
Unstoppable
(5)
🚀
Conference Pioneer
💎
Century Club
(59)
⚡
Prolific Year
(17)
Conferences
CVPR (19)
ICCV (14)
NIPS (11)
ECCV (7)
ICML (3)
WACV (3)
AAAI (2)
ICLR (2)
Top co-authors
Research topics
Keywords
diffusion model
(12)
image generation
(10)
multimodal large language model
(4)
vision-language model
(4)
video generation
(4)
representation learning
(4)
point cloud
(4)
compositional generation
(3)
3d representation learning
(3)
benchmark evaluation
(3)
semantic segmentation
(3)
autoregressive model
(3)
text-to-image generation
(3)
depth estimation
(3)
contrastive learning
(3)
transfer learning
(3)
foundation model
(3)
image editing
(3)
transformer architecture
(2)
3d vision
(2)
Papers
Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning
AAAI 2026
GENMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
AAAI 2026
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities
ICCV 2025
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
ICCV 2025
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
ICCV 2025
GameFactory: Creating New Games with Generative Interactive Videos
ICCV 2025
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
ICCV 2025
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
ICCV 2025
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization
ICCV 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
ICCV 2025
WorldSimBench: Towards Video Generation Models as World Simulators
ICML 2025
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
CVPR 2025
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
CVPR 2025
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
CVPR 2025
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
CVPR 2025
Parallelized Autoregressive Visual Generation
CVPR 2025
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
CVPR 2025
UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation
ICML 2025
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
ICLR 2025
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICCV 2025
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
CVPR 2024
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
CVPR 2024
Point Transformer V3: Simpler Faster Stronger
CVPR 2024
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
ECCV 2024
TC4D: Trajectory-Conditioned Text-to-4D Generation
ECCV 2024
FiT: Flexible Vision Transformer for Diffusion Model
ICML 2024
PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
ECCV 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
NIPS 2024
Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation
WACV 2024
Shape-Guided Diffusion With Inside-Outside Attention
WACV 2024
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
ICLR 2024
Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation
NIPS 2024
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
NIPS 2024
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions
NIPS 2024
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models
NIPS 2024
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
CVPR 2024
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
CVPR 2024
Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
CVPR 2023
Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images
NIPS 2023
CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
NIPS 2023
OV-PARTS: Towards Open-Vocabulary Part Segmentation
NIPS 2023
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
NIPS 2023
GLeaD: Improving GANs With a Generator-Leading Task
CVPR 2023
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
CVPR 2023
Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption
CVPR 2023
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge
CVPR 2023
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023
More Control for Free! Image Synthesis With Semantic Diffusion Guidance
WACV 2023
Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
NIPS 2022
Bridging Video-Text Retrieval With Multiple Choice Questions
CVPR 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
ECCV 2022
Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions
ECCV 2020
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
ICCV 2019
Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis
NIPS 2019
Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing
CVPR 2019
Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association
ECCV 2018
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data
ECCV 2018
HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis
ICCV 2017
Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-Identification
ICCV 2017
Object Detection in Videos With Tubelet Proposal Networks
CVPR 2017