Xihui Liu

61 papers · 2017–2026 · 8 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🏃 Academic Marathon (8) 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10)

🌈 Renaissance Researcher (8) 🌍 Conference Polyglot (7) 🏃 Academic Marathon (8) 🏆 Grand Slam 👑 Triple Crown 🤝 Dynamic Duo (10) 🔬 Deep Specialist (17) 🧬 Topic Evolution 🏆 Keyword Champion (2) 🗃️ Keyword Collector (234) 🔥 Unstoppable (5) 🚀 Conference Pioneer 💎 Century Club (59) ⚡ Prolific Year (17)

Conferences

CVPR (19) ICCV (14) NIPS (11) ECCV (7) ICML (3) WACV (3) AAAI (2) ICLR (2)

Top co-authors

hongsheng Li (10) Xiaogang Wang (9) Jing Shao (9) LEI BAI (6) Hengshuang Zhao (6) Ping Luo (6) Zhenguo Li (6) Wanli Ouyang (6) Xian Liu (6) Lu Sheng (5)

Research topics

Privacy (1)

Keywords

diffusion model (12) image generation (10) multimodal large language model (4) vision-language model (4) video generation (4) representation learning (4) point cloud (4) compositional generation (3) 3d representation learning (3) benchmark evaluation (3) semantic segmentation (3) autoregressive model (3) text-to-image generation (3) depth estimation (3) contrastive learning (3) transfer learning (3) foundation model (3) image editing (3) transformer architecture (2) 3d vision (2)

Papers

Self-NPO: Data-Free Diffusion Model Enhancement via Truncated Diffusion Fine-Tuning AAAI 2026 GENMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration AAAI 2026 LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D Capabilities ICCV 2025 Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation ICCV 2025 LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation ICCV 2025 GameFactory: Creating New Games with Generative Interactive Videos ICCV 2025 V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding ICCV 2025 RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints ICCV 2025 Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos ICCV 2025 DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization ICCV 2025 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation ICCV 2025 WorldSimBench: Towards Video Generation Models as World Simulators ICML 2025 T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation CVPR 2025 HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation CVPR 2025 T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation CVPR 2025 MBQ: Modality-Balanced Quantization for Large Vision-Language Models CVPR 2025 Parallelized Autoregressive Visual Generation CVPR 2025 MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation CVPR 2025 UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation ICML 2025 Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding ICLR 2025 PUMA: Empowering Unified MLLM with Multi-granular Visual Generation ICCV 2025 EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI CVPR 2024 DreamComposer: Controllable 3D Object Generation via Multi-View Conditions CVPR 2024 Point Transformer V3: Simpler Faster Stronger CVPR 2024 ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities ECCV 2024 TC4D: Trajectory-Conditioned Text-to-4D Generation ECCV 2024 FiT: Flexible Vision Transformer for Diffusion Model ICML 2024 PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines ECCV 2024 GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing NIPS 2024 Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation WACV 2024 Shape-Guided Diffusion With Inside-Outside Attention WACV 2024 HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion ICLR 2024 Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation NIPS 2024 4Diffusion: Multi-view Video Diffusion Model for 4D Generation NIPS 2024 LVD-2M: A Long-take Video Dataset with Temporally Dense Captions NIPS 2024 BEACON: Benchmark for Comprehensive RNA Tasks and Language Models NIPS 2024 HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting CVPR 2024 Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training CVPR 2024 Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning CVPR 2023 Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images NIPS 2023 CorresNeRF: Image Correspondence Priors for Neural Radiance Fields NIPS 2023 OV-PARTS: Towards Open-Vocabulary Part Segmentation NIPS 2023 T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation NIPS 2023 GLeaD: Improving GANs With a Generator-Leading Task CVPR 2023 RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer CVPR 2023 Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption CVPR 2023 Learning Transferable Spatiotemporal Representations From Natural Script Knowledge CVPR 2023 DDP: Diffusion Model for Dense Visual Prediction ICCV 2023 More Control for Free! Image Synthesis With Semantic Diffusion Guidance WACV 2023 Point Transformer V2: Grouped Vector Attention and Partition-based Pooling NIPS 2022 Bridging Video-Text Retrieval With Multiple Choice Questions CVPR 2022 MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval ECCV 2022 Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions ECCV 2020 CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval ICCV 2019 Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis NIPS 2019 Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing CVPR 2019 Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association ECCV 2018 Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data ECCV 2018 HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis ICCV 2017 Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-Identification ICCV 2017 Object Detection in Videos With Tubelet Proposal Networks CVPR 2017