Xiangtai Li

57 papers · 2020–2026 · 7 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌍 Conference Polyglot (7) 🏃 Academic Marathon (5) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (5)

🌈 Renaissance Researcher (5) 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (7) 🤝 Dynamic Duo (19) 🏆 Grand Slam 👥 Mega-Team (32) 🔬 Deep Specialist (11) 🏆 Keyword Champion (2) 🗃️ Keyword Collector (193) ⚡ Prolific Year (16) ❓ The Questioner (3) 🔥 Unstoppable (6) 💎 Century Club (56)

Conferences

CVPR (16) ECCV (9) ICCV (8) NIPS (8) ICLR (7) AAAI (6) ICML (3)

Top co-authors

Yunhai Tong (19) Chen Change Loy (11) Jiangning Zhang (10) Lu Qi (10) Guangliang Cheng (10) Shuicheng Yan (9) Haobo Yuan (9) Kai Chen (8) Yining Li (7) Jianzong Wu (7)

Keywords

semantic segmentation (14) diffusion model (8) image segmentation (6) object detection (5) panoptic segmentation (4) attention mechanism (4) instance segmentation (3) 3d vision (3) vision-language model (3) multimodal large language model (3) state space model (3) convolutional neural network (3) large language model (3) domain generalization (2) image generation (2) image editing (2) feature learning (2) scene graph generation (2) scene graph (2) point cloud (2)

Papers

PointDGRWKV: Generalizing RWKV-like Architecture to Unseen Domains for Point Cloud Classification AAAI 2026 Unified Dense Prediction of Video Diffusion CVPR 2025 DreamRelation: Bridging Customization and Relation Generation CVPR 2025 SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model CVPR 2025 Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language CVPR 2025 Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene CVPR 2025 Towards Semantic Equivalence of Tokenization in Multimodal LLM ICLR 2025 RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything ICLR 2025 The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer ICCV 2025 QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing ICCV 2025 Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs ICCV 2025 Point Cloud Mamba: Point Cloud Learning via State Space Model AAAI 2025 Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer ICCV 2025 On Path to Multimodal Generalist: General-Level and General-Bench ICML 2025 Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation ICLR 2025 Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis ICLR 2025 RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection ICLR 2025 Three-Dimensional Trajectory Prediction with 3DMoTraj Dataset ICML 2025 OmniAudio: Generating Spatial Audio from 360-Degree Video ICML 2025 PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning AAAI 2025 Explore In-Context Segmentation via Latent Diffusion Models AAAI 2025 DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation CVPR 2025 PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model AAAI 2025 Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control ECCV 2024 MotionBooth: Motion-Aware Customized Text-to-Video Generation NIPS 2024 MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection NIPS 2024 OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding NIPS 2024 Synergistic Dual Spatial-aware Generation of Image-to-text and Text-to-image NIPS 2024 SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow NIPS 2024 OMG-Seg: Is One Model Good Enough For All Segmentation? CVPR 2024 BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model CVPR 2024 Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning CVPR 2024 RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation CVPR 2024 Towards Language-Driven Video Inpainting via Multimodal Large Language Models CVPR 2024 Referring Image Editing: Object-level Image Editing via Referring Expressions CVPR 2024 Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively ECCV 2024 Improving Video Segmentation via Dynamic Anchor Queries ECCV 2024 GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning ECCV 2024 CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction ICLR 2024 Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning ICLR 2023 Explore In-Context Learning for 3D Point Cloud Understanding NIPS 2023 Rethinking Mobile Block for Efficient Attention-based Models ICCV 2023 4D Panoptic Scene Graph Generation NIPS 2023 Panoptic Video Scene Graph Generation CVPR 2023 Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation ICCV 2023 Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation ICCV 2023 PolyphonicFormer: Unified Query Learning for Depth-Aware Video Panoptic Segmentation ECCV 2022 Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network? NIPS 2022 "Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition" ECCV 2022 Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation ECCV 2022 Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation CVPR 2022 Enhanced Boundary Learning for Glass-Like Object Segmentation ICCV 2021 PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation CVPR 2021 Involution: Inverting the Inherence of Convolution for Visual Recognition CVPR 2021 Gated Fully Fusion for Semantic Segmentation AAAI 2020 Semantic Flow for Fast and Accurate Scene Parsing ECCV 2020 Improving Semantic Segmentation via Decoupled Body and Edge Supervision ECCV 2020