Xiangtai Li
57 papers · 2020–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
🌍 Conference Polyglot (7) 🏃 Academic Marathon (5) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (5)
🌈
Renaissance Researcher
(5)
🐣
Hot Topic Early Bird
🌍
Conference Polyglot
(7)
🤝
Dynamic Duo
(19)
🏆
Grand Slam
👥
Mega-Team
(32)
🔬
Deep Specialist
(11)
🏆
Keyword Champion
(2)
🗃️
Keyword Collector
(193)
⚡
Prolific Year
(16)
❓
The Questioner
(3)
🔥
Unstoppable
(6)
💎
Century Club
(56)
Conferences
CVPR (16)
ECCV (9)
ICCV (8)
NIPS (8)
ICLR (7)
AAAI (6)
ICML (3)
Top co-authors
Keywords
semantic segmentation
(14)
diffusion model
(8)
image segmentation
(6)
object detection
(5)
panoptic segmentation
(4)
attention mechanism
(4)
instance segmentation
(3)
3d vision
(3)
vision-language model
(3)
multimodal large language model
(3)
state space model
(3)
convolutional neural network
(3)
large language model
(3)
domain generalization
(2)
image generation
(2)
image editing
(2)
feature learning
(2)
scene graph generation
(2)
scene graph
(2)
point cloud
(2)
Papers
PointDGRWKV: Generalizing RWKV-like Architecture to Unseen Domains for Point Cloud Classification
AAAI 2026
Unified Dense Prediction of Video Diffusion
CVPR 2025
DreamRelation: Bridging Customization and Relation Generation
CVPR 2025
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
CVPR 2025
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
CVPR 2025
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
CVPR 2025
Towards Semantic Equivalence of Tokenization in Multimodal LLM
ICLR 2025
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
ICLR 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
ICCV 2025
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
ICCV 2025
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
ICCV 2025
Point Cloud Mamba: Point Cloud Learning via State Space Model
AAAI 2025
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer
ICCV 2025
On Path to Multimodal Generalist: General-Level and General-Bench
ICML 2025
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
ICLR 2025
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
ICLR 2025
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
ICLR 2025
Three-Dimensional Trajectory Prediction with 3DMoTraj Dataset
ICML 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
ICML 2025
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
AAAI 2025
Explore In-Context Segmentation via Latent Diffusion Models
AAAI 2025
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
CVPR 2025
PointDGMamba: Domain Generalization of Point Cloud Classification via Generalized State Space Model
AAAI 2025
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
ECCV 2024
MotionBooth: Motion-Aware Customized Text-to-Video Generation
NIPS 2024
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection
NIPS 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
NIPS 2024
Synergistic Dual Spatial-aware Generation of Image-to-text and Text-to-image
NIPS 2024
SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
NIPS 2024
OMG-Seg: Is One Model Good Enough For All Segmentation?
CVPR 2024
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
CVPR 2024
Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
CVPR 2024
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
CVPR 2024
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
CVPR 2024
Referring Image Editing: Object-level Image Editing via Referring Expressions
CVPR 2024
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
ECCV 2024
Improving Video Segmentation via Dynamic Anchor Queries
ECCV 2024
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
ECCV 2024
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
ICLR 2024
Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning
ICLR 2023
Explore In-Context Learning for 3D Point Cloud Understanding
NIPS 2023
Rethinking Mobile Block for Efficient Attention-based Models
ICCV 2023
4D Panoptic Scene Graph Generation
NIPS 2023
Panoptic Video Scene Graph Generation
CVPR 2023
Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation
ICCV 2023
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
ICCV 2023
PolyphonicFormer: Unified Query Learning for Depth-Aware Video Panoptic Segmentation
ECCV 2022
Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?
NIPS 2022
"Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition"
ECCV 2022
Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation
ECCV 2022
Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
CVPR 2022
Enhanced Boundary Learning for Glass-Like Object Segmentation
ICCV 2021
PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
CVPR 2021
Involution: Inverting the Inherence of Convolution for Visual Recognition
CVPR 2021
Gated Fully Fusion for Semantic Segmentation
AAAI 2020
Semantic Flow for Fast and Accurate Scene Parsing
ECCV 2020
Improving Semantic Segmentation via Decoupled Body and Edge Supervision
ECCV 2020