Xiang Bai
121 papers · 2008–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (20) π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(20)
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(4)
π
Conference Loyalist
(20)
π±
Topic Pioneer
π¬
Deep Specialist
(28)
π€
Dynamic Duo
(17)
π
Keyword Champion
(8)
π
Grand Slam
π
Trend Setter
π
Century Club
(115)
ποΈ
Keyword Collector
(53)
π
Conference Pioneer
β‘
Prolific Year
(17)
π₯
Unstoppable
(15)
β
The Questioner
Conferences
CVPR (43)
ECCV (24)
ICCV (20)
AAAI (12)
NIPS (9)
ACL (4)
EMNLP (2)
ICML (2)
IJCAI (2)
ICLR (1)
MICCAI (1)
WACV (1)
Top co-authors
Keywords
object detection
(13)
semantic segmentation
(12)
text detection
(9)
scene text detection
(8)
scene text
(8)
convolutional neural network
(7)
multimodal learning
(7)
scene text recognition
(6)
3d object detection
(6)
transfer learning
(6)
document understanding
(5)
image segmentation
(5)
vision-language model
(5)
text recognition
(5)
semi-supervised learning
(5)
point cloud
(4)
instance segmentation
(4)
neural network
(4)
multimodal large language model
(4)
text spotting
(4)
Papers
StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
AAAI 2026
I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing
ACL 2026
AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale
AAAI 2026
Doc-V*: Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA
ACL 2026
Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
AAAI 2026
OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
AAAI 2026
VIP: Vision Instructed Pre-training for Robotic Manipulation
ICML 2025
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
ICLR 2025
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
ICCV 2025
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
CVPR 2025
A Unified Image-Dense Annotation Generation Model for Underwater Scenes
CVPR 2025
MINIMA: Modality Invariant Image Matching
CVPR 2025
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
ACL 2025
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
ICCV 2025
Training-free Geometric Image Editing on Diffusion Models
ICCV 2025
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
ICCV 2025
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
ICCV 2025
Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method
ICCV 2025
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
ICCV 2025
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
ICCV 2025
Multi-scenario Overlapping Text Segmentation with Depth Awareness
ICCV 2025
Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval
ICCV 2025
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
ICCV 2025
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
EMNLP 2025
Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
EMNLP 2025
PathVG: A New Benchmark and Dataset for Pathology Visual Grounding
MICCAI 2025
LION: Linear Group RNN for 3D Object Detection in Point Clouds
NIPS 2024
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
CVPR 2024
General Object Foundation Model for Images and Videos at Scale
CVPR 2024
Bridging the Gap Between End-to-End and Two-Step Text Spotting
CVPR 2024
OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition
CVPR 2024
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
CVPR 2024
PointMamba: A Simple State Space Model for Point Cloud Analysis
NIPS 2024
A Unified Framework for 3D Scene Understanding
NIPS 2024
Deciphering Oracle Bone Language with Diffusion Models
ACL 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
ECCV 2024
WAS: Dataset and Methods for Artistic Text Segmentation
ECCV 2024
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
ECCV 2024
PSALM: Pixelwise Segmentation with Large Multi-modal Model
ECCV 2024
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
ECCV 2024
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
ECCV 2024
SEED: A Simple and Effective 3D DETR in Point Clouds
ECCV 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
NIPS 2024
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
ICCV 2023
StereoDistill: Pick the Cream from LiDAR for Distilling Stereo-Based 3D Object Detection
AAAI 2023
Query-based Temporal Fusion with Explicit Motion for 3D Object Detection
NIPS 2023
Modeling Entities As Semantic Points for Visual Information Extraction in the Wild
CVPR 2023
InstMove: Instance Motion for Object-Centric Video Segmentation
CVPR 2023
Turning a CLIP Model Into a Scene Text Detector
CVPR 2023
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
CVPR 2023
Side Adapter Network for Open-Vocabulary Semantic Segmentation
CVPR 2023
SOOD: Towards Semi-Supervised Oriented Object Detection
CVPR 2023
CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
CVPR 2023
A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection
ICCV 2023
Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition
ECCV 2022
Knowledge Mining With Scene Text for Fine-Grained Recognition
CVPR 2022
An Empirical Study of End-to-End Temporal Action Detection
CVPR 2022
Vision-Language Pre-Training for Boosting Scene Text Detectors
CVPR 2022
Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection
CVPR 2022
Syntax-Aware Network for Handwritten Mathematical Expression Recognition
CVPR 2022
An End-to-End Transformer Model for Crowd Localization
ECCV 2022
GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation
ECCV 2022
CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer
ECCV 2022
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition
ECCV 2022
Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning
ECCV 2022
SeqFormer: Sequential Transformer for Video Instance Segmentation
ECCV 2022
In Defense of Online Models for Video Instance Segmentation
ECCV 2022
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model
ECCV 2022
Bootstrap Your Object Detector via Mixed Training
NIPS 2021
End-to-End Semi-Supervised Object Detection With Soft Teacher
ICCV 2021
Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship
CVPR 2021
Scene Text Retrieval via Joint Text Detection and Similarity Learning
CVPR 2021
WDNet: Watermark-Decomposition Network for Visible Watermark Removal
WACV 2021
Multi-Shot Temporal Event Localization: A Benchmark
CVPR 2021
MOST: A Multi-Oriented Scene Text Detector With Localization Refinement
CVPR 2021
FaceController: Controllable Attribute Editing for Face in the Wild
AAAI 2021
EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection
ECCV 2020
AutoSTR: Efficient Backbone Search for Scene Text Recognition
ECCV 2020
TANet: Robust 3D Object Detection from Point Clouds with Triple Attention
AAAI 2020
Real-Time Scene Text Detection with Differentiable Binarization
AAAI 2020
Semantically Multi-Modal Image Synthesis
CVPR 2020
Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation
CVPR 2020
All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting
AAAI 2020
TextScanner: Reading Characters in Order for Robust Scene Text Recognition
AAAI 2020
Intra-class Feature Variation Distillation for Semantic Segmentation
ECCV 2020
Scene Text Image Super-resolution in the wild
ECCV 2020
Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting
ECCV 2020
Progressive Pose Attention Transfer for Person Image Generation
CVPR 2019
Human-Like Delicate Region Erasing Strategy for Weakly Supervised Detection
AAAI 2019
DeepFlux for Skeletons in the Wild
CVPR 2019
Asymmetric Non-Local Neural Networks for Semantic Segmentation
ICCV 2019
View N-Gram Network for 3D Object Retrieval
ICCV 2019
Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting
ICCV 2019
Symmetry-Constrained Rectification Network for Scene Text Recognition
ICCV 2019
Scene Text Recognition from Two-Dimensional Perspective
AAAI 2019
DOTA: A Large-Scale Dataset for Object Detection in Aerial Images
CVPR 2018
Triplet-Center Loss for Multi-View 3D Object Retrieval
CVPR 2018
Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
ECCV 2018
Adaptively Transforming Graph Matching
ECCV 2018
Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification
IJCAI 2018
Hard-Aware Point-to-Set Deep Metric for Person Re-identification
ECCV 2018
Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
CVPR 2018
Rotation-Sensitive Regression for Oriented Scene Text Detection
CVPR 2018
Dynamic Multi-Task Learning with Convolutional Neural Network
IJCAI 2017
Scalable Person Re-Identification on Supervised Smoothed Manifold
CVPR 2017
Detecting Oriented Text in Natural Images by Linking Segments
CVPR 2017
Multiple Instance Detection Network With Online Instance Classifier Refinement
CVPR 2017
Richer Convolutional Features for Edge Detection
CVPR 2017
Ensemble Diffusion for Retrieval
ICCV 2017
Multi-Oriented Text Detection With Fully Convolutional Networks
CVPR 2016
GIFT: A Real-Time and Scalable 3D Shape Search Engine
CVPR 2016
Robust Scene Text Recognition With Automatic Rectification
CVPR 2016
Object Skeleton Extraction in Natural Images by Fusing Scale-Associated Deep Side Outputs
CVPR 2016
Relaxed Multiple-Instance SVM With Application to Object Discovery
ICCV 2015
Symmetry-Based Text Line Detection in Natural Scenes
CVPR 2015
DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection
CVPR 2015
Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition
CVPR 2014
Max-Margin Multiple-Instance Dictionary Learning
ICML 2013
Fusion with Diffusion for Robust Visual Tracking
NIPS 2012
Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
NIPS 2011
Multiscale Random Fields with Application to Contour Grouping
NIPS 2008