conftrace_

Xiang Bai

121 papers · 2008–2026 · 12 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+17 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (20) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (20) 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (4) 🏠 Conference Loyalist (20) 🌱 Topic Pioneer 🔬 Deep Specialist (28) 🤝 Dynamic Duo (17) 🏆 Keyword Champion (8) 🏆 Grand Slam 📈 Trend Setter 💎 Century Club (115) 🗃️ Keyword Collector (53) 🚀 Conference Pioneer ⚡ Prolific Year (17) 🔥 Unstoppable (15) ❓ The Questioner

Conferences

CVPR (43) ECCV (24) ICCV (20) AAAI (12) NIPS (9) ACL (4) EMNLP (2) ICML (2) IJCAI (2) ICLR (1) MICCAI (1) WACV (1)

Top co-authors

dingkang liang (18) Yuliang Liu (17) Song Bai (17) Cong Yao (15) Xiaoqing Ye (12) Wenyu Liu (10) Zhen Zhu (9) Zhe Liu (9) Yongchao Xu (7) Minghui Liao (7)

Keywords

object detection (13) semantic segmentation (12) text detection (9) scene text detection (8) scene text (8) convolutional neural network (7) multimodal learning (7) scene text recognition (6) 3d object detection (6) transfer learning (6) document understanding (5) image segmentation (5) vision-language model (5) text recognition (5) semi-supervised learning (5) point cloud (4) instance segmentation (4) neural network (4) multimodal large language model (4) text spotting (4)

Papers

StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression AAAI 2026 I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing ACL 2026 AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale AAAI 2026 Doc-V*: Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA ACL 2026 Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution AAAI 2026 OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward AAAI 2026 VIP: Vision Instructed Pre-training for Robotic Manipulation ICML 2025 Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid ICLR 2025 AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation ICCV 2025 SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting CVPR 2025 A Unified Image-Dense Annotation Generation Model for Underwater Scenes CVPR 2025 MINIMA: Modality Invariant Image Matching CVPR 2025 MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering ACL 2025 LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance ICCV 2025 Training-free Geometric Image Editing on Diffusion Models ICCV 2025 DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding ICCV 2025 ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation ICCV 2025 Towards Comprehensive Lecture Slides Understanding: Large-scale Dataset and Effective Method ICCV 2025 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video ICCV 2025 LLaVA-KD: A Framework of Distilling Multimodal Large Language Models ICCV 2025 Multi-scenario Overlapping Text Segmentation with Depth Awareness ICCV 2025 Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval ICCV 2025 HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation ICCV 2025 WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? EMNLP 2025 Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning EMNLP 2025 PathVG: A New Benchmark and Dataset for Pathology Visual Grounding MICCAI 2025 LION: Linear Group RNN for 3D Object Detection in Point Clouds NIPS 2024 Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models CVPR 2024 General Object Foundation Model for Images and Videos at Scale CVPR 2024 Bridging the Gap Between End-to-End and Two-Step Text Spotting CVPR 2024 OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition CVPR 2024 Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis CVPR 2024 PointMamba: A Simple State Space Model for Point Cloud Analysis NIPS 2024 A Unified Framework for 3D Scene Understanding NIPS 2024 Deciphering Oracle Bone Language with Diffusion Models ACL 2024 PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects ECCV 2024 WAS: Dataset and Methods for Artistic Text Segmentation ECCV 2024 Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression ECCV 2024 PSALM: Pixelwise Segmentation with Large Multi-modal Model ECCV 2024 OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection ECCV 2024 SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer ECCV 2024 SEED: A Simple and Effective 3D DETR in Point Clouds ECCV 2024 MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks NIPS 2024 ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer ICCV 2023 StereoDistill: Pick the Cream from LiDAR for Distilling Stereo-Based 3D Object Detection AAAI 2023 Query-based Temporal Fusion with Explicit Motion for 3D Object Detection NIPS 2023 Modeling Entities As Semantic Points for Visual Information Extraction in the Wild CVPR 2023 InstMove: Instance Motion for Object-Centric Video Segmentation CVPR 2023 Turning a CLIP Model Into a Scene Text Detector CVPR 2023 CAPE: Camera View Position Embedding for Multi-View 3D Object Detection CVPR 2023 Side Adapter Network for Open-Vocabulary Semantic Segmentation CVPR 2023 SOOD: Towards Semi-Supervised Oriented Object Detection CVPR 2023 CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model CVPR 2023 A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection ICCV 2023 Toward Understanding WordArt: Corner-Guided Transformer for Scene Text Recognition ECCV 2022 Knowledge Mining With Scene Text for Fine-Grained Recognition CVPR 2022 An Empirical Study of End-to-End Temporal Action Detection CVPR 2022 Vision-Language Pre-Training for Boosting Scene Text Detectors CVPR 2022 Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection CVPR 2022 Syntax-Aware Network for Handwritten Mathematical Expression Recognition CVPR 2022 An End-to-End Transformer Model for Crowd Localization ECCV 2022 GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation ECCV 2022 CCPL: Contrastive Coherence Preserving Loss for Versatile Style Transfer ECCV 2022 When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition ECCV 2022 Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning ECCV 2022 SeqFormer: Sequential Transformer for Video Instance Segmentation ECCV 2022 In Defense of Online Models for Video Instance Segmentation ECCV 2022 A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model ECCV 2022 Bootstrap Your Object Detector via Mixed Training NIPS 2021 End-to-End Semi-Supervised Object Detection With Soft Teacher ICCV 2021 Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship CVPR 2021 Scene Text Retrieval via Joint Text Detection and Similarity Learning CVPR 2021 WDNet: Watermark-Decomposition Network for Visible Watermark Removal WACV 2021 Multi-Shot Temporal Event Localization: A Benchmark CVPR 2021 MOST: A Multi-Oriented Scene Text Detector With Localization Refinement CVPR 2021 FaceController: Controllable Attribute Editing for Face in the Wild AAAI 2021 EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection ECCV 2020 AutoSTR: Efficient Backbone Search for Scene Text Recognition ECCV 2020 TANet: Robust 3D Object Detection from Point Clouds with Triple Attention AAAI 2020 Real-Time Scene Text Detection with Differentiable Binarization AAAI 2020 Semantically Multi-Modal Image Synthesis CVPR 2020 Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation CVPR 2020 All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting AAAI 2020 TextScanner: Reading Characters in Order for Robust Scene Text Recognition AAAI 2020 Intra-class Feature Variation Distillation for Semantic Segmentation ECCV 2020 Scene Text Image Super-resolution in the wild ECCV 2020 Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting ECCV 2020 Progressive Pose Attention Transfer for Person Image Generation CVPR 2019 Human-Like Delicate Region Erasing Strategy for Weakly Supervised Detection AAAI 2019 DeepFlux for Skeletons in the Wild CVPR 2019 Asymmetric Non-Local Neural Networks for Semantic Segmentation ICCV 2019 View N-Gram Network for 3D Object Retrieval ICCV 2019 Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting ICCV 2019 Symmetry-Constrained Rectification Network for Scene Text Recognition ICCV 2019 Scene Text Recognition from Two-Dimensional Perspective AAAI 2019 DOTA: A Large-Scale Dataset for Object Detection in Aerial Images CVPR 2018 Triplet-Center Loss for Multi-View 3D Object Retrieval CVPR 2018 Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes ECCV 2018 Adaptively Transforming Graph Matching ECCV 2018 Cascaded SR-GAN for Scale-Adaptive Low Resolution Person Re-identification IJCAI 2018 Hard-Aware Point-to-Set Deep Metric for Person Re-identification ECCV 2018 Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation CVPR 2018 Rotation-Sensitive Regression for Oriented Scene Text Detection CVPR 2018 Dynamic Multi-Task Learning with Convolutional Neural Network IJCAI 2017 Scalable Person Re-Identification on Supervised Smoothed Manifold CVPR 2017 Detecting Oriented Text in Natural Images by Linking Segments CVPR 2017 Multiple Instance Detection Network With Online Instance Classifier Refinement CVPR 2017 Richer Convolutional Features for Edge Detection CVPR 2017 Ensemble Diffusion for Retrieval ICCV 2017 Multi-Oriented Text Detection With Fully Convolutional Networks CVPR 2016 GIFT: A Real-Time and Scalable 3D Shape Search Engine CVPR 2016 Robust Scene Text Recognition With Automatic Rectification CVPR 2016 Object Skeleton Extraction in Natural Images by Fusing Scale-Associated Deep Side Outputs CVPR 2016 Relaxed Multiple-Instance SVM With Application to Object Discovery ICCV 2015 Symmetry-Based Text Line Detection in Natural Scenes CVPR 2015 DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection CVPR 2015 Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition CVPR 2014 Max-Margin Multiple-Instance Dictionary Learning ICML 2013 Fusion with Diffusion for Robust Visual Tracking NIPS 2012 Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning NIPS 2011 Multiscale Random Fields with Application to Contour Grouping NIPS 2008