Song Bai

58 papers · 2016–2026 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (10) 🌍 Conference Polyglot (8) 🌈 Renaissance Researcher (8) 🗺️ Taxonomy Completionist (91)

🗺️ Taxonomy Completionist (91) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (23) 🏆 Keyword Champion (2) 🧬 Topic Evolution 🤝 Dynamic Duo (17) 🚀 Conference Pioneer 💎 Century Club (57) 🔥 Unstoppable (11) 🗃️ Keyword Collector (223) 📈 Trend Setter ❓ The Questioner ⚡ Prolific Year (10)

Conferences

CVPR (23) ICCV (14) ECCV (11) ICLR (4) AAAI (2) AACL (1) IJCNLP (1) NIPS (1) WACV (1)

Top co-authors

Xiang Bai (17) Wenqing Zhang (9) Chuhui Xue (8) Alan Yuille (7) Philip H.S. Torr (7) Qihao Liu (6) Shijian Lu (5) Yi Jiang (5) Junfeng Wu (5) Yuyin Zhou (4)

Keywords

semantic segmentation (8) object detection (5) instance segmentation (4) video object segmentation (4) video understanding (3) person re-identification (3) neural network (3) 3d object retrieval (3) diffusion model (3) contrastive learning (2) prompt engineering (2) instruction following (2) metric learning (2) object tracking (2) transfer learning (2) representation learning (2) zero-shot learning (2) video generation (2) model compression (2) action recognition (2)

Papers

Learning to Animate Images from A Few Videos to Portray Delicate Human Actions WACV 2026 TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding ICCV 2025 Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval ICCV 2025 Structured Outputs in Prompt Engineering: Enhancing LLM Adaptability on Counterintuitive Instructions IJCNLP 2025 Structured Outputs in Prompt Engineering: Enhancing LLM Adaptability on Counterintuitive Instructions AACL 2025 Versatile Transition Generation with Image-to-Video Diffusion ICCV 2025 DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing CVPR 2024 Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search ICLR 2024 PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects ECCV 2024 Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images ECCV 2024 DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data CVPR 2024 General Object Foundation Model for Images and Videos at Scale CVPR 2024 InstMove: Instance Motion for Object-Centric Video Segmentation CVPR 2023 PLA: Language-Driven Open-Vocabulary 3D Scene Understanding CVPR 2023 Mixed Samples as Probes for Unsupervised Model Selection in Domain Adaptation NIPS 2023 MOSE: A New Dataset for Video Object Segmentation in Complex Scenes ICCV 2023 SRFormer: Permuted Self-Attention for Single Image Super-Resolution ICCV 2023 PV3D: A 3D Generative Model for Portrait Video Generation ICLR 2023 Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning ICLR 2023 IS SYNTHETIC DATA FROM GENERATIVE MODELS READY FOR IMAGE RECOGNITION? ICLR 2023 DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion CVPR 2022 Explicit Occlusion Reasoning for Multi-Person 3D Human Pose Estimation ECCV 2022 Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting ECCV 2022 Contextual Text Block Detection towards Scene Text Understanding ECCV 2022 SeqFormer: Sequential Transformer for Video Instance Segmentation ECCV 2022 In Defense of Online Models for Video Instance Segmentation ECCV 2022 Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning CVPR 2022 An Empirical Study of End-to-End Temporal Action Detection CVPR 2022 Knowledge Distillation As Efficient Pre-Training: Faster Convergence, Higher Data-Efficiency, and Better Transferability CVPR 2022 Fourier Document Restoration for Robust Document Dewarping and Recognition CVPR 2022 TransMix: Attend To Mix for Vision Transformers CVPR 2022 YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset CVPR 2022 Multi-Shot Temporal Event Localization: A Benchmark CVPR 2021 PlaneTR: Structure-Guided Transformers for 3D Plane Recovery ICCV 2021 SwiftNet: Real-Time Video Object Segmentation CVPR 2021 Anchor-Free Person Search CVPR 2021 Holistically-Attracted Wireframe Parsing CVPR 2020 Neural Architecture Search for Lightweight Non-Local Networks CVPR 2020 Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses ECCV 2020 Importance-Aware Semantic Segmentation in Self-Driving with Discrete Wasserstein Training AAAI 2020 XingGAN for Person Image Generation ECCV 2020 Learning Transferable Adversarial Examples via Ghost Networks AAAI 2020 Corner Proposal Network for Anchor-free, Two-stage Object Detection ECCV 2020 Learning Attraction Field Representation for Robust Line Segment Detection CVPR 2019 Improving Transferability of Adversarial Examples With Input Diversity CVPR 2019 Asymmetric Non-Local Neural Networks for Semantic Segmentation ICCV 2019 Anchor Diffusion for Unsupervised Video Object Segmentation ICCV 2019 CenterNet: Keypoint Triplets for Object Detection ICCV 2019 View N-Gram Network for 3D Object Retrieval ICCV 2019 Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting ICCV 2019 Symmetry-Constrained Rectification Network for Scene Text Recognition ICCV 2019 Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation ICCV 2019 Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification CVPR 2019 Triplet-Center Loss for Multi-View 3D Object Retrieval CVPR 2018 Hard-Aware Point-to-Set Deep Metric for Person Re-identification ECCV 2018 Ensemble Diffusion for Retrieval ICCV 2017 Scalable Person Re-Identification on Supervised Smoothed Manifold CVPR 2017 GIFT: A Real-Time and Scalable 3D Shape Search Engine CVPR 2016