Xiaoshuai Sun

58 papers · 2013–2026 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (9) 🏃 Academic Marathon (12) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (9)

🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (9) 🤝 Dynamic Duo (45) 🏆 Grand Slam 🔬 Deep Specialist (15) 🧬 Topic Evolution 🏆 Keyword Champion (2) 📈 Trend Setter 🗃️ Keyword Collector (244) 🚀 Conference Pioneer 🔥 Unstoppable (8) 💎 Century Club (57) ⚡ Prolific Year (8)

Conferences

AAAI (17) CVPR (13) NIPS (9) ECCV (5) ICCV (4) ICML (4) ICLR (3) IJCAI (2) EMNLP (1)

Top co-authors

Rongrong Ji (45) Jiayi Ji (25) Yiyi Zhou (21) Gen Luo (14) Yiwei Ma (13) Yongjian Wu (9) Haowei Wang (9) Feiyue Huang (5) Mingrui Wu (5) Liujuan Cao (5)

Keywords

multimodal learning (10) attention mechanism (7) image captioning (6) semantic segmentation (5) object detection (4) knowledge distillation (4) contrastive learning (4) referring expression (4) multimodal large language model (4) convolutional neural network (4) image segmentation (3) visual question answering (3) referring expression comprehension (3) zero-shot learning (3) vision-language model (3) multi-modal learning (3) image retrieval (3) image generation (3) diffusion model (3) weakly supervised learning (2)

Papers

Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach AAAI 2026 FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression CVPR 2025 Towards General Visual-Linguistic Face Forgery Detection CVPR 2025 StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization AAAI 2025 IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation AAAI 2025 $\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models ICLR 2025 Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models ICLR 2025 Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models ICLR 2025 ACL: Activating Capability of Linear Attention for Image Restoration CVPR 2025 AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models ICCV 2025 Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks AAAI 2024 I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing NIPS 2024 ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models NIPS 2024 DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion NIPS 2024 RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation NIPS 2024 Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation AAAI 2024 X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks AAAI 2024 3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation AAAI 2024 Toward Open-Set Human Object Interaction Detection AAAI 2024 Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation CVPR 2024 Multi-branch Collaborative Learning Network for 3D Visual Grounding ECCV 2024 Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model ECCV 2024 AnyTrans: Translate AnyText in the Image with Large Scale Models EMNLP 2024 X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation ICML 2024 Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models ICML 2024 SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation ICML 2024 Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization ICML 2024 Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network AAAI 2023 RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension CVPR 2023 Clover: Towards a Unified Video-Language Alignment and Fusion Model CVPR 2023 RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension CVPR 2023 Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models NIPS 2023 Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models NIPS 2023 X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance ICCV 2023 End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation AAAI 2023 Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach NIPS 2022 Active Teacher for Semi-Supervised Object Detection CVPR 2022 An Information Theoretic Approach for Attention-Driven Face Forgery Detection ECCV 2022 PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation ECCV 2022 SeqTR: A Simple Yet Universal Network for Visual Grounding ECCV 2022 DIFNet: Boosting Visual Information Flow for Image Captioning CVPR 2022 TRAR: Routing the Attention Spans in Transformer for Visual Question Answering ICCV 2021 RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words CVPR 2021 Dual-level Collaborative Transformer for Image Captioning AAAI 2021 Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network AAAI 2021 SSAH: Semi-Supervised Adversarial Deep Hashing with Self-Paced Hard Sample Generation AAAI 2020 Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation CVPR 2020 Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images ICCV 2019 Hypergraph Induced Convolutional Manifold Networks IJCAI 2019 Variational Structured Semantic Inference for Diverse Image Captioning NIPS 2019 Towards Optimal Discrete Online Hashing with Balanced Similarity AAAI 2019 Towards Optimal Fine Grained Retrieval via Decorrelated Centralized Loss with Normalize-Scale Layer AAAI 2019 Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning AAAI 2019 Information Competing Process for Learning Diversified Representations NIPS 2019 Dynamic Capsule Attention for Visual Question Answering AAAI 2019 GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints CVPR 2018 Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval IJCAI 2018 Exploring Implicit Image Statistics for Visual Representativeness Modeling CVPR 2013