Xiaoshuai Sun
58 papers · 2013–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
🌍 Conference Polyglot (9) 🏃 Academic Marathon (12) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (9)
🌈
Renaissance Researcher
(6)
🐣
Hot Topic Early Bird
🌍
Conference Polyglot
(9)
🤝
Dynamic Duo
(45)
🏆
Grand Slam
🔬
Deep Specialist
(15)
🧬
Topic Evolution
🏆
Keyword Champion
(2)
📈
Trend Setter
🗃️
Keyword Collector
(244)
🚀
Conference Pioneer
🔥
Unstoppable
(8)
💎
Century Club
(57)
⚡
Prolific Year
(8)
Conferences
AAAI (17)
CVPR (13)
NIPS (9)
ECCV (5)
ICCV (4)
ICML (4)
ICLR (3)
IJCAI (2)
EMNLP (1)
Top co-authors
Keywords
multimodal learning
(10)
attention mechanism
(7)
image captioning
(6)
semantic segmentation
(5)
object detection
(4)
knowledge distillation
(4)
contrastive learning
(4)
referring expression
(4)
multimodal large language model
(4)
convolutional neural network
(4)
image segmentation
(3)
visual question answering
(3)
referring expression comprehension
(3)
zero-shot learning
(3)
vision-language model
(3)
multi-modal learning
(3)
image retrieval
(3)
image generation
(3)
diffusion model
(3)
weakly supervised learning
(2)
Papers
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
AAAI 2026
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
CVPR 2025
Towards General Visual-Linguistic Face Forgery Detection
CVPR 2025
StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization
AAAI 2025
IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
AAAI 2025
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
ICLR 2025
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
ICLR 2025
Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models
ICLR 2025
ACL: Activating Capability of Linear Attention for Image Restoration
CVPR 2025
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
ICCV 2025
Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
AAAI 2024
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
NIPS 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
NIPS 2024
DiffusionFake: Enhancing Generalization in Deepfake Detection via Guided Stable Diffusion
NIPS 2024
RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
NIPS 2024
Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation
AAAI 2024
X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks
AAAI 2024
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation
AAAI 2024
Toward Open-Set Human Object Interaction Detection
AAAI 2024
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
CVPR 2024
Multi-branch Collaborative Learning Network for 3D Visual Grounding
ECCV 2024
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
ECCV 2024
AnyTrans: Translate AnyText in the Image with Large Scale Models
EMNLP 2024
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation
ICML 2024
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models
ICML 2024
SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation
ICML 2024
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
ICML 2024
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
AAAI 2023
RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
CVPR 2023
Clover: Towards a Unified Video-Language Alignment and Fusion Model
CVPR 2023
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
CVPR 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
NIPS 2023
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
NIPS 2023
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
ICCV 2023
End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
AAAI 2023
Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
NIPS 2022
Active Teacher for Semi-Supervised Object Detection
CVPR 2022
An Information Theoretic Approach for Attention-Driven Face Forgery Detection
ECCV 2022
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation
ECCV 2022
SeqTR: A Simple Yet Universal Network for Visual Grounding
ECCV 2022
DIFNet: Boosting Visual Information Flow for Image Captioning
CVPR 2022
TRAR: Routing the Attention Spans in Transformer for Visual Question Answering
ICCV 2021
RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words
CVPR 2021
Dual-level Collaborative Transformer for Image Captioning
AAAI 2021
Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network
AAAI 2021
SSAH: Semi-Supervised Adversarial Deep Hashing with Self-Paced Hard Sample Generation
AAAI 2020
Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
CVPR 2020
Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images
ICCV 2019
Hypergraph Induced Convolutional Manifold Networks
IJCAI 2019
Variational Structured Semantic Inference for Diverse Image Captioning
NIPS 2019
Towards Optimal Discrete Online Hashing with Balanced Similarity
AAAI 2019
Towards Optimal Fine Grained Retrieval via Decorrelated Centralized Loss with Normalize-Scale Layer
AAAI 2019
Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning
AAAI 2019
Information Competing Process for Learning Diversified Representations
NIPS 2019
Dynamic Capsule Attention for Visual Question Answering
AAAI 2019
GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints
CVPR 2018
Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval
IJCAI 2018
Exploring Implicit Image Statistics for Visual Representativeness Modeling
CVPR 2013