Yong Xu

87 papers · 2014–2026 · 14 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (21) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (14)

🌍 Conference Polyglot (14) 🏃 Academic Marathon (11) 🐝 Cross-Pollinator (11) 🏠 Conference Loyalist (22) 🏆 Keyword Champion (2) 🤝 Dynamic Duo (17) 🔬 Deep Specialist (15) 🏆 Grand Slam 🧬 Topic Evolution 🗃️ Keyword Collector (57) 🔥 Unstoppable (12) ⚡ Prolific Year (18) 🚀 Conference Pioneer 📈 Trend Setter 💎 Century Club (83)

Conferences

AAAI (22) INTERSPEECH (17) CVPR (12) NIPS (7) ICML (6) IJCAI (6) ACL (5) ICCV (4) COLING (2) ECCV (2) EMNLP (1) ICLR (1) MICCAI (1) NAACL (1)

Top co-authors

Chao Huang (17) Jie Wen (16) Chengliang Liu (13) Dong Yu (12) Meng Yu (12) Shi-Xiong Zhang (12) Xiaoling Luo (10) Lianghao Xia (7) Si Wu (6) Yabo Liu (6)

Keywords

speech separation (9) attention mechanism (8) graph neural network (7) large language model (7) representation learning (6) multi-view learning (5) diabetic retinopathy (4) convolutional neural network (4) multimodal learning (4) multi-label classification (4) zero-shot learning (4) multi-view clustering (4) incomplete multi-view (3) generative adversarial network (3) image restoration (3) domain adaptation (3) speech enhancement (3) prompt engineering (3) contrastive learning (3) speech recognition (3)

Papers

Towards Zero-Shot Diabetic Retinopathy Grading: Learning Generalized Knowledge via Prompt-Driven Matching and Emulating AAAI 2026 Vision-Language Models Guided Graph Concept Reasoning for Interpretable Diabetic Retinopathy Diagnosis AAAI 2026 PA-FAS: Towards Interpretable and Generalizable Multimodal Face Anti-Spoofing via Path-Augmented Reinforcement Learning AAAI 2026 Frequency-Aligned Cross-Modal Learning with Top-K Wavelet Fusion and Dynamic Expert Routing for Enhanced Retinal Disease Diagnosis AAAI 2026 RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting AAAI 2025 Federated Weakly Supervised Video Anomaly Detection with Multimodal Prompt AAAI 2025 Self-Correcting Robot Manipulation via Gaussian-Splatted Foresight AAAI 2025 Hard Sample Mining-based Tongue Diagnosis for Fatty Liver Disease Severity Classification MICCAI 2025 Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models ICML 2025 Mutual Learning for SAM Adaptation: A Dual Collaborative Network Framework for Source-Free Domain Transfer ICML 2025 Base-Detail Feature Learning Framework for Visible-Infrared Person Re-Identification IJCAI 2025 LogiGraph: Logical Reasoning with Contrastive Learning and Lightweight Graph Networks COLING 2025 EducationQ: Evaluating LLMs’ Teaching Capabilities Through Multi-Agent Dialogue Framework ACL 2025 Spectral Compressive Imaging via Unmixing-driven Subspace Diffusion Refinement ICLR 2025 Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent EMNLP 2025 Zero-Shot Low-Light Image Enhancement via Latent Diffusion Models AAAI 2025 Deep Hierarchies and Invariant Disease-Indicative Feature Learning for Computer Aided Diagnosis of Multiple Fundus Diseases AAAI 2025 OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision AAAI 2025 FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction ICML 2024 Zero-Shot Event-Intensity Asymmetric Stereo via Visual Prompting from Image Domain NIPS 2024 MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging NIPS 2024 Multi-Channel Multi-Speaker ASR Using Target Speaker’s Solo Segment INTERSPEECH 2024 LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization INTERSPEECH 2024 HACDR-Net: Heterogeneous-Aware Convolutional Network for Diabetic Retinopathy Multi-Lesion Segmentation AAAI 2024 Attention-Induced Embedding Imputation for Incomplete Multi-View Partial Multi-Label Classification AAAI 2024 QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction ACL 2024 Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation ACL 2024 Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments ACL 2024 Unsupervised Sign Language Translation and Generation ACL 2024 AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework COLING 2024 Diffusion-based Missing-view Generation With the Application on Incomplete Multi-view Clustering ICML 2024 Partial Multi-View Multi-Label Classification via Semantic Invariance Learning and Prototype Modeling ICML 2024 Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition ICML 2024 VRetouchEr: Learning Cross-frame Feature Interdependence with Imperfection Flow for Face Retouching in Videos CVPR 2024 Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis CVPR 2024 "Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance" ECCV 2024 Highly Confident Local Structure Based Consensus Graph Learning for Incomplete Multi-View Clustering CVPR 2023 CIGAR: Cross-Modality Graph Reasoning for Domain Adaptive Object Detection CVPR 2023 MVCINN: Multi-View Diabetic Retinopathy Detection Using a Deep Cross-Interaction Neural Network AAAI 2023 Coherent Event Guided Low-Light Video Enhancement ICCV 2023 Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers AAAI 2023 DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification AAAI 2023 Zoneformer: On-device Neural Beamformer For In-car Multi-zone Speech Separation, Enhancement and Echo Cancellation INTERSPEECH 2023 GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks NIPS 2023 Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning NIPS 2023 Streamable Speech Representation Disentanglement and Multi-Level Prosody Modeling for Live One-Shot Voice Conversion INTERSPEECH 2022 SwinTrack: A Simple and Strong Baseline for Transformer Tracking NIPS 2022 Fine-Grained Object Classification via Self-Supervised Pose Alignment CVPR 2022 SphericGAN: Semi-Supervised Hyper-Spherical Generative Adversarial Networks for Fine-Grained Image Synthesis CVPR 2022 CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for Single-Corpus and Cross-Corpus Speech Emotion Recognition IJCAI 2022 Joint Neural AEC and Beamforming with Double-Talk Detection INTERSPEECH 2022 Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter INTERSPEECH 2022 Encoding Spatial Distribution of Convolutional Features for Texture Representation NIPS 2021 Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network AAAI 2021 Knowledge-aware Coupled Graph Neural Network for Social Recommendation AAAI 2021 Spatial-Temporal Sequential Hypergraph Network for Crime Prediction with Dynamic Multiplex Relation Learning IJCAI 2021 Dual-Octave Convolution for Accelerated Parallel MR Image Reconstruction AAAI 2021 Graph-Enhanced Multi-Task Learning of Multi-Level Transition Dynamics for Session-based Recommendation AAAI 2021 Knowledge-Enhanced Hierarchical Graph Transformer Network for Multi-Behavior Recommendation AAAI 2021 Unified Tensor Framework for Incomplete Multi-view Clustering and Missing-view Inferring AAAI 2021 TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation INTERSPEECH 2021 MIMO Self-Attentive RNN Beamformer for Multi-Speaker Speech Separation INTERSPEECH 2021 MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment INTERSPEECH 2021 Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation INTERSPEECH 2021 Deep Texture Recognition via Exploiting Cross-Layer Statistical Self-Similarity CVPR 2021 Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation ICCV 2021 Hypergraph Neural Networks for Hypergraph Matching ICCV 2021 CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network IJCAI 2020 Neural Spatio-Temporal Beamformer for Target Speech Separation INTERSPEECH 2020 Audio-Visual Multi-Channel Recognition of Overlapped Speech INTERSPEECH 2020 Improved Speaker-Dependent Separation for CHiME-5 Challenge INTERSPEECH 2019 Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information INTERSPEECH 2019 LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking CVPR 2019 A Comprehensive Study of Speech Separation: Spectrogram vs Waveform Separation INTERSPEECH 2019 Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks IJCAI 2019 Unified Embedding Alignment with Missing Views Inferring for Incomplete Multi-View Clustering AAAI 2019 Adaptive GNN for Image Analysis and Editing NIPS 2019 Highly-Economized Multi-View Binary Compression for Scalable Image Clustering ECCV 2018 Bidirectional Attentive Fusion With Context Gating for Dense Video Captioning CVPR 2018 Intelligibilities of Mandarin Chinese Sentences with Spectral “Holes” INTERSPEECH 2017 Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging INTERSPEECH 2017 Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation CVPR 2017 Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks IJCAI 2016 TransRead: Designing a Bilingual Reading Experience with Machine Translation Technologies NAACL 2016 Sparse Coding for Classification via Discrimination Ensemble CVPR 2016 Removing Rain From a Single Image via Discriminative Sparse Coding ICCV 2015 Lacunarity Analysis on Image Patterns for Texture Classification CVPR 2014