Ya Li

25 papers · 2015–2026 · 9 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (11) 🐣 Hot Topic Early Bird

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧬 Topic Evolution 🏆 Keyword Champion (2) 🗃️ Keyword Collector (116) 📈 Trend Setter 💎 Century Club (24) 🔥 Unstoppable (11) 🚀 Conference Pioneer

Conferences

INTERSPEECH (13) AAAI (2) CVPR (2) ICML (2) IJCAI (2) ACL (1) ECCV (1) EMNLP (1) NIPS (1)

Top co-authors

Yingming Gao (8) Jianhua Tao (7) Xinmei Tian (6) Zhengqi Wen (5) Tongliang Liu (4) Dacheng Tao (3) Yibin Zheng (3) Fengping Wang (2) Bingsong Bai (2) Jiangyan Yi (2)

Keywords

diffusion model (3) text-to-speech synthesis (3) speech synthesis (3) knowledge distillation (3) phoneme embedding (2) large language model (2) prosodic boundary (2) audio codec (2) bidirectional lstm (2) word embedding (2) zero-shot learning (2) gradient boosting decision tree (2) singing voice conversion (2) attention mechanism (2) black-box attack (2) deep neural network (2) image classification (2) adversarial example (2) depression detection (2) neural network (2)

Papers

HQ-SVC: Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios AAAI 2026 OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition ICML 2025 Controllable 3D Dance Generation Using Diffusion-Based Transformer U-Net AAAI 2025 Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense Diagnosis ACL 2025 Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition INTERSPEECH 2024 SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion INTERSPEECH 2024 Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining INTERSPEECH 2024 Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model INTERSPEECH 2024 FTA-net: A Frequency and Time Attention Network for Speech Depression Detection INTERSPEECH 2023 Towards Lightweight Black-Box Attack Against Deep Neural Networks NIPS 2022 ECAPA-TDNN Based Depression Detection from Clinical Speech INTERSPEECH 2022 Cross Attention Augmented Transducer Networks for Simultaneous Translation EMNLP 2021 Dual-Path Distillation: A Unified Framework to Improve Black-Box Attacks ICML 2020 Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking CVPR 2020 Compact Feature Learning for Multi-Domain Image Classification CVPR 2019 Deep Domain Generalization via Conditional Invariant Adversarial Networks ECCV 2018 BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End INTERSPEECH 2018 Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function INTERSPEECH 2018 Classification and Representation Joint Learning via Deep Networks IJCAI 2017 Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction INTERSPEECH 2017 Distilling Knowledge from an Ensemble of Models for Punctuation Prediction INTERSPEECH 2017 Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach INTERSPEECH 2016 The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis INTERSPEECH 2016 The Rhythmic Constraint on Prosodic Boundaries in Mandarin Chinese Based on Corpora of Silent Reading and Speech Perception INTERSPEECH 2016 Multi-Task Model and Feature Joint Learning IJCAI 2015