Jinyu Li
59 papers · 2016–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π§ Keyword Pioneer π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (26) π Renaissance Researcher (5) π£ Hot Topic Early Bird
π
Cross-Pollinator
(13)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(26)
π
Conference Loyalist
(46)
π
Keyword Champion
(2)
π¬
Deep Specialist
(10)
π§¬
Topic Evolution
π€
Dynamic Duo
(21)
ποΈ
Keyword Collector
(90)
π
Century Club
(57)
π
Trend Setter
π
Conference Pioneer
β‘
Prolific Year
(9)
π₯
Unstoppable
(10)
β
The Questioner
Conferences
INTERSPEECH (46)
ACL (6)
EMNLP (2)
NIPS (2)
CVPR (1)
ICLR (1)
ICML (1)
Top co-authors
Keywords
automatic speech recognition
(20)
speech recognition
(15)
end-to-end speech recognition
(11)
word error rate
(10)
end-to-end model
(7)
recurrent neural network transducer
(7)
speech translation
(6)
speech separation
(5)
speech enhancement
(5)
transformer transducer
(5)
domain adaptation
(4)
speech synthesis
(4)
multi-task learning
(3)
attention mechanism
(3)
teacher-student learning
(3)
speaker adaptation
(3)
speaker identification
(3)
adversarial learning
(3)
acoustic model
(3)
feature mapping
(3)
Papers
Closing the Modality Reasoning Gap for Speech Large Language Models
ACL 2026
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
ACL 2026
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
ICLR 2025
Autoregressive Speech Synthesis without Vector Quantization
ACL 2025
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training
ACL 2025
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
ACL 2025
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
ICML 2024
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
INTERSPEECH 2024
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
INTERSPEECH 2024
Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
INTERSPEECH 2024
A multimodal approach to study the nature of coordinative patterns underlying speech rhythm
INTERSPEECH 2024
Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
INTERSPEECH 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
NIPS 2024
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
NIPS 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
EMNLP 2024
Accurate and Structured Pruning for Efficient Automatic Speech Recognition
INTERSPEECH 2023
LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers
INTERSPEECH 2023
PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds
CVPR 2023
Accelerating Transducers through Adjacent Token Merging
INTERSPEECH 2023
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
EMNLP 2022
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
ACL 2022
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
INTERSPEECH 2022
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
INTERSPEECH 2022
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
INTERSPEECH 2022
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
INTERSPEECH 2022
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
INTERSPEECH 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
INTERSPEECH 2022
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
INTERSPEECH 2022
Separating Long-Form Speech with Group-wise Permutation Invariant Training
INTERSPEECH 2022
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer
INTERSPEECH 2021
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions
INTERSPEECH 2021
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS
INTERSPEECH 2021
Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need
INTERSPEECH 2021
Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems
INTERSPEECH 2021
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification
INTERSPEECH 2021
A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems
INTERSPEECH 2021
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition
INTERSPEECH 2021
Ultra Fast Speech Separation Model with Teacher Student Learning
INTERSPEECH 2021
Investigation of Practical Aspects of Single Channel Speech Separation for ASR
INTERSPEECH 2021
Sequence-Level Self-Learning with Multiple Hypotheses
INTERSPEECH 2020
Exploring Transformers for Large-Scale Speech Recognition
INTERSPEECH 2020
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
INTERSPEECH 2020
An End-to-End Architecture of Online Multi-Channel Speech Separation
INTERSPEECH 2020
Semantic Mask for Transformer Based End-to-End Speech Recognition
INTERSPEECH 2020
Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator
INTERSPEECH 2020
Combination of End-to-End and Hybrid Models for Speech Recognition
INTERSPEECH 2020
Low Latency End-to-End Streaming Speech Recognition with a Scout Network
INTERSPEECH 2020
Transfer Learning Approaches for Streaming End-to-End Speech Recognition System
INTERSPEECH 2020
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability
INTERSPEECH 2020
Acoustic-to-Phrase Models for Speech Recognition
INTERSPEECH 2019
Layer Trajectory BLSTM
INTERSPEECH 2019
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
INTERSPEECH 2019
Adversarial Feature-Mapping for Speech Enhancement
INTERSPEECH 2018
Improved Training for Online End-to-end Speech Recognition Systems
INTERSPEECH 2018
Layer Trajectory LSTM
INTERSPEECH 2018
Cycle-Consistent Speech Enhancement
INTERSPEECH 2018
Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection
INTERSPEECH 2017
Large-Scale Domain Adaptation via Teacher-Student Learning
INTERSPEECH 2017
Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention
INTERSPEECH 2016