Jiatong Shi

49 papers · 2020–2026 · 9 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (9) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (12) 🧭 Keyword Pioneer 🏃 Academic Marathon (5)

🏃 Academic Marathon (5) 🐝 Cross-Pollinator (12) 🌈 Renaissance Researcher (6) 🏠 Conference Loyalist (24) 🔬 Deep Specialist (15) 👥 Mega-Team (76) 🤝 Dynamic Duo (35) 🗃️ Keyword Collector (165) 💎 Century Club (47) 🔥 Unstoppable (6) ⚡ Prolific Year (19)

Conferences

INTERSPEECH (24) ACL (12) NAACL (5) EACL (2) ICLR (2) AAAI (1) EMNLP (1) ICML (1) IJCNLP (1)

Top co-authors

Shinji Watanabe (37) Xuankai Chang (12) William Chen (11) Jinchuan Tian (10) Brian Yan (9) Qin Jin (8) Siddhant Arora (8) Yifan Peng (7) Yuning Wu (7) Yuxun Tang (7)

Keywords

self-supervised learning (13) automatic speech recognition (13) speech recognition (9) speech translation (8) speech synthesis (7) singing voice synthesis (5) speech representation (5) transfer learning (4) language documentation (4) simultaneous translation (4) speech-to-speech translation (4) speech processing (4) spoken language translation (3) beam search (3) end-to-end model (3) machine translation (3) knowledge distillation (3) multilingual speech (3) endangered language (3) end-to-end speech recognition (3)

Papers

Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner ACL 2026 BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction EACL 2026 ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems NAACL 2025 Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks ICLR 2025 VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music NAACL 2025 ESPnet-SpeechLM: An Open Speech Language Model Toolkit NAACL 2025 Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners ACL 2024 FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN ACL 2024 Towards Robust Speech Representation Learning for Thousands of Languages EMNLP 2024 Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction ICLR 2024 UniAudio: Towards Universal Audio Generation with Large Language Models ICML 2024 PL-TTS: A Generalizable Prompt-based Diffusion TTS Augmented by Large Language Model INTERSPEECH 2024 CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection INTERSPEECH 2024 Self-supervised Speech Representations Still Struggle with African American Vernacular English INTERSPEECH 2024 ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models INTERSPEECH 2024 EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios INTERSPEECH 2024 MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model INTERSPEECH 2024 SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models INTERSPEECH 2024 TokSing: Singing Voice Synthesis based on Discrete Tokens INTERSPEECH 2024 The Interspeech 2024 Challenge on Speech Processing Using Discrete Units INTERSPEECH 2024 Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing INTERSPEECH 2024 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head AAAI 2024 OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer INTERSPEECH 2024 ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets INTERSPEECH 2024 Wav2Gloss: Generating Interlinear Glossed Text from Speech ACL 2024 ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit ACL 2023 UniLG: A Unified Structure-aware Framework for Lyrics Generation ACL 2023 FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN ACL 2023 CMU’s IWSLT 2023 Simultaneous Speech Translation System ACL 2023 ML-SUPERB: Multilingual Speech Universal PERformance Benchmark INTERSPEECH 2023 Exploration on HuBERT with Multiple Resolution INTERSPEECH 2023 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders INTERSPEECH 2023 SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy INTERSPEECH 2022 Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation INTERSPEECH 2022 Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation INTERSPEECH 2022 VQ-T: RNN Transducers using Vector-Quantized Prediction Network States INTERSPEECH 2022 CMU’s IWSLT 2022 Dialect Speech Translation System ACL 2022 Findings of the IWSLT 2022 Evaluation Campaign ACL 2022 Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis INTERSPEECH 2022 Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection INTERSPEECH 2022 SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities ACL 2022 ESPnet-ST IWSLT 2021 Offline Speech Translation System ACL 2021 SUPERB: Speech Processing Universal PERformance Benchmark INTERSPEECH 2021 ESPnet-ST IWSLT 2021 Offline Speech Translation System IJCNLP 2021 Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation NAACL 2021 End-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl Mixtec NAACL 2021 Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec EACL 2021 Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning INTERSPEECH 2020 Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training INTERSPEECH 2020