Jiatong Shi
49 papers · 2020–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
🌍 Conference Polyglot (9) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (12) 🧭 Keyword Pioneer 🏃 Academic Marathon (5)
🏃
Academic Marathon
(5)
🐝
Cross-Pollinator
(12)
🌈
Renaissance Researcher
(6)
🏠
Conference Loyalist
(24)
🔬
Deep Specialist
(15)
👥
Mega-Team
(76)
🤝
Dynamic Duo
(35)
🗃️
Keyword Collector
(165)
💎
Century Club
(47)
🔥
Unstoppable
(6)
⚡
Prolific Year
(19)
Conferences
INTERSPEECH (24)
ACL (12)
NAACL (5)
EACL (2)
ICLR (2)
AAAI (1)
EMNLP (1)
ICML (1)
IJCNLP (1)
Top co-authors
Keywords
self-supervised learning
(13)
automatic speech recognition
(13)
speech recognition
(9)
speech translation
(8)
speech synthesis
(7)
singing voice synthesis
(5)
speech representation
(5)
transfer learning
(4)
language documentation
(4)
simultaneous translation
(4)
speech-to-speech translation
(4)
speech processing
(4)
spoken language translation
(3)
beam search
(3)
end-to-end model
(3)
machine translation
(3)
knowledge distillation
(3)
multilingual speech
(3)
endangered language
(3)
end-to-end speech recognition
(3)
Papers
Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner
ACL 2026
BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction
EACL 2026
ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems
NAACL 2025
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
ICLR 2025
VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
NAACL 2025
ESPnet-SpeechLM: An Open Speech Language Model Toolkit
NAACL 2025
Make-A-Voice: Revisiting Voice Large Language Models as Scalable Multilingual and Multitask Learners
ACL 2024
FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN
ACL 2024
Towards Robust Speech Representation Learning for Thousands of Languages
EMNLP 2024
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
ICLR 2024
UniAudio: Towards Universal Audio Generation with Large Language Models
ICML 2024
PL-TTS: A Generalizable Prompt-based Diffusion TTS Augmented by Large Language Model
INTERSPEECH 2024
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
INTERSPEECH 2024
Self-supervised Speech Representations Still Struggle with African American Vernacular English
INTERSPEECH 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
INTERSPEECH 2024
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
INTERSPEECH 2024
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
INTERSPEECH 2024
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
INTERSPEECH 2024
TokSing: Singing Voice Synthesis based on Discrete Tokens
INTERSPEECH 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
INTERSPEECH 2024
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
INTERSPEECH 2024
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
AAAI 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
INTERSPEECH 2024
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
INTERSPEECH 2024
Wav2Gloss: Generating Interlinear Glossed Text from Speech
ACL 2024
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ACL 2023
UniLG: A Unified Structure-aware Framework for Lyrics Generation
ACL 2023
FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN
ACL 2023
CMU’s IWSLT 2023 Simultaneous Speech Translation System
ACL 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
INTERSPEECH 2023
Exploration on HuBERT with Multiple Resolution
INTERSPEECH 2023
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
INTERSPEECH 2023
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy
INTERSPEECH 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
INTERSPEECH 2022
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
INTERSPEECH 2022
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
INTERSPEECH 2022
CMU’s IWSLT 2022 Dialect Speech Translation System
ACL 2022
Findings of the IWSLT 2022 Evaluation Campaign
ACL 2022
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis
INTERSPEECH 2022
Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection
INTERSPEECH 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
ACL 2022
ESPnet-ST IWSLT 2021 Offline Speech Translation System
ACL 2021
SUPERB: Speech Processing Universal PERformance Benchmark
INTERSPEECH 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System
IJCNLP 2021
Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation
NAACL 2021
End-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl Mixtec
NAACL 2021
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec
EACL 2021
Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning
INTERSPEECH 2020
Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training
INTERSPEECH 2020