Ye Bai

15 papers · 2019–2025 · 3 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🌍 Conference Polyglot (3) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (6)

🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (3) 🏃 Academic Marathon (6) 🤝 Dynamic Duo (11) 🧬 Topic Evolution 💎 Century Club (15) 🗃️ Keyword Collector (60) ❓ The Questioner 🔥 Unstoppable (7)

Conferences

INTERSPEECH (13) EMNLP (1) ICLR (1)

Top co-authors

Jiangyan Yi (11) Jianhua Tao (11) Zhengkun Tian (10) Zhengqi Wen (8) SHUAI ZHANG (4) Cunhang Fan (2) Tao Wang (2) Haoxin Ma (2) Chenglong Wang (2) Zhuo Zhang (2)

Keywords

knowledge distillation (3) automatic speech recognition (3) speech recognition (3) parameter efficiency (2) model compression (2) end-to-end model (2) character error rate (2) end-to-end speech recognition (2) fake audio detection (2) connectionist temporal classification (2) attention mechanism (1) visual grounding (1) keyword spotting (1) spoofing detection (1) decision making (1) catastrophic forgetting (1) cross-modal learning (1) latent representation (1) sequence labeling (1) audio source separation (1)

Papers

Discrete Minds in a Continuous World: Do Language Models Know Time Passes? EMNLP 2025 PolyVoice: Language Models for Speech to Speech Translation ICLR 2024 TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking INTERSPEECH 2024 Image-driven Audio-visual Universal Source Separation INTERSPEECH 2023 Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition INTERSPEECH 2022 FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization INTERSPEECH 2021 End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition INTERSPEECH 2021 Continual Learning for Fake Audio Detection INTERSPEECH 2021 Half-Truth: A Partially Fake Audio Detection Dataset INTERSPEECH 2021 Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition INTERSPEECH 2020 Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition INTERSPEECH 2020 Focal Loss for Punctuation Prediction INTERSPEECH 2020 A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting INTERSPEECH 2019 Self-Attention Transducers for End-to-End Speech Recognition INTERSPEECH 2019 Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition INTERSPEECH 2019