Wei-Ning Hsu

50 papers · 2016–2025 · 13 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (17) 🌍 Conference Polyglot (13)

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (9) 🐝 Cross-Pollinator (5) 🤝 Dynamic Duo (14) 👑 Triple Crown 🏆 Keyword Champion (4) 🧬 Topic Evolution 🔬 Deep Specialist (19) 📈 Trend Setter 🔥 Unstoppable (10) 🚀 Conference Pioneer ⚡ Prolific Year (14) 🗃️ Keyword Collector (165) 💎 Century Club (50)

Conferences

INTERSPEECH (19) ACL (7) NIPS (5) ICLR (4) ICML (4) EMNLP (3) NAACL (2) COLING (1) CVPR (1) ECCV (1) IJCNLP (1) JMLR (1) SEMEVAL (1)

Top co-authors

James Glass (14) Bowen Shi (13) Yossi Adi (12) Michael Auli (11) Ann Lee (9) Changhan Wang (9) Juan Pino (9) Alexei Baevski (9) Abdelrahman Mohamed (8) Jiatao Gu (6)

Research topics

Analysis (1)

Keywords

self-supervised learning (15) speech recognition (12) speech synthesis (7) unsupervised learning (6) discrete representation (5) disentangled representation (4) speech-to-speech translation (4) multimodal learning (4) speech generation (4) automatic speech recognition (4) variational autoencoder (4) speech translation (3) representation learning (3) speaker identity (3) speech representation (3) zero-shot learning (3) speaker verification (3) domain adaptation (3) generative model (3) language model (3)

Papers

FlowDec: A flow-based full-band general audio codec with high perceptual quality ICLR 2025 Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning INTERSPEECH 2024 Scaling Speech Technology to 1,000+ Languages JMLR 2024 XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception ACL 2024 Generative Pre-training for Speech with Flow Matching ICLR 2024 Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos ECCV 2024 MusicFlow: Cascaded Flow Matching for Text Guided Music Generation ICML 2024 MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation INTERSPEECH 2023 Simple and Effective Unsupervised Speech Translation ACL 2023 Speech-to-Speech Translation for a Real-world Unwritten Language ACL 2023 ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration CVPR 2023 Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale NIPS 2023 Toward Joint Language Modeling for Speech Units and Text EMNLP 2023 Scaling Laws for Generative Mixed-Modal Language Models ICML 2023 Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language ICML 2023 DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning NIPS 2023 Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis INTERSPEECH 2023 Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation INTERSPEECH 2022 Textless Speech-to-Speech Translation on Real Data NAACL 2022 Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction ICLR 2022 u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality NIPS 2022 Simple and Effective Unsupervised Speech Synthesis INTERSPEECH 2022 Text-Free Prosody-Aware Generative Spoken Language Modeling ACL 2022 data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language ICML 2022 On-demand compute reduction with stochastic wav2vec 2.0 INTERSPEECH 2022 Direct Speech-to-Speech Translation With Discrete Units ACL 2022 textless-lib: a Library for Textless Spoken Language Processing NAACL 2022 Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT INTERSPEECH 2022 Unified Speech-Text Pre-training for Speech Translation and Recognition ACL 2022 Robust Self-Supervised Audio-Visual Speech Recognition INTERSPEECH 2022 Textless Speech Emotion Conversion using Discrete & Decomposed Representations EMNLP 2022 Speech Resynthesis from Discrete Disentangled Self-Supervised Representations INTERSPEECH 2021 Unsupervised Speech Recognition NIPS 2021 Text-Free Image-to-Speech Synthesis Using Learned Segmental Units ACL 2021 fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit EMNLP 2021 Text-Free Image-to-Speech Synthesis Using Learned Segmental Units IJCNLP 2021 Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training INTERSPEECH 2021 Unsupervised Methods for Evaluating Speech Representations INTERSPEECH 2020 A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning INTERSPEECH 2020 Transfer Learning from Audio-Visual Grounding to Speech Recognition INTERSPEECH 2019 An Unsupervised Autoregressive Model for Speech Representation Learning INTERSPEECH 2019 Hierarchical Generative Modeling for Controllable Speech Synthesis ICLR 2019 A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition INTERSPEECH 2018 Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition INTERSPEECH 2018 Scalable Factorized Hierarchical Variational Autoencoder Training INTERSPEECH 2018 Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data NIPS 2017 Learning Latent Representations for Speech Generation and Transformation INTERSPEECH 2017 Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition INTERSPEECH 2016 Neural Attention for Learning to Rank Questions in Community Question Answering COLING 2016 SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering SEMEVAL 2016