Wei-Ning Hsu
50 papers · 2016–2025 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π§ Keyword Pioneer π£ Hot Topic Early Bird π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (17) π Conference Polyglot (13)
π
Interdisciplinary Bridge
π
Academic Marathon
(9)
π
Cross-Pollinator
(5)
π€
Dynamic Duo
(14)
π
Triple Crown
π
Keyword Champion
(4)
π§¬
Topic Evolution
π¬
Deep Specialist
(19)
π
Trend Setter
π₯
Unstoppable
(10)
π
Conference Pioneer
β‘
Prolific Year
(14)
ποΈ
Keyword Collector
(165)
π
Century Club
(50)
Conferences
INTERSPEECH (19)
ACL (7)
NIPS (5)
ICLR (4)
ICML (4)
EMNLP (3)
NAACL (2)
COLING (1)
CVPR (1)
ECCV (1)
IJCNLP (1)
JMLR (1)
SEMEVAL (1)
Top co-authors
Research topics
Keywords
self-supervised learning
(15)
speech recognition
(12)
speech synthesis
(7)
unsupervised learning
(6)
discrete representation
(5)
disentangled representation
(4)
speech-to-speech translation
(4)
multimodal learning
(4)
speech generation
(4)
automatic speech recognition
(4)
variational autoencoder
(4)
speech translation
(3)
representation learning
(3)
speaker identity
(3)
speech representation
(3)
zero-shot learning
(3)
speaker verification
(3)
domain adaptation
(3)
generative model
(3)
language model
(3)
Papers
FlowDec: A flow-based full-band general audio codec with high perceptual quality
ICLR 2025
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
INTERSPEECH 2024
Scaling Speech Technology to 1,000+ Languages
JMLR 2024
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
ACL 2024
Generative Pre-training for Speech with Flow Matching
ICLR 2024
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
ECCV 2024
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
ICML 2024
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
INTERSPEECH 2023
Simple and Effective Unsupervised Speech Translation
ACL 2023
Speech-to-Speech Translation for a Real-world Unwritten Language
ACL 2023
ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration
CVPR 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
NIPS 2023
Toward Joint Language Modeling for Speech Units and Text
EMNLP 2023
Scaling Laws for Generative Mixed-Modal Language Models
ICML 2023
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
ICML 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
NIPS 2023
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
INTERSPEECH 2023
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
INTERSPEECH 2022
Textless Speech-to-Speech Translation on Real Data
NAACL 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
ICLR 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
NIPS 2022
Simple and Effective Unsupervised Speech Synthesis
INTERSPEECH 2022
Text-Free Prosody-Aware Generative Spoken Language Modeling
ACL 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
ICML 2022
On-demand compute reduction with stochastic wav2vec 2.0
INTERSPEECH 2022
Direct Speech-to-Speech Translation With Discrete Units
ACL 2022
textless-lib: a Library for Textless Spoken Language Processing
NAACL 2022
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
INTERSPEECH 2022
Unified Speech-Text Pre-training for Speech Translation and Recognition
ACL 2022
Robust Self-Supervised Audio-Visual Speech Recognition
INTERSPEECH 2022
Textless Speech Emotion Conversion using Discrete & Decomposed Representations
EMNLP 2022
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
INTERSPEECH 2021
Unsupervised Speech Recognition
NIPS 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
ACL 2021
fairseq SΛ2: A Scalable and Integrable Speech Synthesis Toolkit
EMNLP 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
IJCNLP 2021
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
INTERSPEECH 2021
Unsupervised Methods for Evaluating Speech Representations
INTERSPEECH 2020
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
INTERSPEECH 2020
Transfer Learning from Audio-Visual Grounding to Speech Recognition
INTERSPEECH 2019
An Unsupervised Autoregressive Model for Speech Representation Learning
INTERSPEECH 2019
Hierarchical Generative Modeling for Controllable Speech Synthesis
ICLR 2019
A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition
INTERSPEECH 2018
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
INTERSPEECH 2018
Scalable Factorized Hierarchical Variational Autoencoder Training
INTERSPEECH 2018
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
NIPS 2017
Learning Latent Representations for Speech Generation and Transformation
INTERSPEECH 2017
Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition
INTERSPEECH 2016
Neural Attention for Learning to Rank Questions in Community Question Answering
COLING 2016
SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering
SEMEVAL 2016