Helen Meng
117 papers · 2005–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (39) π Interdisciplinary Bridge π Renaissance Researcher (7) π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
π§
Keyword Pioneer
πΊοΈ
Taxonomy Completionist
(39)
π
Conference Loyalist
(91)
π€
Dynamic Duo
(43)
π
Keyword Champion
(6)
π₯
Mega-Team
(20)
π¬
Deep Specialist
(18)
π
Trend Setter
π
Conference Pioneer
β‘
Prolific Year
(5)
π₯
Unstoppable
(11)
β
The Questioner
ποΈ
Keyword Collector
(130)
π
Century Club
(116)
Conferences
INTERSPEECH (91)
ACL (6)
EMNLP (6)
NAACL (5)
AAAI (2)
IJCAI (2)
NIPS (2)
COLING (1)
IJCNLP (1)
SEMEVAL (1)
Top co-authors
Keywords
automatic speech recognition
(13)
speech recognition
(13)
text-to-speech synthesis
(10)
voice conversion
(10)
speaker adaptation
(10)
language model
(8)
speech synthesis
(8)
large language model
(7)
recurrent neural network
(7)
speaker verification
(6)
speaker embedding
(6)
domain adaptation
(6)
data augmentation
(5)
long short-term memory
(5)
disordered speech
(5)
deep neural network
(5)
dysarthric speech
(5)
unsupervised learning
(4)
bayesian inference
(4)
speech separation
(4)
Papers
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
AAAI 2026
Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder
INTERSPEECH 2024
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
INTERSPEECH 2024
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
NAACL 2024
Rethinking Machine Ethics β Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
NAACL 2024
SongCreator: Lyrics-based Universal Song Generation
NIPS 2024
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
ACL 2024
COKE: A Cognitive Knowledge Graph for Machine Theory of Mind
ACL 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
INTERSPEECH 2024
LoRA-MER: Low-Rank Adaptation of Pre-Trained Speech Models for Multimodal Emotion Recognition Using Mutual Information
INTERSPEECH 2024
Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System
INTERSPEECH 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
INTERSPEECH 2024
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
INTERSPEECH 2024
UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner
NIPS 2024
SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes
AAAI 2024
Prompting Large Language Models with Mispronunciation Detection and Diagnosis Abilities
INTERSPEECH 2024
Parameter-efficient Fine-tuning of Speaker-Aware Dynamic Prompts for Speaker Verification
INTERSPEECH 2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
INTERSPEECH 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
INTERSPEECH 2024
Search Augmented Instruction Learning
EMNLP 2023
ConvRGX: Recognition, Generation, and Extraction for Self-trained Conversational Question Answering
ACL 2023
SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting
EMNLP 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
INTERSPEECH 2023
Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders
INTERSPEECH 2023
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition
INTERSPEECH 2023
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition
INTERSPEECH 2023
PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts
INTERSPEECH 2023
Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition
INTERSPEECH 2023
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
INTERSPEECH 2023
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis
INTERSPEECH 2023
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
INTERSPEECH 2023
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
INTERSPEECH 2023
On Controlling Fallback Responses for Grounded Dialogue Generation
ACL 2022
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification
INTERSPEECH 2022
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information
INTERSPEECH 2022
Speech Enhancement with Fullband-Subband Cross-Attention Network
INTERSPEECH 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
INTERSPEECH 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
INTERSPEECH 2022
Context-aware Multimodal Fusion for Emotion Recognition
INTERSPEECH 2022
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus
INTERSPEECH 2022
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion
INTERSPEECH 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
INTERSPEECH 2022
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
INTERSPEECH 2022
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems
INTERSPEECH 2022
Exploring linguistic feature and model combination for speech recognition based automatic AD detection
INTERSPEECH 2022
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information
INTERSPEECH 2022
Spoofing-Aware Speaker Verification by Multi-Level Fusion
INTERSPEECH 2022
Conformer Based Elderly Speech Recognition System for Alzheimerβs Disease Detection
INTERSPEECH 2022
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
INTERSPEECH 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
INTERSPEECH 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
INTERSPEECH 2022
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
INTERSPEECH 2022
Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis
COLING 2022
Partner Personas Generation for Dialogue Response Generation
NAACL 2022
Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout
ACL 2022
Towards Identifying Social Bias in Dialog Systems: Framework, Dataset, and Benchmark
EMNLP 2022
COLD: A Benchmark for Chinese Offensive Language Detection
EMNLP 2022
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion
INTERSPEECH 2021
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition
INTERSPEECH 2021
Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion
INTERSPEECH 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion
INTERSPEECH 2021
Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization
INTERSPEECH 2021
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis
INTERSPEECH 2021
Transformer Based End-to-End Mispronunciation Detection and Diagnosis
INTERSPEECH 2021
Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks
INTERSPEECH 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis
INTERSPEECH 2021
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition
INTERSPEECH 2021
Adversarial Data Augmentation for Disordered Speech Recognition
INTERSPEECH 2021
Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks
INTERSPEECH 2020
Transferring Source Style in Non-Parallel Voice Conversion
INTERSPEECH 2020
Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition
INTERSPEECH 2020
SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition
INTERSPEECH 2020
Investigation of Data Augmentation Techniques for Disordered Speech Recognition
INTERSPEECH 2020
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition
INTERSPEECH 2020
Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification
INTERSPEECH 2020
Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting
INTERSPEECH 2020
Speaker-Aware Linear Discriminant Analysis in Speaker Verification
INTERSPEECH 2020
Enhancing Monotonicity for Robust Autoregressive Transformer TTS
INTERSPEECH 2020
Audio-Visual Multi-Channel Recognition of Overlapped Speech
INTERSPEECH 2020
The CUHK Dysarthric Speech Recognition Systems for English and Cantonese
INTERSPEECH 2019
Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition
INTERSPEECH 2019
On the Use of Pitch Features for Disordered Speech Recognition
INTERSPEECH 2019
Towards Discriminative Representation Learning for Speech Emotion Recognition
IJCAI 2019
Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis
INTERSPEECH 2019
One-Shot Voice Conversion with Global Speaker Embeddings
INTERSPEECH 2019
Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams
INTERSPEECH 2019
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT
INTERSPEECH 2019
Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition
INTERSPEECH 2019
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition
INTERSPEECH 2019
Unsupervised Methods for Audio Classification from Lecture Discussion Recordings
INTERSPEECH 2019
Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models
INTERSPEECH 2019
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection
INTERSPEECH 2018
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
INTERSPEECH 2018
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
INTERSPEECH 2018
Speech and Language Processing for Learning and Wellbeing
INTERSPEECH 2018
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance
INTERSPEECH 2018
Gaussian Process Neural Networks for Speech Recognition
INTERSPEECH 2018
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
INTERSPEECH 2018
Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis
INTERSPEECH 2018
Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method
INTERSPEECH 2018
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space
INTERSPEECH 2017
DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances
INTERSPEECH 2017
Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion
INTERSPEECH 2017
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer
INTERSPEECH 2017
Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams
INTERSPEECH 2016
Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis
INTERSPEECH 2016
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data
INTERSPEECH 2016
Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition
INTERSPEECH 2016
Analysis on Gated Recurrent Unit Based Question Detection Approach
INTERSPEECH 2016
Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings
EMNLP 2015
Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation
IJCAI 2015
SeemGo: Conditional Random Fields Labeling and Maximum Entropy Classification for Aspect Based Sentiment Analysis
SEMEVAL 2014
Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features
IJCNLP 2009
Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features
ACL 2009
Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News
NAACL 2007
A Maximum Entropy Framework that Integrates Word Dependencies and Grammatical Relations for Reading Comprehension
NAACL 2006
The Use of Metadata, Web-derived Answer Patterns and Passage Context to Improve Reading Comprehension Performance
EMNLP 2005