Daniel Povey

48 papers · 2012–2025 · 4 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🗺️ Taxonomy Completionist (23) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🌍 Conference Polyglot (4) 🐝 Cross-Pollinator (9) 🗺️ Taxonomy Completionist (23) 🏠 Conference Loyalist (44) 🏆 Keyword Champion (3) 🧬 Topic Evolution 👥 Mega-Team (20) 🔬 Deep Specialist (19) 🤝 Dynamic Duo (36) 🚀 Conference Pioneer 🔥 Unstoppable (11) ⚡ Prolific Year (5) 💎 Century Club (48) 🗃️ Keyword Collector (95) 📈 Trend Setter

Conferences

INTERSPEECH (44) ICLR (2) AISTATS (1) EMNLP (1)

Top co-authors

Sanjeev Khudanpur (36) Yiming Wang (10) David Snyder (8) Vimal Manohar (8) Hainan Xu (7) Pegah Ghahremani (7) Wei Kang (6) Zengwei Yao (6) Long Lin (6) Fangjun Kuang (6)

Keywords

automatic speech recognition (13) deep neural network (10) word error rate (6) speaker diarization (5) speaker recognition (5) speech recognition (5) connectionist temporal classification (4) speaker embedding (4) probabilistic linear discriminant analysis (4) neural transducer (3) neural network (3) acoustic modeling (3) speaker verification (3) acoustic model (3) time delay neural network (3) lattice-free maximum mutual information (3) feature extraction (3) stochastic gradient descent (2) temporal modeling (2) speech corpus (2)

Papers

CR-CTC: Consistency regularization on CTC for improved speech recognition ICLR 2025 LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization INTERSPEECH 2024 Multi-Channel Multi-Speaker ASR Using Target Speaker’s Solo Segment INTERSPEECH 2024 Zipformer: A faster and better encoder for automatic speech recognition ICLR 2024 Enhancing Neural Transducer for Multilingual ASR with Synchronized Language Diarization INTERSPEECH 2024 Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation INTERSPEECH 2024 Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts INTERSPEECH 2023 Blank-regularized CTC for Frame Skipping in Neural Transducer INTERSPEECH 2023 GPU-accelerated Guided Source Separation for Meeting Transcription INTERSPEECH 2023 Delay-penalized CTC Implemented Based on Finite State Transducer INTERSPEECH 2023 Pruned RNN-T for fast, memory-eﬀicient ASR training INTERSPEECH 2022 speechocean762: An Open-Source Non-Native English Speech Corpus for Pronunciation Assessment INTERSPEECH 2021 GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10,000 Hours of Transcribed Audio INTERSPEECH 2021 An Alternative to MFCCs for ASR INTERSPEECH 2020 Efficient MDI Adaptation for n-Gram Language Models INTERSPEECH 2020 Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems INTERSPEECH 2020 Wake Word Detection with Alignment-Free Lattice-Free MMI INTERSPEECH 2020 Neural Language Modeling with Implicit Cache Pointers INTERSPEECH 2020 PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR INTERSPEECH 2020 Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN INTERSPEECH 2019 Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network INTERSPEECH 2019 Multi-PLDA Diarization on Children’s Speech INTERSPEECH 2019 State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18 INTERSPEECH 2019 x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition INTERSPEECH 2019 Speaker Recognition Benchmark Using the CHiME-5 Corpus INTERSPEECH 2019 The JHU Speaker Recognition System for the VOiCES 2019 Challenge INTERSPEECH 2019 The JHU ASR System for VOiCES from a Distance Challenge 2019 INTERSPEECH 2019 Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification INTERSPEECH 2018 Output-Gate Projected Gated Recurrent Unit for Speech Recognition INTERSPEECH 2018 Acoustic Modeling from Frequency Domain Representations of Speech INTERSPEECH 2018 End-to-end Deep Neural Network Age Estimation INTERSPEECH 2018 End-to-end Speech Recognition Using Lattice-free MMI INTERSPEECH 2018 Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition INTERSPEECH 2018 Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks INTERSPEECH 2018 Emotion Identification from Raw Speech Signals Using DNNs INTERSPEECH 2018 Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge INTERSPEECH 2018 A GPU-based WFST Decoder with Exact Lattice Generation INTERSPEECH 2018 An Exploration of Dropout with LSTMs INTERSPEECH 2017 Phone Duration Modeling for LVCSR Using Neural Networks INTERSPEECH 2017 Deep Neural Network Embeddings for Text-Independent Speaker Verification INTERSPEECH 2017 Backstitch: Counteracting Finite-Sample Bias via Negative Steps INTERSPEECH 2017 The Kaldi OpenKWS System: Improving Low Resource Keyword Search INTERSPEECH 2017 Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework INTERSPEECH 2017 Acoustic Modelling from the Signal Domain Using CNNs INTERSPEECH 2016 Far-Field ASR Without Parallel Data INTERSPEECH 2016 Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI INTERSPEECH 2016 A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation EMNLP 2015 Krylov Subspace Descent for Deep Learning AISTATS 2012