Mark Hasegawa-Johnson

61 papers · 2012–2025 · 11 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (19) 🌍 Conference Polyglot (11)

🐝 Cross-Pollinator (13) 🌈 Renaissance Researcher (11) 🧭 Keyword Pioneer 🏠 Conference Loyalist (37) 👥 Mega-Team (20) 🔬 Deep Specialist (16) 🗃️ Keyword Collector (251) ⚡ Prolific Year (9) 🚀 Conference Pioneer 📈 Trend Setter 💎 Century Club (61) 🔥 Unstoppable (10) ❓ The Questioner

Conferences

INTERSPEECH (37) ACL (6) ICML (5) COLING (3) EMNLP (3) NAACL (2) AAAI (1) AISTATS (1) CVPR (1) ICCV (1) WACV (1)

Top co-authors

Yang Zhang (9) John Harvill (9) Chang Yoo (8) Shiyu Chang (8) Kaizhi Qian (8) Hee Suk Yoon (7) Eunseop Yoon (7) Heting Gao (7) Chang D. Yoo (6) Junrui Ni (5)

Research topics

Education (1)

Keywords

speech recognition (12) automatic speech recognition (11) transfer learning (8) unsupervised learning (7) multi-task learning (6) deep neural network (5) neural network (4) cross-lingual transfer (4) zero-shot learning (4) probabilistic transcription (4) self-supervised learning (4) speech processing (4) multimodal learning (3) voice conversion (3) speech synthesis (3) under-resourced language (3) representation learning (3) convolutional neural network (3) spoken language understanding (2) speech enhancement (2)

Papers

SyncDiff: Diffusion-Based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization WACV 2025 Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility INTERSPEECH 2024 TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback ACL 2024 Visualization for improving foreign language pronunciation INTERSPEECH 2024 Finding Spoken Identifications: Using GPT-4 Annotation for an Efficient and Fast Dataset Creation Pipeline COLING 2024 Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis INTERSPEECH 2024 LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition INTERSPEECH 2024 A Theory of Unsupervised Speech Recognition ACL 2023 End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions INTERSPEECH 2023 Wav2ToBI: a new approach to automatic ToBI transcription INTERSPEECH 2023 Listen, Decipher and Sign: Toward Unsupervised Speech-to-Sign Language Recognition ACL 2023 Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction INTERSPEECH 2023 INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition ACL 2023 One-Shot Exemplification Modeling via Latent Sense Representations ACL 2023 Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio INTERSPEECH 2023 One-Shot and Few-Shot Exemplification Modeling EMNLP 2023 Equivariance Discovery by Learned Parameter-Sharing AISTATS 2022 Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition ACL 2022 Fast and Efficient MMD-Based Fair PCA via Optimization over Stiefel Manifold AAAI 2022 SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation EMNLP 2022 Forget-free Continual Learning with Winning Subnetworks ICML 2022 ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers ICML 2022 Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition INTERSPEECH 2022 Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks INTERSPEECH 2022 WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models INTERSPEECH 2022 Frame-Level Stutter Detection INTERSPEECH 2022 Syn2Vec: Synset Colexification Graphs for Lexical Semantic Similarity NAACL 2022 Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding INTERSPEECH 2021 Classification of COVID-19 from Cough Using Autoregressive Predictive Coding Pretraining and Spectral Data Augmentation INTERSPEECH 2021 Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering NAACL 2021 Global Prosody Style Transfer Without Text Transcriptions ICML 2021 Interpretable Visual Reasoning via Induced Symbolic Space ICCV 2021 Context-Aware Automatic Text Simplification of Health Materials in Low-Resource Domains EMNLP 2020 Unsupervised Speech Decomposition via Triple Information Bottleneck ICML 2020 Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous? INTERSPEECH 2020 Automatic Estimation of Intelligibility Measure for Consonants in Speech INTERSPEECH 2020 A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions INTERSPEECH 2020 Deep F-Measure Maximization for End-to-End Speech Understanding INTERSPEECH 2020 Evaluating Automatically Generated Phoneme Captions for Images INTERSPEECH 2020 Identify Speakers in Cocktail Parties with End-to-End Attention INTERSPEECH 2020 That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages INTERSPEECH 2020 AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss ICML 2019 Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks INTERSPEECH 2018 Speaker Adaptive Audio-Visual Fusion for the Open-Vocabulary Section of AVICAR INTERSPEECH 2018 Improving DNNs Trained with Non-Native Transcriptions Using Knowledge Distillation and Target Interpolation INTERSPEECH 2018 Infant Emotional Outbursts Detection in Infant-parent Spoken Interactions INTERSPEECH 2018 Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning INTERSPEECH 2018 Visualizing Phoneme Category Adaptation in Deep Neural Networks INTERSPEECH 2018 Speech Enhancement Using Bayesian Wavenet INTERSPEECH 2017 Deep Auto-Encoder Based Multi-Task Learning Using Probabilistic Transcriptions INTERSPEECH 2017 Glottal Model Based Speech Beamforming for ad-hoc Microphone Arrays INTERSPEECH 2017 Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering Approach INTERSPEECH 2017 Semantic Image Inpainting With Deep Generative Models CVPR 2017 Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition INTERSPEECH 2017 Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AED INTERSPEECH 2017 Team ELISA System for DARPA LORELEI Speech Evaluation 2016 INTERSPEECH 2017 An Investigation on Training Deep Neural Networks Using Probabilistic Transcriptions INTERSPEECH 2016 Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages INTERSPEECH 2016 Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka INTERSPEECH 2016 A PAC-Bayesian Approach to Minimum Perplexity Language Modeling COLING 2014 Detection of Acoustic-Phonetic Landmarks in Mismatched Conditions using a Biomimetic Model of Human Auditory Processing COLING 2012