speech recognition

1223 papers

Explore in graph

Also known as

STT WER HSR SRS ASR SR

Co-occurring keywords

automatic speech recognition (1764) word error rate (406) acoustic model (277) speech translation (413) multimodal learning (4622) language model (4573) self-supervised learning (3751) machine translation (2472) deep neural network (1801) neural network (6616)

Papers

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training NAACL 2024

Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech INTERSPEECH 2024

Evaluating the Santa Barbara Corpus: Challenges of the Breadth of Conversational Spoken Language INTERSPEECH 2024

A Dataset and Two-pass System for Reading Miscue Detection INTERSPEECH 2024

Transfer Learning from Whisper for Microscopic Intelligibility Prediction INTERSPEECH 2024

Efficient Sample-Specific Encoder Perturbations NAACL 2024

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification ACL 2024

SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR INTERSPEECH 2024

Fine-Tuning ASR models for Very Low-Resource Languages: A Study on Mvskoke ACL 2024

Cross-Modality Diffusion Modeling and Sampling for Speech Recognition INTERSPEECH 2024

Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models NAACL 2024

Leveraging Local Variance for Pseudo-Label Selection in Semi-supervised Learning AAAI 2024

MSNER: A Multilingual Speech Dataset for Named Entity Recognition COLING 2024

Wav2Gloss: Generating Interlinear Glossed Text from Speech ACL 2024

Regeneration Learning: A Learning Paradigm for Data Generation AAAI 2024

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus INTERSPEECH 2024

SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding INTERSPEECH 2024

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language CVPR 2024

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech INTERSPEECH 2024

Beam-search SIEVE for low-memory speech recognition INTERSPEECH 2024

Speech Recognition Models are Strong Lip-readers INTERSPEECH 2024

Speech-based Slot Filling using Large Language Models ACL 2024

A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition INTERSPEECH 2024

Quantifying Unintended Memorization in BEST-RQ ASR Encoders INTERSPEECH 2024

Auto311: A Confidence-Guided Automated System for Non-emergency Calls AAAI 2024