speech recognition

1223 papers

Explore in graph

Also known as

STT WER HSR SRS ASR SR

Co-occurring keywords

automatic speech recognition (1764) word error rate (406) acoustic model (277) speech translation (413) multimodal learning (4622) language model (4573) self-supervised learning (3751) machine translation (2472) deep neural network (1801) neural network (6616)

Papers

VoxRAG: A Step Toward Transcription-Free RAG Systems in Spoken Question Answering ACL 2025

MetaMixSpeech: Meta Task Augmentation for Low-Resource Speech Recognition EMNLP 2025

CourtNav: Voice-Guided, Anchor-Accurate Navigation of Long Legal Documents in Courtrooms EMNLP 2025

FSboard: Over 3 Million Characters of ASL Fingerspelling Collected via Smartphones CVPR 2025

Seeing isn’t Hearing: Benchmarking Vision Language Models at Interpreting Spectrograms AACL 2025

VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation COLING 2025

AppTek’s Automatic Speech Translation: Generating Accurate and Well-Readable Subtitles ACL 2025

Privacy Preserving Data Selection for Bias Mitigation in Speech Models ACL 2025

KIT’s Offline Speech Translation and Instruction Following Submission for IWSLT 2025 ACL 2025

Can LLMs Understand Unvoiced Speech? Exploring EMG-to-Text Conversion with LLMs ACL 2025

InTriage: Intelligent Telephone Triage in Pre-Hospital Emergency Care EMNLP 2025

Code-switching Mediated Sentence-level Semantic Learning AAAI 2025

The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence ACL 2025

IWSLT 2025 Indic Track System Description Paper: Speech-to-Text Translation from Low-Resource Indian Languages (Bengali and Tamil) to English ACL 2025

SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling EMNLP 2025

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models ACL 2025

Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks ACL 2025

Summarizing Speech: A Comprehensive Survey EMNLP 2025

BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation ACL 2025

Harnessing Whisper for Prosodic Stress Analysis ACL 2025

HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation ACL 2025

Quantum-Infused Whisper: A Framework for Replacing Classical Components IJCNLP 2025

StuD: A Multimodal Approach for Stuttering Detection with RAG and Fusion Strategies IJCNLP 2025

Indonesian Speech Content De-Identification in Low Resource Transcripts COLING 2025

YodiV3: NLP for Togolese Languages with Eyaa-Tom Dataset and the Lom Metric ACL 2025