Samuel Thomas

30 papers · 2013–2025 · 4 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (4)

🌍 Conference Polyglot (4) 🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (13) 🏠 Conference Loyalist (24) 🤝 Dynamic Duo (13) 🧬 Topic Evolution 🔬 Deep Specialist (10) 📈 Trend Setter 🚀 Conference Pioneer 🔥 Unstoppable (11) ⚡ Prolific Year (6) ❓ The Questioner 🗃️ Keyword Collector (139) 💎 Century Club (30)

Conferences

INTERSPEECH (24) CVPR (3) IJCAI (2) ICCV (1)

Top co-authors

Brian Kingsbury (13) Andrew Rouditchenko (8) Hilde Kuehne (8) Rogerio Feris (7) James Glass (7) Kartik Audhkhasi (6) Michael Picheny (6) Gakuto Kurata (6) Brian Chen (5) David Harwath (5)

Keywords

self-supervised learning (6) speech recognition (6) neural network (4) spoken language understanding (4) video retrieval (4) automatic speech recognition (4) cross-lingual transfer (3) convolutional neural network (3) domain adaptation (3) zero-shot retrieval (3) multimodal learning (3) long short-term memory (3) acoustic model (3) knowledge distillation (3) contrastive learning (3) word error rate (3) low-resource language (2) acoustic modeling (2) model compression (2) zero-shot learning (2)

Papers

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment CVPR 2025 What When and Where? Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions CVPR 2024 Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation INTERSPEECH 2024 ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding INTERSPEECH 2023 Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages INTERSPEECH 2023 Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval CVPR 2022 Extending RNN-T-based speech recognition systems with emotion and language classification INTERSPEECH 2022 Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems INTERSPEECH 2022 Global RNN Transducer Models For Multi-dialect Speech Recognition INTERSPEECH 2022 Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs INTERSPEECH 2021 Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos ICCV 2021 Integrating Dialog History into End-to-End Spoken Language Understanding Systems INTERSPEECH 2021 AVLnet: Learning Audio-Visual Language Representations from Instructional Videos INTERSPEECH 2021 Cascaded Multilingual Audio-Visual Learning from Videos INTERSPEECH 2021 Knowledge Distillation Based Training of Universal ASR Source Models for Cross-Lingual Transfer INTERSPEECH 2021 Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework INTERSPEECH 2020 End-to-End Spoken Language Understanding Without Full Transcripts INTERSPEECH 2020 Resource-Adaptive Deep Learning for Visual Speech Recognition INTERSPEECH 2020 Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings INTERSPEECH 2020 Detection and Recovery of OOVs for Improved English Broadcast News Captioning INTERSPEECH 2019 Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks INTERSPEECH 2019 Data Augmentation Improves Recognition of Foreign Accented Speech INTERSPEECH 2018 Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models INTERSPEECH 2018 English Conversational Telephone Speech Recognition by Humans and Machines INTERSPEECH 2017 Efficient Knowledge Distillation from an Ensemble of Teachers INTERSPEECH 2017 Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings INTERSPEECH 2016 An Investigation on the Use of i-Vectors for Robust ASR INTERSPEECH 2016 Multilingual Data Selection for Low Resource Speech Recognition INTERSPEECH 2016 Compiling Constraint Networks into Multivalued Decomposable Decision Graphs IJCAI 2015 Knowledge Compilation for Model Counting: Affine Decision Trees IJCAI 2013