Boris Ginsburg

48 papers · 2016–2026 · 7 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (12) 🏃 Academic Marathon (9)

🏃 Academic Marathon (9) 🐝 Cross-Pollinator (14) 🌈 Renaissance Researcher (5) 🏠 Conference Loyalist (26) 🧬 Topic Evolution 🤝 Dynamic Duo (14) 🏆 Keyword Champion (2) 🔬 Deep Specialist (13) 🗃️ Keyword Collector (152) 💎 Century Club (45) 📈 Trend Setter 🔥 Unstoppable (8) ⚡ Prolific Year (11) 🚀 Conference Pioneer

Conferences

INTERSPEECH (26) ACL (7) EMNLP (4) ICLR (4) ICML (4) NAACL (2) NIPS (1)

Top co-authors

Jagadeesh Balam (14) Vitaly Lavrukhin (13) Somshubra Majumdar (9) Evelina Bakhturina (7) Zhehuai Chen (7) Kunal Dhawan (6) Nithin Rao Koluguri (6) He Huang (6) Oleksii Hrinchuk (5) Fei Jia (5)

Research topics

Natural Language Processing (1)

Keywords

automatic speech recognition (10) large language model (7) speech recognition (6) end-to-end model (5) weighted finite-state transducer (4) speech translation (4) speech synthesis (3) language model (3) inverse text normalization (3) machine translation (3) multimodal learning (3) word error rate (3) convolutional neural network (3) parameter efficiency (2) transfer learning (2) translation quality (2) speech processing (2) synthetic data generation (2) speaker recognition (2) connectionist temporal classification (2)

Papers

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech ACL 2026 Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models ACL 2026 Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception ACL 2026 NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model ACL 2025 SWAN: An Efficient and Scalable Approach for Long-Context Language Modeling EMNLP 2025 Extending Automatic Machine Translation Evaluation to Book-Length Documents EMNLP 2025 Nvidia-Nemo’s WMT 2025 Metrics Shared Task Submission EMNLP 2025 VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning NAACL 2025 Anticipating Future with Large Language Model for Simultaneous Machine Translation NAACL 2025 nGPT: Normalized Transformer with Representation Learning on the Hypersphere ICLR 2025 HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR ICLR 2025 Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems ICML 2025 Star Attention: Efficient LLM Inference over Long Sequences ICML 2025 Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models ACL 2025 Instruction Data Generation and Unsupervised Adaptation for Speech Language Models INTERSPEECH 2024 SelfVC: Voice Conversion With Iterative Refinement using Self Transformations ICML 2024 Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter INTERSPEECH 2024 Schrödinger Bridge for Generative Speech Enhancement INTERSPEECH 2024 Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations INTERSPEECH 2024 Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment INTERSPEECH 2024 Less is More: Accurate Speech Recognition & Translation without Web-Scale Data INTERSPEECH 2024 DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment INTERSPEECH 2024 BigVGAN: A Universal Neural Vocoder with Large-Scale Training ICLR 2023 Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer EMNLP 2023 Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers INTERSPEECH 2023 NeMo Forced Aligner and its application to word alignment for subtitle generation INTERSPEECH 2023 A Compact End-to-End Model with Local and Global Context for Spoken Language Identification INTERSPEECH 2023 Efficient Sequence Transduction by Jointly Predicting Tokens and Durations ICML 2023 NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2023 ACL 2023 SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings INTERSPEECH 2023 Confidence-based Ensembles of End-to-End Speech Recognition Models INTERSPEECH 2023 Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator INTERSPEECH 2023 Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling INTERSPEECH 2023 Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization INTERSPEECH 2022 Thutmose Tagger: Single-pass neural model for Inverse Text Normalization INTERSPEECH 2022 NeMo Open Source Speaker Diarization System INTERSPEECH 2022 CTC Variations Through New WFST Topologies INTERSPEECH 2022 Multi-scale Speaker Diarization with Dynamic Scale Weighting INTERSPEECH 2022 TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis INTERSPEECH 2021 NeMo (Inverse) Text Normalization: From Development to Production INTERSPEECH 2021 SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition INTERSPEECH 2021 Hi-Fi Multi-Speaker English TTS Dataset INTERSPEECH 2021 NeMo Inverse Text Normalization: From Development to Production INTERSPEECH 2021 MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition INTERSPEECH 2020 Jasper: An End-to-End Convolutional Neural Acoustic Model INTERSPEECH 2019 Mixed Precision Training ICLR 2018 OpenSeq2Seq: Extensible Toolkit for Distributed and Mixed Precision Training of Sequence-to-Sequence Models ACL 2018 SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques NIPS 2016