Boris Ginsburg
48 papers · 2016–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π§ Keyword Pioneer π Conference Polyglot (7) π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (12) π Academic Marathon (9)
π
Academic Marathon
(9)
π
Cross-Pollinator
(14)
π
Renaissance Researcher
(5)
π
Conference Loyalist
(26)
π§¬
Topic Evolution
π€
Dynamic Duo
(14)
π
Keyword Champion
(2)
π¬
Deep Specialist
(13)
ποΈ
Keyword Collector
(152)
π
Century Club
(45)
π
Trend Setter
π₯
Unstoppable
(8)
β‘
Prolific Year
(11)
π
Conference Pioneer
Conferences
INTERSPEECH (26)
ACL (7)
EMNLP (4)
ICLR (4)
ICML (4)
NAACL (2)
NIPS (1)
Top co-authors
Research topics
Keywords
automatic speech recognition
(10)
large language model
(7)
speech recognition
(6)
end-to-end model
(5)
weighted finite-state transducer
(4)
speech translation
(4)
speech synthesis
(3)
language model
(3)
inverse text normalization
(3)
machine translation
(3)
multimodal learning
(3)
word error rate
(3)
convolutional neural network
(3)
parameter efficiency
(2)
transfer learning
(2)
translation quality
(2)
speech processing
(2)
synthetic data generation
(2)
speaker recognition
(2)
connectionist temporal classification
(2)
Papers
Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech
ACL 2026
Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models
ACL 2026
Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
ACL 2026
NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model
ACL 2025
SWAN: An Efficient and Scalable Approach for Long-Context Language Modeling
EMNLP 2025
Extending Automatic Machine Translation Evaluation to Book-Length Documents
EMNLP 2025
Nvidia-Nemoβs WMT 2025 Metrics Shared Task Submission
EMNLP 2025
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
NAACL 2025
Anticipating Future with Large Language Model for Simultaneous Machine Translation
NAACL 2025
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
ICLR 2025
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
ICLR 2025
Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems
ICML 2025
Star Attention: Efficient LLM Inference over Long Sequences
ICML 2025
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
ACL 2025
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
INTERSPEECH 2024
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
ICML 2024
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
INTERSPEECH 2024
SchrΓΆdinger Bridge for Generative Speech Enhancement
INTERSPEECH 2024
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
INTERSPEECH 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
INTERSPEECH 2024
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data
INTERSPEECH 2024
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
INTERSPEECH 2024
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
ICLR 2023
Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer
EMNLP 2023
Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers
INTERSPEECH 2023
NeMo Forced Aligner and its application to word alignment for subtitle generation
INTERSPEECH 2023
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification
INTERSPEECH 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
ICML 2023
NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2023
ACL 2023
SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings
INTERSPEECH 2023
Confidence-based Ensembles of End-to-End Speech Recognition Models
INTERSPEECH 2023
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
INTERSPEECH 2023
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
INTERSPEECH 2023
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization
INTERSPEECH 2022
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
INTERSPEECH 2022
NeMo Open Source Speaker Diarization System
INTERSPEECH 2022
CTC Variations Through New WFST Topologies
INTERSPEECH 2022
Multi-scale Speaker Diarization with Dynamic Scale Weighting
INTERSPEECH 2022
TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis
INTERSPEECH 2021
NeMo (Inverse) Text Normalization: From Development to Production
INTERSPEECH 2021
SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition
INTERSPEECH 2021
Hi-Fi Multi-Speaker English TTS Dataset
INTERSPEECH 2021
NeMo Inverse Text Normalization: From Development to Production
INTERSPEECH 2021
MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition
INTERSPEECH 2020
Jasper: An End-to-End Convolutional Neural Acoustic Model
INTERSPEECH 2019
Mixed Precision Training
ICLR 2018
OpenSeq2Seq: Extensible Toolkit for Distributed and Mixed Precision Training of Sequence-to-Sequence Models
ACL 2018
SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques
NIPS 2016