Yossi Adi

53 papers · 2016–2026 · 11 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🗺️ Taxonomy Completionist (20) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (11)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (20) 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (20) 🤝 Dynamic Duo (12) 👑 Triple Crown 🏆 Keyword Champion (2) 🏆 Grand Slam 🔬 Deep Specialist (14) 🧬 Topic Evolution 🔥 Unstoppable (6) 🚀 Conference Pioneer ⚡ Prolific Year (5) 📈 Trend Setter 💎 Century Club (52) 🗃️ Keyword Collector (52)

Conferences

INTERSPEECH (20) NIPS (8) EMNLP (6) ACL (5) ICLR (3) AAAI (2) CVPR (2) ICML (2) JMLR (2) NAACL (2) ICCV (1)

Top co-authors

Jade Copet (12) Wei-Ning Hsu (12) Felix Kreuk (11) Gabriel Synnaeve (10) Itai Gat (10) Emmanuel Dupoux (9) Joseph Keshet (9) Adam Polyak (8) Ann Lee (7) Alexandre Defossez (7)

Research topics

Analysis (1)

Keywords

self-supervised learning (9) speech synthesis (6) discrete representation (5) unsupervised learning (5) speech-to-speech translation (4) speech generation (4) neural network (4) speech language model (4) language model (3) automatic speech recognition (3) speaker identity (3) speech recognition (3) data augmentation (3) speech resynthesis (3) recurrent neural network (2) convolutional neural network (2) video generation (2) temporal alignment (2) diffusion model (2) adversarial example (2)

Papers

LaMI: Augmenting Large Language Models via Late Multi-Image Fusion ACL 2026 GmSLM : Generative Marmoset Spoken Language Modeling EMNLP 2025 Slamming: Training a Speech Language Model on One GPU in a Day ACL 2025 Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation CVPR 2025 CAFA: a Controllable Automatic Foley Artist ICCV 2025 Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation AAAI 2024 Masked Audio Generation using a Single Non-Autoregressive Transformer ICLR 2024 Layer Collaboration in the Forward-Forward Algorithm AAAI 2024 NAST: Noise Aware Speech Tokenization for Speech Language Models INTERSPEECH 2024 Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline INTERSPEECH 2024 A Language Modeling Approach to Diacritic-Free Hebrew TTS INTERSPEECH 2024 HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing INTERSPEECH 2024 The Interspeech 2024 Challenge on Speech Processing Using Discrete Units INTERSPEECH 2024 Scaling Speech Technology to 1,000+ Languages JMLR 2024 Discrete Flow Matching NIPS 2024 Transformers are Multi-State RNNs EMNLP 2024 An Independence-promoting Loss for Music Generation with Language Models ICML 2024 Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling ACL 2023 From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion NIPS 2023 Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale NIPS 2023 Simple and Controllable Music Generation NIPS 2023 Textually Pretrained Speech Language Models NIPS 2023 ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration CVPR 2023 Generative Spoken Language Model based on continuous word-sized audio tokens EMNLP 2023 Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units EMNLP 2023 AudioGen: Textually Guided Audio Generation ICLR 2023 Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis INTERSPEECH 2023 Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation INTERSPEECH 2023 Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies ICLR 2022 A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement INTERSPEECH 2022 Probing phoneme, language and speaker information in unsupervised speech representations INTERSPEECH 2022 Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors INTERSPEECH 2022 Deep Audio Waveform Prior INTERSPEECH 2022 Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation INTERSPEECH 2022 textless-lib: a Library for Textless Spoken Language Processing NAACL 2022 Textless Speech Emotion Conversion using Discrete & Decomposed Representations EMNLP 2022 Text-Free Prosody-Aware Generative Spoken Language Modeling ACL 2022 Direct Speech-to-Speech Translation With Discrete Units ACL 2022 On the Importance of Gradient Norm in PAC-Bayesian Bounds NIPS 2022 Textless Speech-to-Speech Translation on Real Data NAACL 2022 Speech Resynthesis from Discrete Disentangled Self-Supervised Representations INTERSPEECH 2021 fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit EMNLP 2021 Voice Separation with an Unknown Number of Multiple Speakers ICML 2020 Unsupervised Cross-Domain Singing Voice Conversion INTERSPEECH 2020 Real Time Speech Enhancement in the Waveform Domain INTERSPEECH 2020 Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation INTERSPEECH 2020 Hide and Speak: Towards Deep Neural Networks for Speech Steganography INTERSPEECH 2020 Out-of-Distribution Detection using Multiple Semantic Label Representations NIPS 2018 Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples NIPS 2017 Automatic Measurement of Pre-Aspiration INTERSPEECH 2017 Learning Similarity Functions for Pronunciation Variations INTERSPEECH 2017 StructED: Risk Minimization in Structured Prediction JMLR 2016 Automatic Measurement of Voice Onset Time and Prevoicing Using Recurrent Neural Networks INTERSPEECH 2016