Yossi Adi
53 papers · 2016–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (20) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π Conference Polyglot (11)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(20)
π£
Hot Topic Early Bird
π
Conference Loyalist
(20)
π€
Dynamic Duo
(12)
π
Triple Crown
π
Keyword Champion
(2)
π
Grand Slam
π¬
Deep Specialist
(14)
π§¬
Topic Evolution
π₯
Unstoppable
(6)
π
Conference Pioneer
β‘
Prolific Year
(5)
π
Trend Setter
π
Century Club
(52)
ποΈ
Keyword Collector
(52)
Conferences
INTERSPEECH (20)
NIPS (8)
EMNLP (6)
ACL (5)
ICLR (3)
AAAI (2)
CVPR (2)
ICML (2)
JMLR (2)
NAACL (2)
ICCV (1)
Top co-authors
Research topics
Keywords
self-supervised learning
(9)
speech synthesis
(6)
discrete representation
(5)
unsupervised learning
(5)
speech-to-speech translation
(4)
speech generation
(4)
neural network
(4)
speech language model
(4)
language model
(3)
automatic speech recognition
(3)
speaker identity
(3)
speech recognition
(3)
data augmentation
(3)
speech resynthesis
(3)
recurrent neural network
(2)
convolutional neural network
(2)
video generation
(2)
temporal alignment
(2)
diffusion model
(2)
adversarial example
(2)
Papers
LaMI: Augmenting Large Language Models via Late Multi-Image Fusion
ACL 2026
GmSLM : Generative Marmoset Spoken Language Modeling
EMNLP 2025
Slamming: Training a Speech Language Model on One GPU in a Day
ACL 2025
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
CVPR 2025
CAFA: a Controllable Automatic Foley Artist
ICCV 2025
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
AAAI 2024
Masked Audio Generation using a Single Non-Autoregressive Transformer
ICLR 2024
Layer Collaboration in the Forward-Forward Algorithm
AAAI 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
INTERSPEECH 2024
Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline
INTERSPEECH 2024
A Language Modeling Approach to Diacritic-Free Hebrew TTS
INTERSPEECH 2024
HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing
INTERSPEECH 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
INTERSPEECH 2024
Scaling Speech Technology to 1,000+ Languages
JMLR 2024
Discrete Flow Matching
NIPS 2024
Transformers are Multi-State RNNs
EMNLP 2024
An Independence-promoting Loss for Music Generation with Language Models
ICML 2024
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling
ACL 2023
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
NIPS 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
NIPS 2023
Simple and Controllable Music Generation
NIPS 2023
Textually Pretrained Speech Language Models
NIPS 2023
ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration
CVPR 2023
Generative Spoken Language Model based on continuous word-sized audio tokens
EMNLP 2023
Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units
EMNLP 2023
AudioGen: Textually Guided Audio Generation
ICLR 2023
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
INTERSPEECH 2023
Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
INTERSPEECH 2023
Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies
ICLR 2022
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement
INTERSPEECH 2022
Probing phoneme, language and speaker information in unsupervised speech representations
INTERSPEECH 2022
Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors
INTERSPEECH 2022
Deep Audio Waveform Prior
INTERSPEECH 2022
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
INTERSPEECH 2022
textless-lib: a Library for Textless Spoken Language Processing
NAACL 2022
Textless Speech Emotion Conversion using Discrete & Decomposed Representations
EMNLP 2022
Text-Free Prosody-Aware Generative Spoken Language Modeling
ACL 2022
Direct Speech-to-Speech Translation With Discrete Units
ACL 2022
On the Importance of Gradient Norm in PAC-Bayesian Bounds
NIPS 2022
Textless Speech-to-Speech Translation on Real Data
NAACL 2022
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
INTERSPEECH 2021
fairseq SΛ2: A Scalable and Integrable Speech Synthesis Toolkit
EMNLP 2021
Voice Separation with an Unknown Number of Multiple Speakers
ICML 2020
Unsupervised Cross-Domain Singing Voice Conversion
INTERSPEECH 2020
Real Time Speech Enhancement in the Waveform Domain
INTERSPEECH 2020
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation
INTERSPEECH 2020
Hide and Speak: Towards Deep Neural Networks for Speech Steganography
INTERSPEECH 2020
Out-of-Distribution Detection using Multiple Semantic Label Representations
NIPS 2018
Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples
NIPS 2017
Automatic Measurement of Pre-Aspiration
INTERSPEECH 2017
Learning Similarity Functions for Pronunciation Variations
INTERSPEECH 2017
StructED: Risk Minimization in Structured Prediction
JMLR 2016
Automatic Measurement of Voice Onset Time and Prevoicing Using Recurrent Neural Networks
INTERSPEECH 2016