FlowSynth: Simplifying Complex Audio Generation Through Explorable Latent Spaces with Normalizing Flows

Philippe Esling; Naotake Masuda; Axel Chemla--Romeu-Santos

2020 IJCAI IJCAI 2020

FlowSynth: Simplifying Complex Audio Generation Through Explorable Latent Spaces with Normalizing Flows

Abstract

Audio synthesizers are pervasive in modern music production. These highly complex audio generation functions provide a unique diversity through their large sets of parameters. However, this feature also can make them extremely hard and obfuscated to use, especially for non-expert users with no formal knowledge on signal processing. We recently introduced a novel formalization of the problem of synthesizer control as learning an invertible mapping between an audio latent space, extracted from the audio signal, and a target parameter latent space, extracted from the synthesizer's presets, using normalizing flows. In addition to model a continuous representation allowing to ease the intuitive exploration of the synthesizer, it also provides a ground-breaking method for audio-based parameter inference, vocal control and macro-control learning. Here, we discuss the details of integrating these high-level features to develop new interaction schemes between a human user and the generating device: parameters inference from audio, high-level preset visualization and interpolation, that can be used both in off-time and real-time situations. Moreover, we also leverage LeapMotion devices to allow the control of hundreds of parameters simply by moving one hand across space to explore the low-dimensional latent space, allowing to both empower and facilitate the user's interaction with the synthesizer.

🐣 Hot Topic Early Bird - signal processing

🐝 Cross-Pollinator - Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio