Organized Sound Spaces with Machine Learning: 2.2 Continuous Latent Audio Spaces

2. Latent Spaces

2.2 Continuous Latent Audio Spaces

Continuous Latent Audio Spaces

Until now we have covered two examples of discrete latent audio spaces. Now, let's have a dive into the continuous latent audio spaces. As we covered earlier in the time scales of music, while the continuous latent audio spaces are not mathematically continuous, the discrete elements that we have are so perceptually short that we can approach them as if they are iknfinitely small or they are audio quanta.

edu sharing object

Fig. The spectrogram of a frequency sweep from 220 Hz to 1000 Hz or so.

In the example above, we have a frequency sweep that is starting from 220 Hertz going up to 1000 Hertz or so. If we look at the spectrogram of that sweep in the image above, we can see that it looks almost as if it is a part. The question here is, how can we come up with a more complex way in which we have a 2D representation of an audio space? And in that 2D representation, how can we represent an audio file almost like a path? This is an interesting research question because there are quite interesting musical applications that we could explore using such approach.

edu sharing object

Fig. Excerpt from the latent space of Latent Timbre Synthesis.

For example, here in the image above, we see three colours: red, green, and blue. And in those three colours, we see some small circles. The red coloured path and the blue coloured path are recordings from an audio dataset. Each circle in those paths represent one audio window. Those paths are interesting because this is actually an emerging property of a machine learning approach that I will be talking about later. The latent space in the image above is not a mathematically continuous space, it is still a grid. However, because each audio quanta or audio window is quite small, we could almost approach each audio sample as if it is a continuously changing representation.