Zum Hauptinhalt

Organized Sound Spaces with Machine Learning

Dr. Kıvanç Tatar

2. Latent Spaces

Latent Spaces in Machine Learning

But what is a latent space? Let's have an example of that. For example, let's think about a latent space of colours, and let's define a colour as three values of RGB: red, green and blue. Let's think about a way of organizing those colours on a 2D surface, and let's see an example in which a machine learning algorithm generates a latent space of colors:

The machine learning algorithm in the video above is called self organizing maps, which has a predefined number of nodes that moves itself so that it takes the shape of the latent space. And in our case, it takes the shape of the colour space towards the end of the training. We can see a variety of colours and how they relate to each other, how they are similar or dissimilar to each other.

Latent Audio Spaces

Now that we have a musical perspective and background to cover latent audio spaces, we will now dive into the main topic of this lecture. We will be looking into two types of approaches to latent audio spaces: discrete approaches and continuous approaches.

edu sharing object

Fig.: Time scaes of music by Curtis Roads (2004).

What is a discrete approach and what is a continuous approach then? Mathematically, both categories that we mention in this section are discrete approaches. However, the continuous approach is in the time scales of micro scale, whereas the discrete approach is working in the time scales of sound object and mesoscale. In the sense that, in discrete latent spaces, we are organizing audio samples that are either fraction of a second to a couple of seconds. In an abstract space, we can think those as sound objects, such as short sound gestures etc. In the continuous latent spaces, we are working in the micro scale. We are working with audio windows that are relatively short, around a few milliseconds to 50 milliseconds. Thus, by putting audio one audio window after another, we treat an audio signal as a time series data where each data point is one audio window. The continuous latent space audio space consists of an organization of audio windows, where each data point represent one audio window.