Zum Hauptinhalt

Self-organizing maps (SOMs) as an AI tool for music analysis and production

Dr. Simon Linke

4. Application

4.5. Sonification

The last example of the application of SOMs turns things upside down. In the previous sections SOMs were used to analyze music and sound; now, sound is used to analyze SOMs. In [to do] section 2.4 some problems with the u-matrix were discussed. A u-matrix is used to present an overview of the entire SOM, even when high-dimensional feature spaces are investigated. As a result a lot of detail gets lost, as it is hardly possible to visualize more than three dimensions at once.

Nevertheless, in music, large numbers of parameters are perceived at the same time and, depending on one's musical training and experience, can be directly analyzed. Thus, every feature of a high-dimensional feature space can be assigned to specific psychoacoustic parameters. The psychoacoustic parameters should be as different as possible to distinguish the features clearly. This approach can be experienced in the provided [to do] interactive online demonstration.

In this example the first psychoacoustic parameter is chroma, which is similar to pitch. Changing chroma changes the fundamental frequency without affecting the perceived brightness. As a second parameter, roughness has been implemented similarly to the [to do] demo. above. By changing the prominence of higher frequencies, the sharpness of the sound can be varied. As the last parameter, a periodical loudness fluctuation has been implemented. All four parameters are applied to different features used to train the SOM. Furthermore, by moving the assigned sliders in the interactive demo, the parameters can be changed individually to get an impression of how they sound. This is especially important when using the tool for the first time; the approach can be learned and one can adapt to it.

The SOM was trained to sort different genres of techno music based on features derived from music production tools. Several black regions can be distinguished and it is known that the nodes inside these black regions are very similar. By relying solely on the u-matrix, however, it is impossible to judge similarities and differences between black areas. This can be done here, however, by simply clicking on the map and moving the mouse to different regions, which changes the sound. While the sound changes only slightly, e.g., when moving between the green dots, it changes drastically when moving away to the violet dots. The sound changes further when moving towards the blue dots at the top.

In this example the component planes may be analyzed individually, even though it is no longer necessary to do so, as the entire Kohonen map can be explored using one's ears instead of one's eyes. Nevertheless, when one is not used to the tool it can be confusing to listen to the changes of all four parameters, simultaneously, while moving the mouse along the map. To this end, one can activate the '1-D' button while looking at the component planes. Only changes related to a selected parameter will affect the sound. This is, of course, a disadvantage, but it may be helpful in learning to judge the output of the tool when one is new to it.

The [to do] demo tool also provides an extended version in which a seven-dimensional feature space can be explored. There are no differences to the previous four-dimensional version other than it being slightly more confusing and complicated. Therefore, a modulation matrix is added. Each parameter can be removed to reduce complexity when the user is new to this topic or if not all parameters are equally important. Furthermore, some combinations of Features and psychoacoustic parameters are perceived to be more intuitive, e.g., the tempo of the training data matches well with the beating of the loudness fluctuation. Thus, the tool can be customized using the modulation matrix to allow the most intuitive user experience possible.

The focus of this study was to produce sounds which were as differentiated as they could be. The results are easy to distinguish but also sound rather annoying when using the tool for a long time. It might be a valuable solution to map the features of the SOM to more pleasing, musical parameters, even though they might be harder to distinguish.