Self-organizing maps (SOMs) as an AI tool for music analysis and production: How to describe music to a computer

3. How to describe music to a computer

Finding suitable parameters is straightforward when using colors, but how can this be transferred to music?

Sometimes music is described as sheet music. Setting aside the fact that sheet music may lack some important parameters, like the actual sound of the instruments or their articulations, this approach however, is usually limited to Western classical music and cannot be transferred to modern, electronic pop music or most of the world's folk music.

Another approach could be to use the raw waveforms resulting from digital recordings of music as they occur, e.g., on a CD or as seen in modern recording software. Even though this data is very precise, it is usually a large amount (typically 44,100 to 96,000 values per second), which massively increases the computational costs when computing a SOM. But, more importantly, these raw digital waveforms lack musical meaning. Even though a SOM relying on them might sort music successfully, it is impossible to know the reasons for its decisions. Even the enormous number of possible component planes would hardly reveal any helpful information.

A much better solution is to use psychoacoustic parameters. They are directly linked to musical perception and, as such, many aspects of music and its perception can be described with a limited number of these parameters. Following are a few examples of parameters that describe musical timbre, explained in detail. There are many more, however, and a wise decision on which of them are used to train a SOM must be made according to the research question.