Skip to main content

AI Composition: New, artistic workflows in dealing with generative audio tools

Genoël von Lilienstern

I. AI tools: a dynamic field

I. AI Tools: A Dynamic Field

1.1 Who is this text aimed at?

This presentation is aimed at anyone from the fields of music, composition and sound art who is interested in experimenting with digitised sound.

It is also aimed at anyone who has questions about an artistic approach to AI and would like to know more, e.g.

  • How exactly do people make music and sound art with AI?
  • What are new techniques and compositional concepts when working with AI?
  • What new sound experiences or even new musical styles could result from the use of AI audio tools?

1.2 Working with tools which are under development

We are operating in an extremely dynamic, rapidly changing field. People are often amazed at what is already possible with AI today. There are also voices who fear the imminent takeover by AI of most areas of our lives. It must be clearly stated, however, that many technologies and approaches are still in their infancy and that what is announced in the title—new artistic workflows—is only just beginning to emerge. It's more a case of gaining experience, especially in the artistic and creative fields, because most music-related AI tools are still in a state of development. Nevertheless, the aim here will also be to discuss actual applications, namely apps, code, repositories and even finished pieces of music.

At the same time, however, it is often appropriate to say, "You could imagine working with this application in such and such a way," or, "You have to try that out first."

If you want to work with AI on a more fundamental level, it's part of the process and also part of the special appeal of always thinking about improving and expanding the artistic-technological tools at the same time. You learn something in the process, both about technology and about music and artistic processes.


1.3 After the AI tsunami

This text was written at the beginning of 2024, which may be helpful to mention due to the short half-life of specific AI technologies. We are currently at the provisional end of a period that many have labelled "the AI tsunami." The AI tsunami covers the period from early/mid-2022 to mid-2023 and is characterised by two events in particular: in 2022, the emergence of generative image models such as Dall-E, Bing and, in particular, Mid-Journey; and, from the beginning of 2023, the publication of the large text model ChatGPT.

After the hot topic of machine learning and artificial intelligence had already been simmering quite constantly on the side, but remained rather abstract for many, from this point onwards many people really began to gain concrete experience in dealing with AI models themselves.

Although it was abstract to many, the hot topic of machine learning and artificial intelligence had been simmering on the side for quite some time. From this point onward, however, people began to gain concrete experience with AI models themselves.

The metaphor of a tsunami-like wave is quite apt as the development of AI has been taking place in waves for several decades. Between moments characterised by particularly outstanding events, such as the introduction of the first chess computer, there were times when less progress was made; these down periods are known as AI winters. It must be added, however, that never before has there been anything like the present AI tsunami in the field of artificial intelligence.


1.4 AI versus AIs

It should be clear that there is no such thing as an AI, but rather many areas of application with varying degrees of development for a wide range of tasks. In everyday dialogue there still seems to be an occasional perception of a uniform, overarching AI superpower. This is, of course, a utopian or dystopian fantasy.

Incidentally, the distinctions between artificial intelligence, machine learning, deep learning and neural networks will not be discussed in detail here. There are certainly many good articles on this, including on the MUTOR platform.


1.5 A focus on the dynamics of creative processes

This text is not so much about how the technology works, nor is it about discursive issues such as copyright. It is intended to provide an overview of technological applications and the associated artistic and technical processes.

It may well happen that one becomes interested in neural networks, attends a workshop on the subject, learns about the various possibilities of programming networks and ends up being able to create a functioning model. After all that, one may still not know how and whether one can do anything useful with their network.

Or, in another case, you find your way to a lecture on the sensitive social aspects of AI music. The pros and cons are highlighted, but you don't really learn anything about the artistic potential and detailed possibilities of AI, which are perhaps worth taking seriously.

The aim here is to shed some light on this little-discussed area of concrete work and the associated, sometimes novel and unfamiliar, steps when creating an AI music or sound composition.