Interactive Machine Learning for Music
3. How can we support instrument designers in using ML in practice?
3.1.3 How should we improve or change a model?
As mentioned above, conventionally, a lot of the human work of applied machine learning is involved in finding a good modeling approach for a given dataset. This can involve trying different learning algorithms, for instance swapping a neural network for a random forest (Breiman 2001), or changing the configuration or type of neurons in a neural network. This can also involve computing different representations, or features, of the data to be used as inputs into the model. For instance, when building models for musical audio analysis, it is common practice to use spectral representations rather than audio waveforms as inputs to the models, because these can result in better accuracy.
Often, such work proceeds under the assumption that the training dataset is sufficient and thorough—that it is a good enough representation of the modeling task to be undertaken, and so effort focuses on the engineering and experimentation work of trying to find the best approach to modeling the patterns in that dataset.
When building an instrument, however, the barrier to providing additional training examples may be very low—particularly if the designer herself is the one demonstrating examples. When a designer tries out a model and discovers that it makes mistakes on particular variants of a hand gesture that is crucial to an intended performance—perhaps it makes mistakes when her wrist is a bit rotated, or when her hand is less well lit—it can be straightforward to remedy those model mistakes by simply providing additional training examples which illustrate the variants causing the mistakes. The learning algorithm can then build a new model from this augmented training set, and there is some reasonable hope that the new model will not make those same mistakes.
Further, it is notable that model mistakes are far from the only reason an instrument designer may wish to change a model. I may build a model, play with it, and wish to make it more complicated by adding capacity to respond to new movements. Or I may wish to change the sound associated with one of my movements to see if it feels more musically satisfying. Or I may wish to try adding a new sensor to my instrument to see if I can more accurately capture aspects of my movement. In all these cases, the appropriate starting point for modifying the model is to modify the training dataset.
If we also keep in mind that an instrument designer rarely knows exactly the form they want the instrument and mapping to take before they begin, but rather discovers these over time in a process of experimentation with the evolving instrument, the requirement to be able to dynamically retrain a model using many variations of a training set becomes even clearer.