Sound Gesture Intelligence: Direct mapping between voice and gesture

Sound Gesture Intelligence

Dr. Greg Beller

Direct mapping between voice and gesture

During my thesis on generative models of expressivity and their applications for speech and music, an artificial intelligence algorithm based on a corpus of expressive sentences allowed me to generate an “emotional” speech by modulating the prosody of a “neutral” utterance (Beller 2009a, Beller 2009b, Beller 2010). Several times during the development of this vocal emotion synthesizer I felt the desire to control prosody by gesture. After all, gesture seems to naturally accompany speech, so why not the other way around?

At the same time, another research team at IRCAM was developing one of the first instrumental gesture sensors allowing the measurement of the dynamics of a bow by integrating small accelerometers and gyroscopes (6 Dof, degrees of freedom). The data related to movement was transmitted in real time by WiFi to a computer which triggered sounds and modulated effects according to the dynamics of the gesture (Bevilacqua 2006).

In the rest of this unit, we will show which sensors to use to obtain hand dynamics data, and how to transform this data into sound triggering parameters to create an aerial percussion instrument.