2019 International Conference on Multimodal Interaction | 2019
Determining Iconic Gesture Forms based on Entity Image Representation
Abstract
Iconic gestures are used to depict physical objects mentioned in speech, and the gesture form is assumed to be based on the image of a given object in the speaker’s mind. Using this idea, this study proposes a model that learns iconic gesture forms from an image representation obtained from pictures of physical entities. First, we collect a set of pictures of each entity from the web, and create an average image representation from them. Subsequently, the average image representation is fed to a fully connected neural network to decide the gesture form. In the model evaluation experiment, our two-step gesture form selection method can classify seven types of gesture forms with over 62% accuracy. Furthermore, we demonstrate an example of gesture generation in a virtual agent system in which our model is used to create a gesture dictionary that assigns a gesture form for each entry word in the dictionary.