Yoshiko Arimoto
Tokyo University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yoshiko Arimoto.
meeting of the association for computational linguistics | 2007
Yukiko I. Nakano; Kazuyoshi Murata; Mika Enomoto; Yoshiko Arimoto; Yasuhiro Asa; Hirohiko Sagawa
The aim of this paper is to develop animated agents that can control multimodal instruction dialogues by monitoring users behaviors. First, this paper reports on our Wizard-of-Oz experiments, and then, using the collected corpus, proposes a probabilistic model of fine-grained timing dependencies among multimodal communication behaviors: speech, gestures, and mouse manipulations. A preliminary evaluation revealed that our model can predict a instructors grounding judgment and a listeners successful mouse manipulation quite accurately, suggesting that the model is useful in estimating the users understanding, and can be applied to determining the agents next action.
international conference on control, automation and systems | 2007
Kazuyoshi Murata; Mika Enomoto; Yoshiko Arimoto; Yukiko I. Nakano
In multimodal communication, verbal and nonverbal behaviors such as gestures and manipulating objects in a workspace occur in parallel, and are coordinated in proper timing to each other. This paper focuses on the interaction between a beginner user using a video recorder application on PC and a multimodal animated help agent, and presents a probabilistic model of fine-grained timing dependencies among different behaviors of different modalities. First, we collect user-agent dialogues using a Wizard-of-Oz experimental setting, and then the collected verbal and nonverbal behavior data will be used to build a Bayesian network model, which can predict the likelihood of successful mouse clicks in near future, given evidence associated with the status of speech, agents gestures and users mouse actions. Finally, we attempt to determine proper timing when the agent should give additional instructions by estimating the likelihood of a mouse click occurrence.
9th International Conference on Speech Prosody 2018 | 2018
Yoshiko Arimoto; Yasuo Horiuchi; Sumio Ohno
To investigate the consistency of base frequency ( Fb) labelling of the F0 contour generation model for expressive and/or authentic emotional speech, a Fb labelling experiment was conducted using three trained labellers employing the parallel corpus of emotional speech, Online-gaming voice chat corpus with emotional labelling (OGVC). Twenty-four utterances from spontaneous dialog speech and emotion-acted speech in the OGVC were labelled with theFb, phrase command, and accent command by the three labellers. A repeated measure analysis of variance was performed with the factor of the corpus type, gender, speaker, emotion, and labeller, for the Fb value of each utterance. The results show a significant main effect on gender, speaker, and emotion and the significant interaction between speaker and emotion. The results also indicate that the value ofFb varied when the different emotions were expressed, even when uttered by the same speaker. Moreover, the precise inspection for theFb of each utterance suggests that the Fb also varied when the linguistic content of the utterances differed, even if the same emotion was expressed in those utterances.
2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) | 2014
Yoshiko Arimoto; Kazuo Okanoya
Covariation of behavioral/physiological reactions may cause emotional synchrony between interlocutors. Based on this assumption, this paper investigated (1) how emotions between the interlocutors are synchronized during a dialog and (2) what types of behavioral or physiological reactions correlate with each other. Speakers verbal and non-verbal emotional behavior (vocal/facial) and physiological responses (heart rate/skin conductance) were recorded when they engaged in competitive/cooperative tasks. After recording, the speakers annotated their own and their interlocutors emotional states (arousal/valence/positivity). An analysis of variance test with correlation coefficients between emotional states suggested that male speakers were less emotionally synchronized than female speakers in the competitive dialog. It also suggested that they believed that their emotions would have been more synchronized with each other than they actually were. Moreover, the results of the correlation tests revealed that the behavioral or physiological reactions of most of the pairs in the same dialog were positively correlated.
Journal of the Acoustical Society of America | 2008
Yoshiko Arimoto; Sumio Ohno; Hitoshi Iida
With great advance of automatic speech recognition (ASR) systems and a voice command system are demanded to be more sensitive to users intention or emotion. These systems currently process linguistic information, but not process nonlinguistic information or paralinguistic information which users expressed during dialogs. For that reason, computers can obtain less information about a user through a dialog than human listeners can. If computers will recognize users emotions conveyed by acoustic information, more appropriate response can be made toward users. For realization of emotion recognition, we have continued our study on anger degree estimation by both prosodic features and segmental features with anger utterances which were recorded during two kinds of pseudo‐dialogs. This report focuses on only segmental features related to voice quality and examines them for capabilities to estimate anger degree. The first cepstral coefficient of anger utterances has been analyzed to obtain acoustic parameters r...
Journal of the Acoustical Society of America | 2006
Yoshiko Arimoto; Sumio Ohno; Hitoshi Iida
Automatic speech recognition (ASR) systems are greatly demanded for customer service systems. With advanced interactive voice response systems, humans have more opportunities to have dialogues with computers. Existing dialogue systems process linguistic information, but do not process paralinguistic information. Therefore, computers are able to obtain less information during a human‐computer dialogue than a human can during a human‐human dialogue. This report describes a study of the estimation method of degree of speakers’ anger emotion using acoustic features and linguistic representation expressed in utterances during a natural dialogue. To record utterances expressing the users’ internal anger emotion, we set pseudo‐dialogues to induce irritation arising from discontentment with the ASR system performance and to induce exasperation against the operator while the user makes a complaint. A five‐scale subjective evaluation was conducted to mark each utterance with a score as the actual measurement of ang...
Acoustical Science and Technology | 2012
Yoshiko Arimoto; Hiromi Kawatsu; Sumio Ohno; Hitoshi Iida
Journal of Natural Language Processing | 2007
Yoshiko Arimoto; Sumio Ohno; Hitoshi Ida
conference of the international speech communication association | 2008
Yoshiko Arimoto; Hiromi Kawatsu; Sumio Ohno; Hitoshi Iida
Acoustical Science and Technology | 2015
Yoshiko Arimoto; Kazuo Okanoya