Masafumi Nishida | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Masafumi Nishida is active.

Explore More

Publication

Featured researches published by Masafumi Nishida.

international conference on acoustics, speech, and signal processing | 2003

Unsupervised speaker indexing using speaker model selection based on Bayesian information criterion

Masafumi Nishida; Tatsuya Kawahara

The paper addresses unsupervised speaker indexing for discussion audio archives. In discussions, the speaker changes frequently, thus the duration of utterances is very short and its variation is large, which causes significant problems in applying conventional methods such as model adaptation and variance-BIC (Bayesian information criterion) methods. We propose a flexible framework that selects an optimal speaker model (GMM or VQ) based on the BIC according to the duration of utterances. When the speech segment is short, the simple and robust VQ-based method is expected to be chosen, while GMM can be reliably trained for long segments. For a discussion archive having a total duration of 10 hours, it is demonstrated that the proposed method achieves higher indexing performance than that of conventional methods.

international universal communication symposium | 2009

Eye-gaze experiments for conversation monitoring

Kristiina Jokinen; Masafumi Nishida; Seiichi Yamamoto

Eye-tracking technology has recently been matured so that its use in studies dealing with unobtrusive and natural user experiments has become easier to conduct. Simultaneously, human computer interactions have become more conversational in style, and more challenging in that they require various human conversational strategies, such as giving feedback and managing turn-taking. In this paper, we focus on eye-gaze in order to investigate turn taking signals and conversation monitoring in naturally occurring dialogues. We seek to build models that deal with the important aspects of which interlocutor the speaker is talking to, and what kind of turn taking signals the partners elicit, and we report the first results of our eye-tracking experiments.

IEEE Transactions on Speech and Audio Processing | 2005

Speaker model selection based on the Bayesian information criterion applied to unsupervised speaker indexing

Masafumi Nishida; Tatsuya Kawahara

In conventional speaker recognition tasks, the amount of training data is almost the same for each speaker, and the speaker model structure is uniform and specified manually according to the nature of the task and the available size of the training data. In real-world speech data such as telephone conversations and meetings, however, serious problems arise in applying a uniform model because variations in the utterance durations of speakers are large, with numerous short utterances. We therefore propose a flexible framework in which an optimal speaker model (GMM or VQ) is automatically selected based on the Bayesian Information Criterion (BIC) according to the amount of training data available. The framework makes it possible to use a discrete model when the data is sparse, and to seamlessly switch to a continuous model after a large amount of data is obtained. The proposed framework was implemented in unsupervised speaker indexing of a discussion audio. For a real discussion archive with a total duration of 10 hours, we demonstrate that the proposed method has higher indexing performance than that of conventional methods. The speaker index is also used to adapt a speaker-independent acoustic model to each participant for automatic transcription of the discussion. We demonstrate that speaker indexing with our method is sufficiently accurate for adaptation of the acoustic model.

international conference on multimedia computing and systems | 1999

Speaker indexing for news articles, debates and drama in broadcasted TV programs

Masafumi Nishida; Yasuo Ariki

We propose a method to extract and verify individual speaker utterance using a subspace method. This method can extract a speech section of the same speaker by repeating speaker verification between the present speech section and the immediately previous speech section. The speaker models are automatically trained in the verification process without constructing speaker templates in advance. As a result, this speaker verification method is applied to speaker indexing. In this study, announcer utterances are automatically separated from news speech data which includes reporter or interviewer utterances using the speaker verification method. Also the utterances of each participant in a debate program broadcasted on TV are automatically extracted. Furthermore, speech sections of an actor or actress in TV drama are extracted.

Proceedings of the 2010 workshop on Eye gaze in intelligent human machine interaction | 2010

On eye-gaze and turn-taking

Kristiina Jokinen; Masafumi Nishida; Seiichi Yamamoto

In this paper we describe our eye-tracking data collection and preliminary experiments concerning the relation between eyegazing and turn-taking in natural human-human conversations, and how these observations can be extended to multimodal human-machine interactions. We confirm the earlier findings that eye-gaze is important in coordinating turn-taking and information flow in dialogues, but note that in multiparty dialogues also head movement seems to pay a crucial role in signalling the persons intention to take, hold, or yield the turn.

Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction | 2012

Multimodal corpus of conversations in mother tongue and second language by same interlocutors

Kosuke Kabashima; Kristiina Jokinen; Masafumi Nishida; Seiichi Yamamoto

We describe data on multi-modal information that were collected from conversations both in the mother tongue and the second language in this paper. We also compare eye movements and utterance styles between communications in the mother tongue and second language. The results we obtained from analyzing eye movements and utterance styles are presented.

language resources and evaluation | 2015

Multimodal corpus of multiparty conversations in L1 and L2 languages and findings obtained from it

Seiichi Yamamoto; Keiko Taguchi; Koki Ijuin; Ichiro Umata; Masafumi Nishida

To investigate the differences in communicative activities by the same interlocutors in Japanese (their L1) and in English (their L2), an 8-h multimodal corpus of multiparty conversations was collected. Three subjects participated in each conversational group, and they had conversations on free-flowing and goal-oriented topics in Japanese and in English. Their utterances, eye gazes, and gestures were recorded with microphones, eye trackers, and video cameras. The utterances and eye gazes were manually annotated. Their utterances were transcribed, and the transcriptions of each participant were aligned with those of the others along the time axis. Quantitative analyses were made to compare the communicative activities caused by the differences in conversational languages, the conversation types, and the levels of language expertise in L2. The results reveal different utterance characteristics and gaze patterns that reflect the differences in difficulty felt by the participants in each conversational condition. Both total and average durations of utterances were shorter in their L2 than in their L1 conversations. Differences in eye gazes were mainly found in those toward the information senders: Speakers were gazed at more in their second-language than in their native-language conversations. Our findings on the characteristics of conversations in the second language suggest possible directions for future research in psychology, cognitive science, and human–computer interaction technologies.

international conference on acoustics, speech, and signal processing | 2004

Speaker indexing and adaptation using speaker clustering based on statistical model selection

Masafumi Nishida; Tatsuya Kawahara

The paper addresses unsupervised speaker indexing and automatic speech recognition of discussions. In speaker indexing, there are two cases, where the number of speakers is unknown beforehand and where the number is known. When the specified number is unknown, it is difficult to apply to various data because it needs to determine several parameters like threshold. In addition, serious problems arise in applying a uniform model because variations in the utterance durations of speakers are large. We thus propose a method which can robustly perform speaker indexing for the two cases using a flexible framework in which an optimal speaker model (GMM or VQ) is selected based on the BIC (Bayesian information criterion). Moreover, we propose a combination method of speaker adaptation based on speaker selection and the indexing method. For real discussion archives, we demonstrated that indexing performance is higher than that of conventional methods for the two cases and speech recognition performance was improved by the combination method.

AMCP '98 Proceedings of the First International Conference on Advanced Multimedia Content Processing | 1998

News Dictation and Article Classification Using Automatically Extracted Announcer Utterance

Yasuo Ariki; Jun Ogata; Masafumi Nishida

In order to construct a news database with a function of video on demand (VOD), it is required to classify news articles into topics. In this study, we describe a system which can dictate news speech, extract keywords and classify news articles based on the extracted keywords. We propose that it is sufficient to dictate only the announcer utterance in classifying the news articles and it contributes to reduce the processing time. As an experiment, we compared the classification performance of news articles in two cases; dictating only the announcer utterances which are automatically extracted and dictating a whole speech which includes reporter or interviewer utterances.

Sensors | 2013

Development of a compact wireless Laplacian electrode module for electromyograms and its human interface applications.

Yutaka Fukuoka; Kenji Miyazawa; Hiroki Mori; Manabi Miyagi; Masafumi Nishida; Yasuo Horiuchi; Akira Ichikawa; Hiroshi Hoshino; Makoto Noshiro; Akinori Ueno

In this study, we developed a compact wireless Laplacian electrode module for electromyograms (EMGs). One of the advantages of the Laplacian electrode configuration is that EMGs obtained with it are expected to be sensitive to the firing of the muscle directly beneath the measurement site. The performance of the developed electrode module was investigated in two human interface applications: character-input interface and detection of finger movement during finger Braille typing. In the former application, the electrode module was combined with an EMG-mouse click converter circuit. In the latter, four electrode modules were used for detection of finger movements during finger Braille typing. Investigation on the character-input interface indicated that characters could be input stably by contraction of (a) the masseter, (b) trapezius, (c) anterior tibialis and (d) flexor carpi ulnaris muscles. This wide applicability is desirable when the interface is applied to persons with physical disabilities because the disability differs one to another. The investigation also demonstrated that the electrode module can work properly without any skin preparation. Finger movement detection experiments showed that each finger movement was more clearly detectable when comparing to EMGs recorded with conventional electrodes, suggesting that the Laplacian electrode module is more suitable for detecting the timing of finger movement during typing. This could be because the Laplacian configuration enables us to record EMGs just beneath the electrode. These results demonstrate the advantages of the Laplacian electrode module.

Explore More