Taro Asada | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Taro Asada is active.

Explore More

Publication

Featured researches published by Taro Asada.

Artificial Life and Robotics | 2011

Facial expression recognition of a speaker using vowel judgment and thermal image processing

Yasunari Yoshitomi; Taro Asada; Kyouhei Shimada; Masayoshi Tabuse

We have previously developed a method for the recognition of the facial expression of a speaker. For facial expression recognition, we previously selected three images: (i) just before speaking, (ii) speaking the first vowel, and (iii) speaking the last vowel in an utterance. By using the speech recognition system named Julius, thermal static images are saved at the timed positions of just before speaking, and when just speaking the phonemes of the first and last vowels. To implement our method, we recorded three subjects who spoke 25 Japanese first names which provided all combinations of the first and last vowels. These recordings were used to prepare first the training data and then the test data. Julius sometimes makes a mistake in recognizing the first and/or last vowel (s). For example, /a/ for the first vowel is sometimes misrecognized as /i/. In the training data, we corrected this misrecognition. However, the correction cannot be carried out in the test data. In the implementation of our method, the facial expressions of the three subjects were distinguished with a mean accuracy of 79.8% when they exhibited one of the intentional facial expressions of “angry,” “happy,” “neutral,” “sad,” and “surprised.” The mean accuracy of the speech recognition of vowels by Julius was 84.1%.

Journal of Information Security | 2011

An Authentication Method for Digital Audio Using a Discrete Wavelet Transform

Yasunari Yoshitomi; Taro Asada; Yohei Kinugawa; Masayoshi Tabuse

Recently, several digital watermarking techniques have been proposed for hiding data in the frequency domain of audio files in order to protect their copyrights. In general, there is a tradeoff between the quality of watermarked audio and the tolerance of watermarks to signal processing methods, such as compression. In previous research, we simultaneously improved the performance of both by developing a multipurpose optimization problem for deciding the positions of watermarks in the frequency domain of audio data and obtaining a near-optimum solution to the problem. This solution was obtained using a wavelet transform and a genetic algorithm. However, obtaining the near-optimum solution was very time consuming. To overcome this issue essentially, we have developed an authentication method for digital audio using a discrete wavelet transform. In contrast to digital watermarking, no additional information is inserted into the original audio by the proposed method, and the audio is authenticated using features extracted by the wavelet transform and characteristic coding in the proposed method. Accordingly, one can always use copyright-protected original audio. The experimental results show that the method has high tolerance of authentication to all types of MP3, AAC, and WMA compression. In addition, the processing time of the method is acceptable for every-day use.

Artificial Life and Robotics | 2011

A system for facial expression recognition of a speaker using front-view face judgment, vowel judgment, and thermal image processing

Tomoko Fujimura; Yasunari Yoshitomi; Taro Asada; Masayoshi Tabuse

For facial expression recognition, we selected three images: (i) just before speaking, (ii) speaking the first vowel, and (iii) speaking the last vowel in an utterance. In this study, as a pre-processing module, we added a judgment function to distinguish a front-view face for facial expression recognition. A frame of the front-view face in a dynamic image is selected by estimating the face direction. The judgment function measures four feature parameters using thermal image processing, and selects the thermal images that have all the values of the feature parameters within limited ranges which were decided on the basis of training thermal images of front-view faces. As an initial investigation, we adopted the utterance of the Japanese name “Taro,” which is semantically neutral. The mean judgment accuracy of the front-view face was 99.5% for six subjects who changed their face direction freely. Using the proposed method, the facial expressions of six subjects were distinguishable with 84.0% accuracy when they exhibited one of the intentional facial expressions of “angry,” “happy,” “neutral,” “sad,” and “surprised.” We expect the proposed method to be applicable for recognizing facial expressions in daily conversation.

Archive | 2011

Vowel Judgment for Facial Expression Recognition of a Speaker

Yasunari Yoshitomi; Taro Asada; Masayoshi Tabuse

To better integrate robots into society, a robot should be able to interact in a friendly manner with humans. The aim of our research is to contribute to the development of a robot that can perceive human feelings and mental states. A robot that could do so could, for example, better take care of an elderly person, support a handicapped person in his or her live, encourage a person who looks sad, or advise an individual to stop working and take a rest when he or she looks tired. Our study concerns the first stage of the development of a robot that has the ability to detect visually human feelings or mental states. Although a mechanism for recognizing facial expressions has received considerable attention in the field of computer vision research (Harashima et al., 1989; Kobayashi & Hara, 1994; Mase, 1990, 1991; Matsuno et al., 1994; Yuille et al., 1989), currently it still falls far short of human capability—especially from the viewpoint of robustness under widely varying lighting conditions. One of the reasons for this is that the nuances of shade, reflection, and localized darkness—as the result of the inevitable changes in gray levels—influence the accuracy of the discernment of facial expressions. To develop a robust method of facial expression recognition applicable under widely varied lighting conditions, we do not use a visible ray (VR) image, instead we use an image produced by infrared rays (IR), which show temperature distributions of the face (Fujimura et al., 2011; Ikezoe et al., 2004; Koda et al., 2009; Nakano et al., 2009; Sugimoto et al., 2000; Yoshitomi et al., 1996, 1997a, 1997b, 2000, 2011a, 2011b; Yoshitomi, 2010). Although a human cannot detect IR, a robot can process the information contained in the thermal images created by IR. Therefore, as a new mode of robot vision, thermal image processing is a practical method that is viable under natural conditions. The timing for recognizing facial expressions also is important for a robot because processing can be time consuming. We adopted an utterance as the key to expressing human feelings or mental states because humans tend to say something to express their feelings (Fujimura et al., 2011; Ikezoe et al., 2004; Koda et al., 2009; Nakano et al., 2009; Yoshitomi et al., 2000; Yoshitomi, 2010). In conversation, we utter many phonemes. We have selected vowel utterances for use as timings to recognize facial expressions because the number of vowels is very limited, and the waveforms of vowels tend to have a bigger amplitude and a longer utterance period than consonants. Accordingly, the timing range of each vowel can be relatively easily decided by a speech recognition system.

International Journal of Advanced Robotic Systems | 2013

Recognition of a Baby's Emotional Cry Towards Robotics Baby Caregiver

Shota Yamamoto; Yasunari Yoshitomi; Masayoshi Tabuse; Kou Kushida; Taro Asada

We developed a method for pattern recognition of babys emotions (discomfortable, hungry, or sleepy) expressed in the babys cries. A 32-dimensional fast Fourier transform is performed for sound form clips, detected by our reported method and used as training data. The power of the sound form judged as a silent region is subtracted from each power of the frequency element. The power of each frequency element after the subtraction is treated as one of the elements of the feature vector. We perform principal component analysis (PCA) for the feature vectors of the training data. The emotion of the baby is recognized by the nearest neighbor criterion applied to the feature vector obtained from the test data of sound form clips after projecting the feature vector on the PCA space from the training data. Then, the emotion with the highest frequency among the recognition results for a sound form clip is judged as the emotion expressed by the babys cry. We successfully applied the proposed method to pattern recognition of babys emotions. The present investigation concerns the first stage of the development of a robotics baby caregiver that has the ability to detect babys emotions. In this first stage, we have developed a method for detecting babys emotions. We expect that the proposed method could be used in robots that can help take care of babies.

Journal of Robotics, Networking and Artificial Life | 2015

Quantitative Evaluation of Facial Expressions and Movements of Persons While Using Video Phone

Taro Asada; Yasunari Yoshitomi; Ryota Kato; Masayoshi Tabuse; Jin Narumoto

A video is analyzed by image processing and the feature parameters of facial expressions and movements, which are extracted in the mouth area. The feature parameter for expressing facial expressions is defined as the average of the facial expression intensity. The parameter for expressing movement of a person is defined as the average of the absolute value of the vertical coordinate for the center of gravity of the mouth area in the relative coordinate system. The experimental result shows the usefulness of the proposed method.

Artificial Life and Robotics | 2008

A human-machine cooperative system for generating sign language animation using thermal image

Taro Asada; Yasunari Yoshitomi; Risa Hayashi

We propose a new approach aimed at sign language animation by skin region detection on an infrared image. To generate several kinds of animations expressing personality and/or emotion appropriately, conventional systems require many manual operations. However, a promising way to realize a lower workload is to manually refine an animation made automatically with a dynamic image of real motion. In the proposed method, a 3D CG model corresponding to a characteristic posture in sign language is made automatically by pattern recognition on a thermal image, and then a person’s hand in the CG model is set. The hand part is made manually beforehand. If necessary, the model can be replaced manually by a more appropriate model corresponding to training key frames and/or the model can be refined manually. In our experiments, a person experienced in using sign language recognized the Japanese sign language of 71 words expressed as animation with 88.3% accuracy, and three persons experienced in using sign language also recognized the sign language animation representing three emotions (neutral, happy and angry) with 88.9% accuracy.

Journal of Robotics, Networking and Artificial Life | 2016

Facial Expression Analysis while Using Video Phone

Taro Asada; Yasunari Yoshitomi; Airi Tsuji; Ryota Kato; Masayoshi Tabuse; Noriaki Kuwahara; Jin Narumoto

We have been developing a method for analyzing the facial expressions of a person using a video phone system (Skype) to talk to another person. The video is recorded and analyzed by image processing software (OpenCV) and the newly proposed feature vector of facial expression. The newly proposed facial expression intensity can be used to analyze a change of facial expression. The judgment of speaking is performed in our study. The experimental results show the usefulness of the proposed method.

Journal of Robotics, Networking and Artificial Life | 2014

Method of Facial Expression Analysis Using Video Phone and Thermal Image

Yasunari Yoshitomi; Taro Asada; Ryota Kato; Masayoshi Tabuse

To improve the quality of life of elderly people living in a home or healthcare facility, especially in a rural area, we have been developing a method for analyzing the facial expressions of a person using a video phone system (Skype) to speak with another person. In the present study, we proposed a method for analyzing facial expressions of a person using the video phone system to talk to another person. The recorded video is analyzed by thermal image processing and the newly proposed feature vector of facial expression, which is extracted in the mouth area by applying 2D-DCT. The facial expression intensity, defined as the norm of the difference vector between the feature vector of the neutral facial expression and that of the observed expression, can be used to analyze a change of facial expression. The judgment of utterance is performed by using the intensity of the sound wave. The experimental results show the usefulness of the proposed method. We intend to use the proposed method of facial expression analysis to develop a method for estimating the emotions or mental state of people, especially elderly patients.

robot and human interactive communication | 2009

A method for synchronizing nods of a CG character and a human using thermal image processing

Ryoto Kato; Yasunari Yoshitomi; Taro Asada; Yoko Fujita; Masayoshi Tabuse

We have proposed a method for communication between a human and a CG character by exploiting thermal image processing. The CG character can synchronize its nod with a humans nod by predicting his or her nod. We use as a feature parameter the ratio of horizontal length to vertical length of face skin region on a thermal image. The measured parameter is inputted to fuzzy algorithm system to obtain the nod angle of the person in front of an infrared ray camera, then we use Moving Average model for predicting the nod of person. The average error of the nod angle obtained by the present method was estimated as about 6°.

Explore More