Yoshinori Kitahara
Hitachi
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yoshinori Kitahara.
international conference on acoustics, speech, and signal processing | 2006
Nobuo Nukaga; Ryota Kamoshida; Kenji Nagamatsu; Yoshinori Kitahara
In this paper we propose two methods in order to implement unit selection-based text-to-speech engine into resource-limited embedded systems. While we have achieved improving the quality of synthesized speech by unit selection-based text-to-speech technology, there is a practical problem regarding the trade-off between the size of database and the quality of synthesized speech. That is, we need large database and expensive computation in order to generate highly natural sounding voices, and the text-to-speech system is required to meet the specification of target system. For this problem, we introduced frequency-based approaches to reduce the size of speech database. The experimental results showed the step-by-step downsizing method was better than the direct one in terms of the cumulative join cost and the target cost. Furthermore, some techniques were introduced and evaluated in order to implement our text-to-speech engine into an embedded system. From experimental results, it developed that the run-time work load for the test sentences was 80 MIPS approximately and the implemented engine was useful and scalable for mid-class embedded system
Journal of the Acoustical Society of America | 1988
Yoshinori Kitahara; Yoh'ichi Tohkura
For the purpose of a natural and high‐quality speech synthesis, the role of prosody in speech perception has been studied. Prosodic components, which contribute to the expression of emotions and their intensity, were clarified by analyzing emotional speech and by performing listening tests on synthetic speech. It has been confirmed that prosodic components, which are composed of pitch structure, temporal structure, and amplitude structure, contribute to the expression of emotions more than the spectral structure of speech. Listening test results also showed that the temporal structure was the most important for the expression of anger, while both amplitude structure and pitch structure were much more important for the intensity of anger. Pitch structure also played a significant role in the expression of joy and its intensity. These results suggest the possibility of converting a neutral utterance (i.e., one with no particular emotion) into utterances expressing various kinds of emotions. These results ca...
multimedia signal processing | 1999
Nobuo Hataoka; Hiroaki Kokubo; Nobuo Nukaga; Yasunari Obuchi; Akio Amano; Yoshinori Kitahara
This paper describes speech processing middleware which has been developed on RISC microprocessors for embedded speech applications. This middleware consists of a speech recognition module and a speech synthesis module, and especially the speech recognition middleware has advantages of robustness for environmental noise and speaker differences. The speech middleware provides sophisticated user interfaces to multimedia systems using microprocessors as CPUs, such as car navigation systems, mobile information equipment, and game machines.
Journal of the Acoustical Society of America | 1988
Yoh'ichi Tohkura; Yoshinori Kitahara
Segmental duration of each phoneme changes depending upon the speaking rate. Generally, vowel parts are easier to be compressed or expanded than consonant parts are in fast or slow speech, respectively. Questions raised in this paper include how the speaking rate can be extracted from the speech signal without knowing the content (i.e., phonetic information) and what kind of time‐scale modification can be chosen in order to control speaking rate. First, the segmental duration compressibility of the speech signal was defined by path slopes in DTW spectral matching when utterances with various kinds of speaking rates were matched to a reference utterance of a normal speaking rate. On the assumption that the compressibility is inversely proportional to segmental spectrum changes, the relationship between the compressibility and the average cepstral time difference Δcep [S. Furui, IEEE Trans. Acoust. Speech Signal Process. ASSP‐34, 52–59 (1986)] was studied. The results showed that the Δcep is an efficient pa...
Archive | 1993
Haru Ando; Yoshinori Kitahara
Archive | 1993
Yoshinori Kitahara; Takehiro Fujita; Shigeru Yabuuchi; Keiichi Yoshioka
Journal of the Acoustical Society of America | 2005
Atsuko Koizumi; Hiroyuki Kaji; Yasunari Obuchi; Yoshinori Kitahara
Archive | 1992
Seiji Futatsugi; Keiji Kojima; Yoshiki Matsuda; Yoshinori Kitahara; Masato Mogaki
Archive | 2001
Yoshinori Kitahara; Yasunari Obuchi; Atsuko Koizumi; Seiki Mizutani
Archive | 1996
Takashi Hasegawa; Yoshinori Kitahara