Tomio Takara
University of the Ryukyus
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tomio Takara.
international conference on acoustics, speech, and signal processing | 2002
Pusadee Seresangtakul; Tomio Takara
Thai speech synthesis by rule is developed using cepstral parameters. To synthesize the pitch contour of Thai tone, we applied an extension of Fujisakis model for tonal languages. According to our study on pitch contour of five Thai tones using this model, the result shows that to synthesize the pitch contour of these five Thai tones, the command pattern for the local F0 components needs both positive and negative commands.
international conference on acoustics, speech, and signal processing | 2003
Pusadee Seresangtakul; Tomio Takara
Thai speech synthesis by rule has been developed using cepstral parameters. To synthesize F/sub 0/ contours of Thai tones, the generative model of F/sub 0/ contours (Fujisakis model) for tonal languages is applied. Along with our method, the pitch contours of Thai disyllabic words were analyzed. Based on the analysis of Thai polysyllabic words using this model, rules are derived to synthesize Thai disyllabic words, which we then applied. We performed listening tests to evaluate intelligibility of the model for Thai tone generation. The correct rates were 95% and 99% for no-meaning words and meaning words, respectively. The generative model of F/sub 0/ contours for Thai words was shown to be effective.
international conference on acoustics, speech, and signal processing | 1997
Tomio Takara; Kaxuya Higa; Itaru Nagayama
Hidden Markov models (HMMs) are widely used for automatic speech recognition because they have a powerful algorithm used in estimating the models parameters, and achieve a high performance. Once a structure of the model is given, the models parameters are obtained automatically by feeding training data. There is, however, no effective design method leading to an optimal structure of the HMMs. We propose a new application of a genetic algorithm to search out such an optimal structure. In this method, the left-right structures are adopted for HMMs and the likelihood is used for the fitness of the genetic algorithm. We report the results of our experiment showing the effectiveness of the genetic algorithm in automatic speech recognition.
Proceedings of the First Workshop on Language Technologies for African Languages | 2009
Tadesse Anberbir; Tomio Takara
This paper presents a speech synthesis system for Amharic language and describes and how the important prosodic features of the language were modeled in the system. The developed Amharic Text-to-Speech system (AmhTTS) is parametric and rule-based that employs a cepstral method. The system uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The intelligibility and naturalness of the system was evaluated by word and sentence listening tests respectively and we achieved 98% correct-rates for words and an average Mean Opinion Score (MOS) of 3.2 (which is categorized as good) for sentences listening tests. The synthesized speech has high intelligibility and moderate naturalness. Comparing with previous similar study, our system produced considerably similar quality speech with a fairly good prosody. In particular our system is mainly suitable for building new languages with little modification.
international conference on acoustics, speech, and signal processing | 2003
Tu Trong Do; Tomio Takara
We propose a Vietnamese text-to-speech (VieTTS) system which is a parametric and rule based speech synthesis system. The fundamental speech units of this system are demisyllables with level tone. VieTTS uses a source-filter model for speech production and a log magnitude approximation (LMA) filter as the vocal tract filter. We chose the Hanoi dialect for VieTTS. Tone synthesis of Vietnamese is implemented by using fundamental frequency (F0) patterns and power pattern control. F0 is the most important factor in Vietnamese tone synthesis and the power control strongly affects broken and drop tones. Applying power control for tone synthesis is effective and unique for Vietnamese compared to other tonal languages such as Chinese and Thai.
Journal of the Acoustical Society of America | 2016
Tomio Takara; Akichika Higa; Syouki Kaneshiro; Yuuya Oshiro
We have studied an effective method using principal components spanning a feature space of isolated vowels. A covariance matrix is calculated from many log-amplitude spectra of isolated vowels uttered by a speaker. An eigen equation of the covariance matrix is solved. The resulting eigenvectors are called principal vectors. In the analysis system, log-amplitude spectrum for each frame of a word uttered by the same speaker is transformed to the components on the principal vectors. In the synthesis system, a log-amplitude spectrum is reconstructed using the components on the principal vectors with the largest eigenvalues and the spoken word is synthesized using the LMA filter. We draw the distribution chart of the first and the second principal components extracted from Japanese vowel data. This figure was very similar to the F1—F2 distribution of vowels and so to the vowel classification map in coordinate axes of the degree of constriction and the tongue hump position. The Listening tests showed that the q...
Journal of the Acoustical Society of America | 2016
Tomio Takara; Akichika Higa
Fujisaki’s model is a generative model of fundamental frequency which approximates original time pattern effectively. In this study, we propose a new generative model of spectral sequence using Fujisaki’s model. We have already proposed a speech synthesis method using vowel space parameters. The vowel space parameter is defined as the component on principal axis obtained by PCA for log-amplitude spectra of isolated vowels. Log-amplitude spectrum can be reconstructed from linear combination of a few principal vectors with coefficients of the vowel space parameters. Using time sequence of the vowel space parameters, we can synthesize a spoken word. In this study, we apply Fujisaki’s model to the time pattern of vowel space parameter. We adopt parameters of Fujisaki’s model as genes of the genetic algorithm. Then, we execute the genetic algorithm with the fitness of similarity between time patterns of Fujisaki’s model and the pattern generated from the vowel space parameter. We obtained synthesized speech wh...
international conference on future generation information technology | 2010
kyawt Yin Win; Tomio Takara
In this paper an optimization method has been proposed to minimize the differences of fundamental frequency (F 0) and the differences of length among the speakers and the phonemes. Within tone languages use pitch variation to construct meaning of the words, we need to define the optimized fundamental F 0 and length to obtain the naturalness of synthetic sound. Large variability exists in the F 0 and the length uttered by deferent speakers and different syllables. Hence for speech synthesis normalization of F 0 and lengths are important to discriminate tones. Here, we implement tone rule by using two parameters; optimized F 0 and length. As an advantage in the proposed method, the optimized parameters can be separated to male and female group. The effectiveness of the proposed method is confirmed by the distribution of F 0 and length. Listening tests with high correct rates approve intelligibility of synthetic sound.
Acoustical Science and Technology | 2004
Tu Trong Do; Tomio Takara
conference of the international speech communication association | 1998
Tomio Takara; Yasushi Iha; Itaru Nagayama