Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kazue Hata is active.

Publication


Featured researches published by Kazue Hata.


International Journal of Speech Technology | 1997

Combining Concatenation and Formant Synthesis for Improved Intelligibility and Naturalness in Text-to-Speech Systems

Steve Pearson; Frode Holm; Kazue Hata

A general method which combines formant synthesis by rule and time-domain concatenation is proposed. This method utilizes the advantages of both techniques by maintaining naturalness while minimizing difficulties such as prosodic modification and spectral discontinuities at the point of concatenation. An integrated sampled natural glottal source (Matsui et al., 1991) and sampled voiceless consonants were incorporated into a real-time text-to-speech formant synthesizer. In special cases, voicing amplitude envelopes and formant transitions dirived from natural speech were also utilized. Several listening tests were performed to evaluate these methods. We obtained a significant overall improvement in intelligibility over our previous formant synthesizer. Such improvements in intelligibility were previously obtained with a Japanese text-to-speech system using a related hybrid system (Kamai and Matsui, 1993), indicating the applicability of this method for multi-lingual synthesis. The results of subjective analyses showed that these methods can alo improve naturalness and listenability factors.


Journal of the Acoustical Society of America | 1988

Delayed pitch fall in Japanese

Yoko Hasegawa; Kazue Hata

The Tokyo dialect of Japanese is regarded as a prototypical pitch accent language where the location of F0 fall is the only acoustic correlate of the accent. Neustupný [Onsei‐gakkai Kaihoo 121 (1966)], however, reported that F0 fall does not always synchronize with an accented mora but may be carried over to the following mora. Sugito [Shooin Joshi Daigaku Ronshuu 10 (1972)] found that native speakers perceived a pitch accent on a mora when it was followed by a falling F0 contour and also that the F0 peak value in the accented mora need not be higher than that in the following mora. Not well investigated though is the correlation between the peak location and the steepness of falling contour. The working hypothesis is that the later the F0 fall occurs, the faster F0 drops. This delayed pitch fall is characterized in terms of (1) the F0 peak location and (2) the steepness of F0 contour computed in Hz/cs. The results from 560 tokens uttered by seven speakers of the Tokyo dialect confirmed our hypothesis. [W...


Journal of the Acoustical Society of America | 1988

Delayed pitch fall in Japanese: Perceptual experiment

Kazue Hata; Yoko Hasegawa

Hasegawa and Hata [J. Acoust. Soc. Am. Suppl. 1 83, S29 (1988)] investigated the delayed pitch fall phenomenon in Tokyo dialect Japanese in production data and examined the relationships of perceived accent to the peak location and the steepness of the falling contour. Pitch fall, the only acoustic correlate of the accent in Japanese, sometimes occurred on the syllable following the accented syllable. A delayed pitch fall tends to be steeper the later it occurs. The present paper examines delayed pitch fall from a perceptual point of view. Three‐syllable synthetic stimuli /ma ma ma/ were prepared with the F0 peak in different locations and different falling slopes. These stimuli were presented to native Japanese subjects in order to determine whether perception and production are correlated in this phenomenon, i.e., whether a change in the location of the peak and the degree of F0 fall in the second /ma/ causes the accent to be perceived on the first syllable. Implications for speech recognition will be d...


Journal of the Acoustical Society of America | 2002

Cues for question intonation in Arabic: Disambiguation techniques for use in automatic speech recognizer systems

Leslie Melbourne Barrett; Kazue Hata

The focus of this study is to determine the extent to which prosodic characteristics can contribute to the improvement of speech recognition in Arabic. F0 rising rate was chosen to disambiguate yes–no question from declarative sentences. In Arabic, as in English, a rising intonation indicates a yes–no question whether the question takes lexical question markers or uses an inverted word order or whether the sentence takes just a declarative form. We conducted a production study with 55 yes–no question sentences uttered by a female native speaker of Arabic. Two types of measurements were taken for F0 rise rate. First, we visually obtained the best‐fit rise in sentence‐final position. Second, we computed the rate based on minimum and maximum F0 values within the sentence‐final 500 ms. The results show that although the rise obtained from the final 500 ms (0.41 Hz/ms) is different from the best‐fit rise rate (0.49 Hz/ms) (p<0.05), when examining two different F0 rising shapes and considering JND for the risin...


Journal of the Acoustical Society of America | 1994

The use of sampled consonants for improved intelligibility in formant synthesizers

Heather Moran; Kazue Hata; Steve Pearson

This paper investigates the improvement of intelligibility in a formant synthesizer by using a library of sampled consonants extracted from natural waveforms. Four hundred and seventy‐one monosyllable words containing consonants in all possible vowel environments from a male American English speaker were recorded. The current focus is on the voiceless consonants. A comparison test was conducted between our current synthesizer [K. Matsui et al., Proc. ICASSP 2, 769–772 (1991)] and the sampled consonant system. Eight naive listeners participated in a simple intelligibility test. The stimuli list consisted of 110 tokens, 60% of which were nonsense words. The results showed that the sampled consonant system was significantly higher (by 20%) in overall intelligibility score. In terms of consonant classes, initial stops showed the most improvement with a 26% increase. Weak fricatives did not show a dramatic difference between the two systems. In further experiments these were improved utilizing additional facto...


Journal of the Acoustical Society of America | 1990

Perceptual shift of accent in English

Yoko Hasegawa; Kazue Hata

It has been found that the perception of an accentual high tone (H) in Japanese is determined by both the F0 peak location and the F0 drop rate thereafter. The actual F0 peak need not occur on the accented syllable: If the peak is approximately within the first half of the postaccent syllable, and the drop rate is greater than a certain value, the H will be associated with the accented syllable [Y. Hasegawa and K. Hata, J. Acoust. Soc. Am. Suppl. 1 83, S29 (1988); K. Hata and Y. Hasegawa, J. Acoust. Soc. Am. Suppl. 1 84, S156 (1988)]. The preliminary experiment indicates that this perceptual accent shift is also observable in English; therefore, the effect must be taken into consideration in synthesizing English utterances. The present study investigates the case where the intonational H occurs on the utterance‐final syllable, e.g., “This is my net.” In such a case, due to the exaggerated F0 lowering at utterance end, the drop rate after the peak may be so great that the perceptual shift can occur, i.e., ...


Journal of the Acoustical Society of America | 1986

Intrinsic pitch of vowels in read speech at different speech rates

Kazue Hata

Although there is abundant evidence for systematic differences in the average F0 of individual vowels (so‐called “intrinsic pitch”), other things being equal, it has been claimed that these differences disappear in connected speech [N. Umeda, J. Acoust. Soc. Am. 70, 350–355 (1981)]. In the hope that it could be possible to identify the other things which, when not equal, influence intrinsic vowel pitch, how it is affected by speaking rate and emphasis (contrastive stress) was examined. Ten speakers of American English (five male and five female) read sentences containing “Leo” and “Lolly” in the target sentence‐medial position with and without emphasis at three different rates. A preliminary analysis of the data reveals that, although these factors influence the intrinsic pitch differences, namely, emphasis amplified it and fast rate attenuated it, the effect was preserved on the average under all conditions.


Journal of the Acoustical Society of America | 1999

Prosody templates in word‐level synthesis

Frode Holm; Kazue Hata

This paper describes an approach to word‐level prosody with the goal of achieving natural‐sounding human intonation. This study comprised about 3000 words in sentence initial position, uttered by a female native speaker of American English. From these recordings general F0 and duration templates were extracted, initially based on stress‐pattern alone. Results with regards to F0 templates [F. Holm and K. Hata, ICSLP (1998)] had previously been reported, which showed great promise for a practical synthesis implementation. In this paper duration templates will be focused on. Obtaining temporal prosody patterns is a much harder problem than it is for F0 contours. This is largely due to the fact that one cannot separate a high‐level prosodic intent from purely articulatory constraints merely by examining individual segmental data. This methodology relies on a concept of stretchability, which gives a clear measure of how much a given cluster of segments can change its duration in natural speech. The higher this...


Journal of the Acoustical Society of America | 1995

Acoustic cues for /θ/ in American English

Nicholas Kibre; Kazue Hata

Distinguishing between the voiceless fricatives /f/ and /θ/ is a difficult problem in natural and synthetic speech. In a previous experiment using natural stimuli [K. Hata et al., Proc. ICSLP 327–330 (1994)], it was found that adding vowel transitions increased identification for /f/ at least 15% in comparison with frication‐only stimuli. However, with vowel transitions, the identification of /θ/ failed to show significant improvement. The purpose of the current study was to investigate, with an improved procedure, significant cues for /θ/ which we can use in our synthesizer. Six monosyllabic nonsense words (e.g., /fiyk/, /θayk/) were recorded. Segments of approximately 30‐ms duration from different locations of /θ/ and its following vowel were spliced into f‐initial words. Eight subjects were asked to identify each stimulus as ‘‘th,’’ ‘‘f’’ or ‘‘indistinguishable.’’ In the /iy/ context, /f/‐initial stimuli spliced with fricative‐vowel transitions from /θ/ were perceived as /θ/ 55% of the time, while stim...


Journal of the Acoustical Society of America | 1989

The perception of the low‐high (LH) tonal sequence

Kazue Hata; Yoko Hasegawa

It has been reported that the primary cue for the HL tonal perception in Japanese is not the actual F0 peak location but rather a falling F0 contour. The F0 fall may be significantly delayed, resulting in the F0 peak within the L‐toned syllable. Furthermore, it was found earlier that (1) the later the F0 fall in the L‐toned syllable, the steeper the fall rate required and (2) the fall must begin within the first two‐thirds of the duration of the vowel in the L‐toned syllable. The present experiment investigates whether a lack of synchronization between F0 change and syllable boundary can be found in the perception of the LH as well. Synthesized nonsense words/mamama/were prepared in such a way that both the onset of F0 rise and the F0 peak occur at various locations, while maintaining the overall F0 contour (level‐rise‐peak‐slight fall). The stimuli were presented to native speakers of Japanese to determine the boundary between the categorical perception of LHH and LLH. The results show that the LH sequen...

Collaboration


Dive into the Kazue Hata's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yoko Hasegawa

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Leslie Melbourne Barrett

United Kingdom Atomic Energy Authority

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge