Kimiko Yamakawa
Aichi Shukutoku University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kimiko Yamakawa.
2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) | 2011
Shuichi Itahashi; Tomoko Kajiyama; Kimiko Yamakawa; Yuichi Ishimoto; Tomoko Matsui
We have already reported a corpus similarity visualization method based on the corpus attribute using multidimensional scaling that makes it easy for users to utilize various speech corpora. In this paper, we present a revised visualization method that is based on a ring structure like a planisphere. By using only a mouse, a user can choose appropriate search keys for each of the multiple attributes and can easily filter information by adjusting the keys. Retrieved results are displayed inside the rings, and the user can filter and browse them in real time. This will facilitate efficient searching of the specific corpus that fits users needs.
Journal of the Acoustical Society of America | 2018
Kimiko Yamakawa; Shigeaki Amano
Spoken words used in a broadcast via outdoor loudspeakers were analyzed in terms of word frequency, word familiarity, and phoneme composition to clarify their characteristics. Based on the analysis, we identified 13 original words that are frequently used but probably difficult to listen to and proposed alternative words having higher word familiarity and a lesser number of noise-intolerant phonemes. To verify the proposal, a listening experiment was conducted. Thirteen each of the original and proposed words were presented with and without pink noise (SNR = 0 dB) to 21 participants with normal hearing ability. Error ratios of word identification were obtained from participants’ responses. In a with-noise condition, the error ratio of the proposed words (7.1%) was significantly lower than that of the original words (18.3%), but no difference in error ratio was observed between the original and proposed words in a without-noise condition. These results indicate that the proposed words have higher noise robustness and less listening difficulty than the original words, and that they are useful for transmitting accurate information in a loudspeaker broadcast. Spoken words used in a broadcast via outdoor loudspeakers were analyzed in terms of word frequency, word familiarity, and phoneme composition to clarify their characteristics. Based on the analysis, we identified 13 original words that are frequently used but probably difficult to listen to and proposed alternative words having higher word familiarity and a lesser number of noise-intolerant phonemes. To verify the proposal, a listening experiment was conducted. Thirteen each of the original and proposed words were presented with and without pink noise (SNR = 0 dB) to 21 participants with normal hearing ability. Error ratios of word identification were obtained from participants’ responses. In a with-noise condition, the error ratio of the proposed words (7.1%) was significantly lower than that of the original words (18.3%), but no difference in error ratio was observed between the original and proposed words in a without-noise condition. These results indicate that the proposed words have higher noise rob...
Journal of the Acoustical Society of America | 2018
Shigeaki Amano; Kimiko Yamakawa; Katuhiro Maki
Speech sound transmitted from an outdoor loudspeaker is sometimes difficult to recognize because it is overlapped by a time-delayed speech sound from other loudspeakers. This difficulty is assumed to depend on a delay time and an intensity difference between the original and overlapping time-delayed sounds. To clarify the effects of a delay time and an intensity difference on speech recognition, a listening experiment was conducted with 21 Japanese adults using 105 Japanese spoken words in a carrier sentence, with delay times and intensity differences ranging between 0–250 ms and 0–9 dB, respectively. The experiment revealed that recognition ratios are significantly lower for delay times of 100–250 ms than for 0 ms. Similarly, the ratios are significantly lower for intensity differences of 0–6 dB than for 9 dB. These results suggest that speech sound from an outdoor loudspeaker is difficult to recognize in a large area where the difference of distances from two loudspeakers ranges between 34 and 85 m and the intensity difference between the original and overlapping time-delayed sound is less than 6 dB. [This study was supported by JSPS KAKENHI Grant Numbers JP15K12494, JP15H03207, JP17K02705, and by Aichi-Shukutoku University Cooperative Research Grant 2017-2018.] Speech sound transmitted from an outdoor loudspeaker is sometimes difficult to recognize because it is overlapped by a time-delayed speech sound from other loudspeakers. This difficulty is assumed to depend on a delay time and an intensity difference between the original and overlapping time-delayed sounds. To clarify the effects of a delay time and an intensity difference on speech recognition, a listening experiment was conducted with 21 Japanese adults using 105 Japanese spoken words in a carrier sentence, with delay times and intensity differences ranging between 0–250 ms and 0–9 dB, respectively. The experiment revealed that recognition ratios are significantly lower for delay times of 100–250 ms than for 0 ms. Similarly, the ratios are significantly lower for intensity differences of 0–6 dB than for 9 dB. These results suggest that speech sound from an outdoor loudspeaker is difficult to recognize in a large area where the difference of distances from two loudspeakers ranges between 34 and 85 m and ...
Journal of the Acoustical Society of America | 2016
Shigeaki Amano; Kimiko Yamakawa; Mariko Kondo
Previous studies have shown that a devoiced vowel makes burst duration of a singleton stop consonant longer than a voiced vowel does. However, the effects of devoiced vowels on stop consonants including a geminate have not been fully investigated. To clarify the effects, durations of a burst and a closure in Japanese stops pronounced by 19 native Japanese speakers were analyzed by stop type (singleton or geminate) and speaking rate (fast, normal, or slow). Analysis revealed that at all speaking rates a devoiced vowel makes the burst duration of singleton and geminate stops longer than a voiced vowel does. Analysis also revealed that at normal and fast speaking rates a devoiced vowel has no effect on closure duration in singleton or geminate stops. However, at a slow speaking rate, a devoiced vowel makes the closure duration of a singleton stop longer. These results indicate that a devoiced vowel consistently lengthens the burst duration of singleton and geminate stops but that its effects on closure durat...
Journal of the Acoustical Society of America | 2016
Ken-Ichi Sakakibara; Kimiko Yamakawa; Hiroshi Imagawa; Takao Goto; Akihito Yamauchi; Katsuhiro Maki; Shigeaki Amano
In this study, we examined the physiological and acoustic characteristics of plosive- and fricative-type geminates, three each of minimal pairs for the plosive- and fricative-type geminates were recorded in different speaking rates, with there ranges are from 3 to 13 mora /s. Physiological analysis showed that (1) an opened duration of vocal folds depends on a speaking rate; (2) a closing duration of vocal folds depends on a speaking rate but an opening duration does not. Acoustical analysis showed that, when a speaking rate increases, (1) logarithm of closure and fricative durations linearly decrease; and (2) a ratio of the duration of a singleton to a geminate decreases.
Journal of the Acoustical Society of America | 2014
Shigeaki Amano; Kimiko Yamakawa
Previous studies suggested that a plosive-type geminate stop in Japanese is discriminated from a single stop with variables of stop closure duration and subword duration that spans from the mora preceding the geminate stop to the vowel following the stop. However, this suggestion does not apply to a fricative-type geminate stop that does not have a stop closure. To overcome this problem, this study proposes Inter-Vowel Interval (IVI) and Successive Vowel Interval (SVI) as discriminant variables. IVI is the duration between the end of the vowel preceding the stop and the beginning of the vowel following the stop. SVI is the duration between the beginning of the vowel preceding the stop and the end of the vowel following the stop. When discriminant analysis was conducted between single and geminate stops of plosive and fricative types using IVI and SVI as independent variables, the discriminant ratio was very high (99.5%, n = 368). This result indicates that IVI and SVI are the general variables that repres...
2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) | 2014
Kimiko Yamakawa; Shigeaki Amano; Mariko Kondo
A Japanese read-word database was developed for 343 Japanese words spoken by 129 non-native speakers of Japanese (Korean, Chinese, Thai, Vietnamese, French, and English speakers) and 18 native speakers of Japanese. The Japanese words had a contrast of segments such as single/geminate stops, normal/lengthened vowels, existence/inexistence of moraic nasal, contracted sound/vowel /i/, and voiced/voiceless consonants that non-native speakers of Japanese frequently mispronounce. The speech files of the Japanese words and phoneme labels for a part of the speech files were registered in the database. The database will contribute to scientific research on characteristics of Japanese speech by non-native speakers. It will also contribute to develop a computer aided learning system of Japanese speech for non-native speakers.
Journal of the Acoustical Society of America | 2013
Shigeaki Amano; Kimiko Yamakawa; Mariko Kondo
Twenty-nine durational variables were examined to clarify rhythmic characteristics in non-fluent Japanese utterances by non-native speakers. Discriminant analysis with these variables was performed on 343 Japanese words, each pronounced in a carrier sentence by six native Japanese speakers and 14 non-native Japanese speakers (7 Vietnamese with low Japanese proficiency and 7 Chinese with high Japanese proficiency). The results showed that a combination of two durational variables could discriminate Japanese speakers from Vietnamese speakers with a small error (8.7%, n = 4458), namely the percentage of vowel duration and the average of “Normalized Voice Onset Asynchrony,” which is an interval time between the onset of two successive vowels divided by the first vowels duration. However, these two variables made a large error (39.4%, n = 4458) in the discrimination of Japanese speakers from Chinese speakers who had higher Japanese proficiency than Vietnamese speakers. These results suggest that the two varia...
Journal of the Acoustical Society of America | 2013
Kimiko Yamakawa; Shigeaki Amano
Fricatives [s] and affricates [ts] uttered at a normal speaking rate are successfully discriminated with two variables: a rise part duration and a steady + decay part duration [Yamakawa et al., Acoust. Sci. Tech. 33(3), 154–159 (2012)]. This study examined whether [s] and [ts] uttered at various speaking rates are also well discriminated with these variables. Discriminant analyses with the two variables were performed on [s] and [ts] in word-initial position in a carrier sentence pronounced by eight native Japanese speakers at fast, normal, and slow speaking rates. Discriminant error rates were low for the fast (9.8%, n = 512), normal (5.7%, n = 512), and slow (13.1%, n =512) speaking rates. However, the error rate was high (22.0%, n = 1536) when the three speaking rates were analyzed together. It decreased to 15.7% when the two variables were normalized with an averaged mora duration of the carrier sentence. These results suggest that [s] and [ts] can be discriminated at various speaking rates with a nor...
Journal of the Acoustical Society of America | 2010
Kimiko Yamakawa; Shigeaki Amano
Previous study has shown that perception boundary between fricative /s/ and affricate /ts/ in Japanese speakers is represented by a linear function with two variables: a rise time and a sum of steady and decay time [Amano et al. (2009)]. This study investigated characteristics of the boundary in non‐native Japanese speakers such as Koreans, because they often have trouble in distinguishing /s/ and /ts/. Thirty‐two Korean speakers listened to 2‐D stimulus continua coordinated with a rise time and the sum of steady and decay time of [s]/[ts] in 1–4‐mora Japanese words and made two‐alternative forced choice between /s/ and /ts/. A logistic regression analysis of the identification ratio of these phonemes showed that the perception boundary in Korean speakers was well represented by a linear function with the same variables as in Japanese, but the boundary shifted to a smaller value both in the rise time and in the sum of steady and decay time, which indicates a greater identification ratio of /s/. The results suggest that Korean speakers’ perception of [s]/[ts] are affected by their native language, which does not have /ts/.