Keiko Ochi
University of Tokyo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Keiko Ochi.
international conference on acoustics, speech, and signal processing | 2009
Keiko Ochi; Keikichi Hirose; Nobuaki Minematsu
A total corpus-based process of generating prosodic features from text is developed. The process first predicts pauses and phone durations, and then generates F<inf>0</inf> contours. Since F<inf>0</inf> contour generation is based on the generation process model, it is rather easy to manipulate the generated F<inf>0</inf> contours in command level. A method was developed for generating sentence F<inf>0</inf> contours, when a focus is placed in one of the “bunsetsu” of an utterance. The method is to predict differences in the F<inf>0</inf> model commands between with and without focus utterances, and apply them to the F<inf>0</inf> model commands predicted beforehand by the baseline method. The validity of the method was proved by the experiment on F<inf>0</inf> contour generation and speech synthesis.
conference of the international speech communication association | 2016
Keiko Ochi; Nobutaka Ono; Shigeki Miyabe; Shoji Makino
In this paper, we present a multi-talker speech recognition system based on blind source separation with an ad hoc microphone array, which consists of smartphones and cloud storage. In this system, a mixture of voices from multiple speakers is recorded by each speaker’s smartphone, which is automatically transferred to online cloud storage. Our prototype system is realized using iPhone and Dropbox. Although the signals recorded by different iPhones are not synchronized, the blind synchronization technique compensates both the differences in the time offset and the sampling frequency mismatch. Then, auxiliary-function-based independent vector analysis separates the synchronized mixture into each speaker’s voice. Finally, automatic speech recognition is applied to transcribe the speech. By experimental evaluation of the multi-talker speech recognition system using Julius, we confirm that it effectively reduces the speech overlap and improves the speech recognition performance.
9th International Conference on Speech Prosody 2018 | 2018
Toshiko Isei-Jaakkola; Yasuko Nagano-Madsen; Keiko Ochi
This paper reports the results of a pilot study, which examines the respiratory control exerted by chest and abdominal-muscles during the reading of a long text in the mother tongue (L1) and a targeted foreign language that is being learned (L2), with reference to syntax and prosody in Japanese and Swedish. Three datasets of read speech were obtained from Swedish speakers (SwL1), Swedish learners of Japanese (SwL2), and Japanese speakers (JL1). The results showed that the subjects used respiratory control differently while reading L1 texts and L2 texts, respectively. Both SwL1 and JL1 used chest and abdominal-muscles almost simultaneously, and the peaks of their muscular movements co-occurred at the onset of major syntactic units such as sentences and clauses. SwL2 used more chest muscles than abdominal-muscles, with muscular movements being more frequent, irregular, and small. There was no significant difference between JL1 and Swedish L1 and L2 in terms of the tonal control (pitch range). Some pitch peaks and pauses that appeared at the major syntactic boundaries coincided with the peaks of the muscular movements, but other pitch peaks and pauses did not. These results led to the hypothesis that the acquisition of intonation precedes that of respiratory control in L2 learning.
The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) | 2017
Yoshiki Mitsui; Satoshi Mizoguchi; Hiroshi Saruwatari; Keiko Ochi; Daichi Kitamura; Nobutaka Ono; Masaru Ishimura; Narumi Mae; Moe Takakusaki; Yutaro Matsui; Kouei Yamaoka; Shoji Makino
Yoshiki MITSUI, The University of Tokyo, yoshiki [email protected] Satoshi MIZOGUCHI, The University of Tokyo Hiroshi SARUWATARI, The University of Tokyo Keiko OCHI, National Institute of Information Daichi KITAMURA, SOKENDAI Nobutaka ONO, National Institute of Information/SOKENDAI Masaru ISHIMURA, University of Tsukuba Narumi MAE, University of Tsukuba Moe TAKAKUSAKI, University of Tsukuba Yutaro MATSUI, University of Tsukuba Kouei YAMAOKA, University of Tsukuba Shoji MAKINO, University of Tsukuba
conference of the international speech communication association | 2016
Keiko Ochi; Koichi Mori; Naomi Sakai; Nobutaka Ono
Soft onset vocalization is used in certain speech therapies. However, it is not easy to practice it at home because the acoustical evaluation itself needs training. It would be helpful for speech patients to get objective feedback during training. In this paper, new parameters for identifying soft onset with high accuracy are described. One of the parameters measures an aspect of the soft voice onset, in which the vocal folds start to oscillate periodically before coming in contact with each other at the beginning of vocalization. Combined with an onset time exceeding a threshold, the proposed parameters gave about 99% accuracy in identifying soft onset vocalization.
Journal of the Acoustical Society of America | 2016
Rongna A; Keiko Ochi; Keiichi Yasu; Naomi Sakai; Koichi Mori
Purpose: Previous studies indicate that people who stutter (PWS) speak more slowly than people who do not stutter (PWNS), even in the fluent utterances. The present study compared the articulation rates of PWS and PWNS in two different conditions: oral reading and speech shadowing in order to elucidate the factor that affect the speech rate in PWS. Method: All participants were instructed to read aloud a text and to shadow a model speech without seeing its transcript. The articulation rate (mora per second) was analyzed with an open-source speech recognition engine “Julius” version 4.3.1 (https://github.com/julius-speech/julius). The pauses and disfluencies were excluded from the calculation of the articulation rate in the present study. Results: The mean articulation rate of PWS was significantly lower than that of PWNS only in oral reading, but not in speech shadowing. PWS showed a significantly faster articulation rate, comparable to that of the model speech, in shadowing than in oral reading, while PW...
international conference on signal processing | 2008
Keikichi Hirose; Keiko Ochi; Nobuaki Minematsu
A total corpus-based process of generating prosodic features form text is developed. The process first predicts pauses and phone durations, and then generates F0 contours. Since F0 contour generation is based on the generation process model, it is rather easy to manipulate the generated F0 contours in command level. A method was developed for generating sentence F0 contours, when a focus is placed in one of ldquobunsetsurdquo of an utterance. The method is to predict differences in the F0 model commands between with and without focus utterances, and applies them to the F0 model commands predicted beforehand by the baseline method. The validity of the method was proved by the experiment on F0 contour generation and speech synthesis.
conference of the international speech communication association | 2007
Keikichi Hirose; Keiko Ochi; Nobuaki Minematsu
conference of the international speech communication association | 2011
Keikichi Hirose; Keiko Ochi; Ryusuke Mihara; Hiroya Hashimoto; Daisuke Saito; Nobuaki Minematsu
international conference on signal processing | 2010
Keikichi Hirose; Keiko Ochi; Nobuaki Minematsu