Michiko Kazama
Waseda University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michiko Kazama.
Journal of the Acoustical Society of America | 2010
Michiko Kazama; Satoru Gotoh; Mikio Tohyama; Tammo Houtgast
This paper investigates the significance of the magnitude or the phase in the short term Fourier spectrum for speech intelligibility as a function of the time-window length. For a wide range of window lengths (1/16-2048 ms), two hybrid signals were obtained by a cross-wise combination of the magnitude and phase spectra of speech and white noise. Speech intelligibility data showed the significance of the phase spectrum for longer windows (>256 ms) and for very short windows (<4 ms), and that of the magnitude spectrum for medium-range window lengths. The hybrid signals used in the intelligibility test were analyzed in terms of the preservation of the original narrow-band speech envelopes. Correlations between the narrow-band envelopes of the original speech and the hybrid signals show a similar pattern as a function of window length. This result illustrates the importance of the preservation of narrow-band envelopes for speech intelligibility. The observed significance of the phase spectrum in recovering the narrow-band envelopes for the long term windows and for the very short term windows is discussed.
international conference on acoustics, speech, and signal processing | 2002
Kazuaki Yoshida; Michiko Kazama; Mikio Tohyama
This article describes a method of intelligible speech representation that uses narrow-band envelopes and their carriers. This method enables modification of the talkers voice pitch and speech-rate without sacrificing intelligibility. The carrier, which shows the instantaneous phase, conveys pitch information, while the temporal envelope conveys speech-rate information and preserves speech intelligibility. The carriers, however, can be replaced by sinusoidal signals without severely degrading intelligibility or voice quality. Consequently, we can modify the pitch by shifting each envelopes carrier-frequency and convert the speech-rate by stretching or shrinking the envelopes. These findings could be useful in frequency scaling of the speech spectrum to assist hearing-impaired listeners or in time scaling of the speech signal for speech signal reproduction.
international symposium on signal processing and information technology | 2006
Satoru Gotoh; Michiko Kazama; Mikio Tohyama; Yoshio Yamasaki
We confirmed that a speakers vocal individuality is contained in the inter-band correlations of narrow-band (1/4 or 1/8 octave bands) temporal envelopes. Two types of envelope correlation matrices (ECMs) were made for 53 speakers, using three utterances of an identical sentence (assuming a situation where a password for verification was stolen) so that any differences in the spoken contents might not greatly influence their individuality. Type-A (reference) ECMs of two of the utterances were constructed to make a speakers individual template, and a type-B ECM was constructed using the other utterance. Speaker matching tests between the two types of ECMs, based on Gaussian mixture model (GMM) matching scores, verified the validity of the individual speakers. In particular, a speakers voice could be verified using spoken materials through the telephone band (250 Hz 3 kHz), a high frequency range (2- 11.3 kHz), or a wide frequency range (250 Hz - 11.3 kHz)
Journal of the Acoustical Society of America | 1999
Michiko Kazama; Mikio Tohyama; Akira Morita
Noise reduction is a fundamental issue of smart microphone systems or a hearing aid. Noise reduction by spectral subtraction has been investigated for speech signals. However, identifying whether frame is a speech or a silence portion is difficult under nonstationary noisy conditions when using this method. Extracting the desired speech based on the sinusoidal wave model [T. Quatieri and R. Mcaulay, IEEE ASSP 34, 1449–1464 (1986)] was investigated. It was confirmed that intelligible speech sound could be synthesized using only five dominant sinusoidal waves [M. Kazama et al., 5th ICSV 2079–2086 (1997)]. In this article, a new noise reduction method by extracting the dominant sinusoidal waves in each frame (32 ms) according to the energy ratio of the signal to noise was proposed. The signal‐to‐noise ratio was improved by 10 dB (S/N ratio) when the original S/N ratio was 0 dB. Speech quality could also be improved by reconstructing the higher harmonics from the noisy vowels using the frame‐dependent comb fi...
Journal of The Audio Engineering Society | 2003
Michiko Kazama; Kazuaki Yoshida; Mikio Tohyama
Archive | 2006
Michiko Kazama; Mikio Tohyama; Koji Kushida
Journal of the Acoustical Society of America | 2011
Mikio Tohyama; Michiko Kazama; Yoshinori Takahashi; Kiyoaki Terada; Shinichi Sakamoto; Keisuke Watanuki; Takeshi Nakaichi
Archive | 2007
Mikio Tohyama; Michiko Kazama; Satoru Goto; Takehiko Kawahara; Yasuo Yoshioka
Archive | 2006
Mikio Tohyama; Michiko Kazama; Yoshinori Takahashi; Kiyoaki Terada; Shinichi Sakamoto; Keisuke Watanuki; Takeshi Nakaichi
Archive | 2006
Michiko Kazama; Takeshi Nakaichi; Shinichi Sakamoto; Yoshinori Takahashi; Kiyoaki Terada; Mikio Tohyama; Keisuke Watanuki