Kazuaki Obara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kazuaki Obara is active.

Explore More

Publication

Featured researches published by Kazuaki Obara.

Journal of the Acoustical Society of America | 2003

Voice selection apparatus voice response apparatus, and game apparatus using word tables from which selected words are output as voice selections

Hidetsugu Maekawa; Tatsumi Watanabe; Kazuaki Obara; Kazuhiro Kayashima; Kenji Matsui; Yoshihiko Matsukawa

A game apparatus of the invention includes: a voice input section for inputting at least one voice set including voice uttered by an operator, for converting the voice set into a first electric signal, and for outputting the first electric signal; a voice recognition section for recognizing the voice set on the basis of the first electric signal output from the voice input means; an image input section for optically detecting a movement of the lips of the operator, for converting the detected movement of lips into a second electric signal, and for outputting the second electric signal; a speech period detection section for receiving the second electric signal, and for obtaining a period in which the voice is uttered by the operator on the basis of the received second electric signal; an overall judgment section for extracting the voice uttered by the operator from the input voice set, on the basis of the voice set recognized by the voice recognition means and the period obtained by the speech period detection means; and a control means for controlling an object on the basis of the voice extracted by the overall judgment means.

Journal of the Acoustical Society of America | 1992

Word recognition using an auditory model front‐end incorporating spectrotemporal masking effect

Kazuaki Obara; Kiyoaki Aikawa; Hideki Kawahara

An auditory model front‐end that reflects spectrotemporal masking characteristics is proposed. The model gives an excellent performance in the multi‐speaker word recognition system using a cochlear filter. Recent auditory perception research shows that the forward masking pattern becomes more wide spread over the frequency axis as the masker‐signal interval increases [E. Miyasaka, J. Acoust. Soc. Jpn. 39, 614–623 (1983)]. This spectrotemporal masking characteristics appears to be effective for eliminating the speaker‐dependent spectral tilt that reflects individual source variation and for enhancing the spectral dynamics that convey phonological information in speech signals. The spectrotemporal masking characteristics is modeled and applied to a multi‐speaker word recognition system. The current masking level is calculated as the weighted sum of the smoothed preceding spectra. The weight values become smaller and the smoothing window size becomes wider on the frequency axis as the masker‐signal interval increases. The power spectra are extracted using a 64‐channel fixed Q cochlear filter (FQF). The FQF covers the frequency range from 1.5 to 18.5 Bark. The current‐masked spectrum is obtained by subtracting the masking levels from the current spectrum. Recognition experiments for phonetically balanced 216 Japanese words uttered by 10 male speakers demonstrate that the introduction of the spectrotemporal masking model improves the recognition performance in the multi‐speaker word recognition system.

Journal of the Acoustical Society of America | 1993

Speaker‐independent speech recognition using an auditory model front end incorporating the spectro‐temporal masking effect

Kazuaki Obara; Kiyoaki Aikawa; Hideki Kawahara

Speaker‐independent speech recognition experiments using an auditory model front end with a spectro‐temporal masking model demonstrated the improvement of the recognition performance and outperformed the auditory front ends without the masking model and the traditional LPC‐based front ends. The auditory model front end composed of an adaptive Q cochlear filter bank incorporating spectro‐temporal masking has been proposed [J. Acoust. Soc. Am. 92, 2476 (A) (1992)]. The spectro‐temporal masking model can enhance common phonetic features by eliminating the speaker‐dependent spectral tilt that reflects individual source variation. It can also enhance the spectral dynamics that convey phonological information in speech signals. These advantages result in an effective new spectral parameter for representing speech models for speaker‐independent speech recognition. Speaker‐independent word and phoneme recognition experiments were carried out for Japanese word and phrase databases. The masked spectrum was calculat...

Journal of the Acoustical Society of America | 1991

Auditory front end in DTW word recognition under noisy, reverberant, and multispeaker conditions.

Kazuaki Obara; Tatsuya Hirahara

In this report three front ends, a fixed Q cochlear filter (FQF), an adaptive Q cochlear filter (AQF), and a Bark DFT (DFT), are compared for use as the front end of a DTW system. The FQF is a conventional cascade/parallel‐type cochlear filter that stimulates the asymmetrical filtering characteristics of a basilar membrane system. The AQF is a nonlinear cochlear filter that simulates three level‐dependent characteristics of a basilar membrane system [T. Hirahara et al., Proc. ICASSP, 496–499 (1989)]. The DFT front end generates 64‐channel Bark scale coefficients based on a 512‐point DFT magnitude spectrum. These three front ends have 64 channels covering the frequency range from 1.5 to 19.5 Bark. Recognition performance for clean speech, degraded speech by adding noise and/or reverberation, and under multispeaker conditions, are compared. Four signal‐to‐noise ratios, S/N=∞, 40, 20, and 10 dB, are set by adding different levels of pink noise to speech data. As for reverberant speech, the impulse responses ...

Archive | 1995