Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shigeki Okawa is active.

Publication


Featured researches published by Shigeki Okawa.


international conference on acoustics, speech, and signal processing | 1994

Automatic training of phoneme dictionary based on mutual information criterion

Shigeki Okawa; Tetsunori Kobayashi; Katsuhiko Shirai

Proposes an automatic training mechanism for phoneme recognition using labelless speech data under the condition that only its orthographical phonemic symbol sequence is given. For the purpose of obtaining better recognition performance the authors attempt to realize an automatic labeling procedure based on a phoneme classification method by mutual information criterion. By iterative training of a phoneme dictionary for a large amount of speech data, one can investigate the performance and convergence properties of the dictionary. From experimental results, the percent correct of the labeling is over 98% after three iterations, and for the phoneme recognition, a very high accuracy is also obtained.<<ETX>>


Speech Communication | 2011

Temporal AM-FM combination for robust speech recognition

Yotaro Kubo; Shigeki Okawa; Akira Kurematsu; Katsuhiko Shirai

A novel method for feature extraction from the frequency modulation (FM) in speech signals is proposed for robust speech recognition. To exploit of the multistream speech recognizers, each stream should compensate for the shortcomings of the other streams. In this light, FM features are promising as complemental features of amplitude modulation (AM). In order to extract effective features from FM patterns, we applied the proposed feature extraction method by the data-driven modulation analysis of instantaneous frequency. By evaluating the frequency responses of the temporal filters obtained by the proposed method, we confirmed that the modulation observed around 4Hz is important for the discrimination of FM patterns, as in the case of AM features. We evaluated the robustness of our method by performing noisy speech recognition experiments. We confirmed that our FM features can improve the noise robustness of speech recognizers even when the FM features are not combined with conventional AM and/or spectral envelope features. We also performed multistream speech recognition experiments. The experimental results show that combination of the conventional AM system and proposed FM system reduced word error by 43.6% at 10 dB SNR as compared to the baseline MFCC system and by 20.2% as compared to the conventional AM system. We investigated the complementarity of the AM and FM features by performing speech recognition experiments in artificial noisy environments. We found the FM features to be robust to wide-band noise, which certainly degrades the performance of AM features. Further, we evaluated the efficiency of multiconditional training. Although the performance of the proposed combination method was degraded by multiconditional training, we confirmed that the performance of the proposed FM method improved. Through a series of experiments, we confirmed that our FM features can be used as independent features as well as complemental features.


international conference on acoustics, speech, and signal processing | 2008

Noisy speech recognition using temporal AM-FM combination

Yotaro Kubo; Akira Kurematsu; Katsuhiko Shirai; Shigeki Okawa

The efficiency of multistream speech recognizers is investigated by performing several experiments. In order to take advantage of multistream features, each stream should compensate the weakness of the other streams. Our objective is to utilize frequency modulation (FM) which can compensate errors from traditional analysis methods. In order to achieve informational independence from other features based on the spectral/time envelope of signals, our features do not contain amplitude information, but contain temporal structure information of frequency modulation. Our method is evaluated by the continuous digit recognition of noisy speech. We confirmed that our AM-FM combination method is efficient for noisy speech recognition.


IEICE Transactions on Information and Systems | 2008

Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

Yotaro Kubo; Shigeki Okawa; Akira Kurematsu; Katsuhiko Shirai

We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect.2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect.3. The proposed system is described in Sect.4. The experimental setup is presented in Sect.5, and the results are discussed in Sect.6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.


international conference on spoken language processing | 1996

Estimation of statistical phoneme center considering phonemic environments

Shigeki Okawa; Katsuhiko Shirai

The paper presents a new scheme of acoustic modeling for speech recognition based on an idea of statistical phoneme center. The statistical phoneme center has several properties that are feasible for realizing more reliable phoneme extraction. First, the authors assume that there is a fictitious center point in every phoneme. The center is determined statistically by an iterative procedure to maximize the local likelihood using a large amount of speech data. Next, in order to evaluate the performance of phoneme extraction, phoneme recognition is realized by optimizing the likelihood based on the dynamic time warping technique. As an experimental result, 71.6% recognition accuracy is obtained for speaker independent phoneme recognition. This result demonstrate that the proposed SPC is a new effective concept to obtain more stabilized acoustic model for speaker independent speech recognition.


conference of the international speech communication association | 1999

A recombination strategy for multi-band speech recognition based on mutual information criterion.

Shigeki Okawa; Takehiro Nakajima; Katsuhiko Shirai


Acoustical Science and Technology | 2006

Instantaneous frequencies of signals obtained by the analytic signal method

Hideo Suzuki; Furong Ma; Hideaki Izumi; Osamu Yamazaki; Shigeki Okawa; Ken’iti Kido


conference of the international speech communication association | 2005

Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals

Toru Taniguchi; Akishige Adachi; Shigeki Okawa; Masaaki Honda; Katsuhiko Shirai


conference of the international speech communication association | 2008

A comparative study on AM and FM features.

Yotaro Kubo; Shigeki Okawa; Akira Kurematsu; Katsuhiko Shirai


IEICE Transactions on Information and Systems | 1993

Phrase recognition in conversational speech using prosodic and phonemic information

Shigeki Okawa; Takashi Endo; Tetsunori Kobayashi; Katsuhiko Shirai

Collaboration


Dive into the Shigeki Okawa's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Akira Kurematsu

University of Electro-Communications

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Takashi Endo

East Japan Railway Company

View shared research outputs
Top Co-Authors

Avatar

Furong Ma

Tokyo Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Hideaki Izumi

Chiba Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge