Katsuhiko Shirai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Katsuhiko Shirai is active.

Explore More

Publication

Featured researches published by Katsuhiko Shirai.

international conference on acoustics, speech, and signal processing | 2008

Noisy speech recognition using temporal AM-FM combination

Yotaro Kubo; Akira Kurematsu; Katsuhiko Shirai; Shigeki Okawa

The efficiency of multistream speech recognizers is investigated by performing several experiments. In order to take advantage of multistream features, each stream should compensate the weakness of the other streams. Our objective is to utilize frequency modulation (FM) which can compensate errors from traditional analysis methods. In order to achieve informational independence from other features based on the spectral/time envelope of signals, our features do not contain amplitude information, but contain temporal structure information of frequency modulation. Our method is evaluated by the continuous digit recognition of noisy speech. We confirmed that our AM-FM combination method is efficient for noisy speech recognition.

IEICE Transactions on Information and Systems | 2008

Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

Yotaro Kubo; Shigeki Okawa; Akira Kurematsu; Katsuhiko Shirai

We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect.2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect.3. The proposed system is described in Sect.4. The experimental setup is presented in Sect.5, and the results are discussed in Sect.6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.

international symposium on signal processing and information technology | 2006

Spectral Frequency Tracking for Classifying Audio Signals

Toru Taniguchi; Mikio Tohyama; Katsuhiko Shirai

Taniguchi et al. proposed a sinusoidal decomposition framework for classifying audio sounds. In this framework, spectral tracking is important, yet still presents an unsolved problem, although it has been investigated for the purpose of sound synthesis or sound modification. Conventional methods developed for these purposes are either ad hoc and less computationally complex or not ad hoc but more computationally complex. In this paper, we propose an optimal and less computationally complex method based on dynamic programming and iterative improvement. We have evaluated this method in experiments using synthesized sound and found that it works well

complex, intelligent and software intensive systems | 2009

Decision Model for a Robot to Start Communicating with a Human

Satoshi Ushiyama; Kazunori Matsui; Motoi Yamagiwa; Makoto Murakami; Minoru Uehara; Katsuhiko Shirai

The purpose of this paper is to develop a robot that actively communicates with a human, and explicitly extracts information from the human mind that is rarely expressed as verbal information. The spoken dialogue system for information collection must independently decide whether it may or may not start communicating with a human. In this paper, we assume that the system begins to communicate with a human sitting and working at a desk, analyze the relationship between his behavioral pattern and the decisions made by the other humans, i.e., whether or not to communicate with the former, and construct the decision model for the system to start communicating with the human.

Journal of the Acoustical Society of America | 2002

Automatic phone segment alignment using statistical deviations from manual transcriptions

Toru Hayakawa; Katsuhiko Shirai; Hiroaki Kato; Yoshinori Sagisaka

For precise temporal characteristic description, disagreements between manual labeling and automatic labeling were quantitatively analyzed with respect to the spectral feature extraction, adoption of acoustic matchers (HMM models), and acoustic matcher by itself. Error analysis shows that boundaries are shifted at phone boundaries where the speech spectrum changes quite rapidly. This disagreement results from the spectral feature extraction averaged over a given window. For the adoption of model, big errors are found at phone boundaries where the spectrum changes slowly. The third model‐dependent errors are seen at phones whose duration cannot be shorter than the frame increment period times the HMM state number. To take into account these error factors individually to reduce the amount of alignment errors, we modified the automatic alignment results context‐dependently using statistical characteristics of phone boundary displacement. This post‐processing of boundary modification reduces boundary errors f...

information technology interfaces | 2001

Human face extraction using genetic algorithm with similarity of subspace method as a fitness value

Makoto Murakami; Katsuhiko Shirai; Masahide Yoneyama

A subspace method that can express facial images efficiently by linear translation into lower dimensional subspace has wide application for face recognition, e.g., identification, facial pose detection etc. In the preprocessing of this method, the accurate extraction of the human face area is required, but it is influenced by light condition, varying background, individual variation and so on, so it has not been put into practical use yet. In this paper, we examine the subspace method by comparison of the search space, and apply a genetic algorithm to face extraction and show that effective results were obtained.

Speech Communication | 2008