Katsuhiko Shirai
Waseda University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Katsuhiko Shirai.
international conference on acoustics, speech, and signal processing | 2008
Yotaro Kubo; Akira Kurematsu; Katsuhiko Shirai; Shigeki Okawa
The efficiency of multistream speech recognizers is investigated by performing several experiments. In order to take advantage of multistream features, each stream should compensate the weakness of the other streams. Our objective is to utilize frequency modulation (FM) which can compensate errors from traditional analysis methods. In order to achieve informational independence from other features based on the spectral/time envelope of signals, our features do not contain amplitude information, but contain temporal structure information of frequency modulation. Our method is evaluated by the continuous digit recognition of noisy speech. We confirmed that our AM-FM combination method is efficient for noisy speech recognition.
IEICE Transactions on Information and Systems | 2008
Yotaro Kubo; Shigeki Okawa; Akira Kurematsu; Katsuhiko Shirai
We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect.2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect.3. The proposed system is described in Sect.4. The experimental setup is presented in Sect.5, and the results are discussed in Sect.6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.
international symposium on signal processing and information technology | 2006
Toru Taniguchi; Mikio Tohyama; Katsuhiko Shirai
Taniguchi et al. proposed a sinusoidal decomposition framework for classifying audio sounds. In this framework, spectral tracking is important, yet still presents an unsolved problem, although it has been investigated for the purpose of sound synthesis or sound modification. Conventional methods developed for these purposes are either ad hoc and less computationally complex or not ad hoc but more computationally complex. In this paper, we propose an optimal and less computationally complex method based on dynamic programming and iterative improvement. We have evaluated this method in experiments using synthesized sound and found that it works well
complex, intelligent and software intensive systems | 2009
Satoshi Ushiyama; Kazunori Matsui; Motoi Yamagiwa; Makoto Murakami; Minoru Uehara; Katsuhiko Shirai
The purpose of this paper is to develop a robot that actively communicates with a human, and explicitly extracts information from the human mind that is rarely expressed as verbal information. The spoken dialogue system for information collection must independently decide whether it may or may not start communicating with a human. In this paper, we assume that the system begins to communicate with a human sitting and working at a desk, analyze the relationship between his behavioral pattern and the decisions made by the other humans, i.e., whether or not to communicate with the former, and construct the decision model for the system to start communicating with the human.
Journal of the Acoustical Society of America | 2002
Toru Hayakawa; Katsuhiko Shirai; Hiroaki Kato; Yoshinori Sagisaka
For precise temporal characteristic description, disagreements between manual labeling and automatic labeling were quantitatively analyzed with respect to the spectral feature extraction, adoption of acoustic matchers (HMM models), and acoustic matcher by itself. Error analysis shows that boundaries are shifted at phone boundaries where the speech spectrum changes quite rapidly. This disagreement results from the spectral feature extraction averaged over a given window. For the adoption of model, big errors are found at phone boundaries where the spectrum changes slowly. The third model‐dependent errors are seen at phones whose duration cannot be shorter than the frame increment period times the HMM state number. To take into account these error factors individually to reduce the amount of alignment errors, we modified the automatic alignment results context‐dependently using statistical characteristics of phone boundary displacement. This post‐processing of boundary modification reduces boundary errors f...
information technology interfaces | 2001
Makoto Murakami; Katsuhiko Shirai; Masahide Yoneyama
A subspace method that can express facial images efficiently by linear translation into lower dimensional subspace has wide application for face recognition, e.g., identification, facial pose detection etc. In the preprocessing of this method, the accurate extraction of the human face area is required, but it is influenced by light condition, varying background, individual variation and so on, so it has not been put into practical use yet. In this paper, we examine the subspace method by comparison of the search space, and apply a genetic algorithm to face extraction and show that effective results were obtained.
Speech Communication | 2008
Toru Taniguchi; Mikio Tohyama; Katsuhiko Shirai
conference of the international speech communication association | 2005
Toru Taniguchi; Akishige Adachi; Shigeki Okawa; Masaaki Honda; Katsuhiko Shirai
conference of the international speech communication association | 2003
Makiko Muto; Yoshinori Sagisaka; Takuro Naito; Daiju Maeki; Aki Kondo; Katsuhiko Shirai
conference of the international speech communication association | 2007
Yotaro Kubo; Shigeki Okawa; Akira Kurematsu; Katsuhiko Shirai