Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Akira Kurematsu is active.

Publication


Featured researches published by Akira Kurematsu.


Speech Communication | 1990

ATR Japanese speech database as a tool of speech recognition and synthesis

Akira Kurematsu; Kazuya Takeda; Yoshinori Sagisaka; Shigeru Katagiri; Hisao Kuwabara; Kiyohiro Shikano

Abstract A large-scale Japanese speech database has been described. The database basically consists of (1) a word speech database, (2) a continuous speech database, (3) a database for a large number of speakers, and (4) a database for speech synthesis. Multiple transcriptions have been made in five different layers from simple phonemic descriptions to fine acoustic-phonetic transcriptions. The database has been used to develop algorithms in speech recognition and synthesis studies and to find acoustic, phonetic and linguistic evidence that will serve as basic data for speech technologies.


international conference on acoustics, speech, and signal processing | 2002

Speech signal band width extension and noise removal using subband HMN

Mitsuhiro Hosoki; Takayuki Nagai; Akira Kurematsu

In this paper, a novel approach for wide band speech generation from narrow band is proposed. The proposed method is based on Subband Hidden Markov Model (Subband HMM). To train the HMM, a set of wide band speech is divided into a number of subbands and features are extracted independently. These extracted features are recombined and HMMs are trained by EM algorithm. The training process makes HMM to model the feature of signal in a single subband. In parallel, HMM learns the corresponding feature of all other subbands. The correspondence makes it possible to estimate the unobserved frequency component using the correspondence of narrowband signal to the HMM. We further investigate the application of the proposed method to denoising. Some experimental results are shown to confirm the validity of the proposed method.


Journal of the Acoustical Society of America | 1985

Adaptive pitch detection system for voice signal

Fumihiro Yato; Seishi Kitayama; Junso Tamura; Hikoichi Ishigami; Akira Kurematsu

A system for detecting the pitch of a voice signal, in which a plurality of pitch searching periods are determined so that pitch components of multiple relationship are not included in each of the pitch searching periods, and in which after detecting a pitch searching period including the pitch from the pitch searching periods, the pitch searching periods are adaptively shifted in a mannger to follow the change direction of the pitch predicted from the result of detection of the detected pitch.


international conference on acoustics, speech, and signal processing | 2001

Estimation of source location based on 2-D MUSIC and its application to speech recognition in cars

Takayuki Nagai; Keisuke Kondo; Masahide Kaneko; Akira Kurematsu

This paper proposes a speech recognition and an enhancement system for noisy car environments based on a microphone array. In the system, multiple microphones axe arranged in 2-dimensional space, surrounding the interior of a car, and the speakers location is first estimated by our proposed HE (harmonic enhanced) 2-D MUSIC (MUltiple SIgnal Classification). Then, 2-D delay and sum (DS) is applied to enhance the target speech. Such pre-processing makes robust speech recognition in noisy car environments possible. In the proposed system, not only a driver, but also a fellow passenger can control car electronics by their voices no matter where they are. This is an advantage of the system as well. The results of the simulation and the preliminary experiment in a real car environment are presented to confirm the validity of our proposed system.


Speech Communication | 2011

Temporal AM-FM combination for robust speech recognition

Yotaro Kubo; Shigeki Okawa; Akira Kurematsu; Katsuhiko Shirai

A novel method for feature extraction from the frequency modulation (FM) in speech signals is proposed for robust speech recognition. To exploit of the multistream speech recognizers, each stream should compensate for the shortcomings of the other streams. In this light, FM features are promising as complemental features of amplitude modulation (AM). In order to extract effective features from FM patterns, we applied the proposed feature extraction method by the data-driven modulation analysis of instantaneous frequency. By evaluating the frequency responses of the temporal filters obtained by the proposed method, we confirmed that the modulation observed around 4Hz is important for the discrimination of FM patterns, as in the case of AM features. We evaluated the robustness of our method by performing noisy speech recognition experiments. We confirmed that our FM features can improve the noise robustness of speech recognizers even when the FM features are not combined with conventional AM and/or spectral envelope features. We also performed multistream speech recognition experiments. The experimental results show that combination of the conventional AM system and proposed FM system reduced word error by 43.6% at 10 dB SNR as compared to the baseline MFCC system and by 20.2% as compared to the conventional AM system. We investigated the complementarity of the AM and FM features by performing speech recognition experiments in artificial noisy environments. We found the FM features to be robust to wide-band noise, which certainly degrades the performance of AM features. Further, we evaluated the efficiency of multiconditional training. Although the performance of the proposed combination method was degraded by multiconditional training, we confirmed that the performance of the proposed FM method improved. Through a series of experiments, we confirmed that our FM features can be used as independent features as well as complemental features.


international conference on acoustics, speech, and signal processing | 2008

Noisy speech recognition using temporal AM-FM combination

Yotaro Kubo; Akira Kurematsu; Katsuhiko Shirai; Shigeki Okawa

The efficiency of multistream speech recognizers is investigated by performing several experiments. In order to take advantage of multistream features, each stream should compensate the weakness of the other streams. Our objective is to utilize frequency modulation (FM) which can compensate errors from traditional analysis methods. In order to achieve informational independence from other features based on the spectral/time envelope of signals, our features do not contain amplitude information, but contain temporal structure information of frequency modulation. Our method is evaluated by the continuous digit recognition of noisy speech. We confirmed that our AM-FM combination method is efficient for noisy speech recognition.


international work conference on artificial and natural neural networks | 2001

Speaker Recognition Using Gaussian Mixtures Models

Eric Simancas-Acevedo; Akira Kurematsu; Mariko Nakano-Miyatake; Hector Perez-Meana

Control access to secret or personal information by using the speaker voice transmitted by long distance communication systems, such as the telephone system, requires accuracy and robustness of the identification or identity verification system, since the speech signal is distorted during the transmission process. Taking in consideration these requirements, a robust text independent speaker identifications system is proposed in which the speaker features are extracted using the Lineal Prediction Cepstral Coefficients (LPCEPSTRAL) and the Gaussian Mixture Models, which provides the features distribution and estimates the optimum model for each speaker, is used for identification. The proposed system, was evaluate using a data-base of 80 different speakers, with a pronoun phrase of 3-5s and digits in Japanese language stored during 4 months. Evaluation results show that proposed system achieves more than 90% of recognition rate.


IEICE Transactions on Information and Systems | 2008

Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

Yotaro Kubo; Shigeki Okawa; Akira Kurematsu; Katsuhiko Shirai

We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect.2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect.3. The proposed system is described in Sect.4. The experimental setup is presented in Sect.5, and the results are discussed in Sect.6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.


international conference on acoustics, speech, and signal processing | 2001

Frequency domain multi-channel speech separation and its applications

Masaki Handa; Takayuki Nagai; Akira Kurematsu

A multi-channel speech separation method for real environments is proposed. The proposed method is based on frequency assignment, namely the magnitude of each channel at the same frequency bin is compared with each other and it is assigned to the channel, to which it originally belongs. This method is a direct consequence of frequency domain interpretation of the eigendecomposition method proposed by Cao et al., (1997). Furthermore, our proposed method does not require eigendecomposition, which consumes the costs of computation. We also present two example applications of the proposed method, that is, voice controlled computers in a multiuser environment and a noise removal in cellular telephony using two microphones.


international conference on acoustics, speech, and signal processing | 2000

Generalized unequal length lapped orthogonal transform for subband image coding

Takayuki Nagai; Masaaki Ikehara; Masahide Kaneko; Akira Kurematsu

In this paper, generalized linear phase lapped orthogonal transforms with unequal length basis functions (GULLOT) are considered. The length of each basis of the proposed GULLOT can be different from each other, while all the bases of the conventional GenLOT are of equal length. In order to apply the GULLOT to subband image coding, we also investigate the size-limited structure to process the finite length signal which is important in practice.

Collaboration


Dive into the Akira Kurematsu's collaboration.

Top Co-Authors

Avatar

Takayuki Nagai

University of Electro-Communications

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Masahide Kaneko

University of Electro-Communications

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kiyohiro Shikano

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shigeki Okawa

Chiba Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hector Perez-Meana

Instituto Politécnico Nacional

View shared research outputs
Top Co-Authors

Avatar

Mariko Nakano-Miyatake

Universidad Autónoma Metropolitana

View shared research outputs
Researchain Logo
Decentralizing Knowledge