Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tetsuo Kosaka is active.

Publication


Featured researches published by Tetsuo Kosaka.


international conference on acoustics, speech, and signal processing | 1994

Tree-structured speaker clustering for fast speaker adaptation

Tetsuo Kosaka; Shigeki Sagayama

The paper proposes a tree-structured speaker clustering algorithm and discusses its application to fast speaker adaptation. By tracing the clustering tree from top to bottom, adaptation is performed step-by-step from global to local individuality of speech. This adaptation method employs successive branch selection in the speaker clustering tree rather than parameter training and hence achieves fast adaptation using only a small amount of training data. This speaker adaptation method was applied to a hidden Markov network (HMnet) and evaluated in Japanese phoneme and phrase recognition experiments, in which it significantly outperformed speaker-independent recognition methods. In the phrase recognition experiments, the method reduced the error rate by 26.6% using three phrase utterances (approximately 2.7 seconds).<<ETX>>


Journal of the Acoustical Society of America | 2000

Speech recognition method and apparatus for recognizing phonemes using a plurality of speech analyzing and recognizing methods for each kind of phoneme

Tetsuo Kosaka

A method and apparatus for recognizing speech employing a word dictionary in which the phoneme of words are stored and for recognizing speech based on the recognition of the phonemes. The method and apparatus recognize phonemes and produce data associated with each phoneme according to different speech analyzing and recognizing methods for each kind of phoneme, normalize the produced data, and match the recognized phonemes with words in the word dictionary by means of dynamic programming based on the normalized data.


international conference on acoustics, speech, and signal processing | 1995

Speaker adaptation based on transfer vector field smoothing using maximum a posteriori probability estimation

Masahiro Tonomura; Tetsuo Kosaka; Shoichi Matsunaga

The paper proposes a novel speech adaptation algorithm that enables adaptation even with a small amount of speech data. This is a unified algorithm of two efficient conventional speaker adaptation techniques, which are maximum a posteriori (MAP) estimation and transfer vector field smoothing (VFS). This algorithm is designed to avoid the weaknesses of both MAP and VFS. A higher phoneme recognition performance was obtained by using this algorithm than with individual methods, showing the superiority of the proposed algorithm. The phoneme recognition error rate was reduced from 22.0% to 19.1% using this algorithm for a speaker-independent model with seven adaptation phrases. Furthermore, a priori knowledge concerning speaker characteristics was obtained for this algorithm by generating an initial HMM with the speech of a selected speaker cluster based on speaker similarity. The adaptation using this initial model reduced the phoneme recognition error rate from 22.0% to 17.7%.


Computer Speech & Language | 1996

Speaker-independent speech recognition based on tree-structured speaker clustering

Tetsuo Kosaka; Shoichi Matsunaga; Shigeki Sagayama

Abstract We have already proposed the application of tree-structured speaker clustering to supervised speaker adaptation. This paper proposes its application to unsupervised speaker adaptation and speaker-independent (SI) speech recognition. This clustering involves the selection of a speaker cluster from among multiple reference speaker clusters arranged in a tree structure. Cluster selection, unlike parameter training, enables quick adaptation using only a small amount of training data. This method was applied to a hidden Markov network (HMnet) and evaluated in Japanese phoneme and phrase recognition experiments. Results show effective unsupervised speaker adaptation using only 5 s calibration speech. In the SI speech recognition experiments, the method reduced the error rate by 8·5% compared with the conventional speaker-independent speech recognition method.


international conference on acoustics, speech, and signal processing | 1993

Rapid speaker adaptation using speaker-mixture allophone models applied to speaker-independent speech recognition

Tetsuo Kosaka; Jun-ichi Takami; Shigeki Sagayama

A speaker mixture principle that allows the creation of speaker-independent phone models is proposed. Speaker-tied training for rapid speaker adaptation using utterances shorter than one second is derived from this principle. The concept of speaker pruning is also introduced for reducing computational cost without degrading the speaker adaptation performance. The above principle is combined with context-dependent phone models, which have been automatically generated by the successive state splitting algorithm. In a Japanese phrase recognition experiment, speaker mixture allophone models achieved an error reduction of 29.0%, which is high in comparison with the conventional speaker-independent HMM (hidden Markov model)-LR method. Speaker adaptation by speaker-tied training attained an error reduction of 16.8% using a 0.6-s Japanese word utterance. Speaker pruning reduced the number of phone model mixtures by between 50% and 92% without lowering recognition performance.<<ETX>>


Journal of the Acoustical Society of America | 1994

Encoding method for syllables

Atsushi Sakurai; Junichi Tamura; Tetsuo Kosaka

A method for encoding syllables of a language, particularly the Japanese language, and for facilitating the extraction of sound codes from the input syllables, for voice recognition or voice synthesis includes the step of providing a syllable classifying table, in which each syllable is represented by an upper byte code indicating the consonant part of the syllable and a lower byte code indicating the non-consonant part of the syllable. The consonants constitute a first category of data classified by phonetic features, while the non-consonants constitute a second category of data classified by phonetic features, so that the extraction of consonant or non-consonant sounds can be made by a search in only the first or the second categories. The encoding of diphthongs are made in such a manner that those containing the same vowel have the same remainder corresponding to the code of this vowel, when the codes are divided by the number of vowels contained in the second category, so that the extraction of a vowel from diphthongs can be achieved by a simple mathematical division.


Journal of the Acoustical Society of America | 1998

Method and apparatus for processing speech

Junichi Tamura; Atsushi Sakurai; Tetsuo Kosaka

The speech processing apparatus and method includes a microphone, an analyzer, a selector, and a memory. The microphone converts input speech into an electrical signal representing speech data. The analyzer converts the speech data into non-linear frequency converted speech data in accordance with a non-linear frequency conversion. The selector selects a coefficient of the non-linear frequency conversion suitable for each of the phonemes or frames of the speech. The memory stores the speech data.


international conference on acoustics, speech, and signal processing | 1997

Fast speech recognition algorithm under noisy environment using modified CMS-PMC and improved IDMM+SQ

Hiroki Yamamoto; Tetsuo Kosaka; Masayuki Yamada; Yasuhiro Komori; Minoru Fujita

We describe a fast speech recognition algorithm under a noisy environment. To achieve accurate and fast speech recognition under a noisy environment, a very fast speech recognition algorithm with well-adapted model against the noisy environment is required. First, for the model adaptation, we propose the MCMS-PMC: an integration of parallel model combination (PMC) and modified cepstral mean subtraction (MCMS) which estimates the cepstrum mean by taking account of the additive noise. Then, for the fast speech recognition, we propose new techniques to create the noise-adapted scalar quantized codebook in order to introduce the MCMS-PMC into the IDMM+SQ, which we proposed previously as a fast speech recognition algorithm using the scalar quantization approach. Finally, an effect of the proposed method is shown through the speaker-independent telephone-bandwidth continuous speech recognition experiment.


international conference on acoustics, speech, and signal processing | 1995

Speaker-independent phone modeling based on speaker-dependent HMMs' composition and clustering

Tetsuo Kosaka; Shoichi Matsunaga; Mikio Kuraoka

This paper proposes a novel method for speaker-independent phone modeling based on the composition and clustering method (CCL) of speaker-dependent HMMs. In general, HMM phone models are trained by the Baum-Welch (B-W) algorithm. We, however, propose a speaker-independent phone modeling in which speaker-dependent (SD) HMMs are combined to form speaker-independent (SI) HMMs without parameter reestimation. Furthermore, by using this method, we investigate how different kinds of reference speakers influence the development of the SI models. The method is evaluated in Japanese phoneme and phrase recognition experiments. Results show that the performance of this method is similar to the conventional B-W algorithms with great reduction of computational cost.


international conference on acoustics speech and signal processing | 1998

Instantaneous environment adaptation techniques based on fast PMC and MAP-CMS methods

Tetsuo Kosaka; Hiroki Yamamoto; Masayuki Yamada; Yasuhiro Komori

This paper proposes instantaneous environment adaptation techniques for both additive noise and channel distortion based on the fast PMC (FPMC) and the MAP-CMS methods. The instantaneous adaptation techniques enable a recognizer to improve recognition on a single sentence that is used for the adaptation in real-time. The key innovations enabling the system to achieve the instantaneous adaptation are: (1) a cepstral mean subtraction method based on maximum a posteriori estimation (MAP-CMS), (2) real-time implementation of the fast parallel model combination (PMC) that we proposed previously, (3) utilization of multi-pass search, and (4) a new combination method of MAP-CMS and FPMC to solve the problem of both channel distortion and additive noise. Experiment results showed that the proposed methods enabled the system to perform recognition and adaptation simultaneously nearly in real-time and obtained good improvements in performance.

Collaboration


Dive into the Tetsuo Kosaka's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge