Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Toshiyuki Hanazawa is active.

Publication


Featured researches published by Toshiyuki Hanazawa.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

Phoneme recognition using time-delay neural networks

Alex Waibel; Toshiyuki Hanazawa; Geoffrey E. Hinton; Kiyohiro Shikano; Kevin J. Lang

The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time-delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input. As a recognition task, the speaker-dependent recognition of the phonemes B, D, and G in varying phonetic contexts was chosen. For comparison, several discrete hidden Markov models (HMM) were trained to perform the same task. Performance evaluation over 1946 testing tokens from three speakers showed that the TDNN achieves a recognition rate of 98.5% correct while the rate obtained by the best of the HMMs was only 93.7%. >


international conference on acoustics speech and signal processing | 1988

Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models

Alex Waibel; Toshiyuki Hanazawa; Geoffrey E. Hinton; Kiyohiro Shikano; K. Lang

A time-delay neural network (TDNN) for phoneme recognition is discussed. By the use of two hidden layers in addition to an input and output layer it is capable of representing complex nonlinear decision surfaces. Three important properties of the TDNNs have been observed. First, it was able to invent without human interference meaningful linguistic abstractions in time and frequency such as formant tracking and segmentation. Second, it has learned to form alternate representations linking different acoustic events with the same higher level concept. In this fashion it can implement trading relations between lower level acoustic events leading to robust recognition performance despite considerable variability in the input speech. Third, the network is translation-invariant and does not rely on precise alignment or segmentation of the input. The TDNNs performance is compared with the best of hidden Markov models (HMMs) on a speaker-dependent phoneme-recognition task. The TDNN achieved a recognition of 98.5% compared to 93.7% for the HMM, i.e., a fourfold reduction in error.<<ETX>>


international conference on acoustics, speech, and signal processing | 1990

ATR HMM-LR continuous speech recognition system

Toshiyuki Hanazawa; Kenji Kita; Satoshi Nakamura; Takeshi Kawabata; Kiyohiro Shikano

An improvement of the hidden Markov model (HMM) LR continuous-speech recognizer using multiple codebooks, HMM state duration control and fuzzy vector quantization is described. The system recognizes Japanese phrases (Bunsetsu) according to a context-free grammar including 1035 words. In speaker-dependent conditions, a phrase recognition rate of 88.4% (99.0% for the top five candidates) was attained. The system was tested with speaker-adaptation based on a codebook mapping algorithm. An average speaker-adapted phrase recognition rate of 81.6% (98.8% for the top five candidates) was attained.<<ETX>>


international conference on acoustics, speech, and signal processing | 1990

HMM continuous speech recognition using stochastic language models

Kenji Kita; T. Kawabaa; Toshiyuki Hanazawa

Three stochastic language models are investigated for hidden Markov model (HMM) continuous-speech recognition system. They are the trigram model of Japanese syllables, the stochastic shift/reduce model in LR parsing, and the trigram model of context-free rewriting rules. These stochastic language models are incorporated into the HMM-LR continuous-speech recognition system. The phrase recognition rate is improved from 72.4% to 81.0%. Moreover, for a high-quality HMM-LR speech recognition system which uses separate vector quantization (VQ) and fuzzy VQ, the phrase recognition rate is improved from 88.2% to 93.2%, and a rate of 100% is achieved for the top four choices.<<ETX>>


Journal of the Acoustical Society of America | 1988

Word spotting method based on HMM phoneme recognition

Takeshi Kawabata; Toshiyuki Hanazawa; Kiyohiro Shikano

A new technique for detecting and locating key words in continuous speech using (hidden Markov model) (HMM) phoneme recognition is proposed. HMM word models are composed of HMM phone models trained on an isolated word database. Because the speaking rates for isolated words and continuous speech are different, phoneme spectra and durations change considerably. An HMM consists of several states and arcs. Each arc has output probabilities for each VQ code. In order to cope with the spectral changes, the output probabilities are smoothed with the probabilities of their spectral neighbor codes. In order to cope with the duration changes, HMM state duration parameters are shifted according to a second‐order duration calibration curve. The calibration curve is obtained from a speaking rate ratio of continuous speech to isolated words. The word detection rate for 8 key words in 25 sentences uttered by one speaker was 98.4%. Accurate word spotting is accomplished using the HMM output probability smoothing techniqu...


complex, intelligent and software intensive systems | 2018

Reducing Computational Complexity of Multichannel Nonnegative Matrix Factorization Using Initial Value Setting for Speech Recognition

Taiki Izumi; Ryo Aihara; Toshiyuki Hanazawa; Yohei Okato; Takanobu Uramoto; Shingo Uenohara; Ken’ichi Furuya

In this paper, we propose efficient the number of computational iteration method of MNMF for speech recognition. The proposed method initializes estimates MNMF algorithm with the estimated spatial correlation matrix reduces the number of iteration of updates algorithm. The experiment result shows that our method reduced the computational complexity of MNMF.


New Era for Robust Speech Recognition, Exploiting Deep Learning | 2017

Advanced ASR Technologies for Mitsubishi Electric Speech Applications.

Yuuki Tachioka; Toshiyuki Hanazawa; Tomohiro Narita; Jun Ishii

Mitsubishi Electric Corporation has been developing speech applications for 20 years. Our main targets are car navigation systems, elevator-controlling systems, and other industrial devices. This chapter deals with automatic speech recognition technologies which were developed for these applications. To realize real-time processing with small resources, syllable N-gram-based text search is proposed. To deal with reverberant environments in elevators, spectral-subtraction-based dereverberation techniques with reverberation time estimation are used. In addition, discriminative methods for acoustic and language models are developed.


Journal of the Acoustical Society of America | 1996

Evaluation of a phase‐spotting method incorporated with prosodic information in spontaneous speech

Toshiyuki Hanazawa; Abe Yoshiharu; Kunio Nakajima

Spotting phrases from continuous speech is a difficult task especially from spontaneous speech. To cope with the problem, a phrase‐spotting method has been proposed containing prosodic information. In this paper, evaluation results of the method are described when applying it to spontaneous speech. In this method, the prosodic likelihood of the phrase boundaries is statistically calculated based on a pitch pattern HMM network, and integrated to the spotting source of the phrase. The pitch pattern HMM network models the pitch contour of sentences by connecting pitch pattern HMMs which model pitch contour of phrases. It is thought that the use of prosodic likelihood has an effect to reduce the false alarms. The method was evaluated using the ATR spontaneous speech database. As for the pitch pattern HMMs, two types of HMMs were used with different lengths for long‐ and short‐duration phrases and filled pauses. To construct an accurate pitch pattern HMM network, a bigram model was applied to transition the pr...


International Journal of Pattern Recognition and Artificial Intelligence | 1994

DICTATION MACHINE BASED ON JAPANESE CHARACTER SOURCE MODELING

Kiyohiro Shikano; Tomokazu Yamada; Takeshi Kawabata; Shoichi Matsunaga; Sadaoki Furui; Toshiyuki Hanazawa

This paper describes a phonetic typewriter and a dictation machine that utilize the underlying statistical structure of phoneme or character sequences. The approach of using syllable or character trigrams is applied to language source modeling. The language source models are obtained by calculating trigram probabilities from a large text database. These models are combined with the HMM-LR continuous speech recognition system.3,6 The phonetic typewriter is tested using 274 phrases uttered by one male speaker. The syllable source model achieves a 94.9% phoneme recognition rate with the test-set phoneme perplexity of 3.9. Without the syllable source model, the phoneme recognition rate is only 73.2%. A trigram model based on characters is also evaluated. This character source model can reduce the syllable perplexity significantly to 7.7, compared with 10.5 of the syllable source model. The character source model achieves a 78.5% character transcription rate for the 274 phrase utterances. The experimental results show that a syllable source model and a character source model are very effective for realizing a Japanese dictation machine.


Journal of the Acoustical Society of America | 1988

Duration control methods for HMM phoneme recognition

Toshiyuki Hanazawa; Takeshi Kawabata; Kiyohiro Shikano

Two kinds of duration control for HMM (hidden Markov model) phoneme recognition are proposed: phoneme duration control for an HMM phone model and an event duration control for an HMM state. The phoneme duration control is carried out by combining an HMM output probability with a phoneme duration probability. The phoneme duration probability is calculated using a phoneme duration histogram obtained from training samples. Phoneme duration control is effective in discriminating phonemes with different durations such as /n/ and /N/. Event duration control is realized as a state duration penalty calculated from an HMM state duration probability of training samples. Event duration control is effective in discriminating phonemes with different event structures such as /s/ and /ts/. Recognition experiments are carried out using Japanese phonemes extracted from an isolated word database uttered by one male speaker. The phoneme recognition rate is improved from 84.8%–90.0% using these duration control techniques.

Collaboration


Dive into the Toshiyuki Hanazawa's collaboration.

Top Co-Authors

Avatar

Kiyohiro Shikano

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kenji Kita

University of Tokushima

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alex Waibel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Satoshi Nakamura

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge