Kazunaga Yoshida
NEC
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kazunaga Yoshida.
international conference on acoustics, speech, and signal processing | 1989
Hiroaki Sakoe; Ryosuke Isotani; Kazunaga Yoshida; Ken-ichi Iso; Takao Watanabe
A description is given of speaker-independent word recognition based on a new neural network model called the dynamic programming neural network (DNN), which can treat time-sequence patterns. DNN is based on the integration of a multilayer neural network and dynamic-programming-based matching. Speaker-independent isolated Japanese digit recognition experiments were carried out using data uttered by 107 speakers (50 speakers for training and 57 speakers for testing). The recognition accuracy was 99.3%, suggesting that the model can be effective for speech recognition.<<ETX>>
Journal of the Acoustical Society of America | 1994
Kazunaga Yoshida; Takao Watanabe
A speech recognition apparatus of the speaker adaptation type operates to recognize an inputted speech pattern produced by a particular speaker by using a reference pattern produced by a voice of a standard speaker. The speech recognition apparatus is adapted to the speech of the particular speaker by converting the reference pattern into a normalized pattern by a neural network unit, internal parameters of which are modified through a learning operation using a normalized feature vector of the training pattern produced by the voice of the particular speaker and normalized on the basis of the reference pattern, so that the neural netowrk unit provides an optimum output similar to the corresponding normalized feature vector of the training pattern. In the alternative, the speech recognition apparatus operates to recognize an inputted speech pattern by converting the inputted speech pattern into a normalized speech pattern by the neural network unit, internal parameters of which are modified through a learning operation using a feature vector of the reference pattern normalized on the basis of the training pattern, so that the neural network unit provides an optimum output similar to the corresponding normalized feature vector of the reference pattern and recognizing the normalized speech pattern according to the reference pattern.
international conference on acoustics, speech, and signal processing | 1989
Kazunaga Yoshida; Takao Watanabe
The authors present a large-vocabulary speech recognition method based on hidden Markov models (HMMs) and aimed at high recognition performance with a small amount of training data. The recognition model is designed to treat contextual and allophonic variations utilizing acoustic-phonetic knowledge. The demisyllable is used as a recognition unit to treat contextual variations caused by the coarticulation effect. A single Gaussian probability density function is used as the HMM output probability, and allophonic units are defined to deal with greater allophonic variations, such as vowel devoicing. In an experiment, demisyllable models were trained using a 250 training word set, and 99.0% and 97.5% recognition rates were obtained for 500-word and 1800-word vocabularies, respectively. The result demonstrates the effectiveness of the method.<<ETX>>
Systems and Computers in Japan | 1989
Hiroaki Sakoe; Hiromi Fujii; Kazunaga Yoshida; Masao Watari
This paper discusses the high-speed DP-matching as the speech recognition algorithm including connected word sequence recognition. The first improvement is the frame synchronization. By this elaboration, an improvement of the speed by approximately one order of magnitude is achieved, compared with the consecutive word recognition of two-level DP-matching type, where DP-matching is iterated by assuming that any time in the input speech can be the word boundary. The second improvement is the introduction of the beam search. This paper discusses the practical aspects of combining the beam search and DP-matching. The discussion includes the construction of the work area, control of DP recursive expression and other problems, aiming at an effective reduction of the computational complexity for the recursive expression. The third improvement is the built-in vector quantization. It is shown that an effective reduction of the computational complexity for the local distance can be produced through a skillful integration of the beam search and the vector quantization. Through an evaluation experiment for the discrete word, it is seen that there is a possibility of achieving the speed improvement by a factor of 30. This corresponds to the speed improvement of two or more orders of magnitude, compared with the two-level DP-matching for the consecutive word sequence recognition algorithm.
Journal of the Acoustical Society of America | 1993
Kazunaga Yoshida
A continuous speech recognition unit using forward probabilities for recognizing continuous speech associated with standard patterns for given units of recognition comprises a standard template memory for storing Markov model standard templates of standard speech, which are composed of state sequences and transition probabilities between the states; an observation probability computing device for computing a forward probability for a feature vector time sequence; and a cumulative value computing device for determining a cumulative value based on the sum of previous cumulative values. The unit further comprises a matching pass memory for storing maximum values produced by the cumulative value computing means and a result processor for determining recognition results indicative of recognized words. The unit stores the transition giving the best probability in memory for each state and traces back the recognition result for the word sequence based on the transitions in memory.
Journal of the Acoustical Society of America | 1988
Hiroaki Hattori; Kazunaga Yoshida
Pronunciation manner varies depending on the noise of the speech environment. This is known as the Lombard effect. It affects the acoustic features of speech such as intensity, pitch, duration, and spectral shape. Because vowels play an important role in Japanese speech recognition, special attention was paid to the variability of the acoustic features of vowels uttered in noise. First, the variability of the five Japanese vowels was examined using speech uttered while hearing noise through headphones. It was observed that the higher formants are unstable and that the energy in the middle frequency range of the vowel spectrum increases depending on the noise level. Based on these observations, a normalization method combining band limitation and spectral tilt compensation was proposed, and its effectiveness in the recognition of vowels was shown by experiment. Next, a new word recognition method utilizing this normalization technique was proposed. This method performs vowel normalization at the portion wh...
Archive | 1986
Kazunaga Yoshida; Hiroshi Shimizu; Masao Watari
Archive | 1990
Kazunaga Yoshida
Archive | 1998
Yoshikazu Ikebata; Kazunaga Yoshida
Archive | 1997
Yutaka Nakashima; Kazunaga Yoshida; Yoshikazu Ikebata