Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hisashi Wakita is active.

Publication


Featured researches published by Hisashi Wakita.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1987

Spectral slope distance measures with linear prediction analysis for word recognition in noise

Brian A. Hanson; Hisashi Wakita

This paper discusses the approximation and use of spectral slope distance measures derived from linear prediction analysis models of speech, with emphasis on their application for recognition of noisy speech. Initial testing of these slope-based measures for speaker-dependent isolated word recognition indicates that they give considerable performance improvement over the standard cepstral distance measure in several noise conditions. Comparisons are also made to two related distance measures which have been recently reported by other researchers.


Speech Communication | 1985

Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain

Hynek Hermansky; Brian A. Hanson; Hisashi Wakita

Abstract A novel speech analysis method which uses several established psychoacoustic concepts is applied to the analysis of vowels. This perceptually based linear predictive analysis (PLP) models the auditory spectrum by the spectrum of the low-order all-pole model. The auditory spectrum is derived from the speech waveform by critical-band filtering, equal-loudness curve pre-emphasis, and intensity-loudness root compression. We demonstrate through analysis of both natural and synthetic speech that psychoacoustic concepts of spectral auditory integration in vowel perception, namely the F 1, F 2′ concept of Carlson and Fant and the 3.5 Bark auditory integration concept of Chistovich, are well modeled by the PLP method. A complete speech analysis-synthesis system based on the PLP method is also described in the paper.


international conference on acoustics, speech, and signal processing | 1986

Perceptually based processing in automatic speech recognition

Hynek Hermansky; K. Tsuga; S. Makino; Hisashi Wakita

The perceptually based linear predictive (PLP) speech analysis method is applied to isolated word automatic speech recognition (ASR). Low dimensionality of the PLP analysis vector, which is otherwise identical in form to the standard linear predictive (LP) analysis vector, allows for computational and storage savings in ASR. We show that in speaker-dependent recognition of the alpha-numeric vocabulary, the PLP method in VQ-based ASR yields similar recognition scores as does the standard ASR system. The main focus of the paper is on cross-speaker ASR. We demonstrate in experiments with vowel centroids of two male and one female speakers that PLP speech representation is more consistent with the underlying phonetic information than the standard LP method. Conclusions from the experiments are confirmed by superior performance of the PLP method in cross-speaker isolated word recognition.


IEEE Transactions on Speech and Audio Processing | 1993

Evaluation and optimization of perceptually-based ASR front-end

Jean-Claude Junqua; Hisashi Wakita; Hynek Hermansky

Several recently proposed automatic speech recognition (ASR) front-ends are experimentally compared in speaker-dependent, speaker-independent (or cross-speaker) recognition. The perceptually based linear predictive (PLP) front-end, with the root-power sums (RPS) distance measure, yields generally the highest accuracies, especially in cross-speaker recognition., It is experimentally shown that one can optimize the system and further improve recognition accuracy for speaker-independent recognition by controlling the distance measures sensitivity to spectral peaks and the spectral tilt and by utilizing the speech dynamic features. For a digit vocabulary and five reference templates obtained with a clustering algorithm, the optimization improves recognition accuracy from 97% to 98.1%, with respect to the PL-PRPS front-end. >


Speech Communication | 1986

Vowel normalization by frequency warped spectral matching

Hiroshi Matsumoto; Hisashi Wakita

Abstract Normalization of formant frequencies have frequently been used to eliminate inter-speaker differences in vowel recognition. However, estimation of formant frequencies becomes difficult under certain circumstances, such as for telephone speech. This paper presents an approach to vowel normalization based on frequency warped spectral matching. A frequency normalized distance between test and reference spectra is defined on the basis of the minimum mean square difference over all possible choices of frequency warping functions under certain nonlinearity constraints and boundary conditions. After adaptively eliminating spectral slope differences due to the individual glottal characteristics, the spectral distance is computed by means of dynamic programming. The vowel identification experiments were conducted on the nine American English vowels in /hvd/ utterances spoken by 12 male and 12 female speakers. The results indicated that the frequency warping method substantially increased the identification scores for female vowels when the male vowels were used as reference. They also indicated that although the improvement in identification was attributed mainly to the linear frequency scaling, an additional improvement for vowel /ae/ was obtained by a slight nonlinear frequency warping. In addition, an application to speaker normalization for word detection in connected speech is discussed.


Journal of the Acoustical Society of America | 1988

SAIPH: A segmentation system for automatic labeling of a large speech database. Application to speech recognition

Jean-Claude Junqua; Hisashi Wakita

Several attempts for automatic segmentation and labeling of large speech databases have been made. Those previous studies proposed to (1) align the input utterance with the manually labeled reference utterance using dynamic time warping, (2) segment and label the utterance into broad phonetic classes prior to time alignment, and (3) use trained reference patterns. In some eases, phonetic or heuristic knowledge has been used to refine the boundaries. This approach does not use reference units and thus can be applied also to speech recognition. A transition measure derived from the perceptually based linear predictive analysis (PLP) was defined. Basic segmentation is obtained by use of heuristic knowledge based on this transition measure in addition to the energy and the zero‐crossings parameters. An evaluation on a keyboard database (104 words) spoken by two speakers (one male and one female) showed that more than 92% of the good segments are obtained after this stage. A broad classification, knowledge of ...


Journal of the Acoustical Society of America | 1985

Root‐power sums and spectral slope distortion measures for all‐pole models of speech

Brian A. Hanson; Hynek Hermansky; Hisashi Wakita

Distortion measures for use in speech processing are presented with emphasis on improvement of efficiency and recognition performance for noisy speech. Good results have been obtained from measures calculated from differences of root‐power sums (i.e., differences of index‐weighted cepstral coefficients: k *ck). This type of distortion measure can be derived as an approximation to the spectral slope distortion measure, which expresses slope differences between test and reference logarithmic spectra. However, when used on all‐pole model spectra, the root‐power sum distortion is computationally more efficient. It is well‐suited for use with both linear predictive (LP) analysis and the recently proposed perceptually based LP [H. Hermansky, B. A. Hanson, and H. Wakita, IEEE Proc. ICASSP 85, 509–512 (1985)]. The proposed distortion measures are shown to provide equal or better recognition scores compared to standard LP‐cepstral matching in low noise conditions and substantial improvements in noisy conditions.


Journal of the Acoustical Society of America | 1988

Comparative study of ASR front‐ends in noise

Jean-Claude Junqua; Hisashi Wakita

In automatic speech recognition (ASR) of speech corrupted by noise, the performance tends to deteriorate rapidly depending on the choice of analysis method and distance measure. In order to evaluate the recognition performance for several analysis methods and distance measures, a series of isolated word recognition experiments was performed. Analysis methods selected are critical‐band filtering, perceptually based linear prediction (PLP), linear prediction (LP), and time synchronous linear prediction (SLP). The weighted Euclidean distance with different weightings [unity, root power sums (RPS), and exponential filtering] was applied in the cepstrum domain. Experiments were carried out for clean speech and for two noise conditions (white and low‐pass filtered white, added to the clean speech) at different SNR ratios (25 to 5 dB), using an alphanumeric vocabulary (ten speakers). It is shown that improvements in robustness of the recognizer in noise can be achieved by a proper selection of analysis method an...


Journal of the Acoustical Society of America | 1988

Improving the performance of backpropagation‐trained vowel classifiers

Gregory R. De Haan; Ömer Eğecioğlu; Hisashi Wakita

Classification experiments using nine steady‐state vowels were performed to compare artificial neural networks trained via backpropagation and K‐nearest neighbor (KNN) classifiers. Normalized critical‐band filterbank outputs served as input patterns in all experiments. Initial experiments used prototypical feedforward networks [R. Lippmann, IEEE ASSP Mag. 4(2), 4–22 (1987)], with fully interconnected adjacent layers of units. Once the critical number of hidden units [D. J. Burr, J. Acoust. Soc. Am. Suppl. 1 83, S46 (1988)] was established for a given experiment, the networks compared favorably to KNN. Significantly, while networks with two hidden layers did better than networks with one hidden layer for (binary) front‐back vowel distinctions, they performed worse for (nine‐class) vowel classification. It appears that backpropagation may be particularly powerful for binary classification [e.g., R. P. Gorman and T. J. Sejnowski, Neural Networks 1, 75–89 (1988)]. Experiments were run comparing prototypical n...


Journal of the Acoustical Society of America | 1984

Lexical analysis for word recognition based on phoneme‐pair differences

Shozo Makino; Hisashi Wakita; Ted H. Applebaum

In large vocabulary word recognition systems, it is important to know the phonetic properties of the vocabulary. This paper presents observed phonetic properties of the 5000 most frequent words in the Brown Corpus, and describes a method to evaluate the effects of phoneme recognition errors on word recognition. The study was conducted for two cases: the 5000‐word vocabulary with one standard pronunciation per word, and the same vocabulary with multiple pronunciations per word. A distance was defined as the number of different phoneme pairs between two words, taking phoneme deletion and insertion into account. The distance was calculated for every word pair using dynamic programming. Detailed analysis was made of word pairs with distances 0, 1, and 2, and some properties of the vocabulary were obtained which provide useful information in designing a word recognition system. Relations among phoneme recognition score, word recognition score, and vocabulary size were also investigated.

Collaboration


Dive into the Hisashi Wakita's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yunxin Zhao

University of Missouri

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge