Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hiroaki Hattori is active.

Publication


Featured researches published by Hiroaki Hattori.


international conference on multimodal interfaces | 2002

An automatic speech translation system on PDAs for travel conversation

Ryosuke Isotani; Kiyoshi Yamabana; Shinichi Ando; Ken Hanazawa; Shinya Ishikawa; Tadashi Emori; Ken-ichi Iso; Hiroaki Hattori; Akitoshi Okumura; Takao Watanabe

We present an automatic speech-to-speech translation system for personal digital assistants (PDAs) that helps oral communication between Japanese and English speakers in various situations while traveling. Our original compact large vocabulary continuous speech recognition engine, compact translation engine based on a lexicalized grammar, and compact Japanese speech synthesis engine lead to the development of a Japanese/English bi-directional speech translation system that works with limited computational resources.


Journal of the Acoustical Society of America | 1998

Feature extraction and normalization for speech recognition

Eiko Yamada; Hiroaki Hattori

Speech data is converted into logarithmic spectrum data and orthogonally transformed to develop feature vectors. Normalization coefficient data and unit vector data are stored. An inner product of the feature vector data and the unit vector data is calculated. The inner product may be the average of inner products for a word or a sentence, or may be a regressive average of them. A normalization vector, which corresponds to a second or higher order curve obtained by least-square error approximation of the speech data on logarithmic spectrum space, is calculated on the transformed feature vector space by using the inner product, the normalization coefficient data, and the unit vector data. Normalization of the feature vectors is performed by subtracting the normalization vector from the feature vectors on the transformed feature vector space. Then, a recognition is performed based on the normalized feature vector.


international conference on acoustics, speech, and signal processing | 1995

Rapid environment adaptation for robust speech recognition

Keizaburo Takagi; Hiroaki Hattori; Takao Watanabe

The paper proposes a rapid environment adaptation algorithm based on spectrum equalization (REALISE). In practical speech recognition applications, differences between training and testing environments often seriously diminish recognition accuracy. These environmental differences can be classified into two types: difference in additive noise and difference in multiplicative noise in the spectral domain. The proposed method calculates time-alignment between a testing utterance and the closest reference pattern to it, and then calculates the noise differences between the two according to the time-alignment. Then, the authors adapt all reference patterns to the testing environment using the differences. Finally, the testing utterance is recognized using the adapted reference patterns. In a 250 Japanese word recognition task, in which the training and testing microphones were of two different types, REALISE improved recognition accuracy from 87% to 96%.


Journal of the Acoustical Society of America | 1993

Device for normalizing a speech spectrum

Hiroaki Hattori

A device for use in a speech recognizer or similar apparatus for normalizing the spectrum of speech as preprocessing for speech recognition. The device divides the spectrum of input speech at a predetermined frequency and determines a linear approximate line for each of the divided spectra such that the resulting approximate lines join each other at the point of division, thereby normalizing the spectrum.


Journal of the Acoustical Society of America | 1993

Phoneme recognition utilizing relative positions of reference phoneme patterns and input vectors in a feature space

Hiroaki Hattori

In a phoneme recognition apparatus, first distances between input vectors and reference vectors are determined. If the first distances are lower than a threshold value, the input vectors are identified as corresponding to the reference vectors. If the first distances are higher than the threshold value, the input vectors are identified as being indeterminate, and input differential vectors between the input vectors identified as corresponding to a reference vector and those identified as not corresponding to any of the reference vectors are determined. In addition, reference differential vectors between the reference vectors having corresponding input vectors and are those having no corresponding input vectors are determined. Second distances between the input differential vectors and the reference differential vectors are calculated and summed. The indeterminate input vectors are then identified as corresponding to the reference vectors in accordance with combined values of the first distances and the summed second distances.


international conference on acoustics, speech, and signal processing | 1991

Speaker adaptation based on Markov modeling of speakers in speaker-independent speech recognition

Hiroaki Hattori

A speaker adaptation method for HMM (hidden Markov model) based speaker-independent speech recognition without supervising is presented. This method reduces the confusion between models, which is caused by training using large-size training data, by controlling the influences of the training samples used in HMM training by considering the similarity of speaker individuality. A Markov model and a hidden Markov model are used to represent an input speakers individuality. These models are compared through their entropy and /b, d, g, m, n, N/ recognition task. The results show that a hidden Markov model is more suitable than a Markov model.<<ETX>>


international conference on spoken language processing | 1996

Speech recognition using sub-word units dependent on phonetic contexts of both training and recognition vocabularies

Hiroaki Hattori; Eiko Yamada

Proposes a new speech recognition algorithm using a new context-dependent recognition unit design method for efficient and precise acoustic modeling. This algorithm uses both training and recognition vocabularies to select context-dependent units which precisely represent acoustic variations due to phonetic contexts in a recognition vocabulary. An efficient training algorithm for selected context-dependent units is also proposed. In speaker-independent isolated-word recognition experiments, the proposed algorithm gave a 11% error reduction for 5000-word recognition, and gave a 43% error reduction for 10-digit recognition. These results confirmed the effectiveness of the proposed method.


international conference on spoken language processing | 1996

Unsupervised and incremental speaker adaptation under adverse environmental conditions

Keizaburo Takagi; Koichi Shinoda; Hiroaki Hattori; Takao Watanabe

A speaker adaptation method is described. In practical applications of speaker adaptation, adaptation and testing environments change significantly and are unknown beforehand. In such cases, since the speaker adaptation adapts a reference pattern to the adaptation utterances with regard to differences in both environment and speaker at the same time, performance in speaker adaptation would be degraded. To cope with this problem, our proposed method first eliminates the environmental differences between each input utterance and a reference pattern by using a rapid environment adaptation algorithm based on spectrum equalization (REALISE) (K. Takagi et al., 1995). Then we apply an unsupervised and incremental speaker adaptation with autonomous control using tree structure pdfs (ACTS) (K. Shinoda and T. Watanabe, 1995) to the environmentally adapted reference pattern. By combining these two methods, the resulting system is expected to perform well under adverse environmental conditions and to show a stable improvement, regardless of the amount of adaptation data. Evaluation experiments were carried out for utterances under three vehicle speed conditions. Recognition rates for a 100 Japanese word recognition task after 100 word adaptation were improved from 92% (ACTS alone) to 95% (proposed method).


Journal of the Acoustical Society of America | 1996

Evaluation of a rapid environment adaptation algorithm in adverse environments

Hiroaki Hattori; Keizaburo Takagi; Takao Watanabe

This paper reports results for a rapid environment adaptation algorithm in adverse car environments. It is known that the difference between training and testing environments degrades speech recognition performance. This degradation becomes serious especially in applications such as telephone speech recognition and speech recognition inside a running vehicle, where the testing environment may drastically change. To solve this problem, a rapid environmental adaptation method (hereafter referred to as REALISE) is proposed and its performance is measured in telephone speech recognition. REALISE estimates the differences in multiplicative and additional noise in the spectral domain between the training and testing environments and uses them to adapt acoustic features of reference patterns to the testing environment. Utterances in a car were also recorded to investigate the performance of REALISE under more severe conditions. The testing data were 100 city names uttered by three males and three females under t...


Journal of the Acoustical Society of America | 1988

Recognition of speech produced in a simulated noisy environment

Hiroaki Hattori; Kazunaga Yoshida

Pronunciation manner varies depending on the noise of the speech environment. This is known as the Lombard effect. It affects the acoustic features of speech such as intensity, pitch, duration, and spectral shape. Because vowels play an important role in Japanese speech recognition, special attention was paid to the variability of the acoustic features of vowels uttered in noise. First, the variability of the five Japanese vowels was examined using speech uttered while hearing noise through headphones. It was observed that the higher formants are unstable and that the energy in the middle frequency range of the vowel spectrum increases depending on the noise level. Based on these observations, a normalization method combining band limitation and spectral tilt compensation was proposed, and its effectiveness in the recognition of vowels was shown by experiment. Next, a new word recognition method utilizing this normalization technique was proposed. This method performs vowel normalization at the portion wh...

Researchain Logo
Decentralizing Knowledge