Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Masakatsu Hoshimi is active.

Publication


Featured researches published by Masakatsu Hoshimi.


Journal of the Acoustical Society of America | 1998

Voice recognition method for recognizing a word in speech

Maki Yamada; Masakatsu Hoshimi; Taisuke Watanabe; Katsuyuki Niyada

An inter-frame similarity between an input voice and a standard patterned word is calculated for each of frames and for each of standard patterned words, and a posterior probability similarity is produced by subtracting a constant value from each of the inter-frame similarities. The constant value is determined by analyzing voice data obtained from specified persons to set the posterior probability similarities to positive values when a word existing in the input voice matches with the standard patterned word and to set the posterior probability similarities to negative values when a word existing in the input voice does not match with the standard patterned word. Thereafter, an accumulated similarity having an accumulated value obtained by accumulating values of the posterior probability similarities according to a continuous dynamic programming matching operation for the frames of the input voice is calculated for each of the standard patterned words. Thereafter, a particular standard patterned word relating to an accumulated similarity having a maximum value among the accumulated similarities is output as a recognized word of the input voice.


Journal of the Acoustical Society of America | 1990

Method and apparatus for speech recognition

Masakatsu Hoshimi; Katsuyuki Niyada

Speech parameters (Ph and Pl) are derived for consonant classification and recognition by separating a speech signal into Low and High frequency bands, then in each band obtaining the time first-derivative, from which the min-max differences (power dip) are obtained (Ph and Pl). The distribution of Ph and Pl in a two-dimensional plot for a discriminant diagram classifies the consonant phoneme.


international conference on acoustics, speech, and signal processing | 1985

Large vocabulary speaker-independent Japanese speech recognition system

S. Morii; Katsuyuki Niyada; S. Fujii; Masakatsu Hoshimi

This paper describes the speaker independent large vocabulary speech recognition system based on phoneme recognition. Phoneme recognition employs LPC cepstrum coefficients as the feature parameter and statistical distance measure between an input pattern and phoneme reference template. Using power dips of low and high frequency range, similarity to unvoiced feature and similarity to nasal feature, the consonant segments are detected. The discrimination of phonemes is performed individually for vowels, semi-vowels and consonants. Phoneme sequence which is result of phoneme recognition is matched with each item of the word dictionary and the item with the highest similarity in the dictionary is output as the recognition result. An average phoneme recognition score is 81.4% for 212 words uttered by forty speakers including males and females; 90.6% for vowels, 78.0% for semivowels and 71.9% for consonants. An average score of word recognition is 95.6% for 274 Japanese city names uttered by forty speakers.


international conference on acoustics, speech, and signal processing | 1992

Speaker independent speech recognition method using training speech from a small number of speakers

Masakatsu Hoshimi; Maki Miyata; Shoji Hiraoka; Katsuyuki Niyada

A novel speaker-independent speech recognition method, which registers speech uttered by a small number of speakers into a dictionary as model speech is presented. It is based on the hypothesis that movement of the vocal tract differs little among individuals when the same word is spoken. This idea leads to the conclusion that dynamic characteristics extracted from a small number of speakers utterances are effective for speaker-independent speech recognition. A speech recognition method using model utterances in which similarity values of an input word are calculated by matching a small number of speakers utterances with phoneme templates for speaker-independent recognition is described. When tested with 212 Japanese words, a word recognition rate of 95.8% was obtained. The evaluation of the noise robustness is also reported.<<ETX>>


international conference on acoustics, speech, and signal processing | 1986

Compact isolated word recognition system for large vocabulary

Shoji Hiraoka; S. Morii; Masakatsu Hoshimi; Katsuyuki Niyada

This paper presents the development of a high-speed speaker-independent speech recognition system for a large vocabulary of Japanese words, based on phoneme recognition. The system is implemented in small size by a pair of digital signal processors and a general purpose micro-processor on three printed circuit boards. An experimental result using 212 Japanese word samples, which have been pronounced by 20 males and 20 females, showed an average word recognition rate of 95.5%. And the word recognition process completes within 0.8 second after the end of utterance.


international conference on spoken language processing | 1996

An experimental Japanese/English interpreting video phone system

Murat Karaorman; Ted H. Applebaum; Tatsuro Itoh; Mitsuru Endo; Yoshio Ohno; Masakatsu Hoshimi; Takahiro Kamai; Kenji Matsui; Kazue Hata; Steve Pearson; Jean-Claude Junqua

We report on the architectural design issues and experiences gained while building and demonstrating an experimental interpreting video phone (IVP) system. The IVP system has been demonstrated in an Internet home shopping simulation simultaneously before live audiences in Japan and the US. An American shop assistant and a Japanese customer engaged in task directed dialogues using their native languages. In addition to their direct audio/visual contact by ISDN video phone, each participant heard a translation of the remote speakers utterances in a synthetic voice in real time. Each site used a medium size vocabulary, a continuous speech recognition system and a text to speech synthesis (TTS) system for the local language. Recognition results were transmitted over the Internet to the remote site, where the corresponding translated sentence was spoken by TTS in the listeners native language. All of the speech and language processing software components of the system were independently developed proprietary technologies of the authors laboratories which were integrated using commercially available hardware and communication media. Difficulties encountered in developing the system, the accommodations which were made, and other experiences gained through the process are reported.


Systems and Computers in Japan | 1987

Consonant recognition methods for unspecified speakers using bpf powers and time sequence of LPC cepstrum coefficients

Katsuyuki Niyada; Masakatsu Hoshimi

This paper discusses the recognition of the consonant (except the consonant at the top of the word) in a word for unspecified speakers. First, the consonant section is detected based on the power dips extracted from the low- and high-frequency power information, together with the nasal and unvoiced properties. n n n nApplying the discrimination diagram to the detected low- and high-frequency dips, the phoneme is classified into the four phoneme classes (rough classification). Then methods are discussed which discriminate the individual phonemes in the phonemes group by pattern matching (fine classification). Using the time-series pattern of the LPC cepstrum coefficient as the parameter, it is shown that the comparison with the standard phoneme patterns using the Bayes discriminant and the Mahalabinos distance is the most useful. The result of recognition experiment using the segmentation, rough classification, and fine classifications is presented. For twenty subjects of both sexes, the mean recognition rate of 78.1 percent was achieved.


Systems and Computers in Japan | 1999

Practical method of unspecified‐speaker speech recognition on single‐chip DSP

Masakatsu Hoshimi; Maki Yamada; Katsuyuki Niyada

This article describes a speech recognition method using phonemic similarity as a characteristic parameter, and development of a practicable system on a single-chip DSP. High performance of the proposed method was confirmed in a real-life environment. Using phonemic similarity as a characteristic parameter was shown to reduce memory size by 1/32, and the number of calculations by 1/4. As a result, the speech recognition system became implementable on a single-chip DSP. In the word-spotting algorithm, a simple method of posterior randomization was employed (namely, subtraction of a constant from a similarity value found through the correlation cosine). Word recognition using the product of posterior probabilities proved efficient and was implemented in a DSP device. The proposed method involves two-stage processing, and it was shown that the rated performance can be maintained provided that 24 types of standard phoneme patterns at the first stage are adjusted to the type of input noise and S/N ratio.


international conference on acoustics, speech, and signal processing | 1997

An experimental bidirectional Japanese/English interpreting video phone system using Internet

Shoji Hiraoka; Masakatsu Hoshimi; Kenji Matsui; Jean-Claude Junqua

We report on an experimental bidirectional Japanese/English interpreting video phone system using Internet. We particularly emphasize the motivation for this work, the task, and the experiments conducted. Using in-house technology developed both in Japan and in the United States, we demonstrate an Internet home shopping application where an American shop assistant and a Japanese customer engage in task directed dialogues, using their native languages. The experiments show that when users are familiar with the application language, a natural interaction can be obtained.


Journal of the Acoustical Society of America | 1988

A telephone number information system for private branch exchanges

Shoji Hiraoka; Toshiyuki Morii; Masakatsu Hoshimi; Katsuyuki Niyada

A telephone speech‐recognition information system has been developed using a personal computer, a speech‐recognizer, and a voice synthesizer. This speech recognizer is speaker independent and has a large vocabulary [S. Hiraoka et al., IEEE ICASSP86, pp. 2.8.1–4]. The user calls the system and gives the surname and division name of the addressee; both words are recognized and the extension number is supplied. Previously developed microphone phoneme templates were exchanged and made telephone compatible. The speech‐sample data, which were used for making the previous templates were reused and converted to sound via a mouth simulator. The data were then returned via a handset to the computer through a PBX line. The new phoneme templates were validated using 18 individuals calls of 212 words each, giving a recognition rate of 91.2%. The new effective templates were easily made because the data had been previously labeled. The recognizer produces 15 candidates for each of the two words and the personal compu...

Collaboration


Dive into the Masakatsu Hoshimi's collaboration.

Researchain Logo
Decentralizing Knowledge