Yasunori Ohora | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yasunori Ohora is active.

Explore More

Publication

Featured researches published by Yasunori Ohora.

Journal of the Acoustical Society of America | 1999

Method and apparatus for recognizing previously unrecognized speech by requesting a predicted-category-related domain-dictionary-linking word

Masayuki Yamada; Yasunori Ohora; Yasuhiro Komori

A voice communication method includes the steps of inputting speech into an apparatus, recognizing the input speech using a first dictionary, predicting the category of an unrecognized word included in the input speech based on the recognition of the input speech in the recognition step, outputting a question to be asked to an operator requesting the operator to input a word which is included in the first dictionary and which can specify a second dictionary for recognizing the unrecognized word, based on the predicted category, and re-recognizing the unrecognized word with the second dictionary specified in response to the word inputted by the operator. The invention also relates to an apparatus performing these functions and to a computer program product instructing a computer to perform these functions.

Journal of the Acoustical Society of America | 1998

Speech synthesis apparatus and method for synthesizing speech from a character series comprising a text and pitch information

Mitsuru Otsuka; Yasunori Ohora; Takashi Aso; Toshiaki Fukada

A speech synthesis method and apparatus for synthesizing speech from a character series comprising a text and pitch information. The apparatus includes a parameter generator for generating power spectrum envelopes as parameters of a speech waveform to be synthesized representing the input text in accordance with the input character series. The apparatus also includes a pitch waveform generator for generating pitch waveforms whose period equals the pitch specified by the pitch information. The pitch waveform generator generates the pitch waveforms from the input pitch information and the power spectrum envelopes generated by the parameter generator. Also provided is a speech waveform output device for outputting the speech waveform obtained by connecting the generated pitch waveforms.

Journal of the Acoustical Society of America | 1998

Syllable-beat-point synchronized rule-based speech synthesis from coded utterance-speed-independent phoneme combination parameters

Mitsuru Ohtsuka; Yasunori Ohora; Takashi Asou; Takeshi Fujita; Toshiaki Fukada

In a speech synthesizer, each frame for generating a speech waveform has an expansion degree to which the frame is expanded or compressed in accordance with the production speed of synthetic speech. In accordance with the set speech production speed, the time interval between beat synchronization points is determined on the basis of the speed of the speech to be produced, and the time length of each frame present between the beat synchronization points is determined on the basis of the expansion degree of the frame. Parameters for producing a speech waveform in each frame are properly generated by the time length determined for the frame. In the speech synthesizer for outputting a speech signal by coupling phonemes constituted by one or a plurality of frames having phoneme vowel-consonant combination parameters (VcV, cV, or V) of the speech waveform, the number of frames can be held constant regardless of a change in the speech production speed. This prevents degradation in the tone quality or a variation in the processing quantity resulting from a change in the speech production speed.

Journal of the Acoustical Society of America | 1998

Method and apparatus for speech processing

Takashi Aso; Yasunori Ohora; Takeshi Fujita

An apparatus and method for processing vocal information includes an extractor for extracting a plurality of spectrum information from parameters for vocal information, a vector quantizer for vector-quantizing the extracted spectrum information and for producing a plurality of parameter patterns therefrom, a memory for storing the plurality of parameter patterns so obtained, and a memory for storing positional information indicating the positions at which the plurality of parameter patterns are stored and for storing code information specifying parameter patterns and corresponding to the positional information. The parameter patterns and code information can be used to synthesize speech. Because a small number of parameter patterns are used, only a small memory capacity is needed and efficient processing of vocal information can be performed.

Journal of the Acoustical Society of America | 2004

Recognizing speech data using a state transition model

Yasuhiro Komori; Yasunori Ohora; Masayuki Yamada

Detecting an unknown word in input speech data reduces the search space and the memory capacity for the unknown word. For this purpose, an HMM data memory stores data describing a state transition mode for the unknown word, defined by a number of states and the transition probability between the states. An output probability calculation unit acquires a state of the maximum likelihood at each time of the speech data, among the plural states employed in the state transition mode for a known word, employed in the speech recognition of the known word. The obtained result is applied to the state transition mode for the unknown word, stored in the HMM data memory, to obtain a state transition mode of the unknown word. A different output probability calculation unit determines the likelihood of the state transition mode for the known word. Then a language search unit effects the language search process, utilizing the likelihoods determined by the aforementioned two output probability calculation units, in a portion where the presence of the unknown word is permitted by the dictionary.

Journal of the Acoustical Society of America | 1995

Speech synthesizer and method for synthesizing speech for superposing and adding a waveform onto a waveform obtained by delaying a previously obtained waveform

Takashi Aso; Yasunori Ohora

A speech synthesizer includes a first indicator for indicating the amplitude of a waveform by using a random number, a second indicator for indicating the superposition period for waveforms by using a random number, a waveform generator for generating first and second waveforms having an amplitude indicated by the first indicator, and a waveform superposition device for synthesizing an unvoiced speech waveform by superposing the second waveform generated by the waveform generator onto a waveform obtained by delaying the first waveform by a superposition period indicated by the second indication means. The speech synthesizer is capable of making the frequency characteristic of unvoiced speech analogous to that of white noise, and generating synthesized speech which is natural and analogous to an actual human voice.

Journal of the Acoustical Society of America | 1995

Speech perception apparatus

Yasunori Ohora; Koichi Miyashiba

Speech perception is performed by obtaining similarity between an input speech and a reference speech and discriminating the input speech; selectively outputting discrimination information or color information corresponding to the similarity derived; and displaying the discrimination information or color information output selectively by a predetermined symbol or color. With this apparatus, the person who utters can know the condition of his utterance and accuracy of speech perception of the apparatus.

Speech Communication | 1994

A spoken dialogue system with active/non-active word control for CD-ROM information retrieval

Masayuki Yamada; Fumiaki Itoh; Keiichi Sakai; Yasuhiro Komori; Yasunori Ohora; Minoru Fujita

Abstract This paper describes a development of a spoken dialogue travel guidance system, TARSAN. TARSAN uses commercial CD-ROM guidebooks as its knowledge source, containing a large amount of travel information. To deal with this amount of information, a large vocabulary has to be accepted by a speech recognizer without reducing its performance. Thus, we propose two steps of active/non-active word control methods: (1) a word/grammar prediction strategy, and (2) unknown word re-evaluation algorithm. The word/grammar prediction strategy dynamically changes a recognition network according to a conversation situation by making use of results retrieved from the CD-ROMs. This strategy makes users to access almost all data on the CD-ROMs using a small vocabulary speech recognizer. The unknown word re-evaluation algorithm processes unknown words and non-active words using Garbage Models by integrating them into the recognition network, and once the Garbage Models are recognized, the unknown part will be compared with the non-active words. This algorithm enhances the ability of the word/grammar prediction. In the experiment without Garbage Models, 80.9% of the utterances were correctly understood. In the unknown word re-evaluation experiment using the Garbage Models, 86.4% were correctly re-evaluated, while the false alarms of 5% were found.

international conference on acoustics speech and signal processing | 1996

Fast output probability computation using scalar quantization and independent dimension multi-mixture

Masayuki Yamada; Hiroki Yamamoto; Tetsuo Kosaka; Yasuhiro Komori; Yasunori Ohora

In this paper, we propose a high speed output probability computation algorithm for multi-mixture continuous HMM. In the algorithm, we adopted the following three techniques: 1) independent dimension multi-mixture computation (IDMM), 2) scalar quantization (SQ) and 3) output probability recalculation. At the first step of the algorithm, the state probability is roughly estimated using the IDMM and SQ. The IDMM is an approximate computation of the multi-mixture probability density function, which realizes the fast probability estimation along with the SQ. The result of the rough estimation is used to select states with the high output probability. Then, the rigid probability calculation is carried out on the selected states. Our experiment on speaker independent continuous speech recognition showed that the proposed algorithm saves 81% of time for the output probability computation and 71% in the total speech recognition process, without degradation of the recognition rate.

Journal of the Acoustical Society of America | 1994