Ryo Mochizuki
Panasonic
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ryo Mochizuki.
Journal of the Acoustical Society of America | 2003
Toshimitsu Minowa; Hirofumi Nishimura; Ryo Mochizuki
A method and apparatus for speech synthesis utilize a plurality of stored prosodic templates, each having been generated based on a series of enunciations of a single syllable executed in accordance with the rythm, pitch and speech power variations of an enunciated sample speech item, whereby the templates express rythm, speech power and pitch characteristics of respectively different sample speech items. Data representing an object speech item are converted to a sequence of acoustic waveform segments which respectively express the syllables of the speech item, the number of morae (syllable intervals) and the accent type of the speech item are judged and a prosodic template having the same number of morae and accent type is selected, and waveform shaping is applied to the waveform segments such as to match the rythm, speech power and pitch characteristics of the object speech item to those expressed by the selected prosodic template. The shaped acoustic waveform segments are then linked to form a continuous acoustic waveform, thereby obtaining synthesized speech which closely resembles natural speech.
international conference on spoken language processing | 1996
Yasuhiko Arai; Ryo Mochizuki; Hirofumi Nishimura; Takashi Honda
A novel pitch waveform extraction method has been proposed. Being different from conventional pitch mark decision algorithms, such as the peak search method, this new algorithm decides excitation points based on the phase-equalized residual excited linear prediction (PE-RELP) model. A pitch waveform is extracted from two adjacent excitation intervals by using the asymmetrical Hanning window. The new pitch waveform extraction method takes advantage of being free from the extraction errors caused by the formant resonance and being fully automatic. Therefore, no manipulation is required and no roughness is heard in the pitch-modified speech sound. The superiority of the new method has been ensured by means of spectral distortion measurement and subjective quality evaluation. Finally, spoken word generation by means of VCV (vowel-consonant-vowel) waveform concatenation is demonstrated. Consequently, it has been shown that the generation of very natural-sounding spoken words is possible.
Journal of the Acoustical Society of America | 2007
Ryo Mochizuki; Toshiyuki Isono; Hirofumi Nishimura
A speech synthesis apparatus (10) comprises speech segment disassembling means (101) for disassembling the speech segments each including at least one phoneme into a plurality of pitch waveforms, phase characteristic transforming means (103) for transforming the phase characteristics of the pitch waveforms into a uniformed phase characteristic, pitch waveform classifying means (104) for classifying the pitch waveforms into a plurality of groups, pitch waveform registering means (106) for registering the pitch waveforms in the database (111) by extracting one pitch waveform from among the pitch waveforms in each of the groups, and synthesizing means (107) for synthesizing the speech with the pitch waveforms registered in the database (111). The speech synthesis apparatus (10) thus constructed can synthesize a natural speech using a relatively small database capacity.
Journal of the Acoustical Society of America | 1998
Yasuhiko Arai; Ryo Mochizuki; Takashi Honda
A speech output system in which messages are composed by embedding some phonetic words in the prerecorded sentence patterns requires very natural‐sounding phonetic words to be embedded. Especially in the case where a very large or an unlimited vocabulary is required, synthesis of very natural‐sounding phonetic words would be desired. Therefore, a novel Japanese VCV‐concatenation synthesis method based on the pitch waveform concatenation has been studied. The VCV waveforms are extracted from the VCV‐balanced spoken word database and pitch waveforms are extracted by means of the excitation synchronous pitch waveform extraction method [Y. Arai et al., ICSLP 96, 1437–1440 (1996)]. The wavelength of each pitch waveform is modified depending on the pitch modification rate, and the global pitch contour of each VCV waveform is modified along with the target pitch pattern, while the micro‐prosody and the small perturbation in pitch frequency inherent in that VCV waveform are retained. The auditory tests confirmed ...
Archive | 2010
Ryo Mochizuki; Shusaku Okamoto
Archive | 2010
Atsushi Nojiri; Ryo Mochizuki; Yashio Kishi; Shinya Aizawa
Archive | 1999
Toshimitsu Minowa; Ryo Mochizuki; Hirofumi Nishimura
Systems and Computers in Japan | 2007
Ryo Mochizuki; Tadashi Okubo; Tetsunori Kobayashi
IEICE Transactions on Information and Systems | 2004
Ryo Mochizuki; Tetsunori Kobayashi
Archive | 2001
Ryo Mochizuki; Toshiyuki Isono; Hirofumi Nishimura