Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hiroya Fujisaki is active.

Publication


Featured researches published by Hiroya Fujisaki.


international conference on acoustics, speech, and signal processing | 1986

Proposal and evaluation of models for the glottal source waveform

Hiroya Fujisaki; Mats Ljungqvist

Speech analysis for high quality speech synthesis or high accuracy speech recognition requires realistic models not only for the vocal tract but also for the voice source. In the present paper, we investigate models for the glottal volume velocity waveform. Previously proposed models are reviewed and classified according to their level of elaboration in expressing the glottal characteristics. A new model is then proposed which possesses all the important features of previously proposed models. A method is also described for simultaneously estimating the glottal source and vocal: tract parameters. Using this method, evaluation of glottal model parameters is carried out on real speech by varying the number of parameters in the proposed model. The results indicate the importance of detailed modeling of the period of glottal closure for accurate analysis.


Speech Communication | 2005

Analysis and synthesis of fundamental frequency contours of Standard Chinese using the command–response model

Hiroya Fujisaki; Changfu Wang; Sumio Ohno; Wentao Gu

While the tonal characteristics of Chinese syllables have been qualitatively described in traditional phonetics, quantitative analysis requires a mathematical model. This paper presents such a model for the fundamental frequency contours of Standard Chinese, based on an extension of a model that has already been proved to be applicable to non-tone languages including Japanese, English, and others. The model allows one to interpret a given fundamental frequency contour in terms of tone commands and phrase commands, and to analyze various tonal phenomena in quantitative terms. The paper then describes the results of analysis of fundamental frequency contours of a number of utterances, revealing systematic relationships between the timing of the tone commands and the final of each syllable. The results are used to derive constraints for tone and phrase command generation in speech synthesis. The validity of the rules is confirmed by evaluating the naturalness of prosody of synthetic speech. The validity of introducing these constraints in speech synthesis of Standard Chinese is confirmed by perceptual tests on naturalness of prosody as well as on intelligibility of tones, using speech synthesized with and without these constraints.


international conference on acoustics, speech, and signal processing | 2002

A method for automatic extraction of model parameters from fundamental frequency contours of speech

Shuichi Narusawa; Nobuaki Minematsu; Keikichi Hirose; Hiroya Fujisaki

The process of generating the F0 contour of speech has been modeled quite accurately in mathematical tenns by Fujisaki and his coworkers, but the extraction of parameters of the underlying commands from an observed F0 contour is an inverse problem that can be solved only by successive approximation. In order to guarantee an efficient and accurate search for the solution, one needs to start with a set of initial values that are close enough to the optimum. This paper presents a method for pre-processing a measured F0 contour to obtain its approximation consisting of third-order polynomial segments that are continuous and differentiable everywhere. It is shown that the proposed method allows one to obtain first-order approximations to the parameters of accent commands for about 90% of all the accent commands, and of phrase commands for about 84% of all the phrase commands.


international conference on acoustics, speech, and signal processing | 1987

Estimation of voice source and vocal tract parameters based on ARMA analysis and a model for the Glottal source waveform

Hiroya Fujisaki; Mats Ljungqvist

Conventional speech analysis methods based on linear prediction often fail to separate and estimate the source and vocal-tract characteristics, especially in the case of voiced sounds, because of oversimplified assumptions regarding the voice source. We have already proposed a model that is capable of expressing a wide range of voice source characteristics, and demonstrated that source and vocal-tract parameters can be well separated and correctly estimated, for vowel and vowel-like sounds, by combining the proposed source model with the linear predictive analysis. The present paper extends our approach to apply to a wider variety of speech sounds including nasal vowels and nasal consonants, by combining the proposed source model with the ARMA analysis. The validity of the system was demonstrated by analysis of synthetic and natural speech.


international conference on acoustics speech and signal processing | 1988

Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese

Hiroya Fujisaki; Hisashi Kawai

Although it has been well known that prosody plays an important role both in the intelligibility and in the naturalness of speech, the process of generating natural prosody from linguistic information has not been fully understood. The authors first define units of prosody of spoken Japanese on the basis of analysis of fundamental frequency contours. Prosodic words are defined by the presence of an accent component, while prosodic phrases and clauses are defined by the presence/absence of a pause and resetting/addition of a phrase component. It is shown that the accent components, representing the information concerning lexical word accent, are modified systematically by the syntactic and the discourse information. Classifications of prosodic boundaries are also presented, and the relationship between these two kinds of boundary is described.<<ETX>>


international conference on acoustics, speech, and signal processing | 1983

Analysis and synthesis of speech based on spectral transform linear predictive method

Hynek Hermansky; Hiroya Fujisaki; Yasuo Sato

The Linear Predictive (LP) method has been widely used in speech analysis, mainly because of the simple mathematical formulation of the model and the straightforward computation of its parameters. However, there still remain certain difficulties that cause errors in the result of the analysis. Often encountered are the errors due to harmonic structure of the excitation source. The Spectral Transform LP (STLP) method proposed in the present paper aims at reducing these errors. Amplitude transforms on the input spectrum and on the spectrum of the model are introduced to modify the error criterion and the model adopted in the standard LP analysis. We show by analyses of both synthetic and natural speech that the STLP method offers significant improvement over the standard LP method. A method of STLP speech synthesis using the standard LP model is proposed. A perceptual experiment confirms the superiority of the STLP method in analysis of speech.


international conference on acoustics, speech, and signal processing | 1984

Spectral envelope sampling and interpolation in linear predictive analysis of speech

Hynek Hermansky; Hiroya Fujisaki; Yasuo Sato

In spite of its extensive use, speech analysis based on linear prediction (LP) is liable to various causes of inaccuracy. This paper presents a novel approach to improve the accuracy in the estimation of the voiced speech production model based on the LP method. The presented method uses interpolation between spectral points which are least influenced by artifacts in the spectral analysis and by noise in the signal. We show, on analyses of both synthetic and natural speech, that the averaged parabolic approximation between harmonic peaks of voiced speech spectrum reduces the sensitivity of the LP analysis to changes in the fundamental frequency Fo and to noise. The method is well suited for combination with the Spectral Transform LP method, previously proposed by the authors [1].


international conference on acoustics, speech, and signal processing | 1992

A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag

Keikichi Hirose; Hiroya Fujisaki; Shigenobu Seto

Although pitch extraction schemes based on the short-time autocorrelation function offer reliable results for most of the speech signal, the autocorrelation peaks indicating fundamental period fluctuate with frame position, causing occasional pitch extraction errors. In order to reduce these errors, a scheme using a new definition of normalized short-time autocorrelation function is proposed. One of the major advantages of the definition over conventional ones is that the frame length changes in proportion to the time lag, and, therefore, the input speech can be analyzed without any knowledge of the fundamental frequency range of the speaker. Two methods are proposed for the normalization, and the one compensating for variations of the short-time power of the waveform is shown to offer better results. A system for pitch extraction is constructed on a workstation, and the validity of the proposed scheme is demonstrated by experiments using the connected speech of male and female announcers.<<ETX>>


international conference on acoustics, speech, and signal processing | 1993

Analysis and modeling of word accent and sentence intonation in Swedish

Hiroya Fujisaki; Mats Ljungqvist; Hiroshi Murata

In Swedish, the fundamental frequency contour (F/sub 0/ contour) is known to be the main acoustic feature for word accent and sentence intonation. The authors propose a model for the generation process of F/sub 0/ contours of Swedish, based on an extension of the model for Japanese. As the input to this model, two kinds of command are assumed: phrase commands, which are always positive impulses, and accent commands, which are stepwise functions of both polarities. Analysis-by-synthesis of F/sub 0/ contours of both isolated words and sentences, uttered by two native speakers from the Stockholm region, indicated that the model can always generate very close approximations to observed F/sub 0/ contours, and that the extracted parameters are systematically related to the underlying lexical word accent, syntactic structure, and focus. This demonstrates the basic validity of the model and its utility in speech synthesis.<<ETX>>


international conference on acoustics, speech, and signal processing | 1982

Analysis and synthesis of voice fundamental frequency contours of spoken sentences

Keikichi Hirose; Hiroya Fujisaki

A model is presented for the analysis and synthesis of F 0 -contours of spoken sentences. It is based on a quantitative formulation of the process whereby the logarithmic fundamental frequency is controlled by the phrase and accent commands, being respectively related to the syntactic and lexical information of a sentence. Analysis of natural utterances revealed that the model can generate close approximations to observed F 0 -contours. Requirements for the accuracy of model parameters were determined by listening tests of synthetic utterances. These results indicate the models utility in speech synthesis.

Collaboration


Dive into the Hiroya Fujisaki's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sumio Ohno

Tokyo University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hansjörg Mixdorff

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Changfu Wang

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Hisashi Kawai

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kenji Abe

Tokyo University of Science

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge