Jiri Pribil
University of West Bohemia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jiri Pribil.
international conference on telecommunications | 2012
Jiri Pribil; Anna Pribilova
The paper is aimed at statistical analysis and comparison of formant features (FF) which describe vocal tract characteristics in emotional and neutral speech of a male and a female voice. This experiment was realized using the Czech and Slovak speech material extracted from the stories performed by professional actors. Statistical results and values will be used for classification of emotional speech types.
Archive | 2011
Jiri Pribil; Anna Pribilova
The methods of analysis of human voice are based on the knowledge of speaker individuality. One of basic studies on speaker acoustic characteristics can be found in (Kuwabara & Sagisaka 1995). According to them the voice individuality is affected by the voice source (the average pitch frequency, the pitch contour, the pitch frequency fluctuation, the glottal wave shape) and the vocal tract (the shape of spectral envelope and spectral tilt, the absolute values of formant frequencies, the formant trajectories, the long-term average speech spectrum, the formant bandwidth). The most important factors on individuality are the pitch frequency and the resonance characteristics of the vocal tract, though the order of the two factors differs in different research studies. According to (Scherer 2003) larynx and pharynx expansion, vocal tract walls relaxation, and mouth corners retraction upward lead to falling first formant and rising higher formants during pleasant emotions. On the other hand, larynx and pharynx constriction, vocal tract walls tension, and mouth corners retraction downward lead to rising first formant and falling higher formants for unpleasant emotions. Thus, the first formant and the higher formants of emotional speech shift in opposite directions in the frequency ranges divided by a frequency between the first and the second formant. In practice, the formant frequencies differ to some extent for different languages and their ranges are overlapped. According to (Stevens 1997) the frequency of vibration of the vocal folds during normal speech production is usually in the range 80 ÷ 160 Hz for adult males, 170 ÷ 340 Hz for adult females, and 250 ÷ 500 Hz for younger children. It means that female pitch frequencies are about twice the male pitch frequencies, pitch frequencies of younger children are about 1.5times higher than those of females and about 3-times higher than those of males. As regards the formant frequencies, females have them on average 20 % higher than males, but the relation between male and female formant frequencies is nonuniform and deviates from a simple scale factor (Fant 2004). Emotional state of a speaker is accompanied by physiological changes affecting respiration, phonation, and articulation. These acoustic changes are transmitted to the ears of the listener and perceived via the auditory perceptual system (Scherer 2003). From literature and our experiments follows that different types of emotions are manifested not only in prosodic patterns (F0, energy, duration) and several voice quality features (e.g. jitter, shimmer, glottal-to-noise excitation ratio, Hammarberg index) (Li et al. 2007) but also by significant
international conference on telecommunications | 2015
Jiri Pribil; Anna Pribilova; Jindrich Matousek
The paper describes an experiment with using the statistical approach based on analysis of variances (ANOVA) and hypothesis tests for detection of artefacts in the synthetic speech produced by the Czech text-to-speech system employing the unit selection principle. In addition, the paper analyses influence of different speech spectral features and supra-segmental parameters as well as the length of the feature vector on the resulting artefact detection accuracy. Other factors which can also have influence on stability of the artefact detection process are analysed, too. Obtained results of performed experiments confirm that the chosen concept works properly and the presented artefact detector can be used as an alternative to the standard listening tests.
international conference on acoustics, speech, and signal processing | 2011
Jiri Pribil; Anna Pribilova
This paper analyzes and compares spectral properties (first three formants position and spectral flatness measure values) and prosodic parameters (F0 and energy, microintonation and jitter) of male and female acted emotional speech in Czech and Slovak languages. Statistical results and values of parameter ratios will be used for modification of the text-to-speech (TTS) system enabling expressive speech production with male / female voices, based on cepstral speech description.
2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE) | 2016
Jiri Pribil; Anna Pribilova; Jindrich Matousek
This paper describes an experiment using the Gaussian mixture models (GMM) for classification of the speaker gender/age and for evaluation of the achieved success in the voice conversion process. The main motivation of the work was to test whether this type of the classifier can be utilized as an alternative approach instead of the conventional listening test in the area of speech evaluation. The proposed two-level GMM classifier was first verified for detection of four age categories (child, young, adult, senior) as well as discrimination of gender for all but childrens voices in Czech and Slovak languages. Then the classifier was applied for gender/age determination of the basic adult male/female original speech together with its conversion. The obtained resulting classification accuracy confirms usability of the proposed evaluation method and effectiveness of the performed voice conversions.
international conference on telecommunications | 2017
Jiri Pribil; Anna Pribilova; Ivan Frollo
The paper is focused at evaluation of successfulness of de-noising of the speech signal recorded in the open-air magnetic resonance imager during phonation for the 3D human vocal tract modeling. Automatic evaluation methods based on classification by Gaussian mixture models (GMM) are described in more detail. The first performed experiments have confirmed that the proposed GMM classifier of the speech quality is functional and fully comparable with the standard evaluation based on the listening test. Our investigations have shown a relatively great influence of the number of mixtures and the type of a covariance matrix on the computational complexity. However, there was less effect of these parameters on variability of the GMM classifier results for different de-noising methods as well as less effect on gender invariability of the results.
international conference on telecommunications | 2016
Jiri Pribil; Anna Pribilova; Jindrich Matousek
The paper describes an experiment using the Gaussian mixture models (GMM) for automatic classification of the speaker age and gender. The developed two-level architecture is compared with the standard one-level GMM classifier in more detail analysing the influence of different number of mixtures and different types of speech features used for GMM gender/age classification and also regarding the computational complexity in dependence on the applied number of used mixtures. Finally, the GMM classification accuracy is compared with the evaluation using the conventional listening test method. The obtained summary results of 92.3 % mean age classification accuracy for the proposed two-level architecture are better than those for the one-level standard architecture (78.7 %) as well as for evaluation by the listening test method (74.6 %). However, the computation complexity in two levels is about twice higher than in one level, either for GMM model creation or for classification phases.
COST 2102 Workshop (Patras) | 2007
Jiri Pribil; Anna Pribilova
international conference on applied electronics | 2012
Jiri Pribil; Anna Pribilova
ACTA IMEKO | 2016
Jiri Pribil; Anna Pribilova; Ivan Frollo