V. N. Sorokin
Russian Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by V. N. Sorokin.
Speech Communication | 2000
V. N. Sorokin; Alexander S. Leonov; Alexander V. Trushkin
Abstract The inverse problem for the vocal tract is under consideration from the viewpoint of the ill-posed problem theory. The proposed approach, which permits overcoming the difficulties related to ambiguity and instability, is based on the variational regularization with constraints. The work of articulators is used as a functional of regularization and a criterion of optimality for finding an approximate solution. The measured acoustical parameters of the speech signal serve as external constraints while the geometry of the vocal tract, the mechanics of the articulation, and the phonetic properties of the language play the role of internal constraints. An effective numerical implementation of the proposed approach is based on a local piecewise linear approximation of the articulatory-to-acoustics mapping and a polynomial approximation of the discrepancy measure. A heuristic method named the “calibrating curves method” is applied for estimating the accuracy of the obtained approximate solution. It was shown that in some cases the error of the inverse problem solution is weakly dependent on the errors of formant frequency measurements. The vocal tract shapes obtained by virtue of the proposed approach are very close to those measured in X-ray experiments.
Speech Communication | 1992
V. N. Sorokin
Abstract The inverse problem for vocal tract shape, area function and articulatory parameters was solved for steady-state vowels by means of an optimization procedure requiring the conditional minimum of work on the part of the articulatory organs. One to four formant frequencies were used as references. The shape of the tongue was measured with an X-ray microbeam system for male and female speakers. The shapes of the vocal tract calculated for the experiments are very similar to the measured shapes.
Acoustical Physics | 2008
V. N. Sorokin; I. S. Makarov
Efficiency of automatic recognition of male and female voices based on solving the inverse problem for glottis area dynamics and for waveform of the glottal airflow volume velocity pulse is studied. The inverse problem is regularized through the use of analytical models of the voice excitation pulse and of the dynamics of the glottis area, as well as the model of one-dimensional glottal airflow. Parameters of these models and spectral parameters of the volume velocity pulse are considered. The following parameters are found to be most promising: the instant of maximum glottis area, the maximum derivative of the area, the slope of the spectrum of the glottal airflow volume velocity pulse, the amplitude ratios of harmonics of this spectrum, and the pitch. On the plane of the first two main components in the space of these parameters, an almost twofold decrease in the classification error relative to that for the pitch alone is attained. The male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%.
Speech Communication | 1994
V. N. Sorokin
Abstract Articulatory parameters, vocal tract shape and cross-sectional area function were determined from fricative spectra. A model of fricative generation was used for providing acoustical constraints for an optimization procedure with muscles work as the criterion of optimality. A distance between spectra was measured with the use of the Cauchy-Bounjakovsky non-equality. A proper initial approximation of articulatory parameters is required to obtain an accurate and stable solution of the inverse problem.
Speech Communication | 1998
V. N. Sorokin; Vladimir Olshansky; Leonid Kozhanov
Abstract A compensating ability of the articulatory control system for laryngectomized patients was studied. X-rays of the vocal tract and acoustic measurements were carried out on three patients before and after the operation, using the trachea – esophagus bypass. Within two weeks of the operation, the patients produced the Russian vowels / a, u, i / with formant frequencies closer to the phonetic norm than before the operation. After two years, two patients produced the vowels with normal formant parameters. The acoustical characteristics of speech after the operation were measured on 14 patients. 1 to 2 years after the operation, four patients were able to make voicing–unvoicing distinction. One patient has recovered complete control of the vocal source. The results obtained imply that the adaptation of the articulatory control system to the distorted conditions of articulation and voice generation can be governed, not only by acoustical parameters like formant frequencies, but also by such a complex phonetic element as the voicing cue. The control system has demonstrated its ability to reorganize the activity of articulatory muscles and to transfer the functions of the excised laryngeal muscles to the muscles that had never been used for voice control. The implication of the observed phenomena for the concept of internal model is being discussed.
Acoustical Physics | 2009
A. S. Leonov; I. S. Makarov; V. N. Sorokin
The paper examines physical mechanisms of frequency modulations in acoustics of the vocal tract and methods of estimation of these modulations in the speech signal. It has been found that vibrations of the tract walls make a negligibly small effect on modulations of its resonance frequencies. The model of the process of speech formation with account for the subglottal cavity shows that a change in boundary conditions at the open glottis produces noticeable variations in resonance frequencies. Along with this type of modulations, modulations determined by the shape of the source of excitation also arise in the speech signal. They substantially depend on the ratio of the frequency of the fundamental tone to the resonance frequency and of the parameters of methods estimating modulations and methods of analysis of the speech signal. Overall, this may sometimes cause unstable and unpredictable modulations of estimated formant frequencies in the speech signal.
Acoustical Physics | 2016
V. N. Sorokin
The extrema of the logarithmic derivative of the mean energy of a voice signal in the frequency range of 1000–3000 Hz are used to determine the instants of opening and closure of the glottis. The inaccuracy of analysis is estimated with the Arctic CMU database, which contains synchronous recordings of speech signals and electro-glottograms. The estimates of the instants of opening and closure of the glottis, found by the developed algorithm, are compared with the instants of the maximum and minimum of the derivative from electro-glottogram signals, which are taken as the “true” instants. The mean square deviation of the glottal opening instant from the extrema of the derivative from the electro-glottogram signals for different speakers is in the range of 1.03–1.64 ms. The error rate of a false estimate of the glottal opening instant is from 0.01 to 0.14%, and the error rate of omission is from 0.42 to 2.38%. An error-detection algorithm is developed. The mean square deviation with an relative—to the period of the fundamental tone—error in detecting the glottal opening instant is in the range of 13–18% for the most probable error from 0 to +5%.
Acoustical Physics | 2014
A. S. Leonov; V. N. Sorokin
The paper studies the asymptotic behavior of the function for the area of the glottis near moments of its opening and closing for two mathematical voice source models. It is shown that in the first model, the asymptotics of the area function obeys a power law with an exponent of no less that 1. Detailed analysis makes it possible to refine these limits depending on the relative sizes of the intervals of a closed and open glottis. This work also studies another parametric model of the area of the glottis, which is based on a simplified physical-geometrical representation of vocal-fold vibration processes. This is a special variant of the well-known two-mass model and contains five parameters: the period of the main tone, equivalent masses on the lower and upper edge of vocal folds, the coefficient of elastic resistance of the lower vocal fold, and the delay time between openings of the upper and lower folds. It is established that the asymptotics of the obtained function for the area of the glottis obey a power law with an exponent of 1 both for opening and closing.
Acoustical Physics | 2005
Pierre Badin; I. S. Makarov; V. N. Sorokin
It is found that the articulation process is accompanied by active variations of the pharynx width. To describe the latter, a linear combination of two width eigenvectors with varying coefficients is proposed. A new algorithm is constructed for calculating the cross-section areas of the vocal tract. The algorithm takes into account not only the anatomic parameters and the shape of the tract in the saggital plane but also the parameters in the lateral and axial planes.
Acoustical Physics | 2004
I. S. Makarov; V. N. Sorokin
The calculation of the resonance frequencies from experimental cross-sectional areas of a vocal tract under the assumption that its walls are perfectly rigid provides values that noticeably differ from the measured resonance frequencies. The compliance of the walls affects the first resonance and almost does not affect the higher-order resonances. The presence of branching in the tract at the level of the larynx affects the second and third resonances stronger than the first resonance. The parameters of the wall impedance (the loss, mass, and elasticity) and the length and cross-sectional area of the branchings are determined by minimizing the rms discrepancy between the measured and calculated resonance frequencies. The error in the frequency calculation with allowance for the wall compliance and branching in the tract proves to be within the accuracy of the formant estimation.