Dimitrios Ververidis
Aristotle University of Thessaloniki
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dimitrios Ververidis.
Speech Communication | 2006
Dimitrios Ververidis; Constantine Kotropoulos
In this paper we overview emotional speech recognition having in mind three goals. The first goal is to provide an up-to-date record of the available emotional speech data collections. The number of emotional states, the language, the number of speakers, and the kind of speech are briefly addressed. The second goal is to present the most frequent acoustic features used for emotional speech recognition and to assess how the emotion affects them. Typical features are the pitch, the formants, the vocal tract cross-section areas, the mel-frequency cepstral coefficients, the Teager energy operator-based features, the intensity of the speech signal, and the speech rate. The third goal is to review appropriate techniques in order to classify speech into emotional states. We examine separately classification techniques that exploit timing information from which that ignore it. Classification techniques based on hidden Markov models, artificial neural networks, linear discriminant analysis, k-nearest neighbors, support vector machines are reviewed.
international conference on acoustics, speech, and signal processing | 2004
Dimitrios Ververidis; Constantine Kotropoulos; Ioannis Pitas
Our purpose is to design a useful tool which can be used in psychology to automatically classify utterances into five emotional states such as anger, happiness, neutral, sadness, and surprise. The major contribution of the paper is to rate the discriminating capability of a set of features for emotional speech recognition. A total of 87 features has been calculated over 500 utterances from the Danish Emotional Speech database. The sequential forward selection method (SFS) has been used in order to discover a set of 5 to 10 features which are able to classify the utterances in the best way. The criterion used in SFS is the cross-validated correct classification score of one of the following classifiers: nearest mean and Bayes classifier where class pdf are approximated via Parzen windows or modelled as Gaussians. After selecting the 5 best features, we reduce the dimensionality to two by applying principal component analysis. The result is a 51.6% /spl plusmn/ 3% correct classification rate at 95% confidence interval for the five aforementioned emotions, whereas a random classification would give a correct classification rate of 20%. Furthermore, we find out those two-class emotion recognition problems whose error rates contribute heavily to the average error and we indicate that a possible reduction of the error rates reported in this paper would be achieved by employing two-class classifiers and combining them.
Signal Processing | 2008
Dimitrios Ververidis; Constantine Kotropoulos
This paper addresses subset feature selection performed by the sequential floating forward selection (SFFS). The criterion employed in SFFS is the correct classification rate of the Bayes classifier assuming that the features obey the multivariate Gaussian distribution. A theoretical analysis that models the number of correctly classified utterances as a hypergeometric random variable enables the derivation of an accurate estimate of the variance of the correct classification rate during cross-validation. By employing such variance estimate, we propose a fast SFFS variant. Experimental findings on Danish emotional speech (DES) and speech under simulated and actual stress (SUSAS) databases demonstrate that SFFS computational time is reduced by 50% and the correct classification rate for classifying speech into emotional states for the selected subset of features varies less than the correct classification rate found by the standard SFFS. Although the proposed SFFS variant is tested in the framework of speech emotion recognition, the theoretical results are valid for any classifier in the context of any wrapper algorithm.
IEEE Transactions on Signal Processing | 2008
Dimitrios Ververidis; Constantine Kotropoulos
In this paper, the expectation-maximization (EM) algorithm for Gaussian mixture modeling is improved via three statistical tests. The first test is a multivariate normality criterion based on the Mahalanobis distance of a sample measurement vector from a certain Gaussian component center. The first test is used in order to derive a decision whether to split a component into another two or not. The second test is a central tendency criterion based on the observation that multivariate kurtosis becomes large if the component to be split is a mixture of two or more underlying Gaussian sources with common centers. If the common center hypothesis is true, the component is split into two new components and their centers are initialized by the center of the (old) component candidate for splitting. Otherwise, the splitting is accomplished by a discriminant derived by the third test. This test is based on marginal cumulative distribution functions. Experimental results are presented against seven other EM variants both on artificially generated data-sets and real ones. The experimental results demonstrate that the proposed EM variant has an increased capability to find the underlying model, while maintaining a low execution time.
international conference on multimedia and expo | 2005
Dimitrios Ververidis; Constantine Kotropoulos
Emotional speech classification can be treated as a supervised learning task where the statistical properties of emotional speech segments are the features and the emotional styles form the labels. The Akaike criterion is used for estimating automatically the number of Gaussian densities that model the probability density function of the emotional speech features. A procedure for reducing the computational burden of crossvalidation in sequential floating forward selection algorithm is proposed that applies the t-test on the probability of correct classification for the Bayes classifier designed for various feature sets. For the Bayes classifier, the sequential floating forward selection algorithm is found to yield a higher probability of correct classification by 3% than that of the sequential forward selection algorithm either taking into account the gender information or ignoring it. The experimental results indicate that the utterances from isolated words and sentences are more colored emotional than those from paragraphs. Without taking into account the gender information, the probability of correct classification for the Bayes classifier admits a maximum when the probability density function of emotional speech features extracted from the aforementioned utterances is modeled as a mixture of 2 Gaussian densities
iberoamerican congress on pattern recognition | 2006
Michal Haindl; Petr Somol; Dimitrios Ververidis; Constantine Kotropoulos
Feature selection is a critical procedure in many pattern recognition applications. There are two distinct mechanisms for feature selection namely the wrapper methods and the filter methods. The filter methods are generally considered inferior to wrapper methods, however wrapper methods are computationally more demanding than filter methods. A novel filter feature selection method based on mutual correlation is proposed. We assess the classification performance of the proposed filter method by using the selected features to the Bayes classifier. Alternative filter feature selection methods that optimize either the Bhattacharrrya distance or the divergence are also tested. Furthermore, wrapper feature selection techniques employing several search strategies such as the sequential forward search, the oscillating search, and the sequential floating forward search are also included in the comparative study. A trade off between the classification accuracy and the feature set dimensionality is demonstrated on both two benchmark datasets from UCI repository and two emotional speech data collections.
international symposium on circuits and systems | 2005
Dimitrios Ververidis; Constantine Kotropoulos
The classification of utterances into five basic emotional states is studied. A total of 87 statistical characteristics of pitch, energy, and formants is extracted from 500 utterances of the Danish emotional speech database. An evaluation of the classification capability of each feature is performed with respect to the probability of correct classification achieved by the Bayes classifier that models the feature probability density function as a mixture of Gaussian densities. Next, the feature subset that yields the highest probability of correct classification is found using the sequential floating forward selection algorithm. The probability of correct classification is estimated via cross-validation and the probability density functions are modelled as mixtures of 2 or 3 Gaussian densities. The results demonstrate that the Bayes classifier which employs mixtures of 2 Gaussian densities can achieve a probability of correct classification equal to 0.55, whereas the human classification score is 0.67 for the database considered and random classification would give a probability of correct classification equal to 0.20.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2009
Dimitrios Ververidis; Constantine Kotropoulos
When an infinite training set is used, the Mahalanobis distance between a pattern measurement vector of dimensionality D and the center of the class it belongs to is distributed as a chi2 with D degrees of freedom. However, the distribution of Mahalanobis distance becomes either Fisher or Beta depending on whether cross validation or resubstitution is used for parameter estimation in finite training sets. The total variation between chi2 and Fisher, as well as between chi2 and Beta, allows us to measure the information loss in high dimensions. The information loss is exploited then to set a lower limit for the correct classification rate achieved by the Bayes classifier that is used in subset feature selection.
Neurocomputing | 2007
Vassiliki Moschou; Dimitrios Ververidis; Constantine Kotropoulos
Two well-known variants of the self-organizing map (SOM) that are based on multivariate order statistics are the marginal median SOM and the vector median SOM. In the past, their efficiency was demonstrated for color image quantization. We employ the well-known IRIS and VOWEL data sets and we assess the SOM variants performance with respect to the accuracy, the average over all neurons mean squared error between the patterns that were assigned to a neuron and the neurons weight vector, the Rand index, the @C statistic, and the overall entropy. All figures of merit favor the marginal median SOM and the vector median SOM against the standard SOM. Based on the aforementioned findings, the marginal median SOM and the vector median SOM are used to redistribute emotional speech patterns from the Danish Emotional Speech database, which were originally classified as being neutral, to the emotional states of hot anger, happiness, sadness, and surprise.
IEEE Transactions on Biomedical Engineering | 2011
Dimitrios Ververidis; M. van Gils; Christina Passath; Jukka Takala; Lukas Brander
Neurally adjusted ventilatory assist (NAVA) delivers airway pressure (P<sub>aw</sub>) in proportion to the electrical activity of the diaphragm (EAdi) using an adjustable proportionality constant (NAVA level, cm·H<sub>2</sub>O/μV). During systematic increases in the NAVA level, feedback-controlled down-regulation of the EAdi results in a characteristic two-phased response in P<sub>aw</sub> and tidal volume (Vt). The transition from the 1st to the 2nd response phase allows identification of adequate unloading of the respiratory muscles with NAVA (NAVA<sub>AL</sub>). We aimed to develop and validate a mathematical algorithm to identify NAVA<sub>AL</sub>. P<sub>aw</sub>, Vt, and EAdi were recorded while systematically increasing the NAVA level in 19 adult patients. In a multistep approach, inspiratory P<sub>aw</sub> peaks were first identified by dividing the EAdi into inspiratory portions using Gaussian mixture modeling. Two polynomials were then fitted onto the curves of both P<sub>aw</sub> peaks and Vt. The beginning of the P<sub>aw</sub> and Vt plateaus, and thus NAVA<sub>AL</sub>, was identified at the minimum of squared polynomial derivative and polynomial fitting errors. A graphical user interface was developed in the Matlab computing environment. Median NAVA<sub>AL</sub> visually estimated by 18 independent physicians was 2.7 (range 0.4 to 5.8) cm·H<sub>2</sub>O/μV and identified by our model was 2.6 (range 0.6 to 5.0) cm·H<sub>2</sub>O/μV. NAVA<sub>AL</sub> identified by our model was below the range of visually estimated NAVA<sub>AL</sub> in two instances and was above in one instance. We conclude that our model identifies NAVA<sub>AL</sub> in most instances with acceptable accuracy for application in clinical routine and research.