Alexander I. Iliev
University of Wisconsin–Stevens Point
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexander I. Iliev.
Computer Speech & Language | 2010
Alexander I. Iliev; Michael S. Scordilis; João Paulo Papa; Alexandre X. Falcão
A new method for the recognition of spoken emotions is presented based on features of the glottal airflow signal. Its effectiveness is tested on the new optimum path classifier (OPF) as well as on six other previously established classification methods that included the Gaussian mixture model (GMM), support vector machine (SVM), artificial neural networks - multi layer perceptron (ANN-MLP), k-nearest neighbor rule (k-NN), Bayesian classifier (BC) and the C4.5 decision tree. The speech database used in this work was collected in an anechoic environment with ten speakers (5M and 5F) each speaking ten sentences in four different emotions: Happy, Angry, Sad, and Neutral. The glottal waveform was extracted from fluent speech via inverse filtering. The investigated features included the glottal symmetry and MFCC vectors of various lengths both for the glottal and the corresponding speech signal. Experimental results indicate that best performance is obtained for the glottal-only features with SVM and OPF generally providing the highest recognition rates, while for GMM or the combination of glottal and speech features performance was relatively inferior. For this text dependent, multi speaker task the top performing classifiers achieved perfect recognition rates for the case of 6th order glottal MFCCs.
international conference on systems signals and image processing | 2007
Alexander I. Iliev; Yongxin Zhang; Michael S. Scordilis
This study investigated the usefulness of ToBI marks in determining the emotional state conveyed in speech. The Gaussian mixture model GMM used was as the classifier structure. A total of three different classification systems were developed based on the use of three different feature vectors. They were: (a) the classical approach that used signal pitch and energy features; (b) a ToBI-only feature based on tone and break tiers; and (c) a system that used the features of both (a) and (b). In ToBI, tone tier elements were automatically determined using pitch information. Three emotional states were investigated: happiness, anger, and sadness. The overall success rate achieved for the combined system was between 75% and 100%. This work indicated that the ToBI features alone were very useful for the classification of emotion, and detection improves when classical features are used in conjunction with ToBI.
international conference on systems, signals and image processing | 2008
Alexander I. Iliev; Michael S. Scordilis
This study deals with the recognition of three emotional states in speech, namely: happiness, anger, and sadness. The corpus included speech from six subjects (3M and 3F) speaking ten sentences. Glottal inverse filtering was first performed on the spoken utterances. Then parameters for computing the glottal symmetry were collected and computed to create a final matrix of features. A combined with all emotions across the different subjects was formed and used to train a Gaussian mixture model (GMM) classifier. Training on 80% of all combined utterances for each emotion was performed. Testing was administered on the remaining 20%. The system shows confidence that glottal information may be used for determining the correct emotion in speech. The recognition performance varied between 48.96% and 82.29%.
computer systems and technologies | 2017
Alexander I. Iliev; Peter Stanchev
In an attempt to establish an improved service-oriented architecture (SOA) for interoperable and customizable access of digital cultural resources an automatic deterministic technique can potentially lead to the improvement of searching, recommending and personalizing of content. Such technique can be developed in many ways using different means for data search and analysis. This paper focuses on the use of voice and emotion recognition in speech as a main vehicle for delivering an alternative way to develop novel solutions for integrating the loosely connected components that exchange information based on a common data model. The parameters used to construct the feature vectors for analysis carried pitch, temporal and duration information. They were compared to the glottal symmetry extracted from the speech source using inverse filtering. A comparison to their first derivatives was also a subject of investigation in this paper. The speech source was a 100-minute long theatrical play containing four male speakers and was recorder at 8kHz with 16-bit sample resolution. Four emotional states were targeted namely: happy, angry, fear, and neutral. Classification was performed using k-Nearest Neighbor method. Training and testing experiments were performed in three scenarios: 60/40, 70/30 and 80/20 minutes respectively. A close comparison of each feature and its rate of change show that the time-domain features perform better while using lesser computational strain than their first derivative counterparts. Furthermore, a correct recognition rate was achieved of up 95% using the chosen features.
asilomar conference on signals, systems and computers | 2004
Alexander I. Iliev; Michael S. Scordilis
This novel multi level encoding technique is using psychoacoustical principles of binaural hearing in the phase spectrum of stereo signals. A frequency-dependent threshold of the interaural phase difference is computed, and then used to determine locations suitable for data insertion. Once established, the threshold is divided into several sublevels, each carrying few bits of data. The disturbance in the phase is done in a fashion that mimics the original signal thus introducing little noise. Listening tests have been performed to confirm that using this method results in a signal indistinguishable from the original audio. This technique provides payload of over 105 kb/s for CD-quality audio.
Journal of the Acoustical Society of America | 2002
Alexander I. Iliev; Michael S. Scordilis
Researchers have established that in binaural hearing the smallest detectable angular separation between two sources, commonly referred to as the minimal audible angle (MAA), for a pair of sources on the horizontal plane depends on the frequency of the emitted pure tone and the azimuth angular separation between the sources. One interesting way is to view the sources’ angular perturbation within the MAA limits as noise in the phase domain, and the listener inability to detect this perturbation as the result of a masking process. The present discussion focuses on experimental procedures for examining the perception of MAA and the corresponding interaural phase difference (IPD) when complex sound sources are located in the most sensitive region, which is directly in front of the observer (both azimuth and elevation angles at 0° degrees). Sound stimuli were viewed as the linear combination of pure tones, as provided by Fourier analysis. Results indicate that masking is achieved when the IPD is disturbed with...
management of emergent digital ecosystems | 2017
Peter Stanchev; Desislava Paneva-Marinova; Alexander I. Iliev
The worlds digital content and media is growing rapidly at a never stopping rate. There are millions of digital media assets on display through mobile devices, home entertainment systems or computers. The vast pool of visual and audio information has to therefore be grouped in different ecosystems depending on their nature or intended audience to simplify the problem of searching, finding and personalizing datasets on demand. Though such is the case for the Digital Cultural Ecosystems, we still need to introduce number of smart methodologies to make the process of narrowing down vast number of digital assets in order to arrive at a desirable media and essentially personalize and automate the approach. In this paper, we propose a method that deals with the detection, extraction and personalization of media assets applied to the world of digital cultural ecosystems.
asilomar conference on signals, systems and computers | 2004
Wen Jin; Michael S. Scordilis; Alexander I. Iliev
An audio resampler is built to convert the sampling rate from 44.1 kHz to 16 kHz. An interpolator and decimator are created separately to change the sampling rate by an integer factor. The low pass filters used by the interpolator and decimator are obtained by two different approaches: namely the window method and the equiripple design. Vie performances of these two filters are evaluated in terms of 3-dB bandwidth and pass-band signal to noise ratios. The filters are implemented using the multistage interpolated FIR approach. Polyphase structures are used to further reduce the computational load. The resampler is implemented on 16-bit fixed-point simulations. The described approach drastically reduces the filter order and thus the computational cost.
Archive | 2001
Alexander I. Iliev; Michael S. Scordilis
EURASIP Journal on Advances in Signal Processing | 2011
Alexander I. Iliev; Michael S. Scordilis