Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael S. Scordilis is active.

Publication


Featured researches published by Michael S. Scordilis.


Computer Speech & Language | 2010

Spoken emotion recognition through optimum-path forest classification using glottal features

Alexander I. Iliev; Michael S. Scordilis; João Paulo Papa; Alexandre X. Falcão

A new method for the recognition of spoken emotions is presented based on features of the glottal airflow signal. Its effectiveness is tested on the new optimum path classifier (OPF) as well as on six other previously established classification methods that included the Gaussian mixture model (GMM), support vector machine (SVM), artificial neural networks - multi layer perceptron (ANN-MLP), k-nearest neighbor rule (k-NN), Bayesian classifier (BC) and the C4.5 decision tree. The speech database used in this work was collected in an anechoic environment with ten speakers (5M and 5F) each speaking ten sentences in four different emotions: Happy, Angry, Sad, and Neutral. The glottal waveform was extracted from fluent speech via inverse filtering. The investigated features included the glottal symmetry and MFCC vectors of various lengths both for the glottal and the corresponding speech signal. Experimental results indicate that best performance is obtained for the glottal-only features with SVM and OPF generally providing the highest recognition rates, while for GMM or the combination of glottal and speech features performance was relatively inferior. For this text dependent, multi speaker task the top performing classifiers achieved perfect recognition rates for the case of 6th order glottal MFCCs.


Speech Communication | 2002

Analysis, enhancement and evaluation of five pitch determination techniques

Peter Veprek; Michael S. Scordilis

Abstract Speech classification into voiced and unvoiced (or silent) portions is important in many speech processing applications. In addition, segmentation of voiced speech into individual pitch epochs is necessary in several high quality speech synthesis and coding techniques. This paper introduces criteria for measuring the performance of automatic procedures performing this task against manually segmented and labeled data. First, five basic pitch determination algorithms (PDAs) (SIFT, comb filter energy maximization, spectrum decimation/accumulation, optimal temporal similarity and dyadic wavelet transform) are evaluated and their performance is analyzed. A set of enhancements is then developed and applied to the basic algorithms, which yields superior performance by virtually eliminating multiple and sub-multiple pitch assignment errors and reducing all other errors. Evaluation shows that the enhancements improved performance of all five PDAs with the improvement ranging from 3.5% for the comb filter energy maximization method to 8.3% for the dyadic wavelet transform method.


international conference on acoustics, speech, and signal processing | 1994

Phonemic segmentation of fluent speech

David B. Grayden; Michael S. Scordilis

A hierarchical approach to phonemic segmentation of continuous, speaker-independent speech is presented. Each sentence is divided into distinct obstruent and sonorant regions using a Bayesian decision surface. Rules are then used to make context specific corrections with these regions. Finally, finer segmentation is performed using a number of rules specific to obstruent and sonorant boundaries. Around 80% of the boundaries are located with an insertion rate of 12%. The developed system is suitable for use in phoneme recognition and automatic labelling of speech.<<ETX>>


Speech Communication | 2004

Beam search pruning in speech recognition using a posterior probability-based confidence measure

Sherif M. Abdou; Michael S. Scordilis

In this work we propose the early incorporation of confidence information in the decoding process of large vocabulary speech recognition. A confidence based pruning technique is used to guide the search to the most promising paths. We introduce a posterior probability-based confidence measure that can be estimated efficiently and synchronously from the available information during the search process. The accuracy of this measure is enhanced using a discriminative training technique whose objective is to maximize the discrimination between the correct and incorrect decoding hypotheses. For this purpose, phone-level confidence scores are combined to derive word level scores. Highly compact models that exhibit minimal degradation in performance are introduced. Experimental results using large speech corpora show that the proposed method improves both the decoding accuracy and the decoding time when compared to a baseline recognition system that uses a conventional search approach. Furthermore, the introduced confidence measures are well-suited for cross-task portability.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Speech Enhancement Using Harmonic Emphasis and Adaptive Comb Filtering

Wen Jin; Xin Liu; Michael S. Scordilis; Lu Han

An enhancement method for single-channel speech degraded by additive noise is proposed. A spectral weighting function is derived by constrained optimization to suppress noise in the frequency domain. Two design parameters are included in the suppression gain, namely, the frequency-dependent noise-flooring parameter (FDNFP) and the gain factor. The FDNFP controls the level of admissible residual noise in the enhanced speech. Enhanced harmonic structures are incorporated into the FDNFP by time-domain processing of the linear prediction residuals of voiced speech. Further enhancement of the harmonics is achieved by adaptive comb filtering derived using the gain factor with a peak-picking algorithm. The performance of the enhancement method was evaluated by the modified bark spectral distance (MBSD), ITU-Perceptual Evaluation of Speech Quality (PESQ) scores, composite objective measures and listening tests. Experimental results indicate that the proposed method outperforms spectral subtraction; a main signal subspace method applicable to both white and colored noise conditions and a perceptually based enhancement method with a constant noise-flooring parameter, particularly at lower signal-to-noise ratio conditions. Our listening test indicated that 16 listeners on average preferred the proposed approach over any of the other three approaches about 73% of the time.


Pattern Recognition Letters | 2008

Effective online unsupervised adaptation of Gaussian mixture models and its application to speech classification

Yongxin Zhang; Michael S. Scordilis

Online unsupervised adaptation of statistical classifiers is attractive for many speech processing applications. In this work, we describe an online unsupervised adaptation method for a four-way speech classifier which is based on modelling the universal background model (UBM)-GMM and using confidence scoring in deriving classification results. The aim of the proposed method is to automatically adapt the classifier to mismatched conditions caused by acoustically adverse backgrounds and speaker variability. Extensive analysis of the experimental learning curves shows that the new online unsupervised adaptation algorithm achieves practical convergence. When compared to batch mode adaptation the proposed technique deals effectively with data sparsity and it has significantly lower computational requirements at the expense of a slight sacrifice in classification performance. The proposed algorithm can be readily extended to other mixture families and different expectation-maximization (EM) alternatives for improved performance.


Journal of The Franklin Institute-engineering and Applied Mathematics | 2006

An enhanced psychoacoustic model based on the discrete wavelet packet transform

Xing He; Michael S. Scordilis

The perception of acoustic information by humans is based on the detailed temporal and spectral analysis provided by the auditory processing of the received signal. The incorporation of this process in psychoacoustical computational models has contributed significantly both in the development of highly efficient audio compression schemes as well as in effective audio watermarking methods. In this paper, we present an approach based on the discrete wavelet packet transform, which closely mimics the multi-resolution properties of the human ear and also includes simultaneous and temporal auditory masking. Experimental results show that the proposed technique offers better masking capabilities and it reduces the signal-to-masking ratio when compared to related approaches, without introducing audible distortion. Those results have implications that are important both for audio compression by permitting further bit rate reduction, and for watermarking by providing greater signal space for information hiding.


Research Letters in Signal Processing | 2008

Efficiently synchronized spread-spectrum audio watermarking with improved psychoacoustic model

Xing He; Michael S. Scordilis

This paper presents an audio watermarking scheme which is based on an efficiently synchronized spread-spectrum technique and a new psychoacoustic model computed using the discrete wavelet packet transform. The psychoacoustic model takes advantage of the multiresolution analysis of a wavelet transform, which closely approximates the standard critical band partition. The goal of this model is to include an accurate time-frequency analysis and to calculate both the frequency and temporal masking thresholds directly in the wavelet domain. Experimental results show that this watermarking scheme can successfully embed watermarks into digital audio without introducing audible distortion. Several common watermark attacks were applied and the results indicate that the method is very robust to those attacks.


Speech Communication | 2006

Speech enhancement by residual domain constrained optimization

Wen Jin; Michael S. Scordilis

Abstract A new algorithm for the enhancement of speech corrupted by additive noise is proposed. This algorithm estimates the linear prediction residuals of the clean speech using a constrained optimization criterion. The signal distortion is minimized in the residual domain subject to a constraint on the average power of the noise residuals. Enhanced speech is obtained by exciting the time-varying all-pole synthesis filter with the estimated residuals of the clean speech. The proposed method was tested with speech corrupted by both white Gaussian and colored noise. The enhancement performances were evaluated in terms of segmental signal-to-noise ratio (SNR) and ITU-PESQ scores. Experimental results indicate our method yields better enhancement results than a former residual-weighting scheme [Yegnanarayana, B., Avendano, C., Hermansky, H., Murthy P.S., 1999. Speech enhancement using linear prediction residual. Speech Commun. 28, 25–42]. The proposed method also achieves better noise reduction than the time-domain subspace method [Ephraim, Y., Van Trees, H.L., 1995. A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3, 251–266] on real world colored noise.


Journal of Electrical and Computer Engineering | 2008

Psychoacoustic music analysis based on the discrete wavelet packet transform

Xing He; Michael S. Scordilis

Psychoacoustical computational models are necessary for the perceptual processing of acoustic signals and have contributed significantly in the development of highly efficient audio analysis and coding. In this paper, we present an approach for the psychoacoustic analysis of musical signals based on the discrete wavelet packet transform. The proposed method mimics the multiresolution properties of the human ear closer than other techniques and it includes simultaneous and temporal auditory masking. Experimental results show that this method provides better masking capabilities and it reduces the signal-to-masking ratio substantially more than other approaches, without introducing audible distortion. This model can lead to greater audio compression by permitting further bit rate reduction and more secure watermarking by providing greater signal space for information hiding.

Collaboration


Dive into the Michael S. Scordilis's collaboration.

Top Co-Authors

Avatar

Alexander I. Iliev

University of Wisconsin–Stevens Point

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lu Han

North Carolina State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge