Mark D. Skowronski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark D. Skowronski is active.

Explore More

Publication

Featured researches published by Mark D. Skowronski.

Neural Networks | 2007

2007 Special Issue: Automatic speech recognition using a predictive echo state network classifier

Mark D. Skowronski; John G. Harris

We have combined an echo state network (ESN) with a competitive state machine framework to create a classification engine called the predictive ESN classifier. We derive the expressions for training the predictive ESN classifier and show that the model was significantly more noise robust compared to a hidden Markov model in noisy speech classification experiments by 8+/-1 dB signal-to-noise ratio. The simple training algorithm and noise robustness of the predictive ESN classifier make it an attractive classification engine for automatic speech recognition.

Journal of the Acoustical Society of America | 2004

Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition.

Mark D. Skowronski; John G. Harris

Mel frequency cepstral coefficients (MFCC) are the most widely used speech features in automatic speech recognition systems, primarily because the coefficients fit well with the assumptions used in hidden Markov models and because of the superior noise robustness of MFCC over alternative feature sets such as linear prediction-based coefficients. The authors have recently introduced human factor cepstral coefficients (HFCC), a modification of MFCC that uses the known relationship between center frequency and critical bandwidth from human psychoacoustics to decouple filter bandwidth from filter spacing. In this work, the authors introduce a variation of HFCC called HFCC-E in which filter bandwidth is linearly scaled in order to investigate the effects of wider filter bandwidth on noise robustness. Experimental results show an increase in signal-to-noise ratio of 7 dB over traditional MFCC algorithms when filter bandwidth increases in HFCC-E. An important attribute of both HFCC and HFCC-E is that the algorithms only differ from MFCC in the filter bank coefficients: increased noise robustness using wider filters is achieved with no additional computational cost.

Speech Communication | 2006

Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments

Mark D. Skowronski; John G. Harris

Abstract Previous studies have documented phenomena involving the modification of human speech in special communication circumstances. Whether speaking to a hearing-impaired person (clear speech) or in a noisy environment (Lombard speech), speakers tend to make similar modifications to their normal, conversational speaking style in order to increase the understanding of their message by the listener. One strategy characteristic of the above speech types is to increase consonant power relative to the signal power of adjacent vowels and is referred to as consonant–vowel (CV) ratio boosting. An automated method of speech enhancement using CV ratio boosting is called energy redistribution voiced/unvoiced (ERVU). To characterize the performance of ERVU, 25 listeners responded to 500 words in a two-word, forced-choice experiment in the presence of energetic masking noise. The test material was a vocabulary of confusable monosyllabic words spoken by 8 male and 8 female speakers, and the conditions tested were a control (unmodified speech), ERVU, and a high-pass filter (HPF). Both ERVU and the HPF significantly increased recognition accuracy compared to the control. Nine of the 16 speakers were significantly more intelligible when ERVU or the HPF was used, compared to the control, while no speaker was less intelligible. The results show that ERVU successfully increased intelligibility of speech using a simple automated segmentation algorithm, applicable to a wide variety of communication systems such as cell phones and public address systems.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Noise-Robust Automatic Speech Recognition Using a Predictive Echo State Network

Mark D. Skowronski; John G. Harris

Artificial neural networks have been shown to perform well in automatic speech recognition (ASR) tasks, although their complexity and excessive computational costs have limited their use. Recently, a recurrent neural network with simplified training, the echo state network (ESN), was introduced by Jaeger and shown to outperform conventional methods in time series prediction experiments. We created the predictive ESN classifier by combining the ESN with a state machine framework. In small-vocabulary ASR experiments, we compared the noise-robust performance of the predictive ESN classifier with a hidden Markov model (HMM) as a function of model size and signal-to-noise ratio (SNR). The predictive ESN classifier outperformed an HMM by 8-dB SNR, and both models achieved maximum noise-robust accuracy for architectures with more states and fewer kernels per state. Using ten trials of random sets of training/validation/test speakers, accuracy for the predictive ESN classifier, averaged between 0 and 20 dB SNR, was 81plusmn3%, compared to 61plusmn2% for an HMM. The closed-form regression training for the ESN significantly reduced the computational cost of the network, and the reservoir of the ESN created a high-dimensional representation of the input with memory which led to increased noise-robust classification.

Journal of the Acoustical Society of America | 2006

Acoustic detection and classification of microchiroptera using machine learning : Lessons learned from automatic speech recognition

Mark D. Skowronski; John G. Harris

Current automatic acoustic detection and classification of microchiroptera utilize global features of individual calls (i.e., duration, bandwidth, frequency extrema), an approach that stems from expert knowledge of call sonograms. This approach parallels the acoustic phonetic paradigm of human automatic speech recognition (ASR), which relied on expert knowledge to account for variations in canonical linguistic units. ASR research eventually shifted from acoustic phonetics to machine learning, primarily because of the superior ability of machine learning to account for signal variation. To compare machine learning with conventional methods of detection and classification, nearly 3000 search-phase calls were hand labeled from recordings of five species: Pipistrellus bodenheimeri, Molossus molossus, Lasiurus borealis, L. cinereus semotus, and Tadarida brasiliensis. The hand labels were used to train two machine learning models: a Gaussian mixture model (GMM) for detection and classification and a hidden Markov model (HMM) for classification. The GMM detector produced 4% error compared to 32% error for a baseline broadband energy detector, while the GMM and HMM classifiers produced errors of 0.6 +/- 0.2% compared to 16.9 +/- 1.1% error for a baseline discriminant function analysis classifier. The experiments showed that machine learning algorithms produced errors an order of magnitude smaller than those for conventional methods.

international conference on acoustics, speech, and signal processing | 2002

Increased mfcc filter bandwidth for noise-robust phoneme recognition

Mark D. Skowronski; John G. Harris

Many speech recognition systems use mel-frequency cepstral coefficient (mfcc) feature extraction as a front end. In the algorithm, a speech spectrum passes through a filter bank of mel-spaced triangular filters, and the filter output energies are log-compressed and transformed to the cepstral domain by the OCT. The spacing of filter bank center frequencies mimics the known warped-frequency characteristics of the human auditory system, yet the bandwidths of these filters is not chosen through biological inspiration. Instead they are set by aligning endpoints of the triangle, which is itself an arbitrary shape. It is surprising that for such a popular speech recognition front end, proper analysis or optimization of the filter bandwidths has not been performed. With complex cochlear models, realistic filter shapes that more closely approximate critical bands are used. And these filters, compared to the filters used in mfcc, are considerably wider and overlap with neighboring filters more. We have extended this filter characteristic to the mfcc algorithm and found that the increased filter bandwidth improves recognition performance in clean speech and provides added noise robustness as well.

Journal of the Acoustical Society of America | 2008

Model-based automated detection of echolocation calls using the link detector

Mark D. Skowronski; M. Brock Fenton

The link detector combines a model-based spectral peak tracker with an echo filter to detect echolocation calls of bats. By processing calls in the spectrogram domain, the links detector separates calls that overlap in time, including call harmonics and echoes. The links detector was validated by using an artificial recording environment, including synthetic calls, atmospheric absorption, and echoes, which provided control of signal-to-noise ratio and an absolute ground truth. Maximum hit rate (2% false positive rate) for the links detector was 87% compared to 1.5% for a spectral peak detector. The difference in performance was due to the ability of the links detector to filter out echoes. Detection range varied across species from 13 to more than 20 m due to call bandwidth and frequency range. Global features of calls detected by the links detector were compared to those of synthetic calls. The error in all estimates increased as the range increased, and estimates of minimum frequency and frequency of most energy were more accurate compared to maximum frequency. The links detector combines local and global features to automatically detect calls within the machine learning paradigm and detects overlapping calls and call harmonics in a unified framework.

IEEE Transactions on Biomedical Engineering | 2006

Prediction of Intrauterine Pressure From Electrohysterography Using Optimal Linear Filtering

Mark D. Skowronski; John G. Harris; Dorothee Marossero; Rodney K. Edwards; Tammy Y. Euliano

We propose a method of predicting intrauterine pressure (IUP) from external electrohysterograms (EHG) using a causal FIR Wiener filter. IUP and 8-channel EHG data were collected simultaneously from 14 laboring patients at term, and prediction models were trained and tested using 10-min windows for each patient and channel. RMS prediction error varied between 5-14 mmHg across all patients. We performed a 4-way analysis of variance on the RMS error, which varied across patients, channels, time (test window) and model (train window). The patient-channel interaction was the most significant factor while channel alone was not significant, indicating that different channels produced significantly different RMS errors depending on the patient. The channel-time factor was significant due to single-channel bursty noise, while time was a significant factor due to multichannel bursty noise. The time-model interaction was not significant, supporting the assumption that the random process generating the IUP and EHG signals was stationary. The results demonstrate the capabilities of optimal linear filter in predicting IUP from external EHG and offer insight into the factors that affect prediction error of IUP from multichannel EHG recordings

international symposium on circuits and systems | 2003

Improving the filter bank of a classic speech feature extraction algorithm

Mark D. Skowronski; John G. Harris

The most popular speech feature extractor used in automatic speech recognition (ASR) systems today is the mel frequency cepstral coefficient (MFCC) algorithm. Introduced in 1980, the filter bank-based algorithm eventually replaced linear prediction cepstral coefficients (LPCC) as the premier front end, primarily because of MFCCs superior robustness to additive noise. However, MFCC does not approximate the critical bandwidth of the human auditory system. We propose a novel scheme for decoupling filter bandwidth from other filter bank parameters, and we demonstrate improved noise robustness over three versions of MFCC through HMM-based experiments with the English digits in various noise environments.

Journal of the Acoustical Society of America | 2002

Human factor cepstral coefficients

Mark D. Skowronski; John G. Harris

Automatic speech recognition (ASR) is an emerging field with the goal of creating a more natural man/machine interface. The single largest obstacle to widespread use of ASR technology is robustness to noise. Since human speech recognition greatly outperforms current ASR systems in noisy environments, ASR systems seek to improve noise robustness by drawing on biological inspiration. Most ASR front ends employ mel frequency cepstral coefficients (mfcc) which is a filter bank‐based algorithm whose filters are spaced on a linear‐log frequency scale. Although center frequency is based on a perceptually motivated frequency scale, filter bandwidth is set by filter spacing and not through biological motivation. The coupling of filter bandwidth to other filter bank parameters (frequency range, number of filters) has led to variations of the original algorithm with different filter bandwidths. In this work, a novel extension to mfcc is introduced which decouples filter bandwidth from the rest of the filter bank par...

Explore More