Michael T. Johnson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael T. Johnson is active.

Explore More

Publication

Featured researches published by Michael T. Johnson.

IEEE Transactions on Knowledge and Data Engineering | 2004

Time series classification using Gaussian mixture models of reconstructed phase spaces

Richard J. Povinelli; Michael T. Johnson; Andrew C. Lindgren; Jinjin Ye

A new signal classification approach is presented that is based upon modeling the dynamics of a system as they are captured in a reconstructed phase space. The modeling is done using full covariance Gaussian mixture models of time domain signatures, in contrast with current and previous work in signal classification that is typically focused on either linear systems analysis using frequency content or simple nonlinear machine learning models such as artificial neural networks. The proposed approach has strong theoretical foundations based on dynamical systems and topological theorems, resulting in a signal reconstruction, which is asymptotically guaranteed to be a complete representation of the underlying system, given properly chosen parameters. The algorithm automatically calculates these parameters to form appropriate reconstructed phase spaces, requiring only the number of mixtures, the signals, and their class labels as input. Three separate data sets are used for validation, including motor current simulations, electrocardiogram recordings, and speech waveforms. The results show that the proposed method is robust across these diverse domains, significantly outperforming the time delay neural network used as a baseline.

Speech Communication | 2007

Speech signal enhancement through adaptive wavelet thresholding

Michael T. Johnson; Xiaolong Yuan; Yao Ren

This paper demonstrates the application of the Bionic Wavelet Transform (BWT), an adaptive wavelet transform derived from a non-linear auditory model of the cochlea, to the task of speech signal enhancement. Results, measured objectively by Signal-to-Noise ratio (SNR) and Segmental SNR (SSNR) and subjectively by Mean Opinion Score (MOS), are given for additive white Gaussian noise as well as four different types of realistic noise environments. Enhancement is accomplished through the use of thresholding on the adapted BWT coefficients, and the results are compared to a variety of speech enhancement techniques, including Ephraim Malah filtering, iterative Wiener filtering, and spectral subtraction, as well as to wavelet denoising based on a perceptually scaled wavelet packet transform decomposition. Overall results indicate that SNR and SSNR improvements for the proposed approach are comparable to those of the Ephraim Malah filter, with BWT enhancement giving the best results of all methods for the noisiest (-10db and -5db input SNR) conditions. Subjective measurements using MOS surveys across a variety of 0db SNR noise conditions indicate enhancement quality competitive with but still lower than results for Ephraim Malah filtering and iterative Wiener filtering, but higher than the perceptually scaled wavelet method.

international conference on acoustics, speech, and signal processing | 2007

Stress and Emotion Classification using Jitter and Shimmer Features

Xi Li; Jidong Tao; Michael T. Johnson; Joseph Soltis; Anne Savage; Kirsten M. Leong; John D. Newman

In this paper, we evaluate the use of appended jitter and shimmer speech features for the classification of human speaking styles and of animal vocalization arousal levels. Jitter and shimmer features are extracted from the fundamental frequency contour and added to baseline spectral features, specifically Mel-frequency cepstral coefficients (MFCCs) for human speech and Greenwood function cepstral coefficients (GFCCs) for animal vocalizations. Hidden Markov models (HMMs) with Gaussian mixture models (GMMs) state distributions are used for classification. The appended jitter and shimmer features result in an increase in classification accuracy for several illustrative datasets, including the SUSAS dataset for human speaking styles as well as vocalizations labeled by arousal level for African elephant and Rhesus monkey species.

Biological Reviews | 2016

Acoustic sequences in non-human animals: a tutorial review and prospectus

Arik Kershenbaum; Daniel T. Blumstein; Marie A. Roch; Çağlar Akçay; Gregory A. Backus; Mark A. Bee; Kirsten Bohn; Yan Cao; Gerald G. Carter; Cristiane Cäsar; Michael H. Coen; Stacy L. DeRuiter; Laurance R. Doyle; Shimon Edelman; Ramon Ferrer-i-Cancho; Todd M. Freeberg; Ellen C. Garland; Morgan L. Gustison; Heidi E. Harley; Chloé Huetz; Melissa Hughes; Julia Hyland Bruno; Amiyaal Ilany; Dezhe Z. Jin; Michael T. Johnson; Chenghui Ju; Jeremy Karnowski; Bernard Lohr; Marta B. Manser; Brenda McCowan

Animal acoustic communication often takes the form of complex sequences, made up of multiple distinct acoustic units. Apart from the well‐known example of birdsong, other animals such as insects, amphibians, and mammals (including bats, rodents, primates, and cetaceans) also generate complex acoustic sequences. Occasionally, such as with birdsong, the adaptive role of these sequences seems clear (e.g. mate attraction and territorial defence). More often however, researchers have only begun to characterise – let alone understand – the significance and meaning of acoustic sequences. Hypotheses abound, but there is little agreement as to how sequences should be defined and analysed. Our review aims to outline suitable methods for testing these hypotheses, and to describe the major limitations to our current and near‐future knowledge on questions of acoustic sequences. This review and prospectus is the result of a collaborative effort between 43 scientists from the fields of animal behaviour, ecology and evolution, signal processing, machine learning, quantitative linguistics, and information theory, who gathered for a 2013 workshop entitled, ‘Analysing vocal sequences in animals’. Our goal is to present not just a review of the state of the art, but to propose a methodological framework that summarises what we suggest are the best practices for research in this field, across taxa and across disciplines. We also provide a tutorial‐style introduction to some of the most promising algorithmic approaches for analysing sequences. We divide our review into three sections: identifying the distinct units of an acoustic sequence, describing the different ways that information can be contained within a sequence, and analysing the structure of that sequence. Each of these sections is further subdivided to address the key questions and approaches in that area. We propose a uniform, systematic, and comprehensive approach to studying sequences, with the goal of clarifying research terms used in different fields, and facilitating collaboration and comparative studies. Allowing greater interdisciplinary collaboration will facilitate the investigation of many important questions in the evolution of communication and sociality.

IEEE Signal Processing Letters | 2005

Capacity and complexity of HMM duration modeling techniques

Michael T. Johnson

The ability of a standard hidden Markov model (HMM) or expanded state HMM (ESHMM) to accurately model duration distributions of phonemes is compared with specific duration-focused approaches such as semi-Markov models or variable transition probabilities. It is demonstrated that either a three-state ESHMM or a standard HMM with an increased number of states is capable of closely matching both Gamma distributions and duration distributions of phonemes from the TIMIT corpus, as measured by Bhattacharyya distance to the true distributions. Standard HMMs are easily implemented with off-the-shelf tools, whereas duration models require substantial algorithmic development and have higher computational costs when implemented, suggesting that a simple adjustment to HMM topologies is perhaps a more efficient solution to the problem of duration than more complex approaches.

Journal of the Acoustical Society of America | 2003

Automatic classification and speaker identification of African elephant (Loxodonta africana) vocalizations

Patrick J. Clemins; Michael T. Johnson; Kirsten M. Leong; Anne Savage

A hidden Markov model (HMM) system is presented for automatically classifying African elephant vocalizations. The development of the system is motivated by successful models from human speech analysis and recognition. Classification features include frequency-shifted Mel-frequency cepstral coefficients (MFCCs) and log energy, spectrally motivated features which are commonly used in human speech processing. Experiments, including vocalization type classification and speaker identification, are performed on vocalizations collected from captive elephants in a naturalistic environment. The system classified vocalizations with accuracies of 94.3% and 82.5% for type classification and speaker identification classification experiments, respectively. Classification accuracy, statistical significance tests on the model parameters, and qualitative analysis support the effectiveness and robustness of this approach for vocalization analysis in nonhuman species.

IEEE Transactions on Signal Processing | 2006

Statistical models of reconstructed phase spaces for signal classification

Richard J. Povinelli; Michael T. Johnson; Andrew C. Lindgren; Felice M. Roberts; Jinjin Ye

This paper introduces a novel approach to the analysis and classification of time series signals using statistical models of reconstructed phase spaces. With sufficient dimension, such reconstructed phase spaces are, with probability one, guaranteed to be topologically equivalent to the state dynamics of the generating system, and, therefore, may contain information that is absent in analysis and classification methods rooted in linear assumptions. Parametric and nonparametric distributions are introduced as statistical representations over the multidimensional reconstructed phase space, with classification accomplished through methods such as Bayes maximum likelihood and artificial neural networks (ANNs). The technique is demonstrated on heart arrhythmia classification and speech recognition. This new approach is shown to be a viable and effective alternative to traditional signal classification approaches, particularly for signals with strong nonlinear characteristics.

international conference on acoustics, speech, and signal processing | 2003

Speech recognition using reconstructed phase space features

Andrew C. Lindgren; Michael T. Johnson; Richard J. Povinelli

The paper presents a novel method for speech recognition by utilizing nonlinear/chaotic signal processing techniques to extract time-domain based phase space features. By exploiting the theoretical results derived in nonlinear dynamics, a processing space called a reconstructed phase space can be generated where a salient model (the natural distribution of the attractor) can be extracted for speech recognition. To discover the discriminatory power of these features, isolated phoneme classification experiments were performed using the TIMIT corpus and compared to a baseline classifier that uses MFCC (Mel frequency cepstral coefficient) features. The results demonstrate that phase space features contain substantial discriminatory power, even though MFCC features outperformed the phase space features on direct comparisons. The authors conjecture that phase space and MFCC features used in combination within a classifier may yield increased accuracy for various speech recognition tasks.

IEEE Transactions on Speech and Audio Processing | 2005

Time-domain isolated phoneme classification using reconstructed phase spaces

Michael T. Johnson; Richard J. Povinelli; Andrew C. Lindgren; Jinjin Ye; Xiaolin Liu; Kevin M. Indrebo

This paper introduces a novel time-domain approach to modeling and classifying speech phoneme waveforms. The approach is based on statistical models of reconstructed phase spaces, which offer significant theoretical benefits as representations that are known to be topologically equivalent to the state dynamics of the underlying production system. The lag and dimension parameters of the reconstruction process for speech are examined in detail, comparing common estimation heuristics for these parameters with corresponding maximum likelihood recognition accuracy over the TIMIT data set. Overall accuracies are compared with a Mel-frequency cepstral baseline system across five different phonetic classes within TIMIT, and a composite classifier using both cepstral and phase space features is developed. Results indicate that although the accuracy of the phase space approach by itself is still currently below that of baseline cepstral methods, a combined approach is capable of increasing speaker independent phoneme accuracy.

Journal of the Acoustical Society of America | 2006

Generalized perceptual linear prediction features for animal vocalization analysis

Patrick J. Clemins; Michael T. Johnson

A new feature extraction model, generalized perceptual linear prediction (gPLP), is developed to calculate a set of perceptually relevant features for digital signal analysis of animal vocalizations. The gPLP model is a generalized adaptation of the perceptual linear prediction model, popular in human speech processing, which incorporates perceptual information such as frequency warping and equal loudness normalization into the feature extraction process. Since such perceptual information is available for a number of animal species, this new approach integrates that information into a generalized model to extract perceptually relevant features for a particular species. To illustrate, qualitative and quantitative comparisons are made between the species-specific model, generalized perceptual linear prediction (gPLP), and the original PLP model using a set of vocalizations collected from captive African elephants (Loxodonta africana) and wild beluga whales (Delphinapterus leucas). The models that incorporate perceptional information outperform the original human-based models in both visualization and classification tasks.

Explore More