Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Steven Sandoval is active.

Publication


Featured researches published by Steven Sandoval.


Journal of the Acoustical Society of America | 2013

Automatic assessment of vowel space area.

Steven Sandoval; Visar Berisha; Rene L. Utianski; Julie M. Liss; Andreas Spanias

Vowel space area (VSA) is an attractive metric for the study of speech production deficits and reductions in intelligibility, in addition to the traditional study of vowel distinctiveness. Traditional VSA estimates are not currently sufficiently sensitive to map to production deficits. The present report describes an automated algorithm using healthy, connected speech rather than single syllables and estimates the entire vowel working space rather than corner vowels. Analyses reveal a strong correlation between the traditional VSA and automated estimates. When the two methods diverge, the automated method seems to provide a more accurate area since it accounts for all vowels.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients

Laura E. Boucheron; P.L. De Leon; Steven Sandoval

In this paper, we propose a low bit-rate speech codec based on vector quantization (VQ) of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show that the MFCC-based codec exceeds the state-of-the-art MELPe codec across the entire range of 600-2400 bps, when evaluated with the perceptual evaluation of speech quality (PESQ) (ITU-T recommendation P.862). The main advantage of the proposed codec is in distributed speech recognition (DSR) since the MFCCs can be directly applied thus eliminating additional decode and feature extract stages; furthermore, the proposed codec better preserves the fidelity of MFCCs and better word accuracy rates as compared to CELP and MELPe codecs.


international conference on acoustics, speech, and signal processing | 2014

Modeling pathological speech perception from data with similarity labels

Visar Berisha; Julie M. Liss; Steven Sandoval; Rene L. Utianski; Andreas Spanias

The current state of the art in judging pathological speech intelligibility is subjective assessment performed by trained speech pathologists (SLP). These tests, however, are inconsistent, costly and, oftentimes suffer from poor intra- and inter-judge reliability. As such, consistent, reliable, and perceptually-relevant objective evaluations of pathological speech are critical. Here, we propose a data-driven approach to this problem. We propose new cost functions for examining data from a series of experiments, whereby we ask certified SLPs to rate pathological speech along the perceptual dimensions that contribute to decreased intelligibility. We consider qualitative feedback from SLPs in the form of comparisons similar to statements “Is Speaker As rhythm more similar to Speaker B or Speaker C?” Data of this form is common in behavioral research, but is different from the traditional data structures expected in supervised (data matrix + class labels) or unsupervised (data matrix) machine learning. The proposed method identifies relevant acoustic features that correlate with the ordinal data collected during the experiment. Using these features, we show that we are able to develop objective measures of the speech signal degradation that correlate well with SLP responses.


data compression conference | 2011

Hybrid Scalar/Vector Quantization of Mel-Frequency Cepstral Coefficients for Low Bit-Rate Coding of Speech

Laura E. Boucheron; Phillip L. De Leon; Steven Sandoval

In this paper, we propose a low bit-rate speech codec based on a hybrid scalar/vector quantization of the mel-frequency cepstral coefficients (MFCCs). We begin by showing that if a high-resolution mel-frequency cepstrum (MFC) is computed, good-quality speech reconstruction is possible from the MFCCs despite the lack of explicit phase information. By evaluating the contribution toward speech quality that individual MFCCs make and applying appropriate quantization, our results show perceptual evaluation of speech quality (PESQ) of the MFCC-based codec matches the state-of-the-art MELPe codec at 600 bps and exceeds the CELP codec at 2000 -- 4000 bps coding rates. The main advantage of the proposed codec is in distributed speech recognition (DSR) since speech features based on MFCCs can be directly obtained from code words thus eliminating additional decode and feature extract stages.


Journal of the Acoustical Society of America | 2014

Characterizing the distribution of the quadrilateral vowel space area.

Visar Berisha; Steven Sandoval; Rene L. Utianski; Julie M. Liss; Andreas Spanias

The vowel space area (VSA) has been studied as a quantitative index of intelligibility to the extent it captures articulatory working space and reductions therein. The majority of such studies have been empirical wherein measures of VSA are correlated with perceptual measures of intelligibility. However, the literature contains minimal mathematical analysis of the properties of this metric. This paper further develops the theoretical underpinnings of this metric by presenting a detailed analysis of the statistical properties of the VSA and characterizing its distribution through the moment generating function. The theoretical analysis is confirmed by a series of experiments where empirically estimated and theoretically predicted statistics of this function are compared. The results show that on the Hillenbrand and TIMIT data, the theoretically predicted values of the higher-order statistics of the VSA match very well with the empirical estimates of the same.


international conference on acoustics, speech, and signal processing | 2013

Selecting disorder-specific features for speech pathology fingerprinting

Visar Berisha; Steven Sandoval; Rene L. Utianski; Julie M. Liss; Andreas Spanias

The general aim of this work is to learn a unique statistical signature for the state of a particular speech pathology. We pose this as a speaker identification problem for dysarthric individuals. To that end, we propose a novel algorithm for feature selection that aims to minimize the effects of speaker-specific features (e.g., fundamental frequency) and maximize the effects of pathology-specific features (e.g., vocal tract distortions and speech rhythm). We derive a cost function for optimizing feature selection that simultaneously trades off between these two competing criteria. Furthermore, we develop an efficient algorithm that optimizes this cost function and test the algorithm on a set of 34 dysarthric and 13 healthy speakers. Results show that the proposed method yields a set of features related to the speech disorder and not an individuals speaking style. When compared to other feature-selection algorithms, the proposed approach results in an improvement in a disorder fingerprinting task by selecting features that are specific to the disorder.


ieee automatic speech recognition and understanding workshop | 2015

Hilbert spectral analysis of vowels using intrinsic mode functions

Steven Sandoval; Phillip L. De Leon; Julie M. Liss

In recent work, we presented mathematical theory and algorithms for time-frequency analysis of non-stationary signals. In that work, we generalized the definition of the Hilbert spectrum by using a superposition of complex AM-FM components parameterized by the Instantaneous Amplitude (IA) and Instantaneous Frequency (IF). Using our Hilbert Spectral Analysis (HSA) approach, the IA and IF estimates can be far more accurate at revealing underlying signal structure than prior approaches to time-frequency analysis. In this paper, we have applied HSA to speech and compared to both narrowband and wideband spectrograms. We demonstrate how the AM-FM components, assumed to be intrinsic mode functions, align well with the energy concentrations of the spectrograms and highlight fine structure present in the Hilbert spectrum. As an example, we show never before seen intra-glottal pulse phenomena that are not readily apparent in other analyses. Such fine-scale analyses may have application in speech-based medical diagnosis and automatic speech recognition (ASR) for pathological speakers.


Journal of the Acoustical Society of America | 2013

Speech assist: An augmentative tool for practice in speech-language pathology

Rene L. Utianski; Steven Sandoval; Nicole Lehrer; Visar Berisha; Julie M. Liss

Dysarthria affects approximately 46 million people worldwide, with three million individuals residing in the US. Clinical intervention by speech-language pathologists (SLPs) in the United States is supplemented by high quality research, clinical expertise, and state of the art technology, supporting the overarching goal of improved communication. Unfortunately, many individuals do not have access to such care, leaving them with a persisting inability to communicate. Telemedicine, along with the growing use of mobile devices to augment clinical practice, provides the impetus for the development of remote, mobile applications to augment the work of SLPs. The proposed application will record speech samples and provide a variety of derived calculations, novel and traditional, to assess the integrity of speech production, including: vowel space area, assessment of an individual’s pathology fingerprint, and identification of which parameters of the intelligibility disorder are most disrupted (e.g., prosody, voc...


international conference on acoustics, speech, and signal processing | 2017

Advances in Empirical Mode Decomposition for computing Instantaneous Amplitudes and Instantaneous Frequencies

Steven Sandoval; Phillip L. De Leon

In this paper, we propose improvements to the Complete Ensemble Empirical Mode Decomposition (CEEMD) aimed at the resolution of closely-spaced Intrinsic Mode Functions (IMFs), reproducible and consistent decompositions, reduction in estimation error, numerical stability, and faster decompositions through fewer ensemble trials. We focus on three areas to achieve these goals: 1) use of complimentary masking signals applied at the IMF level, 2) use of narrowband tones instead of white noise for masking signals, and 3) ensuring a true IMF is obtained after ensemble averaging. We propose a numerically stable Instantaneous Frequency (IF) demodulation approach that together with a previously-reported Instantaneous Amplitude (IA) demodulation, allows estimation of the IA/IF parameters of the IMFs and hence a time-frequency representation. Using biomedical signal examples, we compare our results with CEEMD and Improved CEEMD (ICEEMD).


Journal of the Acoustical Society of America | 2013

Feature divergence of pathological speech

Steven Sandoval; Rene L. Utianski; Visar Berisha; Julie M. Liss; Andreas Spanias

Many state of the art speaker verification systems are implemented by modeling the probability distribution of a feature set using Gaussian mixture models. In these systems, a decision is made by comparing a likelihood of an observation using both a Gaussian mixture model corresponding to an individual, and a Gaussian mixture model universal back ground model. In this study we propose to use a similar framework to instead characterize the divergence of the feature set distribution between healthy and pathological speech. We accomplish this by determining the difference between a universal background model trained on healthy speech and model of an individuals pathological speech. There are several known methods to evaluate the difference between two probability distributions, one example being the Kullback-Leibler divergence. By building a universal background model using healthy speech, we hope to capture the expected distribution of our feature space. Then by computing a difference between a dysathric i...

Collaboration


Dive into the Steven Sandoval's collaboration.

Top Co-Authors

Avatar

Julie M. Liss

Arizona State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Visar Berisha

Arizona State University

View shared research outputs
Top Co-Authors

Avatar

Phillip L. De Leon

New Mexico State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Laura E. Boucheron

New Mexico State University

View shared research outputs
Top Co-Authors

Avatar

P.L. De Leon

New Mexico State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge