Umit H. Yapanel
University of Colorado Boulder
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Umit H. Yapanel.
Pediatrics | 2009
Frederick J. Zimmerman; Jill Gilkerson; Jeffrey A. Richards; Dimitri A. Christakis; Dongxin Xu; Sharmistha Gray; Umit H. Yapanel
OBJECTIVE: To test the independent association of adult language input, television viewing, and adult-child conversations on language acquisition among infants and toddlers. METHODS: Two hundred seventy-five families of children aged 2 to 48 months who were representative of the US census were enrolled in a cross-sectional study of the home language environment and child language development (phase 1). Of these, a representative sample of 71 families continued for a longitudinal assessment over 18 months (phase 2). In the cross-sectional sample, language development scores were regressed on adult word count, television viewing, and adult-child conversations, controlling for socioeconomic attributes. In the longitudinal sample, phase 2 language development scores were regressed on phase 1 language development, as well as phase 1 adult word count, television viewing, and adult-child conversations, controlling for socioeconomic attributes. RESULTS: In fully adjusted regressions, the effects of adult word count were significant when included alone but were partially mediated by adult-child conversations. Television viewing when included alone was significant and negative but was fully mediated by the inclusion of adult-child conversations. Adult-child conversations were significant when included alone and retained both significance and magnitude when adult word count and television exposure were included. CONCLUSIONS: Television exposure is not independently associated with child language development when adult-child conversations are controlled. Adult-child conversations are robustly associated with healthy language development. Parents should be encouraged not merely to provide language input to their children through reading or storytelling, but also to engage their children in two-sided conversations.
Proceedings of the National Academy of Sciences of the United States of America | 2010
D. K. Oller; P. Niyogi; Sharmistha Gray; Jeffrey A. Richards; Jill Gilkerson; Dongxin Xu; Umit H. Yapanel; Steven F. Warren
For generations the study of vocal development and its role in language has been conducted laboriously, with human transcribers and analysts coding and taking measurements from small recorded samples. Our research illustrates a method to obtain measures of early speech development through automated analysis of massive quantities of day-long audio recordings collected naturalistically in childrens homes. A primary goal is to provide insights into the development of infant control over infrastructural characteristics of speech through large-scale statistical analysis of strategically selected acoustic parameters. In pursuit of this goal we have discovered that the first automated approach we implemented is not only able to track childrens development on acoustic parameters known to play key roles in speech, but also is able to differentiate vocalizations from typically developing children and children with autism or language delay. The method is totally automated, with no human intervention, allowing efficient sampling and analysis at unprecedented scales. The work shows the potential to fundamentally enhance research in vocal development and to add a fully objective measure to the battery used to detect speech-related disorders in early childhood. Thus, automated analysis should soon be able to contribute to screening and diagnosis procedures for early disorders, and more generally, the findings suggest fundamental methods for the study of language in natural environments.
JAMA Pediatrics | 2009
Dimitri A. Christakis; Jill Gilkerson; Jeffrey A. Richards; Frederick J. Zimmerman; Michelle M. Garrison; Dongxin Xu; Sharmistha Gray; Umit H. Yapanel
OBJECTIVE To test the hypothesis that audible television is associated with decreased parent and child interactions. DESIGN Prospective, population-based observational study. SETTING Community. PARTICIPANTS Three hundred twenty-nine 2- to 48-month-old children. MAIN EXPOSURES Audible television. Children wore a digital recorder on random days for up to 24 months. A software program incorporating automatic speech-identification technology processed the recorded file to analyze the sounds the children were exposed to and the sounds they made. Conditional linear regression was used to determine the association between audible television and the outcomes of interest. OUTCOME MEASURES Adult word counts, child vocalizations, and child conversational turns. RESULTS Each hour of audible television was associated with significant reductions in age-adjusted z scores for child vocalizations (linear regression coefficient, -0.26; 95% confidence interval [CI], -0.29 to -0.22), vocalization duration (linear regression coefficient, -0.24; 95% CI, -0.27 to -0.20), and conversational turns (linear regression coefficient, -0.22; 95% CI, -0.25 to -0.19). There were also significant reductions in adult female (linear regression coefficient, -636; 95% CI, -812 to -460) and adult male (linear regression coefficient, -134; 95% CI, -263 to -5) word count. CONCLUSIONS Audible television is associated with decreased exposure to discernible human adult speech and decreased child vocalizations. These results may explain the association between infant television exposure and delayed language development.
Journal of Autism and Developmental Disorders | 2010
Steven F. Warren; Jill Gilkerson; Jeffrey A. Richards; D. Kimbrough Oller; Dongxin Xu; Umit H. Yapanel; Sharmistha Gray
The study compared the vocal production and language learning environments of 26 young children with autism spectrum disorder (ASD) to 78 typically developing children using measures derived from automated vocal analysis. A digital language processor and audio-processing algorithms measured the amount of adult words to children and the amount of vocalizations they produced during 12-h recording periods in their natural environments. The results indicated significant differences between typically developing children and children with ASD in the characteristics of conversations, the number of conversational turns, and in child vocalizations that correlated with parent measures of various child characteristics. Automated measurement of the language learning environment of young children with ASD reveals important differences from the environments experienced by typically developing children.
Speech Communication | 2008
Umit H. Yapanel; John H. L. Hansen
Acoustic feature extraction from speech constitutes a fundamental component of automatic speech recognition (ASR) systems. In this paper, we propose a novel feature extraction algorithm, perceptual-MVDR (PMVDR), which computes cepstral coefficients from the speech signal. This new feature representation is shown to better model the speech spectrum compared to traditional feature extraction approaches. Experimental results for small (40-word digits) to medium (5k-word dictation) size vocabulary tasks show varying degree of consistent improvements across different experiments; however, the new front-end is most effective in noisy car environments. The PMVDR front-end uses the minimum variance distortionless response (MVDR) spectral estimator to represent the upper envelope of the speech signal. Unlike Mel frequency cepstral coefficients (MFCCs), the proposed front-end does not utilize a filterbank. The effectiveness of the PMVDR approach is demonstrated by comparing speech recognition accuracies with the traditional MFCC front-end and recently proposed PMCC front-end in both noise-free and real adverse environments. For speech recognition in noisy car environments, a 40-word vocabulary task, PMVDR front-end provides a 36% relative decrease in word error rate (WER) over the MFCC front-end. Under simulated speaker stress conditions, a 35-word vocabulary task, the PMVDR front-end yields a 27% relative decrease in the WER. For a noise-free dictation task, a 5k-word vocabulary task, again a relative 8% reduction in the WER is reported. Finally, a novel analysis technique is proposed to quantify noise robustness of an acoustic front-end. This analysis is conducted for the acoustic front-ends analyzed in the paper and results are presented.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Satya Dharanipragada; Umit H. Yapanel; Bhaskar D. Rao
This paper describes a robust feature extraction technique for continuous speech recognition. Central to the technique is the minimum variance distortionless response (MVDR) method of spectrum estimation. We consider incorporating perceptual information in two ways: 1) after the MVDR power spectrum is computed and 2) directly during the MVDR spectrum estimation. We show that incorporating perceptual information directly into the spectrum estimation improves both robustness and computational efficiency significantly. We analyze the class separability and speaker variability properties of the features using a Fisher linear discriminant measure and show that these features provide better class separability and better suppression of speaker-dependent information than the widely used mel frequency cepstral coefficient (MFCC) features. We evaluate the technique on four different tasks: an in-car speech recognition task, the Aurora-2 matched task, the Wall Street Journal (WSJ) task, and the Switchboard task. The new feature extraction technique gives lower word-error-rates than the MFCC and perceptual linear prediction (PLP) feature extraction techniques in most cases. Statistical significance tests reveal that the improvement is most significant in high noise conditions. The technique thus provides improved robustness to noise without sacrificing performance in clean conditions
international conference of the ieee engineering in medicine and biology society | 2009
Dongxin Xu; Jill Gilkerson; Jeffrey A. Richards; Umit H. Yapanel; Sharmi Gray
Early identification is crucial for young children with autism to access early intervention. The existing screens require either a parent-report questionnaire and/or direct observation by a trained practitioner. Although an automatic tool would benefit parents, clinicians and children, there is no automatic screening tool in clinical use. This study reports a fully automatic mechanism for autism detection/screening for young children. This is a direct extension of the LENATM (Language ENvironment Analysis) system, which utilizes speech signal processing technology to analyze and monitor a childs natural language environment and the vocalizations/speech of the child. It is discovered that child vocalization composition contains rich discriminant information for autism detection. By applying pattern recognition and machine learning approaches to child vocalization composition data, accuracy rates of 85% to 90% in cross-validation tests for autism detection have been achieved at the equal-error-rate (EER) point on a data set with 34 children with autism, 30 language delayed children and 76 typically developing children. Due to its easy and automatic procedure, it is believed that this new tool can serve a significant role in childhood autism screening, especially in regards to population-based or universal screening.
Archive | 2005
John H. L. Hansen; Xianxian Zhang; Murat Akbacak; Umit H. Yapanel; Bryan L. Pellom; Wayne H. Ward; Pongtep Angkititrakul
In this chapter, we present our recent advances in the formulation and development of an in-vehicle hands-free route navigation system. The system is comprised of a multi-microphone array processing front-end, environmental sniffer (for noise analysis), robust speech recognition system, and dialog manager and information servers. We also present our recently completed speech corpus for in-vehicle interactive speech systems for route planning and navigation. The corpus consists of five domains which include: digit strings, route navigation expressions, street and location sentences, phonetically balanced sentences, and a route navigation dialog in a human Wizard-of-Oz like scenario. A total of 500 speakers were collected from across the United States of America during a six month period from April-Sept. 2001. While previous attempts at in-vehicle speech systems have generally focused on isolated command words to set radio frequencies, temperature control, etc., the CU-Move system is focused on natural conversational interaction between the user and in-vehicle system. After presenting our proposed in-vehicle speech system, we consider advances in multi-channel array processing, environmental noise sniffing and tracking, new and more robust acoustic front-end representations and built-in speaker normalization for robust ASR, and our back-end dialog navigation information retrieval sub-system connected to the WWW. Results are presented in each sub-section with a discussion at the end of the chapter.
international conference on acoustics, speech, and signal processing | 2003
Umit H. Yapanel; Satya Dharanipragada
This paper describes a robust feature extraction technique for continuous speech recognition. Central to the technique is the minimum variance distortionless response (MVDR) method of spectrum estimation. We incorporate perceptual information directly in to the spectrum estimation. This provides improved robustness and computational efficiency when compared with the previously proposed MVDR-MFCC technique. On an in-car speech recognition task this method, which we refer to as PMCC, is 15% more accurate in WER and requires approximately a factor of 4 times less computation than the MVDR-MFCC technique. On the same task PMCC yields 20% relative improvement over MFCC and 11% relative improvement over PLP frontends. Similar improvements are observed on the Aurora 2 database.
Eurasip Journal on Audio, Speech, and Music Processing | 2008
Umit H. Yapanel; John H. L. Hansen
A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization. More effective speaker normalization methods are needed which require limited computing resources for real-time performance. The most popular speaker normalization technique is vocal-tract length normalization (VTLN), despite the fact that it is computationally expensive. In this study, we propose a novel online VTLN algorithm entitled built-in speaker normalization (BISN), where normalization is performed on-the-fly within a newly proposed PMVDR acoustic front end. The novel algorithm aspect is that in conventional frontend processing with PMVDR and VTLN, two separating warping phases are needed; while in the proposed BISN method only one single speaker dependent warp is used to achieve both the PMVDR perceptual warp and VTLN warp simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces simultaneously. This improved integration unifies the nonlinear warping performed in the front end and reduces computational requirements, thereby offering advantages for real-time ASR systems. Evaluations are performed for (i) an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER) by 24%, and (ii) for a diverse noisy speech task (SPINE 2), where the relative WER improvement was 9%, both relative to the baseline speaker normalization method.