Daniel Bone
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel Bone.
IEEE Transactions on Affective Computing | 2014
Daniel Bone; Chi-Chun Lee; Shrikanth Narayanan
Studies in classifying affect from vocal cues have produced exceptional within-corpus results, especially for arousal (activation or stress); yet cross-corpora affect recognition has only recently garnered attention. An essential requirement of many behavioral studies is affect scoring that generalizes across different social contexts and data conditions. We present a robust, unsupervised (rule-based) method for providing a scale-continuous, bounded arousal rating operating on the vocal signal. The method incorporates just three knowledge-inspired features chosen based on empirical and theoretical evidence. It constructs a speakers baseline model for each feature separately, and then computes single-feature arousal scores. Lastly, it advantageously fuses the single-feature arousal scores into a final rating without knowledge of the true affect. The baseline data is preferably labeled as neutral, but some initial evidence is provided to suggest that no labeled data is required in certain cases. The proposed method is compared to a state-of-the-art supervised technique which employs a high-dimensional feature set. The proposed framework achieveshighly-competitive performance with additional benefits. The measure is interpretable, scale-continuous as opposed to discrete, and can operate without any affective labeling. An accompanying Matlab tool is made available with the paper.
Journal of Speech Language and Hearing Research | 2014
Daniel Bone; Chi-Chun Lee; Matthew P. Black; Marian E. Williams; Sungbok Lee; Pat Levitt; Shrikanth Narayanan
PURPOSE The purpose of this study was to examine relationships between prosodic speech cues and autism spectrum disorder (ASD) severity, hypothesizing a mutually interactive relationship between the speech characteristics of the psychologist and the child. The authors objectively quantified acoustic-prosodic cues of the psychologist and of the child with ASD during spontaneous interaction, establishing a methodology for future large-sample analysis. METHOD Speech acoustic-prosodic features were semiautomatically derived from segments of semistructured interviews (Autism Diagnostic Observation Schedule, ADOS; Lord, Rutter, DiLavore, & Risi, 1999; Lord et al., 2012) with 28 children who had previously been diagnosed with ASD. Prosody was quantified in terms of intonation, volume, rate, and voice quality. Research hypotheses were tested via correlation as well as hierarchical and predictive regression between ADOS severity and prosodic cues. RESULTS Automatically extracted speech features demonstrated prosodic characteristics of dyadic interactions. As rated ASD severity increased, both the psychologist and the child demonstrated effects for turn-end pitch slope, and both spoke with atypical voice quality. The psychologists acoustic cues predicted the childs symptom severity better than did the childs acoustic cues. CONCLUSION The psychologist, acting as evaluator and interlocutor, was shown to adjust his or her behavior in predictable ways based on the childs social-communicative impairments. The results support future study of speech prosody of both interaction partners during spontaneous conversation, while using automatic computational methods that allow for scalable analysis on much larger corpora.
Journal of Autism and Developmental Disorders | 2015
Daniel Bone; Matthew S. Goodwin; Matthew P. Black; Chi-Chun Lee; Kartik Audhkhasi; Shrikanth Narayanan
Machine learning has immense potential to enhance diagnostic and intervention research in the behavioral sciences, and may be especially useful in investigations involving the highly prevalent and heterogeneous syndrome of autism spectrum disorder. However, use of machine learning in the absence of clinical domain expertise can be tenuous and lead to misinformed conclusions. To illustrate this concern, the current paper critically evaluates and attempts to reproduce results from two studies (Wall et al. in Transl Psychiatry 2(4):e100, 2012a; PloS One 7(8), 2012b) that claim to drastically reduce time to diagnose autism using machine learning. Our failure to generate comparable findings to those reported by Wall and colleagues using larger and more balanced data underscores several conceptual and methodological problems associated with these studies. We conclude with proposed best-practices when using machine learning in autism research, and highlight some especially promising areas for collaborative work at the intersection of computational and behavioral science.
international conference on acoustics, speech, and signal processing | 2012
Ming Li; Angeliki Metallinou; Daniel Bone; Shrikanth Narayanan
This paper presents an automatic speaker state recognition approach which models the factor vectors in the latent factor analysis framework improving upon the Gaussian Mixture Model (GMM) baseline performance. We investigate both intoxicated and affective speaker states. We consider the affective speech signal as the original normal average speech signal being corrupted by the affective channel effects. Rather than reducing the channel variability to enhance the robustness as in the speaker verification task, we directly model the speaker state on the channel factors under the factor analysis framework. In this work, the speaker state factor vectors are extracted and modeled by the latent factor analysis approach in the GMM modeling framework and support vector machine classification method. Experimental results show that the proposed speaker state factor vector modeling system achieved 5.34% and 1.49% unweighted accuracy improvement over the GMM baseline on the intoxicated speech detection task (Alcohol Language Corpus) and the emotion recognition task (IEMOCAP database), respectively.
Computer Speech & Language | 2014
Daniel Bone; Ming Li; Matthew P. Black; Shrikanth Narayanan
Segmental and suprasegmental speech signal modulations offer information about paralinguistic content such as affect, age and gender, pathology, and speaker state. Speaker state encompasses medium-term, temporary physiological phenomena influenced by internal or external biochemical actions (e.g., sleepiness, alcohol intoxication). Perceptual and computational research indicates that detecting speaker state from speech is a challenging task. In this paper, we present a system constructed with multiple representations of prosodic and spectral features that provided the best result at the Intoxication Subchallenge of Interspeech 2011 on the Alcohol Language Corpus. We discuss the details of each classifier and show that fusion improves performance. We additionally address the question of how best to construct a speaker state detection system in terms of robust and practical marginalization of associated variability such as through modeling speakers, utterance type, gender, and utterance length. As is the case in human perception, speaker normalization provides significant improvements to our system. We show that a held-out set of baseline (sober) data can be used to achieve comparable gains to other speaker normalization techniques. Our fused frame-level statistic-functional systems, fused GMM systems, and final combined system achieve unweighted average recalls (UARs) of 69.7%, 65.1%, and 68.8%, respectively, on the test set. More consistent numbers compared to development set results occur with matched-prompt training, where the UARs are 70.4%, 66.2%, and 71.4%, respectively. The combined system improves over the Challenge baseline by 5.5% absolute (8.4% relative), also improving upon our previously best result.
international conference on acoustics, speech, and signal processing | 2013
Theodora Chaspari; Daniel Bone; James Gibson; Chi-Chun Lee; Shrikanth Narayanan
Signal-derived measures can provide effective ways towards quantifying human behavior. Verbal Response Latencies (VRLs) of children with Autism Spectrum Disorders (ASD) during conversational interactions are able to convey valuable information about their cognitive and social skills. Motivated by the inherent gap between the external behavior and inner affective state of children with ASD, we study their VRLs in relation to their explicit but also implicit behavioral cues. Explicit cues include the childrens language use, while implicit cues are based on physiological signals. Using these cues, we perform classification and regression tasks to predict the duration type (short/long) and value of VRLs of children with ASD while they interacted with an Embodied Conversational Agent (ECA) and their parents. Since parents are active participants in these triadic interactions, we also take into account their linguistic and physiological behaviors. Our results suggest an association between VRLs and these externalized and internalized signal information streams, providing complementary views of the same problem.
international conference on acoustics, speech, and signal processing | 2016
Rahul Gupta; Theodora Chaspari; Jangwon Kim; Naveen Kumar; Daniel Bone; Shrikanth Narayanan
The study of speech pathology involves evaluation and treatment of speech production related disorders affecting phonation, fluency, intonation and aeromechanical components of respiration. Recently, speech pathology has garnered special interest amongst machine learning and signal processing (ML-SP) scientists. This growth in interest is led by advances in novel data collection technology, data science, speech processing and computational modeling. These in turn have enabled scientists in better understanding both the causes and effects of pathological speech conditions. In this paper, we review the application of machine learning and signal processing techniques to speech pathology and specifically focus on three different aspects. First, we list challenges such as controlling subjectivity in pathological speech assessments and patient variability in the application of ML-SP tools to the domain. Second, we discuss feature design methods and machine learning algorithms using a combination of domain knowledge and data driven methods. Finally, we present some case studies related to analysis of pathological speech and discuss their design.
conference of the international speech communication association | 2016
Manoj Kumar; Rahul Gupta; Daniel Bone; Nikolaos Malandrakis; Somer L. Bishop; Shrikanth Narayanan
Lexical planning is an important part of communication and is reflective of a speaker’s internal state that includes aspects of affect, mood, as well as mental health. Within the study of developmental disorders such as autism spectrum disorder (ASD), language acquisition and language use have been studied to assess disorder severity and expressive capability as well as to support diagnosis. In this paper, we perform a language analysis of children focusing on word usage, social and cognitive linguistic word counts, and a few recently proposed psycholinguistic norms. We use data from conversational samples of verbally fluent children obtained during Autism Diagnostic Observation Schedule (ADOS) sessions. We extract the aforementioned lexical cues from transcripts of session recordings and demonstrate their role in differentiating children diagnosed with Autism Spectrum Disorder from the rest. Further, we perform a correlation analysis between the lexical norms and ASD symptom severity. The analysis reveals an increased affinity by the interlocutor towards use of words with greater feminine association and negative valence.
conference of the international speech communication association | 2016
Daniel Bone; Somer L. Bishop; Rahul Gupta; Sungbok Lee; Shrikanth Narayanan
Atypical speech prosody is a hallmark feature of autism spectrum disorder (ASD) that presents across the lifespan, but is difficult to reliably characterize qualitatively. Given the great heterogeneity of symptoms in ASD, an acoustic-based objective measure would be vital for clinical assessment and interventions. In this study, we investigate speech features in childpsychologist conversational samples, including: segmental and suprasegmental pitch dynamics, speech rate, coordination of prosodic attributes, and turn-taking. Data consist of 95 children with ASD as well as 81 controls with non-ASD developmental disorders. We demonstrate significant predictive performance using these features as well as interpret feature correlations of both interlocutors. The most robust finding is that segmental and suprasegmental prosodic variability increases for both participants in interactions with children having higher ASD severity. Recommendations for future research towards a fullyautomatic quantitative measure of speech prosody in neurodevelopmental disorders are discussed.
Computer Speech & Language | 2016
Rahul Gupta; Daniel Bone; Sungbok Lee; Shrikanth Narayanan
Child engagement is defined as the interaction of a child with his/her environment in a contextually appropriate manner. Engagement behavior in children is linked to socio-emotional and cognitive state assessment with enhanced engagement identified with improved skills. A vast majority of studies however rely solely, and often implicitly, on subjective perceptual measures of engagement. Access to automatic quantification could assist researchers/clinicians to objectively interpret engagement with respect to a target behavior or condition, and furthermore inform mechanisms for improving engagement in various settings. In this paper, we present an engagement prediction system based exclusively on vocal cues observed during structured interaction between a child and a psychologist involving several tasks. Specifically, we derive prosodic cues that capture engagement levels across the various tasks. Our experiments suggest that a childs engagement is reflected not only in the vocalizations, but also in the speech of the interacting psychologist. Moreover, we show that prosodic cues are informative of the engagement phenomena not only as characterized over the entire task (i.e., global cues), but also in short term patterns (i.e., local cues). We perform a classification experiment assigning the engagement of a child into three discrete levels achieving an unweighted average recall of 55.8% (chance is 33.3%). While the systems using global cues and local level cues are each statistically significant in predicting engagement, we obtain the best results after fusing these two components. We perform further analysis of the cues at local and global levels to achieve insights linking specific prosodic patterns to the engagement phenomenon. We observe that while the performance of our model varies with task setting and interacting psychologist, there exist universal prosodic patterns reflective of engagement.