Classification of Manifest Huntington Disease using Vowel Distortion Measures
Amrit Romana, John Bandon, Noelle Carlozzi, Angela Roberts, Emily Mower Provost
CClassification of Manifest Huntington Disease using Vowel Distortion Measures
Amrit Romana , John Bandon , Noelle Carlozzi , Angela Roberts , Emily Mower Provost Computer Science and Engineering, University of Michigan, Ann Arbor, Michigan, USA Physical Medicine & Rehabilitation, University of Michigan, Ann Arbor, Michigan, USA Communication Sciences and Disorders, Northwestern University, Evanston, Illinois, USA [email protected], [email protected], [email protected],[email protected], [email protected]
Abstract
Huntington disease (HD) is a fatal autosomal dominant neu-rocognitive disorder that causes cognitive disturbances, neu-ropsychiatric symptoms, and impaired motor abilities (e.g., gait,speech, voice). Due to its progressive nature, HD treatment re-quires ongoing clinical monitoring of symptoms. Individualswith the gene mutation which causes HD may exhibit a rangeof speech symptoms as they progress from premanifest to man-ifest HD. Differentiating between premanifest and manifest HDis an important yet understudied problem, as this distinctionmarks the need for increased treatment. Speech-based passivemonitoring has the potential to augment clinical assessments bycontinuously tracking manifestation symptoms. In this workwe present the first demonstration of how changes in connectedspeech can be measured to differentiate between premanifestand manifest HD. To do so, we focus on a key speech symptomof HD: vowel distortion. We introduce a set of vowel featureswhich we extract from connected speech. We show that ourvowel features can differentiate between premanifest and man-ifest HD with 87% accuracy.
Index Terms : Huntington disease, disordered speech, speechfeature extraction, vowel distortion
1. Introduction
Huntington disease (HD) is a fatal autosomal dominant neu-rocognitive disorder that causes cognitive disturbances, neu-ropsychiatric symptoms, and impaired motor abilities (e.g., gait,speech, voice) [1–4]. Individuals who have a family history ofHD can undergo a gene test to learn if they carry the gene mu-tation that causes HD (i.e., are gene-positive). Individuals whoare gene-positive will develop clinically significant symptomsof HD, resulting in an HD diagnosis, typically in their mid-40’s [5]. These individuals are considered premanifest beforethe onset of these symptoms, and manifest after. No cure exists,but timely diagnosis of HD (i.e., manifestation) coupled withtreatment allows individuals to manage their symptoms.At-home passive symptom monitoring captures patienthealth as it relates to real-world functioning [6]. Providing clin-icians with this information can allow for a more timely diagno-sis of HD and a better understanding of its progression for treat-ment planning. Disordered speech is one symptom of HD, andprevious work has demonstrated changes in speech occur be-fore an HD diagnosis, and become more noticeable as HD pro-gresses. [7–10]. This suggests the potential of passively track-ing speech symptoms to better understand HD progression.Vowel distortion is one speech symptom of HD [11–13],and tracking this symptom may augment passive monitoring.However, methods of automatically quantifying vowel distor-tion from connected speech have not been extensively explored. Kaploun et al. extracted jitter and shimmer from a sustainedvowel task to characterize vowel distortion, and they demon-strated its prevalence as an HD symptom [7]. Works differenti-ating between healthy and disordered speech (a range of condi-tions in the Kay Elemetrics Disordered Voice Dataset [14]) haveextracted measures of system stability from sustained voweltasks, suggesting the potential of stability measures for cap-turing vowel distortion [14–17]. However, measures extractedfrom sustained vowel tasks may not have the same informationabout speech or voice disorders when extracted from vowels inconnected speech. Vowels in connected speech differ becausethey are 1) modified and often nonstationary (i.e., changingmean, variance, and frequency properties) due to coarticulationand 2) shorter, which may pose problems for distortion mea-sures that rely on lengthy signals. Thus, to incorporate trackingof vowel distortion into passive speech monitoring, we must as-sess how these measures relate to HD when extracted from con-nected speech. In this work we analyze read speech, which isone type of connected speech and includes short vowel samplesthat are modified due to coarticulation.The novelty of this work is a new set of features which ac-count for the characteristics of connected speech and reliablymeasure vowel distortion as it relates to HD. We present thefirst system to classify premanifest versus manifest HD usingfeatures from connected speech, doing so with 87% accuracy.
2. Related work
Previous works have demonstrated the potential of passivelymonitoring speech to assist in managing neurocognitive disor-ders such as Parkinson’s [18, 19] and Alzheimer’s [20–22]. In-dividuals with HD exhibit similar speech symptoms, suggestingthe potential of monitoring speech to aid in managing HD.The majority of work in classifying HD stages using speechhas not studied how to differentiate between premanifest andmanifest HD, but they have differentiated between healthy con-trols and individuals who are gene-positive [7, 10]. Kaploun etal. used speaking rate from a reading passage and jitter, shim-mer, and the noise to harmonics ratio of a sustained vowel,to classify individuals as healthy controls or premanifest [7].This work illustrates that subtle speech symptoms occur evenin the premanifest population. Perez et al. used speaking rate,filler frequency, pause information, and goodness of pronunci-ation features from a reading passage to classify individuals ashealthy controls or gene-positive with 87% accuracy [10]. Indoing so, Perez et al. demonstrated the difficulty of differenti-ating between the premanifest and manifest subcategories: halfof premanifest individuals were classified as healthy controls,and the other half as gene-positive. a r X i v : . [ ee ss . A S ] O c t .2. Vowel stability Prior works have applied additional measures from nonlineardynamical systems to quantify vowel stability from sustainedvowels. These measures have been used to classify a range ofdisorders and may be applicable in classifying manifest HD.Vaziri et al. extracted the correlation dimension (CD) andthe Maximal Lyapunov Exponent (MLE) from sustained vow-els, and used these measures to classify voice disorders in theKay Dataset [17]. Little et al. highlight nonstationarity as anissue when extracting the CD and MLE from speech. To ad-dress this, Little et al. use the detrended fluctuation analysis(DFA) exponent. The DFA exponent characterizes the rough-ness of noise or fluctuations around vocal cord vibrations and isintended for use with nonstationary data. They extract DFA ex-ponent from sustained vowels to classify voice disorders in theKay Dataset [15]. In later work, Little et al. also demonstratedhow the DFA exponent could be used to measure the severity ofParkinson’s disorder [23].Bryce et al. describe how DFA, although intended to workwith nonstationary data, can still fail to detrend in many cases[24]. When a series has strong underlying periodicity, DFA in-troduces artifacts that distort the DFA exponent. Bryce et al.suggest explicitly detrending the signal before analyzing fluc-tuations. In this paper, we analyze whether DFA effectivelydetrends vowels, and explore explicitly remove trends beforeanalyzing fluctuations.
3. Data description
We use data collected as part of a study on acoustic biomarkersfor HD at the University of Michigan. The study participantsprovided speech samples that were recorded at 44.1 kHz witha Hosa XVM-102M XLR microphone. We use two tasks: thesustained vowel, in which participants were instructed to holdthe vowel /a/ for as long as possible, and the Grandfather Pas-sage (GFP). The GFP contains nearly all of the phonemes ofAmerican English and is a standard reading passage used in as-sessing motor speech and voice disorders [25].The data contains speech from 62 individuals, in which 31are healthy controls and 31 are gene-positive. Gene-positiveindividuals are assigned to specific HD stages (premanifest,manifest early-stage, and manifest late-stage) using the Uni-fied Huntington’s Disease Rating Scale (UHDRS) [26]. First,the premanifest versus manifest labels are determined based onthe clinician-determined Diagnostic Confidence Level (DCL)within the Total Motor Score (TMS) portion of UHDRS. DCLranges from 0 (no symptoms) to 4 (symptoms of HD with >
99% confidence). We label participants with a DCL of lessthan 4 as premanifest, and participants with a DCL of 4 as man-ifest, as in [27]. Within the manifest group, we label participantsas early- or late-stage based on their Total Functional Capacity(TFC) scores [28]. TFC scores provide a rating of functional ca-pacity, and range from 0 (low functioning) to 13 (high function-ing). We label participants with a TFC score of 7-13 early-stageand those with a TFC score of 0-6 as late-stage, as in [29].This paper focuses on analyzing speech of gene-positive in-dividuals. Of these participants, one was unable to hold thesustained vowel. To provide a consistent comparison across ex-periments, we exclude this participant from our analysis. Thus,in this paper, we use data collected from 30 individuals: 12 pre-manifest, 11 early-stage HD, and 7 late-stage HD. We focus onthe binary problem of differentiating between individuals withpremanifest HD and manifest HD.
4. Methods
We segment three vowel sample types: sustained vowels, short-ened sustained vowels, and vowels extracted from the GFP. Ta-ble 1 summarizes these samples.We analyze the sustained vowel recordings in which par-ticipants were instructed to hold the vowel /a/ for as long aspossible after the interviewer provided an example. Recordingsvaried in length (12.8s ± SV ) sample.To analyze vowel changes within read speech, we manuallysegment the vowels from GFP recordings. We choose to fo-cus on a single phone to limit potential variation due to sound.We focus specifically on the phone [ c ], as it closely resembledthe sounds in the SV samples and, according to the CarnegieMellon University Pronouncing Dictionary [30] phonetic trans-lation of the GFP, [ c ] has 12 occurrences within the passage,such as the /a/ in the words “all” and “walk”. We listen to eachGFP recording for the occurrences of [ c ] identified in the pho-netic transcript, and then determine phone endpoints by assess-ing changes in the sound and associated spectrogram. We ver-ify that the sound of the resulting sample minimally containedsurrounding phones. The number and length of samples varyslightly by each participant, as there were variations in speakingrate and pronunciation, making some occurrences of the phonesdifficult to segment. Ultimately, we extract 11.3 ± ± GFPV ).To understand the impact of vowel length versus vowelchanges within read speech, we sample an intermediate set ofvowels from SV, but of a length representative of GFPV. Wesample 10 shortened sustained vowels (
SSV ) from each SV.The start positions and lengths of the SSV are randomly cho-sen. The lengths are chosen from a normal distribution with amean of 105 ms and standard deviation of 49 ms, to resemblethe lengths of the GFPV.Table 1:
Three types of vowel samples for each individual
Sample type Description
SV 1 sample holding the vowel /a/SSV 10 randomly selected segments fromthe SV, each 105 ±
49 ms in lengthGFPV 11.3 ± c ]manually segmented from theGrandfather Passage reading,each 105 ±
49 ms in length
We develop a set of baseline features forthe task of classifying premanifest versus manifest HD. In priorwork, Perez et al. extracted 358 features relating to speakingrate, pauses, goodness of pronunciation, and filler usage fromthe GFP [10]. These features were extracted by force aligningaudio with manually-created transcripts. Using these features,Perez et al. differentiated between healthy controls and gene-positive individuals with 87% accuracy, but did not focus onseparating the premanifest and manifest populations. owel features overview.
In the remainder of this sec-tion, we describe the extraction of vowel-specific features weexplore: vowel length, the original implementation of a DFAexponent, and our proposed trend and fluctuation features. Inour preliminary analysis we also compared the use of jitter,shimmer, and the noise to harmonics ratio from vowels withinconnected speech, but found that these features were not corre-lated with HD manifestation. Thus in this paper we focus onvowel length, the original implementation of a DFA exponent,and our proposed features.We extract these features from SV, SSV, and GFPV. Whenextracting features from SV, we have one value per individual.When extracting features from SSV and GFPV, we have onevalue per vowel. For each individual we aggregate the featuresof all vowels with six statistics: minimum, median, maximum,range, mean, and standard deviation.
Vowel length.
We extract the length of vowels in millisec-onds for SV and GFPV. We do not do so for SSV, because thelengths of these samples were randomly determined. We hy-pothesize that within SV, individuals with manifest HD haveshorter vowel lengths. We hypothesize that with GFPV, indi-viduals with manifest HD have longer vowel lengths due to theHD symptom of prolonged sounds [13].
Detrended fluctuation analysis.
Detrended fluctuationanalysis (DFA) is a method for analyzing the stability of fluctu-ations. A DFA exponent describes how deviations from a trendincrease as we change the scale (or the size of the window) atwhich we view them. We follow [15] to extract the DFA expo-nent from our vowel samples, and describe the potential pitfallsof DFA. Little et al. extracted DFA exponents from sustainedphonation recorded at 25 kHz [15]. They used linear detrendingwithin each window, and these windows ranged in size from 50samples (2 ms) to 100 samples (4 ms). In extracting DFA expo-nents, rather than downsampling our data, which may obscuresome of the fluctuations we are aiming to characterize, we scalethe window sizes so that our windows capture the same tempo-ral information as captured in [15]. Our smallest windows are88 samples and our largest windows are 176 samples.Bryce et al. describe how DFA, although intended to workwith nonstationary data, can still fail to detrend in many cases[24]. In particular, when a series has strong periodic compo-nent, as speech does, DFA may not be able to effectively detrendwithin each window. Attempting to detrend using DFA may in-troduce artifacts that distort the DFA exponent. We explore thispossibility by looking at speech within various windows, andfind that, especially as the windows increase in length, the un-derlying trends are not linear. This is illustrated with an exam-ple in Figure 1. To prevent DFA from introducing artifacts as ittries to detrend, Bryce et al. suggest explicitly detrending databefore performing fluctuation analysis. This fluctuation anal-ysis will return an estimate of the Hurst exponent (HE), whichcorresponds with the DFA exponent although the HE is intendedfor detrended series. We explore this option next.
Trend and fluctuation measures.
We propose a pipelinefor explicitly detrending a vowel before performing fluctuationanalysis. This method is reliant on empirical mode decomposi-tion (EMD), which is an adaptive sifting method that works inthe time domain and separates a signal into intrinsic mode func-tions (IMFs) that have varying frequency content. The first IMFcontains the highest frequency content, and thus contains thenoise-like fluctuations in the signal. As we remove the trends,we also quantify them to explore their relevance to HD man-ifestation. Previous work has suggested that nonstationaritiesintroduced in connected speech result from movement of articu- Figure 1:
An example of a vowel and the first step of DFA whichincludes mean-adjusting and integrating the vowel. We high-light the signal within windows of 88 and 176 samples of theintegrated vowel. While the smallest window (88 samples) hasa clear linear trend, the larger window (176 samples) does not.This suggests that linear detrending may not be effective. lators, such as the tongue and lips [15]. Because HD limits thesemovements, we suspect individuals with manifest HD may ex-hibit different nonstationarities than individuals with premani-fest HD. This approach, summarized in Figure 2, provides threemeasures for each vowel: the standard deviation of the multi-plicative trend, the standard deviation of the additive trend, andan estimate of the HE of the noise-like fluctuations . The re-mainder of this section will describe these steps. Quantifying and removing the multiplicative trend.
A mul-tiplicative trend in speech indicates changes in volume. Pre-vious work has highlighted monoloudness as a symptom ofHD [13]. However, individuals with manifest HD also exhibitprolonged sounds, which may result in an increase in volumechanges around vowels. We address this trend by first calculat-ing the average decibels relative to full scale (dBFS, a measureof amplitude) of the vowel. We then apply a filter, with a win-dow of 25 ms and a shift of 10 ms, to calculate the averagedBFS within each window. We calculate the standard devia-tion of these dBFS values and save it as a feature: the standarddeviation of the multiplicative trend . Finally, we correct forthis trend by applying the necessary gain or decay to each win-dow so that it matches the average dBFS of the entire vowel.The goal of removing this trend is to work toward a detrendedsignal from which we can analyze fluctuations.
Separating the signal into IMFs.
In order to avoid conflat-ing volume information in the rest of our analysis, we normalizethe amplitude of each signal to [-1,1]. We then apply EMD toseparate the signal into IMFs [31]. The first IMF will have thehighest frequency content, and later IMFs will have lower fre-quency content. The main advantage of this filtering approach isthat it does not make assumptions about the type of periodicity.
Quantifying the additive trend.
An additive trend in speechis a potential artifact of coarticulation due to movement of thearticulators. We hypothesize that individuals with manifest HDexercise less articulator movement, which may be evident infewer changes in this trend. Chatlani et al. explore IMF char-acteristics of voiced sounds, and they provide methods to as- Code available at https://github.com/amritkromana/FVDM igure 2:
Pipeline for extracting three vowel measures: 1) standard deviation of the multiplicative trend; 2) standard deviation of theadditive trend; and 3) HE of the first IMF. EMD=empirical model decomposition, IMF=intrinsic mode function, HE=Hurst exponent
Figure 3:
IMF Variance for random vowel samples sociate certain IMFs with signal information and certain IMFswith low-frequency trend information [32]. They demonstratedhow the variance of each IMF drops after the fourth IMF, andsuggest that the first four IMFs contain relevant signal informa-tion. We repeat this analysis using 50 random vowel samplesfrom our dataset. Figure 3 illustrates that within our data thefirst five IMFs have higher variance, after which variance drops.This difference in which IMFs contain signal (the first four ver-sus the first five) may be due to different recording conditionsor the fact that our dataset includes disordered speech. Basedon this analysis, we sum IMFs higher than five as the additivetrend. We calculate the standard deviation of this trend and saveit as a feature: the standard deviation of the additive trend . Analyzing fluctuations.
The first IMF contains the highestfrequency content, while the higher IMFs are more likely tocontain periodic components. Thus we focus on capturing fluc-tuations within the first IMF. The third vowel feature we intro-duce is the
Hurst exponent (HE) of the first IMF . A HE closerto 1 indicates more smoothness, whereas a HE closer to 0 indi-cates more roughness. We hypothesize that the HE of the firstIMF will be lower for individuals with manifest HD comparedto individuals with premanifest HD due to vowel distortion.To improve the robustness of our measure, we make the fol-lowing three modifications to our fluctuation analysis comparedto that in [15]. First, we look for deviations from the meanwithin each window, rather than applying any linear detrending.Next, we expand the maximum window size as recommendedin [24]. We make the assumption that individuals have a funda-mental frequency of at least 100 Hz, and we set the largest win-dow to 441 samples (10 ms) to capture fluctuations across eachindividual’s vocal cord vibration cycle. Finally, we include afiltering step to ensure we are reliably estimating the HE. Mea- Table 2:
Number of vowel samples per individual before andafter reduction for R > Premanifest HD Manifest HD
SSV Before 10 ± ± ± ± ± ± ± ± R of less than 0.99. Table 2 displays thenumber of vowel samples used in the HE estimates for each in-dividual before and after this reduction, and illustrates that themajority of vowels satisfy these requirements. However, this re-duction in samples may limit the usefulness of certain statistics,such as the range and standard deviation. In future work we willcontinue to evaluate what vowel characteristics lead to certainvowels not producing linear behavior within log-log plots.In summary, we propose three features:1. Standard deviation of the multiplicative trend, which weexpect will be higher for individuals with manifest HDdue to prolonged sounds2. Standard deviation of the additive trend, which we ex-pect may be lower for individuals with manifest HD dueto less articulator movement3. HE of the first IMF, which we expect will be lower for in-dividuals with manifest HD due to distortion in the vowelable 3: Spearman correlation coefficients between vowel features and manifest label for each sample type. Significant correla-tions (p < Sample Stat
Vowellength DFAexponent
Trend and fluctuation measures
HE offirst IMF SD of themult. trend SD of theadd. trend SV -0.48 -0.23 -0.08 SSV min - -0.30 -0.24 0.08 -0.31med - -0.27 -0.26 0.21 -0.45 max - -0.04 -0.13 0.20 -0.19range - 0.31 0.11 0.20 -0.12mean - -0.28 -0.20 0.21 -0.36
SD - 0.35 0.06 0.20 -0.13GFPV min 0.09 -0.17 -0.39 -0.36 med -0.20 -0.53 -0.06 -0.39 0.55 -0.16range -0.13mean -0.17 -0.53 0.38 -0.28SD -0.11
5. Results
Table 3 lists the correlations between the vowel measures ex-tracted from each sample type and the binary manifest labels.Note that we do not extract vowel length from SSV, as thoselengths were randomly chosen. We highlight the correlationsgreater than 0.5 between several vowel features (specificallyvowel length, the HE of the first IMF, and the standard devi-ation of the multiplicative trend) and the manifest label whenthese features are extracted from vowels in connected speech(GFPV). We further analyze these findings in this section.Vowel length is negatively correlated with HD manifesta-tion when extracted from SV but positively correlated with HDmanifestation when extracted from GFPV. This is consistentwith HD symptoms, including prolonged sounds [13].The DFA exponent, extracted using the method outlined in[15], is not significantly correlated with HD manifestation whenextracted from any of the vowel samples. This suggests that thismeasure may not be measuring fluctuations accurately in ourdataset, potentially due to detrending problems.The HE of the first IMF is significantly correlated with HDmanifestation when extracted from GFPV, but not SSV or SV.For longer samples, we find that EMD separates the signal intoa much larger number of IMFs. For SV, the noise is potentiallycontained within higher IMFs as opposed to just the first IMF,and future work will explore this possibility. For SSV, we donot find significant correlations, but we find that the direction ofthese correlations is consistent with GFPV. This suggests thatvowel distortion is more pronounced within read speech. Withinthe GFPV we find that the median and mean of this feature arecorrelated with HD manifestation with a coefficient of -0.53.The standard deviation of the multiplicative trend is posi-tively correlated with HD manifestation when measured fromall the samples, and significantly so when extracted from SVand GFPV. We suspect the higher correlations with GFPV sam-ples may be related to the HD symptoms of prolonged soundsand an increase in pauses, which imply phones are more likelyto exhibit volume variations. Finally, we find significant correlations between the stan-dard deviation of the additive trend and HD manifestation. Thenegative correlation between the minimum of this feature fromGFPV and manifest HD is consistent with less articulator move-ment due to disease manifestation. However, we find this nega-tive correlation within SSV where we do not expect articulatormovement. Future work will aim to analyze the factors driv-ing these correlations. Finally, we note the high correlation be-tween this measure when extracted from SV. Again, we believethis is due to differences in IMF characteristics across shorterand longer samples. The SV samples were decomposed into ahigher number of IMFs, so IMF five and higher likely containdifferent frequencies in SV compared to SSV and GFPV.
We explore the feasibility of detecting HD manifestation fromspeech by training a logistic regression model to classify aspeaker as having premanifest or manifest HD. We train themodel using a leave-one-subject-out paradigm, meaning thatfor each participant, we train a model using data from all ofthe other participants, and use this model to classify the heldout participant. We implement the logistic regression modelusing scikit-learn [33]. We use the Limited-memory Broy-den–Fletcher–Goldfarb–Shannon (LBFGS) solver with L2 reg-ularization (C=1.0). We perform z-score normalization on eachof the features, using the mean and standard deviation of thetraining set. We also perform feature selection using the train-ing set. We limit the model to using ten features, and choosethose features to maximize relevance to the label while mini-mizing multicollinearity. More specifically, we begin with a listof potential features which are correlated with the manifest la-bel, indicated by a Spearman correlation p-value of less than0.1. We then include the feature with the strongest correlationcoefficient in our confirmed feature set. We then calculate thevariance inflation factor (VIF) between the confirmed featureset and each feature in the potential feature list. We remove fea-tures from the potential feature list if their VIF is greater than5, as this indicates multicollinearity [34]. With our reduced po-tential feature list, we move the one feature with the strongestable 4:
Accuracy and F1 scores for classifying premanifest vsmanifest HD using the baseline features, vowel features, andcombining the feature sets. We use our feature selection methodto choose 10 features for each experiment, and for the exper-iment that combines baseline features with vowel features weenforce 5 features from the baseline set and 5 from the vowelset. Best scores for each feature set tested in bold.
Features Accuracy F1 score
Pauses 0.67 0.71Rate 0.63 0.67GOP 0.63 0.65Fillers 0.63 0.62
All Baseline 0.73 0.76
Vowel length 0.73 0.76Trend + Fluctuation 0.67 0.71
All Vowel Features 0.87 0.88Baseline Features +Vowel Features 0.80 0.83 correlation coefficient to our confirmed feature set. We repeatthis process until the confirmed feature set includes ten featuresor the list of potential features is empty.We compare the classification accuracy within and acrossdifferent feature sets: baseline, vowel, and both combined. Weanalyze the relevance of baseline features which have beendemonstrated to be relevant to separating healthy controls fromgene-positive individuals [10]. We then assess the accuracy ofa model using the vowel features extracted from the GFP. Weuse vowel features extracted from the GFP because the base-line features were extracted from the GFP, and this provides themost insight into how to passively predict manifestation fromconnected speech. We do not present results here for using theDFA exponent, as these features were not significantly corre-lated with the manifest label and as a result these features werenot selected by the feature selector. Finally, we evaluate the ac-curacy of a model that combines the baseline features with thevowel features. The results for each experiment are in Table 4.The speech rate, goodness of pronunciation, and filler fea-ture sets each perform comparably with 63% accuracy, and thepause features perform slightly better with 67% accuracy. Wefind combining all of the baseline feature sets improves perfor-mance, reaching 73% accuracy. Evaluating the vowel featuresindividually, we find that vowel length performs the best, alsowith 73% accuracy. Adding trend and fluctuation features in-creases this accuracy to 87%. In combining baseline featureswith vowel features, we limited our feature selection process tochoose up to five features from the baseline set and five featuresfrom the vowel set. However, we find that this does not performas well as the vowel features on their own. Overall, our resultssuggest that the baseline features, which separated healthy con-trols from gene-positive individuals with high accuracy [10], arenot as relevant to classifying HD manifestation.We further analyze the features selected and their beta co-efficients to understand the relative importance of features andensure that the model is interpretable. We perform this anal-ysis with our best-performing experiment, which included allvowel features. Table 5 lists the features that were selected inthe majority of training folds, as well as the mean and standarddeviation of derived coefficients. The range of vowel length Table 5:
Most common features selected from manifest HD clas-sification experiment with all vowel features. Each of these fea-tures was selected in the majority of training folds with signif-icant beta coefficients (p-values < Feature β Mean of vowel length 0.79 ± ± ± ± ± ± ± ± ±
6. Conclusions and Future Work
In this paper, we present a small and interpretable feature setto capture changes in vowels with HD manifestation. We showthat these features can classify HD manifestation with 87% ac-curacy. These results bring us closer to being able to passivelydetect HD manifestation. These features also provide an av-enue for understanding the changes in vowels within connectedspeech as they relate to other neurocognitive disorders.In future work, we will focus on coupling these techniqueswith vowel detection, so we can automatically extract these fea-tures. After automating the extraction of these measure, we planon evaluate their relevance to additional datasets, including un-derstanding vowel characteristics for healthy controls, classify-ing manifest HD from spontaneous speech, and understandingother neurocognitive conditions.Our future work will also explore additional vowel features.Riad et al. recently analyzed characteristics of sustained vow-els for predicting HD severity [35]. Several of our findings areconsistent with theirs, namely that vowel length is a significantindicator of disease manifestation while DFA is not. Riad et al.also find a number of voice break and MFCC-related featuresto be related to disease manifestation and severity. In our futurework we will explore if these features are similarly related toHD severity when extracted from vowels in connected speech. . Acknowledgements
We thank the investigators and coordinators of this study, thestudy participants, the Huntington Study Group, and the Hunt-ington’s Disease Society of America. This work was supportedby the National Institutes of Health (NIH), National Center forAdvancing Translational Sciences (UL1TR000433), the HeinzC Prechter Bipolar Research Fund, and the Richard Tam Foun-dation at the University of Michigan. A portion of this studysample was collected in conjunction with NIH, National In-stitute of Neurological Disorders and Stroke (R01BS077946)and/or Enroll-HD (funded by the CHDI Foundation).
8. References [1] J. P. G. Vonsattel and M. DiFiglia, “Huntington disease,”
Journalof neuropathology and experimental neurology , vol. 57, no. 5, p.369, 1998.[2] J. Snowden, D. Craufurd, H. Griffiths, J. Thompson, andD. Neary, “Longitudinal evaluation of cognitive disorder in hunt-ington’s disease,”
Journal of the International Neuropsychologi-cal Society , vol. 7, no. 1, pp. 33–44, 2001.[3] J. S. Paulsen, R. Ready, J. Hamilton, M. Mega, and J. Cummings,“Neuropsychiatric aspects of huntington’s disease,”
Journal ofNeurology, Neurosurgery & Psychiatry , vol. 71, no. 3, pp. 310–314, 2001.[4] J. D. Long, J. S. Paulsen, K. Marder, Y. Zhang, J.-I. Kim, J. A.Mills, and R. of the PREDICT-HD Huntington’s Study Group,“Tracking motor impairments in the progression of huntington’sdisease,”
Movement Disorders , vol. 29, no. 3, pp. 311–319, 2014.[5] M. Duyao, C. Ambrose, R. Myers, A. Novelletto, F. Persichetti,M. Frontali, S. Folstein, C. Ross, M. Franz, M. Abbott et al. ,“Trinucleotide repeat length instability and age of onset in hunt-ington’s disease,”
Nature genetics , vol. 4, no. 4, pp. 387–392,1993.[6] Z. Kabelac, C. G. Tarolli, C. Snyder, B. Feldman, A. Glidden, C.-Y. Hsu, R. Hristov, E. R. Dorsey, and D. Katabi, “Passive monitor-ing at home: A pilot study in parkinson disease,”
Digital Biomark-ers , vol. 3, no. 1, pp. 22–30, 2019.[7] L. R. Kaploun, J. H. Saxman, P. Wasserman, and K. Marder,“Acoustic analysis of voice and speech characteristics in presymp-tomatic gene carriers of huntington’s disease: biomarkers forpreclinical sign onset?”
Journal of Medical Speech-LanguagePathology , vol. 19, no. 2, pp. 49–65, 2011.[8] A. P. Vogel, C. Shirbin, A. J. Churchyard, and J. C. Stout, “Speechacoustic markers of early stage and prodromal huntington’s dis-ease: a marker of disease onset?”
Neuropsychologia , vol. 50,no. 14, pp. 3273–3278, 2012.[9] W. Hinzen, J. Rossell´o, C. Morey, E. Camara, C. Garcia-Gorro,R. Salvador, and R. de Diego-Balaguer, “A systematic linguisticprofile of spontaneous narrative speech in pre-symptomatic andearly stage huntington’s disease,”
Cortex , vol. 100, pp. 71–83,2018.[10] M. Perez, W. Jin, D. Le, N. Carlozzi, P. Dayalu, A. Roberts, andE. M. Provost, “Classification of huntington disease using acous-tic and lexical features.” in
Interspeech , 2018, pp. 1898–1902.[11] F. L. Darley, A. E. Aronson, and J. R. Brown, “Differential di-agnostic patterns of dysarthria,”
Journal of speech and hearingresearch , vol. 12, no. 2, pp. 246–269, 1969.[12] L. Hartelius, A. Carlstedt, M. Ytterberg, M. Lillvik, andK. Laakso, “Speech disorders in mild and moderate huntingtondisease: Results of dysarthria assessments of 19 individuals,”
Journal of Medical Speech-Language Pathology , vol. 11, no. 1,pp. 1–15, 2003.[13] S. K. Diehl, A. S. Mefferd, Y.-C. Lin, J. Sellers, K. E. McDonell,M. de Riesthal, and D. O. Claassen, “Motor speech patterns inhuntington disease,”
Neurology , vol. 93, no. 22, pp. e2042–e2052,2019. [14] K. Elemetrics, “Disordered voice database,” 1994.[15] M. A. Little, P. E. McSharry, S. J. Roberts, D. A. Costello, andI. M. Moroz, “Exploiting nonlinear recurrence and fractal scalingproperties for voice disorder detection,”
Biomedical engineeringonline , vol. 6, no. 1, p. 23, 2007.[16] P. Henr´ıquez, J. B. Alonso, M. A. Ferrer, C. M. Travieso, J. I.Godino-Llorente, and F. D´ıaz-de Mar´ıa, “Characterization ofhealthy and pathological voice through measures based on nonlin-ear dynamics,”
IEEE transactions on audio, speech, and languageprocessing , vol. 17, no. 6, pp. 1186–1195, 2009.[17] G. Vaziri, F. Almasganj, and R. Behroozmand, “Pathological as-sessment of patients’ speech signals using nonlinear dynamicalanalysis,”
Computers in biology and medicine , vol. 40, no. 1, pp.54–63, 2010.[18] A. Tsanas, M. A. Little, P. E. McSharry, J. Spielman, and L. O.Ramig, “Novel speech signal processing algorithms for high-accuracy classification of parkinson’s disease,”
IEEE transactionson biomedical engineering , vol. 59, no. 5, pp. 1264–1271, 2012.[19] J. Orozco-Arroyave, F. H¨onig, J. Arias-Londo˜no, J. Vargas-Bonilla, K. Daqrouq, S. Skodda, J. Rusz, and E. N¨oth, “Auto-matic detection of parkinson’s disease in running speech spokenin three different languages,”
The Journal of the Acoustical Soci-ety of America , vol. 139, no. 1, pp. 481–500, 2016.[20] A. Satt, R. Hoory, A. K¨onig, P. Aalten, and P. H. Robert, “Speech-based automatic and robust detection of very early dementia,” in
Fifteenth Annual Conference of the International Speech Commu-nication Association , 2014.[21] A. K¨onig, A. Satt, A. Sorin, R. Hoory, O. Toledo-Ronen, A. Der-reumaux, V. Manera, F. Verhey, P. Aalten, P. H. Robert et al. ,“Automatic speech analysis for the assessment of patients withpredementia and alzheimer’s disease,”
Alzheimer’s & Dementia:Diagnosis, Assessment & Disease Monitoring , vol. 1, no. 1, pp.112–124, 2015.[22] L. Zhou, K. C. Fraser, and F. Rudzicz, “Speech recognition inalzheimer’s disease and in its assessment.” in
Interspeech , 2016,pp. 1948–1952.[23] M. Little, P. McSharry, E. Hunter, J. Spielman, and L. Ramig,“Suitability of dysphonia measurements for telemonitoring ofparkinson’s disease,”
Nature Precedings , pp. 1–1, 2008.[24] R. Bryce and K. Sprague, “Revisiting detrended fluctuation anal-ysis,”
Scientific reports , vol. 2, p. 315, 2012.[25] F. L. Darley, A. E. Aronson, and J. R. Brown, in
Motor speechdisorders . W.B. Saunders Company, 1975.[26] K. Kieburtz, J. B. Penney, P. Corno, N. Ranen, I. Shoulson, A. Fei-gin, D. Abwender, J. T. Greenarnyre, D. Higgins, F. J. Marshall et al. , “Unified huntington’s disease rating scale: reliability andconsistency,”
Neurology , vol. 11, no. 2, pp. 136–142, 2001.[27] D. Liu, J. D. Long, Y. Zhang, L. A. Raymond, K. Marder,A. Rosser, E. A. McCusker, J. A. Mills, J. S. Paulsen, P.-H. Inves-tigators et al. , “Motor onset and diagnosis in huntington diseaseusing the diagnostic confidence level,”
Journal of neurology , vol.262, no. 12, pp. 2691–2698, 2015.[28] I. Shoulson, R. Kurlan, A. J. Rubin, D. Goldblatt, J. Behr,C. Miller, J. Kennedy, K. A. Bamford, E. D. Caine, D. K. Kido et al. , “Assessment of functional capacity in neurodegenerativemovement disorders: Huntington’s disease as a prototype,”
Quan-tification of neurologic deficit. Boston: Butterworths , pp. 271–283, 1989.[29] K. Marder, H. Zhao, R. Myers, M. Cudkowicz, E. Kayson,K. Kieburtz, C. Orme, J. Paulsen, J. Penney, E. Siemers et al. ,“Rate of functional decline in huntington’s disease,”
Neurology
IEEE Transactionson Audio, Speech, and Language Processing , vol. 20, no. 4, pp.1158–1166, 2011.[33] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg,J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot,and E. Duchesnay, “Scikit-learn: Machine learning in Python,”
Journal of Machine Learning Research , vol. 12, pp. 2825–2830,2011.[34] J. H. Kim, “Multicollinearity and misleading statistical results,”
Korean journal of anesthesiology , vol. 72, no. 6, p. 558, 2019.[35] R. Riad, H. Titeux, L. Lemoine, J. Montillot, J. H. Bagnou,X. N. Cao, E. Dupoux, and A.-C. Bachoud-L´evi, “Vocal markersfrom sustained phonation in huntington’s disease,” arXiv preprintarXiv:2006.05365arXiv preprintarXiv:2006.05365