Objective Assessment of Social Skills Using Automated Language Analysis for Identification of Schizophrenia and Bipolar Disorder
Rohit Voleti, Stephanie Woolridge, Julie M. Liss, Melissa Milanovic, Christopher R. Bowie, Visar Berisha
OObjective Assessment of Social Skills Using Automated Language Analysis forIdentification of Schizophrenia and Bipolar Disorder
Rohit Voleti , Stephanie Woolridge , Julie M. Liss , Melissa Milanovic , Christopher R. Bowie ,Visar Berisha , School of Electrical, Computer, & Energy Engineering Department of Speech & Hearing Science, Arizona State University, Tempe, AZ, USA Department of Psychology, Queen’s University, Kingston, ON, Canada [email protected], [email protected], [email protected], [email protected]@queensu.ca, [email protected]
Abstract
Several studies have shown that speech and language features,automatically extracted from clinical interviews or spontaneousdiscourse, have diagnostic value for mental disorders such asschizophrenia and bipolar disorder. They typically make use ofa large feature set to train a classifier for distinguishing betweentwo groups of interest, i . e . a clinical and control group. How-ever, a purely data-driven approach runs the risk of overfitting toa particular data set, especially when sample sizes are limited.Here, we first down-select the set of language features to a smallsubset that is related to a well-validated test of functional ability,the Social Skills Performance Assessment (SSPA). This helpsestablish the concurrent validity of the selected features. We useonly these features to train a simple classifier to distinguish be-tween groups of interest. Linear regression reveals that a subsetof language features can effectively model the SSPA, with a cor-relation coefficient of . . Furthermore, the same feature setcan be used to build a strong binary classifier to distinguish be-tween healthy controls and a clinical group (AUC = 0 . ) andalso between patients within the clinical group with schizophre-nia and bipolar I disorder (AUC = 0 . ). Index Terms : computational linguistics, schizophrenia, bipolardisorder, semantic coherence, natural language processing
1. Introduction & Previous Work
In the United States alone, the National Institute of MentalHealth (NIMH) in 2016 estimated that ∼ . million individ-uals live with a form of severe mental illness, approximately . of the adult population [1]. Among these are schizophre-nia and bipolar disorder , for which diagnosis is difficult andtreatment costs are disproportionately high [2]. Additionally,differential diagnosis of bipolar I disorder and schizophreniais often difficult, with some estimating about % of bipolarI patients are misdiagnosed [3]. Therefore, there is a demandfor effective methods with which we can classify and trackthe progress of treatment in these conditions. Language im-pairments are a well-known component of schizophrenia andbipolar disorder, including symptoms like alogia (poverty ofspeech) or development of formal thought disorder (FTD), in-cluding schizophasia (”word salad” or semantically incoherentutterances) [4]. These impairments are typically assessed byclinical interviews, but few quantitative measures exist for mea-suring them objectively. Recent work in computational linguis-tics and natural language processing (NLP) have paved the way Accepted to INTERSPEECH 2019 in Graz, Austria for research into computational psychiatry to objectively assessthe degree of language impairment [5]. Several recent studieshave made use of these tools for psychiatric evaluation, but theirpresence in clinical practice is still largely absent [6]. In thispaper, we aim to bridge this gap by presenting an objective andinterpretable panel of language features for assessment of pa-tients with schizophrenia and bipolar disorder that is anchoredto a well-validated clinical assessment of social skills.Most existing work in this area takes a largely data-drivenapproach to language analysis, considering a host of semanticand lexical complexity measures over a large variety of lan-guage elicitation tasks [7, 8, 9, 10, 11, 12]. Semantic featuresare often captured with numerical word and sentence embed-dings , in which words, sentences, phrases, etc .are representedin high-dimensional vector space; typically, words that are se-mantically similar are embedded close together in this vec-tor space, e . g . latent semantic analysis (LSA) [13], word2vec [14], and several others. Another measure of semantics can beachieved by topic modeling, such as with latent dirichlet allo-cation (LDA) [15]. Semantic features are often combined withother lexical measures of language complexity to improve clas-sification performance. Some examples are “surface features”( i . e . words per sentence, speaking rate, etc .) with tools like Lin-guistic Inquiry and Word Count (LIWC) [16], statistical lan-guage features ( n -gram word likelihoods) [7], part-of-speechtag statistics [8, 9, 17], and sentiment analysis [18, 19, 20].Despite promising early results, these tools are not cur-rently used in clinical practice. We posit that this is because thelarge and varied feature space, the variability associated withthe speech elicitation tasks, and the small sample sizes makeit difficult to develop reliable and interpretable algorithms thatgeneralize. As patient data is scarce, the identification of a stan-dard set of important, interpretable, and easy-to-compute lan-guage features that clinicians can use is a significant hurdle toovercome. We address this by evaluating the language of pa-tients with schizophrenia, patients with bipolar I disorder, andhealthy control subjects on the Social Skills Performance As-sessment (SSPA) [21], a well-validated test of social functionalcompetence (described in Section 2). Our approach is motivatedby our previous work in interpretable clinical-speech analytics[22]. First, we identify a subset of language measures that reli-ably model clinical SSPA scores. Next, we use only this reducedfeature set to perform two classification problems: ( ) distin-guishing between healthy controls and clinical subjects and ( )distinguishing patients with schizophrenia/schizoaffective dis-order (Sz/Sza) and bipolar I patients within the clinical group.To the best of our knowledge, this is the first study to establish a r X i v : . [ c s . C L ] J u l set of language measures that jointly assess social skills and uses those features to accurately classify all groups of interest.
2. SSPA Data Collection
Our study involves the analysis of interview transcripts col-lected from a total of clinical subjects and healthy con-trols that participated in the SSPA task described by Patterson etal . [21]. Of the clinical population, had been diagnosed withbipolar I disorder and had been diagnosed with schizophre-nia or schizoaffective disorder (considered together in this anal-ysis). The SSPA interviews are described by Bowie et al . in[23]. The transcriptions used in our analysis were completed atQueen’s University in Kingston, ON, Canada.The task consists of three role-playing scenes: ( ) -minutepractice scene of making plans with a friend (not scored), ( ) minutes of greeting a new neighbor, and ( ) minutes of ne-gotiation with a recalcitrant landlord over fixing an unrepairedleak. Each session was recorded and scored by trained researchassistants upon reviewing the recording. Scene (new neigh-bor) and Scene (negotiation with landlord) were scored ona scale of (low) to (high) on several categories, i . e . inter-est/disinterest, fluency, clarity, social appropriateness, negotia-tion ability, etc . A composite score for each scene and an over-all score is computed by averaging Scene and Scene scores.Bowie et al . identified group differences between the scoresof both clinical populations and healthy control subjects in [23]by evaluation on the SSPA task and several other clinical mea-sures. In this work, we aim to automate this task with a subsetof language metrics from the SSPA transcripts. As stated inSection 1, our first goal is to identify semantic and lexical fea-tures from which we can reliably predict SSPA performance.Then, we test the ability of these features to differentiate be-tween healthy control and clinical populations, and we also testtheir ability to differentiate within the distinct groups in the clin-ical population.
3. Computed Language Features
In our work, we attempt to identify a comprehensive set of ob-jective language measures from which we can model and pre-dict SSPA performance and classify individuals using these fea-tures. Inspired by much of the previous work described in Sec-tion 1, we theorized that it is critical to consider language fea-tures that model semantic coherence through the use of wordand sentence embeddings. We focused on a few pre-trained neu-ral embedding models that are publicly available and known tomodel semantic similarity accurately. Additionally, we considera set of lexical complexity features that are measures of lexicaland syntactic complexity, described below.
Many of the previously described studies in this area in-volve computing a notion of semantic coherence in languagewith the use of word embeddings in high-dimensional vectorspace, either with LSA or neural word embedding techniques[7, 8, 9, 10]. In nearly all cases, word and sentence/phrase em-bedding pairs, denoted by vectors a and b , are evaluated withthe notion of cosine similarity , a measure of the cosine of theangle θ a , b between the two vectors. We also use cosine similar-ity as a measure of pairwise sentence similarity, but with somemodifications in implementation due the difference in the natureof the SSPA task and data collection. Our work differs from several of the previously discussedstudies in that we are interested in conversational semantic sim-ilarity between the subject and clinical assessor in each of thethree scenes of the SSPA task. Therefore, we sought to uti-lize some of the latest sentence/phrase embedding methods tocompute a vector representation for each assessor and subjectspeaking turn. Then, we used the cosine similarity to computethe similarity score between each consecutive assessor + sub-ject speaking turn, generating a distribution of similarity scoresfor each embedding method for each subject in each transcribedscene. The following sentence embedding representations areused in our analysis: ( ) an unweighted bag-of-words (BoW)average for all word vectors based on the pre-trained skip-gram implementation of word2vec trained on the Google News cor-pus [14], ( ) Smooth Inverse Frequency (SIF) with pre-trainedskip-gram word2vec vectors [24], and ( ) InferSent (INF) sen-tence encodings based on pre-trained
FastText vectors [25]. TheBoW average of vectors and SIF embeddings showed goodbaseline performance in [10], and we additionally included
In-ferSent , a deep neural network sentence encoder, due to itsstrong performance on semantic similarity tasks. Then, basicstatistics for the similarity score distribution were computed foreach subject and transcribed scene. These included minimum,maximum, mean, median, th percentile, and th percentilecoherence. While semantic coherence measures are often the most effec-tive at classifying patients with schizophrenia and bipolar dis-order, several other linguistic complexity measures are used fora more holistic analysis. We consider a subset of these features,computed for the entire set of subject responses across all threescene transcripts.
Lexical diversity refers to unique vocabulary usage for aparticular subject and for which several measurement tech-niques exist. The type-to-token ratio (TTR) is a well-knownmeasure of lexical diversity, in which the number of uniquewords (word types , V ) are compared against the total numberof words (word tokens , N ): TTR = V / N . However, TTRis known to be negatively impacted for longer utterances, asthe diversity of unique words plateaus as the number of totalwords increase. Hence, we consider a small selection of mod-ified measures for lexical diversity in our work. The movingaverage type-to-token ratio (MATTR) [26] is one such methodwhich aims to reduce the dependence on text length by consid-ering TTR over a sliding window of the text. Brun´et’s Index (BI) [27], defined in Equation (1), is another measure of lex-ical diversity that has a weaker dependence on text length. Asmaller value indicates a greater degree of lexical diversity
BI = N V − . (1)An alternative is also provided by Honor´e’s Statistic (HS) [28],defined in Equation (2), which emphasizes the use of words thatare spoken only once (denoted by V ). HS = 100 log N − V / V (2)MATTR, BI, and HS have been used successfully in computa-tional linguistics studies for patients with Alzheimer’s disease[17, 29] and may prove to be similarly useful in our task.Because we expect schizophrenia and bipolar patients tosometimes exhibit poverty of speech, we considered a few mea-sures of lexical and syntactic complexity in our work. .0 2.5 3.0 3.5 4.0 4.5 5.0 SSPA Mean Score (Both Scenes) P r e d i c t e d SS P A M e a n S c o r e ( B o t h S c e n e s ) Linear Regression of Mean SSPA Scores
Sz/SzaBipolarControl
Figure 1:
A linear regression model was fit using out of the semantic coherence and linguistic complexity features fromthe subject responses to predict the SSPA scores. Correla-tion Coefficient = . , Mean Absolute Error = . , RootMean Square (RMS) Error = . Lexical density , which quantifies the degree of informationpackaging in a given text, is defined as the proportion of contentwords ( i . e . nouns, verbs, adjectives, adverbs) [30]. Typically,these words convey more information than function words , e . g .prepositions, conjunctions, interjections, etc . We make use ofthe Stanford tagger [31] to compute POS tags to determine thenumber of function words (FUNC) and total words (W) andmeasure FUNC / W , which represents an inverse of the lexical den-sity. A related, more granular measure is the proportion ofinterjections (UH) to the total words, which is given by UH / W .The mean length of sentence (MLS) is another easily computedmeasure which we expect to be lower for clinical subjects whencompared with healthy controls. Finally, we considered parsetree statistics, computed using the Stanford Parser [32]. Thisincludes the parse tree height and Yngve depth scores (mean,total, and maximum), a measure of embedded clause usage [33].
4. Results & Discussion
We first sought to determine a subset of language features (de-scribed in Section 3) from which we can accurately model theclinical SSPA scores. A total of features were considered: semantic features ( statistical features × sentence embed-ding types × scenes) and linguistic complexity featurescomputed over all three scenes concatenated. Next, we aim todetermine the predictive power of the selected subset of thesefeatures in separating the groups of interest ( i . e . Sz/Sza, bipolarI disorder, and healthy control subjects). The regression andclassification models built with these features were designedand tested using WEKA [34]. It is important to note that theSSPA itself is correlated to the clinical diagnosis and has beeneffective in differentiating groups of interest [23]. As a result,we note that using it to select features may result in overly-optimistic classification performance for the clinical vs. healthycontrol and Sz/Sza vs. bipolar disorder classification problems. However, due to the relative dearth of available data in this area,we performed this analysis on the same dataset.Table 1: Selected features to model SSPA scores with a linearregression model, including ranking of overall importance foreach feature. Italicized features were included in both the 25feature and 15 feature classification problems.
Category Features RankSemantic Coherence
BoW mean scene 3 INF minimum scene 3 SIF 90 th percentile scene 3 INF maximum scene 2 INF median scene 3 BoW median scene 3 BoW minimum scene 2 BoW st. dev. scene 2 BoW maximum scene 3 INF st. dev. scene 3 th percentile scene 2 19BoW st. dev. scene 3 20BoW 90 th percentile scene 3 21INF mean scene 3 22INF 10 th percentile scene 3 23BoW 10 th percentile scene 2 24Lexical Diversity MATTR Brun´et’s index
FUNC/W UH/W
Maximum Yngve depth
We use a greedy stepwise search (with linear regression)through the feature space to determine the optimal subset ofthe features which accurately model the SSPA scores for all subjects without considering the group variable. We down-selected to a set of computed features out of the original . These are briefly summarized in Table 1, and the result-ing regression model (evaluated using leave-one-out) is shownin Figure 1. We notice that several of the coherence statisticsfor Scene (negotiation with landlord) are particularly influ-ential when tracking the assigned SSPA score with this model.Interestingly, the top three coherence statistics include a bag-of-words average of word2vec vectors (BoW mean scene ),an InferSent sentence encoding (INF minimum scene ), and aSIF embedding (SIF th percentile scene ), indicating a va-riety of embeddings and range of statistics all provide usefulinformation in predicting SSPA performance. We also note thata variety of lexical diversity (MATTR, Brun´et’s index), lexicaldensity ( FUNC / W , UH / W ) and syntactic complexity (maximum Yn-gve depth) measures are among the most influential, confirmingthe benefit of a complementary set of language measures. Next, we aim to determine the ability of this subset of languagefeatures to correctly predict which subjects fall into the groupsof interest. We performed two separate classification tasks: ( )separation of the clinical and healthy control groups, ( ) sep-aration within the clinical group between Sz/Sza subjects andbipolar I subjects. Both a logistic regression (LR) and a na¨ıve .0 0.2 0.4 0.6 0.8 1.0 False Positive Rate (FPR) T r u e P o s i t i v e R a t e ( T P R ) Receiver Operating Characteristic
Clinical vs. ControlLog. Reg. / 25 features:AUC=0.960Sz/Sza vs BipolarN. Bayes / 25 features:AUC=0.826Random guessAUC=0.500
Figure 2:
Selected receiver operating characteristic (ROC)curves for both binary classification tasks. For clinical vs con-trol classification, TPR indicates correctly classifying a clinicalsubject and FPR indicates falsely classifying a control subjectas clinical. For Sz/Sza vs bipolar classification, TPR is correctlyclassifying an Sz/Sza subject and FPR is falsely classifying abipolar subject as Sz/Sza.
Bayes (NB) classifier were trained in each case using leave-one-out cross validation to determine model parameters and perfor-mance. Then, we further down-selected this set to a group of features and re-evaluated the performance of both classifiers.The confusion matrices for the clinical and control groupclassification task are shown in Table 2a. As we can see, LRwith all selected features works best, with the area undercurve (AUC) in the ROC plot being . (see Figure 2). In thiscase, of ( . ) clinical subjects and of ( . )healthy controls were correctly identified in our leave-one-outevaluation. We also see comparable performance for the NBand LR models when the feature set is reduced to only the top features that model SSPA scores, though AUC is lower thanboth models with features.Next, we consider a classification problem within the groupof clinical subjects, of which are diagnosed with Sz/Szaand 44 are diagnosed with bipolar I disorder. We use the samefeature subsets and same binary classifier models as in the pre-vious task, trained and evaluated using leave-one-out cross-validation. From the confusion matrices in Table 2b, we seethat NB performs better than LR when either a feature or feature subset are used, with the best AUC = 0 . for NBwith 25 features. The ROC curve for a -feature NB classifieris shown in Figure 2. Interestingly, LR with features had thelowest performance on this task (AUC = 0 . ).LR typically performs better than NB when more data isavailable for training [35]; however in clinical applications dataset size is often limited. This makes sense with respect to ourstudy, as the dataset used in the Sz/Sza vs. bipolar I classi-fication problem is smaller than the dataset used in the clini-cal vs control group classification problem. In this case, theLR model is prone to overfitting, as is evident by the fact that Table 2: Confusion matrices for binary classification resultswith logistic regression (LR) and na¨ıve Bayes(NB) classifierswith a feature and feature subset. (a) For clinical vs con-trol classification, LR with features works best at differenti-ating groups. (b) For Sz/Sza vs bipolar classification, LR usinga feature subset works poorly. NB provides more consistentresults, even when the feature set is reduced. (a) Clinical vs Control
True group:
Log. Reg.
Clinical Control25 feat. Clinical 78 3Control 9 19AUC = 0.960Clinical Control15 feat. Clinical 79 10Control 8 12AUC = 0.882
N. Bayes
Clinical Control25 feat. Clinical 73 2Control 14 20AUC = 0.908Clinical Control15 feat. Clinical 76 5Control 11 17AUC = 0.873 (b)
Sz/Sza vs Bip.
True group:Sz/Sza BipolarSz/Sza 30 14Bipolar 13 30AUC = 0.700Sz/Sza BipolarSz/Sza 30 10Bipolar 13 34AUC = 0.796Sz/Sza BipolarSz/Sza 30 11Bipolar 13 33AUC = 0.826Sz/Sza BipolarSz/Sza 31 11Bipolar 12 33AUC = 0.803 performance improves when the feature dimension is reduced.As expected, the classifier performance is considerably worsethan the clinical and control group classification problems, asthe language differences between schizophrenia and bipolar pa-tients are more difficult to distinguish, even for experiencedclinicians. Considering this fact, we still see reasonable per-formance with only computed language measures and no addi-tional clinical assessment.
5. Conclusion
This paper demonstrates the potential of computational linguis-tics to aid neuropsychiatric practice in the clinic. We believeit is critically important to tie computational methods to estab-lished clinical practice in order to bridge the gap between thelatest developments in NLP, which motivated our feature selec-tion using SSPA. Still, there are many directions in which wecan take future work. The sentence embedding and coherencemetrics computed in this study are by no means an exhaustivelist of potential methods, and it is likely a more optimal easilycomputable feature set exists to model SSPA performance andclassify groups of interest. In particular, we are interested infinding a more concise group of clinically relevant language fea-tures with which we can perform this analysis. Additionally, wecan look at more language metrics within each subject group tofurther subtype and cluster individuals within each group basedon language metrics. These methods can also be applied to clin-ical assessments beyond the SSPA tasks and for a wider varietyof psychiatric conditions. Lastly, we would like to examine howclassification and modeling of clinical test scores changes whencomputed features are used in conjunction with other clinicaltests to model task performance and classification of groups. . Acknowledgment
This work was partially funded by a grant from the BoeheringerIngelheim International GmbH to ASU (PI: Berisha).
7. References [1] Center for Behavioral Health Statistics and Quality, “2016 na-tional survey on drug use and health: Methodological summaryand definitions,”
Substance Abuse and Mental Health Services Ad-ministration, Rockville, MD , 2017.[2] P. R. Desai, K. A. Lawson, J. C. Barner, and K. L. Ras-cati, “Estimating the direct and indirect costs for community-dwelling patients with schizophrenia: Schizophrenia-related costsfor community-dwellers,”
Journal of Pharmaceutical Health Ser-vices Research , vol. 4, no. 4, pp. 187–194, Dec. 2013.[3] A. C. Altamura and J. M. Goikolea, “Differential diagnoses andmanagement strategies in patients with schizophrenia and bipolardisorder,”
Neuropsychiatric Disease and Treatment , vol. 4, no. 1,pp. 311–317, Feb. 2008.[4] American Psychiatric Association,
Diagnostic and StatisticalManual of Mental Disorders: DSM-5 , 5th ed. Arlington, VA:American Psychiatric Publishing, 2013.[5] P. R. Montague, R. J. Dolan, K. J. Friston, and P. Dayan, “Compu-tational Psychiatry,”
Trends in Cognitive Sciences , vol. 16, no. 1,pp. 72–80, Jan. 2012.[6] G. A. Cecchi, V. Gurev, S. J. Heisig, R. Norel, I. Rish, and S. R.Schrecke, “Computing the structure of language for neuropsy-chiatric evaluation,”
IBM Journal of Research and Development ,vol. 61, no. 2/3, pp. 1:1–1:10, Mar. 2017.[7] B. Elvev˚ag, P. W. Foltz, D. R. Weinberger, and T. E. Goldberg,“Quantifying incoherence in speech: An automated methodologyand novel application to schizophrenia,”
Schizophrenia Research ,vol. 93, no. 1-3, pp. 304–316, Jul. 2007.[8] G. Bedi, F. Carrillo, G. A. Cecchi, D. F. Slezak, M. Sigman, N. B.Mota, S. Ribeiro, D. C. Javitt, M. Copelli, and C. M. Corcoran,“Automated analysis of free speech predicts psychosis onset inhigh-risk youths,” npj Schizophrenia , vol. 1, p. 15030, 2015.[9] C. M. Corcoran, F. Carrillo, D. Fern´andez-Slezak, G. Bedi,C. Klim, D. C. Javitt, C. E. Bearden, and G. A. Cecchi, “Pre-diction of psychosis across protocols and risk cohorts using au-tomated language analysis,”
World Psychiatry , vol. 17, no. 1, pp.67–75, Feb. 2018.[10] D. Iter, J. Yoon, and D. Jurafsky, “Automatic Detection of Incoher-ent Speech for Diagnosing Schizophrenia,” in
Proceedings of theFifth Workshop on Computational Linguistics and Clinical Psy-chology: From Keyboard to Clinic , 2018, pp. 136–146.[11] I. Sekulic, M. Gjurkovi´c, and J. ˇSnajder, “Not Just Depressed:Bipolar Disorder Prediction on Reddit,” in
Proceedings of the 9thWorkshop on Computational Approaches to Subjectivity, Senti-ment and Social Media Analysis . Association for ComputationalLinguistics, Oct. 2018, pp. 72–78.[12] N. B. Mota, M. Copelli, and S. Ribeiro, “Thought disordermeasured as random speech structure classifies negative symp-toms and schizophrenia diagnosis 6 months in advance,” npjSchizophrenia , vol. 3, no. 1, Dec. 2017.[13] T. K. Landauer and S. T. Dumais, “A solution to Plato’s problem:The latent semantic analysis theory of acquisition, induction, andrepresentation of knowledge.”
Psychological Review , vol. 104,no. 2, pp. 211–240, 1997.[14] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient esti-mation of word representations in vector space,” arXiv preprintarXiv:1301.3781 , 2013.[15] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet alloca-tion,”
Journal of Machine Learning Research , vol. 3, no. Jan, pp.993–1022, 2003. [16] Y. R. Tausczik and J. W. Pennebaker, “The psychological meaningof words: LIWC and computerized text analysis methods,”
Jour-nal of language and social psychology , vol. 29, no. 1, pp. 24–54,2010.[17] K. C. Fraser, J. A. Meltzer, and F. Rudzicz, “Linguistic FeaturesIdentify Alzheimer’s Disease in Narrative Speech,”
Journal ofAlzheimer’s Disease , vol. 49, no. 2, pp. 407–422, Oct. 2015.[18] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng,and C. Potts, “Recursive deep models for semantic composition-ality over a sentiment treebank,” in
Proceedings of the 2013 Con-ference on Empirical Methods in Natural Language Processing ,2013, pp. 1631–1642.[19] E. S. Kayi, M. Diab, L. Pauselli, M. Compton, and G. Copper-smith, “Predictive Linguistic Features of Schizophrenia,” in
Pro-ceedings of the 6th Joint Conference on Lexical and Computa-tional Semantics (* SEM 2017) , 2017, pp. 241–250.[20] M. Mitchell, K. Hollingshead, and G. Coppersmith, “Quantifyingthe language of schizophrenia in social media,” in
Proc.of the 2ndWorkshop on Computational Linguistics and Clinical Psychology:From Linguistic Signal to Clinical Reality , 2015, pp. 11–20.[21] T. L. Patterson, S. Moscona, C. L. McKibbin, K. Davidson, andD. V. Jeste, “Social skills performance assessment among olderpatients with schizophrenia,”
Schizophrenia Research , vol. 48, no.2-3, pp. 351–360, Mar. 2001.[22] M. Tu, V. Berisha, and J. Liss, “Interpretable Objective Assess-ment of Dysarthric Speech Based on Deep Neural Networks,” in
Interspeech 2017 . ISCA, Aug. 2017, pp. 1849–1853.[23] C. R. Bowie, C. Depp, J. A. McGrath, P. Wolyniec, B. T. Maus-bach, M. H. Thornquist, J. Luke, T. L. Patterson, P. D. Harvey,and A. E. Pulver, “Prediction of real-world functional disabilityin chronic mental disorders: A comparison of schizophrenia andbipolar disorder,”
American Journal of Psychiatry , vol. 167, no. 9,pp. 1116–1124, 2010.[24] S. Arora, Y. Liang, and T. Ma, “A Simple but Tough-to-Beat Base-line for Sentence Embeddings,” in
Proc. of 5th International Con-ference on Learning Representations , Toulon, France, 2017, p. 16.[25] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bor-des, “Supervised Learning of Universal Sentence Representationsfrom Natural Language Inference Data,” arXiv:1705.02364 [cs] ,May 2017.[26] M. A. Covington and J. D. McFall, “Cutting the Gordian Knot:The Moving-Average Type–Token Ratio (MATTR),”
Journal ofQuantitative Linguistics , vol. 17, no. 2, pp. 94–100, May 2010.[27] E. Brun´et,
Le Vocabulaire de Jean Giraudoux. Structure et´Evolution.
Slatkine, 1978, no. 1.[28] A. Honor´e, “Some Simple Measures of Richness of Vocabu-lary,”
Association for Literary and Linguistic Computing Bulletin ,vol. 7, no. 2, pp. 172–177, 1979.[29] R. S. Bucks, S. Singh, J. M. Cuerden, and G. K. Wilcock,“Analysis of spontaneous, conversational speech in dementiaof Alzheimer type: Evaluation of an objective technique foranalysing lexical performance,”
Aphasiology , vol. 14, no. 1, pp.71–91, Jan. 2000.[30] V. Johansson, “Lexical Diversity and Lexical Density in Speechand Writing: A Developmental Perspective,”
Working Papers inLinguistics , vol. 53, pp. 61–79, 2009.[31] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of-speech tagging with a cyclic dependency network,” in
Proceedings of the 2003 Conference of the North American Chap-ter of the Association for Computational Linguistics on HumanLanguage Technology - NAACL ’03 , vol. 1. Edmonton, Canada:Association for Computational Linguistics, 2003, pp. 173–180.[32] R. Socher, J. Bauer, C. D. Manning, and A. Y. Ng, “Parsing WithCompositional Vector Grammars,” in
In Proceedings of the ACLConference , 2013.[33] V. H. Yngve, “A Model and an Hypothesis for Language Struc-ture,”
Proceedings of the American Philosophical Society , vol.104, no. 5, 1960.34] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, andI. H. Witten, “The WEKA data mining software: An update,”
ACM SIGKDD Explorations Newsletter , vol. 11, no. 1, pp. 10–18, 2009.[35] A. Y. Ng and M. I. Jordan, “On Discriminative vs. GenerativeClassifiers: A Comparison of Logistic Regression and Na¨ıveBayes,” in