H. Andrew Schwartz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where H. Andrew Schwartz is active.

Explore More

Publication

Featured researches published by H. Andrew Schwartz.

Journal of Personality and Social Psychology | 2015

Automatic personality assessment through social media language.

Gregory Park; H. Andrew Schwartz; Johannes C. Eichstaedt; Margaret L. Kern; Michal Kosinski; David Stillwell; Lyle H. Ungar; Martin E. P. Seligman

Language use is a psychologically rich, stable individual difference with well-established correlations to personality. We describe a method for assessing personality using an open-vocabulary analysis of language from social media. We compiled the written language from 66,732 Facebook users and their questionnaire-based self-reported Big Five personality traits, and then we built a predictive model of personality based on their language. We used this model to predict the 5 personality factors in a separate sample of 4,824 Facebook users, examining (a) convergence with self-reports of personality at the domain- and facet-level; (b) discriminant validity between predictions of distinct traits; (c) agreement with informant reports of personality; (d) patterns of correlations with external criteria (e.g., number of friends, political attitudes, impulsiveness); and (e) test-retest reliability over 6-month intervals. Results indicated that language-based assessments can constitute valid personality measures: they agreed with self-reports and informant reports of personality, added incremental validity over informant reports, adequately discriminated between traits, exhibited patterns of correlations with external criteria similar to those found with self-reported personality, and were stable over 6-month intervals. Analysis of predictive language can provide rich portraits of the mental life associated with traits. This approach can complement and extend traditional methods, providing researchers with an additional measure that can quickly and cheaply assess large groups of participants with minimal burden.

Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality | 2014

Towards Assessing Changes in Degree of Depression through Facebook

H. Andrew Schwartz; Johannes C. Eichstaedt; Margaret L. Kern; Gregory Park; Maarten Sap; David Stillwell; Michal Kosinski; Lyle H. Ungar

Depression is typically diagnosed as being present or absent. However, depression severity is believed to be continuously distributed rather than dichotomous. Severity may vary for a given patient daily and seasonally as a function of many variables ranging from life events to environmental factors. Repeated population-scale assessment of depression through questionnaires is expensive. In this paper we use survey responses and status updates from 28,749 Facebook users to develop a regression model that predicts users’ degree of depression based on their Facebook status updates. Our user-level predictive accuracy is modest, significantly outperforming a baseline of average user sentiment. We use our model to estimate user changes in depression across seasons, and find, consistent with literature, users’ degree of depression most often increases from summer to winter. We then show the potential to study factors driving individuals’ level of depression by looking at its most highly correlated language features.

Assessment | 2014

The Online Social Self An Open Vocabulary Approach to Personality

Margaret L. Kern; Johannes C. Eichstaedt; H. Andrew Schwartz; Lukasz Dziurzynski; Lyle H. Ungar; David Stillwell; Michal Kosinski; Stephanie M. Ramones; Martin E. P. Seligman

Objective: We present a new open language analysis approach that identifies and visually summarizes the dominant naturally occurring words and phrases that most distinguished each Big Five personality trait. Method: Using millions of posts from 69,792 Facebook users, we examined the correlation of personality traits with online word usage. Our analysis method consists of feature extraction, correlational analysis, and visualization. Results: The distinguishing words and phrases were face valid and provide insight into processes that underlie the Big Five traits. Conclusion: Open-ended data driven exploration of large datasets combined with established psychological theory and measures offers new tools to further understand the human psyche.

Annals of The American Academy of Political and Social Science | 2015

Data-Driven Content Analysis of Social Media A Systematic Overview of Automated Methods

H. Andrew Schwartz; Lyle H. Ungar

Researchers have long measured people’s thoughts, feelings, and personalities using carefully designed survey questions, which are often given to a relatively small number of volunteers. The proliferation of social media, such as Twitter and Facebook, offers alternative measurement approaches: automatic content coding at unprecedented scales and the statistical power to do open-vocabulary exploratory analysis. We describe a range of automatic and partially automatic content analysis techniques and illustrate how their use on social media generates insights into subjective well-being, health, gender differences, and personality.

Developmental Psychology | 2014

From "Sooo Excited!!!" to "So Proud": Using Language to Study Development.

Margaret L. Kern; Johannes C. Eichstaedt; H. Andrew Schwartz; Gregory Park; Lyle H. Ungar; David Stillwell; Michal Kosinski; Lukasz Dziurzynski; Martin E. P. Seligman

We introduce a new method, differential language analysis (DLA), for studying human development in which computational linguistics are used to analyze the big data available through online social media in light of psychological theory. Our open vocabulary DLA approach finds words, phrases, and topics that distinguish groups of people based on 1 or more characteristics. Using a data set of over 70,000 Facebook users, we identify how word and topic use vary as a function of age and compile cohort specific words and phrases into visual summaries that are face valid and intuitively meaningful. We demonstrate how this methodology can be used to test developmental hypotheses, using the aging positivity effect (Carstensen & Mikels, 2005) as an example. While in this study we focused primarily on common trends across age-related cohorts, the same methodology can be used to explore heterogeneity within developmental stages or to explore other characteristics that differentiate groups of people. Our comprehensive list of words and topics is available on our web site for deeper exploration by the research community.

north american chapter of the association for computational linguistics | 2015

The role of personality, age, and gender in tweeting about mental illness

Daniel Preoţiuc-Pietro; Johannes C. Eichstaedt; Gregory Park; Maarten Sap; Laura Smith; Victoria Tobolsky; H. Andrew Schwartz; Lyle H. Ungar

Mental illnesses, such as depression and post traumatic stress disorder (PTSD), are highly underdiagnosed globally. Populations sharing similar demographics and personality traits are known to be more at risk than others. In this study, we characterise the language use of users disclosing their mental illness on Twitter. Language-derived personality and demographic estimates show surprisingly strong performance in distinguishing users that tweet a diagnosis of depression or PTSD from random controls, reaching an area under the receiveroperating characteristic curve ‐ AUC ‐ of around .8 in all our binary classification tasks. In fact, when distinguishing users disclosing depression from those disclosing PTSD, the single feature of estimated age shows nearly as strong performance (AUC = .806) as using thousands of topics (AUC = .819) or tens of thousands of n-grams (AUC = .812). We also find that differential language analyses, controlled for demographics, recover many symptoms associated with the mental illnesses in the clinical literature.

Journal of Medical Internet Research | 2016

Seeing the “Big” Picture: Big Data Methods for Exploring Relationships Between Usage, Language, and Outcome in Internet Intervention Data

Jordan Carpenter; Patrick Crutchley; Ran D Zilca; H. Andrew Schwartz; Laura Smith; Angela M Cobb; Acacia Parks

Background Assessing the efficacy of Internet interventions that are already in the market introduces both challenges and opportunities. While vast, often unprecedented amounts of data may be available (hundreds of thousands, and sometimes millions of participants with high dimensions of assessed variables), the data are observational in nature, are partly unstructured (eg, free text, images, sensor data), do not include a natural control group to be used for comparison, and typically exhibit high attrition rates. New approaches are therefore needed to use these existing data and derive new insights that can augment traditional smaller-group randomized controlled trials. Objective Our objective was to demonstrate how emerging big data approaches can help explore questions about the effectiveness and process of an Internet well-being intervention. Methods We drew data from the user base of a well-being website and app called Happify. To explore effectiveness, multilevel models focusing on within-person variation explored whether greater usage predicted higher well-being in a sample of 152,747 users. In addition, to explore the underlying processes that accompany improvement, we analyzed language for 10,818 users who had a sufficient volume of free-text response and timespan of platform usage. A topic model constructed from this free text provided language-based correlates of individual user improvement in outcome measures, providing insights into the beneficial underlying processes experienced by users. Results On a measure of positive emotion, the average user improved 1.38 points per week (SE 0.01, t122,455=113.60, P<.001, 95% CI 1.36–1.41), about a 27% increase over 8 weeks. Within a given individual user, more usage predicted more positive emotion and less usage predicted less positive emotion (estimate 0.09, SE 0.01, t6047=9.15, P=.001, 95% CI .07–.12). This estimate predicted that a given user would report positive emotion 1.26 points higher after a 2-week period when they used Happify daily than during a week when they didn’t use it at all. Among highly engaged users, 200 automatically clustered topics showed a significant (corrected P<.001) effect on change in well-being over time, illustrating which topics may be more beneficial than others when engaging with the interventions. In particular, topics that are related to addressing negative thoughts and feelings were correlated with improvement over time. Conclusions Using observational analyses on naturalistic big data, we can explore the relationship between usage and well-being among people using an Internet well-being intervention and provide new insights into the underlying mechanisms that accompany it. By leveraging big data to power these new types of analyses, we can explore the workings of an intervention from new angles, and harness the insights that surface to feed back into the intervention and improve it further in the future.

north american chapter of the association for computational linguistics | 2015

Mental Illness Detection at the World Well-Being Project for the CLPsych 2015 Shared Task

Daniel Preoţiuc-Pietro; Maarten Sap; H. Andrew Schwartz; Lyle H. Ungar

This article is a system description and report on the submission of the World Well-Being Project from the University of Pennsylvania in the ‘CLPsych 2015’ shared task. The goal of the shared task was to automatically determine Twitter users who self-reported having one of two mental illnesses: post traumatic stress disorder (PTSD) and depression. Our system employs user metadata and textual features derived from Twitter posts. To reduce the feature space and avoid data sparsity, we consider several word clustering approaches. We explore the use of linear classifiers based on different feature sets as well as a combination use a linear ensemble. This method is agnostic of illness specific features, such as lists of medicines, thus making it readily applicable in other scenarios. Our approach ranked second in all tasks on average precision and showed best results at .1 false positive rates.

Psychological Science | 2017

Birds of a Feather Do Flock Together

Wu Youyou; David Stillwell; H. Andrew Schwartz; Michal Kosinski

Friends and spouses tend to be similar in a broad range of characteristics, such as age, educational level, race, religion, attitudes, and general intelligence. Surprisingly, little evidence has been found for similarity in personality—one of the most fundamental psychological constructs. We argue that the lack of evidence for personality similarity stems from the tendency of individuals to make personality judgments relative to a salient comparison group, rather than in absolute terms (i.e., the reference-group effect), when responding to the self-report and peer-report questionnaires commonly used in personality research. We employed two behavior-based personality measures to circumvent the reference-group effect. The results based on large samples provide evidence for personality similarity between romantic partners (n = 1,101; rs = .20–.47) and between friends (n = 46,483; rs = .12–.31). We discuss the practical and methodological implications of the findings.

north american chapter of the association for computational linguistics | 2016

Modelling Valence and Arousal in Facebook posts

Daniel Preoţiuc-Pietro; H. Andrew Schwartz; Gregory Park; Johannes C. Eichstaedt; Margaret L. Kern; Lyle H. Ungar; Elisabeth Shulman

Access to expressions of subjective personal posts increased with the popularity of Social Media. However, most of the work in sentiment analysis focuses on predicting only valence from text and usually targeted at a product, rather than affective states. In this paper, we introduce a new data set of 2895 Social Media posts rated by two psychologicallytrained annotators on two separate ordinal nine-point scales. These scales represent valence (or sentiment) and arousal (or intensity), which defines each post’s position on the circumplex model of affect, a well-established system for describing emotional states (Russell, 1980; Posner et al., 2005). The data set is used to train prediction models for each of the two dimensions from text which achieve high predictive accuracy – correlated at r = .65 with valence and r = .85 with arousal annotations. Our data set offers a building block to a deeper study of personal affect as expressed in social media. This can be used in applications such as mental illness detection or in automated large-scale psychological studies.

Explore More