Maarten Sap | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maarten Sap is active.

Explore More

Publication

Featured researches published by Maarten Sap.

Psychological Science | 2015

Psychological Language on Twitter Predicts County-Level Heart Disease Mortality:

Johannes C. Eichstaedt; Hansen Andrew Schwartz; Margaret L. Kern; Gregory Park; Darwin R. Labarthe; Raina M. Merchant; Sneha Jha; Megha Agrawal; Lukasz Dziurzynski; Maarten Sap; Christopher Weeg; Emily E. Larson; Lyle H. Ungar; Martin E. P. Seligman

Hostility and chronic stress are known risk factors for heart disease, but they are costly to assess on a large scale. We used language expressed on Twitter to characterize community-level psychological correlates of age-adjusted mortality from atherosclerotic heart disease (AHD). Language patterns reflecting negative social relationships, disengagement, and negative emotions—especially anger—emerged as risk factors; positive emotions and psychological engagement emerged as protective factors. Most correlations remained significant after controlling for income and education. A cross-sectional regression model based only on Twitter language predicted AHD mortality significantly better than did a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking, diabetes, hypertension, and obesity. Capturing community psychological characteristics through social media is feasible, and these characteristics are strong markers of cardiovascular mortality at the community level.

empirical methods in natural language processing | 2014

Developing Age and Gender Predictive Lexica over Social Media

Maarten Sap; Gregory Park; Johannes C. Eichstaedt; Margaret L. Kern; David Stillwell; Michal Kosinski; Lyle H. Ungar; Hansen Andrew Schwartz

Demographic lexica have potential for widespread use in social science, economic, and business applications. We derive predictive lexica (words and weights) for age and gender using regression and classification models from word usage in Facebook, blog, and Twitter data with associated demographic labels. The lexica, made publicly available,1 achieved state-of-the-art accuracy in language based age and gender prediction over Facebook and Twitter, and were evaluated for generalization across social media genres as well as in limited message situations.

Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality | 2014

Towards Assessing Changes in Degree of Depression through Facebook

H. Andrew Schwartz; Johannes C. Eichstaedt; Margaret L. Kern; Gregory Park; Maarten Sap; David Stillwell; Michal Kosinski; Lyle H. Ungar

Depression is typically diagnosed as being present or absent. However, depression severity is believed to be continuously distributed rather than dichotomous. Severity may vary for a given patient daily and seasonally as a function of many variables ranging from life events to environmental factors. Repeated population-scale assessment of depression through questionnaires is expensive. In this paper we use survey responses and status updates from 28,749 Facebook users to develop a regression model that predicts users’ degree of depression based on their Facebook status updates. Our user-level predictive accuracy is modest, significantly outperforming a baseline of average user sentiment. We use our model to estimate user changes in depression across seasons, and find, consistent with literature, users’ degree of depression most often increases from summer to winter. We then show the potential to study factors driving individuals’ level of depression by looking at its most highly correlated language features.

north american chapter of the association for computational linguistics | 2015

The role of personality, age, and gender in tweeting about mental illness

Daniel Preoţiuc-Pietro; Johannes C. Eichstaedt; Gregory Park; Maarten Sap; Laura Smith; Victoria Tobolsky; H. Andrew Schwartz; Lyle H. Ungar

Mental illnesses, such as depression and post traumatic stress disorder (PTSD), are highly underdiagnosed globally. Populations sharing similar demographics and personality traits are known to be more at risk than others. In this study, we characterise the language use of users disclosing their mental illness on Twitter. Language-derived personality and demographic estimates show surprisingly strong performance in distinguishing users that tweet a diagnosis of depression or PTSD from random controls, reaching an area under the receiveroperating characteristic curve ‐ AUC ‐ of around .8 in all our binary classification tasks. In fact, when distinguishing users disclosing depression from those disclosing PTSD, the single feature of estimated age shows nearly as strong performance (AUC = .806) as using thousands of topics (AUC = .819) or tens of thousands of n-grams (AUC = .812). We also find that differential language analyses, controlled for demographics, recover many symptoms associated with the mental illnesses in the clinical literature.

Journal of Medical Internet Research | 2015

Twitter sentiment predicts Affordable Care Act marketplace enrollment.

Charlene A. Wong; Maarten Sap; Andrew Schwartz; Robert J. Town; Tom Baker; Lyle H. Ungar; Raina M. Merchant

Background Traditional metrics of the impact of the Affordable Care Act (ACA) and health insurance marketplaces in the United States include public opinion polls and marketplace enrollment, which are published with a lag of weeks to months. In this rapidly changing environment, a real-time barometer of public opinion with a mechanism to identify emerging issues would be valuable. Objective We sought to evaluate Twitter’s role as a real-time barometer of public sentiment on the ACA and to determine if Twitter sentiment (the positivity or negativity of tweets) could be predictive of state-level marketplace enrollment. Methods We retrospectively collected 977,303 ACA-related tweets in March 2014 and then tested a correlation of Twitter sentiment with marketplace enrollment by state. Results A 0.10 increase in the sentiment score was associated with an 8.7% increase in enrollment at the state level (95% CI 1.32-16.13; P=.02), a correlation that remained significant when adjusting for state Medicaid expansion (P=.02) or use of a state-based marketplace (P=.03). Conclusions This correlation indicates Twitter’s potential as a real-time monitoring strategy for future marketplace enrollment periods; marketplaces could systematically track Twitter sentiment to more rapidly identify enrollment changes and potentially emerging issues. As a repository of free and accessible consumer-generated opinions, this study reveals a novel role for Twitter in the health policy landscape.

pacific symposium on biocomputing | 2016

PREDICTING INDIVIDUAL WELL-BEING THROUGH THE LANGUAGE OF SOCIAL MEDIA.

Hansen Andrew Schwartz; Maarten Sap; Margaret L. Kern; Johannes C. Eichstaedt; Adam Kapelner; Megha Agrawal; Eduardo Blanco; Lukasz Dziurzynski; Gregory Park; David Stillwell; Michal Kosinski; Martin E. P. Seligman; Lyle H. Ungar

We present the task of predicting individual well-being, as measured by a life satisfaction scale, through the language people use on social media. Well-being, which encompasses much more than emotion and mood, is linked with good mental and physical health. The ability to quickly and accurately assess it can supplement multi-million dollar national surveys as well as promote whole body health. Through crowd-sourced ratings of tweets and Facebook status updates, we create message-level predictive models for multiple components of well-being. However, well-being is ultimately attributed to people, so we perform an additional evaluation at the user-level, finding that a multi-level cascaded model, using both message-level predictions and userlevel features, performs best and outperforms popular lexicon-based happiness models. Finally, we suggest that analyses of language go beyond prediction by identifying the language that characterizes well-being.

north american chapter of the association for computational linguistics | 2015

Mental Illness Detection at the World Well-Being Project for the CLPsych 2015 Shared Task

Daniel Preoţiuc-Pietro; Maarten Sap; H. Andrew Schwartz; Lyle H. Ungar

This article is a system description and report on the submission of the World Well-Being Project from the University of Pennsylvania in the ‘CLPsych 2015’ shared task. The goal of the shared task was to automatically determine Twitter users who self-reported having one of two mental illnesses: post traumatic stress disorder (PTSD) and depression. Our system employs user metadata and textual features derived from Twitter posts. To reduce the feature space and avoid data sparsity, we consider several word clustering approaches. We explore the use of linear classifiers based on different feature sets as well as a combination use a linear ensemble. This method is agnostic of illness specific features, such as lists of medicines, thus making it readily applicable in other scenarios. Our approach ranked second in all tasks on average precision and showed best results at .1 false positive rates.

conference on computational natural language learning | 2017

The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task

Roy Schwartz; Maarten Sap; Ioannis Konstas; Leila Zilles; Yejin Choi; Noah A. Smith

A writers style depends not just on personal traits but also on her intent and mental state. In this paper, we show how variants of the same writing task can lead to measurable differences in writing style. We present a case study based on the story cloze task (Mostafazadeh et al., 2016a), where annotators were assigned similar writing tasks with different constraints: (1) writing an entire story, (2) adding a story ending for a given story context, and (3) adding an incoherent ending to a story. We show that a simple linear classifier informed by stylistic features is able to successfully distinguish among the three cases, without even looking at the story context. In addition, combining our stylistic features with language model predictions reaches state of the art performance on the story cloze challenge. Our results demonstrate that different task framings can dramatically affect the way people write.

Journal of Personality | 2017

Living in the Past, Present, and Future: Measuring Temporal Orientation with Language.

Gregory Park; H. Andrew Schwartz; Maarten Sap; Margaret L. Kern; Evan Weingarten; Johannes C. Eichstaedt; Jonah Berger; David Stillwell; Michal Kosinski; Lyle H. Ungar; Martin E. P. Seligman

Temporal orientation refers to individual differences in the relative emphasis one places on the past, present, or future, and it is related to academic, financial, and health outcomes. We propose and evaluate a method for automatically measuring temporal orientation through language expressed on social media. Judges rated the temporal orientation of 4,302 social media messages. We trained a classifier based on these ratings, which could accurately predict the temporal orientation of new messages in a separate validation set (accuracy/mean sensitivity = .72; mean specificity = .77). We used the classifier to automatically classify 1.3 million messages written by 5,372 participants (50% female; ages 13-48). Finally, we tested whether individual differences in past, present, and future orientation differentially related to gender, age, Big Five personality, satisfaction with life, and depressive symptoms. Temporal orientations exhibit several expected correlations with age, gender, and Big Five personality. More future-oriented people were older, more likely to be female, more conscientious, less impulsive, less depressed, and more satisfied with life; present orientation showed the opposite pattern. Language-based assessments can complement and extend existing measures of temporal orientation, providing an alternative approach and additional insights into language and personality relationships.

north american chapter of the association for computational linguistics | 2015

Extracting Human Temporal Orientation from Facebook Language.

H. Andrew Schwartz; Gregory Park; Maarten Sap; Evan Weingarten; Johannes C. Eichstaedt; Margaret L. Kern; David Stillwell; Michal Kosinski; Jonah Berger; Martin E. P. Seligman; Lyle H. Ungar

People vary widely in their temporal orientation—how often they emphasize the past, present, and future—and this affects their finances, health, and happiness. Traditionally, temporal orientation has been assessed by self-report questionnaires. In this paper, we develop a novel behavior-based assessment using human language on Facebook. We first create a past, present, and future message classifier, engineering features and evaluating a variety of classification techniques. Our message classifier achieves an accuracy of 71.8%, compared with 52.8% from the most frequent class and 58.6% from a model based entirely on time expression features. We quantify a users’ overall temporal orientation based on their distribution of messages and validate it against known human correlates: conscientiousness, age, and gender. We then explore social scientific questions, finding novel associations with the factors openness to experience, satisfaction with life, depression, IQ, and one’s number of friends. Further, demonstrating how one can track orientation over time, we find differences in future orientation around birthdays.

Explore More