[PDF] Social media mining for identification and exploration of health-related information from pregnant women

Abstract

Widespread use of social media has led to the generation of substantial amounts of information about individuals, including health-related information. Social media provides the opportunity to study health-related information about selected population groups who may be of interest for a particular study. In this paper, we explore the possibility of utilizing social media to perform targeted data collection and analysis from a particular population group -- pregnant women. We hypothesize that we can use social media to identify cohorts of pregnant women and follow them over time to analyze crucial health-related information. To identify potentially pregnant women, we employ simple rule-based searches that attempt to detect pregnancy announcements with moderate precision. To further filter out false positives and noise, we employ a supervised classifier using a small number of hand-annotated data. We then collect their posts over time to create longitudinal health timelines and attempt to divide the timelines into different pregnancy trimesters. Finally, we assess the usefulness of the timelines by performing a preliminary analysis to estimate drug intake patterns of our cohort at different trimesters. Our rule-based cohort identification technique collected 53,820 users over thirty months from Twitter. Our pregnancy announcement classification technique achieved an F-measure of 0.81 for the pregnancy class, resulting in 34,895 user timelines. Analysis of the timelines revealed that pertinent health-related information, such as drug-intake and adverse reactions can be mined from the data. Our approach to using user timelines in this fashion has produced very encouraging results and can be employed for other important tasks where cohorts, for which health-related information may not be available from other sources, are required to be followed over time to derive population-based estimates.

Full PDF

aa r X i v : . [ c s . C L ] F e b Social media mining for identiﬁcation and exploration ofhealth-related information from pregnant women

Pramod Chandrashekar

Department of BiomedicalInformaticsArizona State UniversityTempe, AZ, USA [email protected]

Arjun Magge

Department of BiomedicalInformaticsArizona State UniversityTempe, AZ, USA [email protected]

Abeed Sarker

Department of Biostatisticsand EpidemiologyUniversity of PennsylvaniaPhiladelphia, PA, USA [email protected]

Graciela Gonzalez

Department of Biostatisticsand EpidemiologyUniversity of PennsylvaniaPhiladelphia, PA, USA [email protected]

ABSTRACT

Widespread use of social media has led to the generationof substantial amounts of information about individuals, in-cluding health-related information. Thus, social media pro-vides the opportunity to study health-related informationabout selected population groups who may be of interestfor a particular study. In this paper, we explore the pos-sibility of utilizing social media data to perform targeteddata collection and analysis from a particular populationgroup— pregnant women. We hypothesize that we can usesocial media to identify cohorts of pregnant women and fol-low them over time to analyze crucial health-related infor-mation. To identify potentially pregnant women, we employsimple rule-based searches that attempt to detect pregnancyannouncements with moderate precision. To further ﬁlterout false positives and noise, we employ a supervised classi-ﬁer using a small number of hand-annotated data. Follow-ing the identiﬁcation of a reasonably sized cohort, we collecttheir posts over time to create longitudinal health timelinesand attempt to divide the timelines into diﬀerent pregnancytrimesters. Finally, we assess the usefulness of the timelinesby performing a preliminary analysis to estimate drug intakepatterns of our cohort at diﬀerent trimesters. Our rule-basedcohort identiﬁcation technique collected 53,820 users overthirty months from Twitter. Our pregnancy announcementclassiﬁcation technique achieved an F-measure of 0.81 for thepregnancy class, resulting in 34,895 user timelines. Analy-sis of the timelines revealed that pertinent health-relatedinformation, such as drug-intake and adverse reactions canbe mined from the data. Our approach to using user time-lines in this fashion has produced very encouraging resultsand can be employed for an array of other important taskswhere cohorts, for which health-related information may notbe available from other sources, are required to be followedover time to derive population-based estimates.

1. INTRODUCTION

Pre-market clinical trials assess the safety of drugs/medications(we use the terms interchangeably in this paper) in lim-ited settings, and so the eﬀects of those drugs on particular cohorts ( e.g. , pregnant women, children, or people suﬀer-ing from speciﬁc conditions) cannot be assessed. Sponta-neous reporting systems, such as the FDA Adverse EventReporting System (FAERS), are used for post-marketingdrug safety surveillance and they provide a mechanism forreporting adverse events associated with medication con-sumption. Although these sources may accumulate drugsafety knowledge about speciﬁc population groups, studieshave shown that they suﬀer from various problems, suchas under-reporting [15]. To overcome these problems, addi-tional sources of information are being actively utilized forpharmacovigilance tasks. Studies have shown that 26% ofonline adults discuss health information using social media[7], with approximately 90% women using online media forhealth-care information, and 60% using pregnancy relatedapps for support. These statistics suggest that social mediasources may contain key information regarding speciﬁc co-horts, such as pregnant women, and their drug usage habits.Although consuming drugs during pregnancy is not rec-ommended by doctors worldwide, their usage is common-place. For example, during pregnancy, women continue tak-ing prescription drugs for ailments which preceded the preg-nancy. For common health problems like heartburn, com-mon cold and body pains, women tend to take over-the-counter medicines which may cause harm to the fetus. Pastresearch has also indicated that 50% of the pregnancies inthe United States are unintended [11]. In such cases, thefetus may be exposed to drugs without the mother’s explicitknowledge. Currently, the U.S FDA maintains a list of preg-nancy exposure registries that collect health information onexposure to medical products during pregnancy. Such reg-istries require pregnant women to voluntarily sign up, andhence, they suﬀer from low enrollment and follow-up rates[35]. Considering the fact that infant mortality rates areestimated to be at 5.96 deaths per 1,000 live births [1], andthat the causes of 50% of these birth defects are unknown[22], identifying and utilizing additional sources for moni-toring health information of pregnant women, such as social Despite the noisy nature of data on Twit-ter, because of the high volume and frequency, it is an at-tractive resource for big data mining tasks. In addition towidely used social networks like Twitter, there are also on-line health communities , which facilitate health-related in-formation sharing over the Internet. One such online healthcommunity is DailyStrength , which has over 400,000 mem-bers engaging in discussions among its 500+ groups. In con-trast to tweets, the posts in online health forums like Dai-lyStrength have no strict constraints on word counts. Thelanguage used is more formal and the availability of domain-speciﬁc discussion forums increase the chances of ﬁndingrelevant medical information from discussions [28]. Thus,Twitter and DailyStrength present quite diﬀerent types ofsocial media chatter, with the data from the latter beingsigniﬁcantly lower in terms of both volume and noise. Boththese data sources carry health-related knowledge expressedby various cohorts but require customized techniques formining the knowledge encapsulated. Given the limited amount of information that is avail-able about pregnant women during pre-market clinical tri-als, there is a need to explore additional resources of in-formation. The presence of large amounts of social mediadata, which hold crucial health-related information, presentsa strong motivation for developing frameworks for mininglongitudinal information from this resource. Based on thesemotivations, the goals of this paper are as follows: • Develop natural language processing (NLP), machinelearning, and information retrieval (IR) methods foraccurately identifying a cohort of pregnant women andcollecting their social media timelines. • Perform preliminary analyses of the extracted healthtimelines to assess their usefulness, identify limitations,and establish future research goals.The main contributions of the paper are as follows: • We present a framework by which social media datacan be used to identify and collect information aboutpregnant women. • We show that health timelines collected from socialmedia contain crucial health-related information, whichmay be used in longitudinal studies. • We discuss techniques for further dividing the time-lines into pregnancy trimesters and verify that trimester-speciﬁc information can also be mined from the time-lines. • We discuss the current limitations of our novel ideaand outline future directions https://about.twitter.com/company. Accessed on:03/03/2016.

2. RELATED WORK

Research work most closely related to ours is in the do-main of pharmacovigilance from social media, although, tothe best of our knowledge, no past research has attemptedto identify and follow longitudinal cohort information fromthis domain. Most of the research in pharmacovigilance anddrug safety surveillance has focused on identifying adversereactions associated with medications. Some past researchhas attempted to employ classiﬁcation techniques to deter-mine adverse drug reaction (ADR) assertive posts. For thesetasks, two primary techniques have been attempted: lexicon-based classiﬁcation and supervised classiﬁcation. In lexicon-based classiﬁcations [25, 19] a given text is classiﬁed as hav-ing an ADR if it meets a set of speciﬁed lexical rules. Super-vised classiﬁcation techniques [33, 6] involve training classi-ﬁers using features from annotated data (used as trainingdata) to automatically make classiﬁcation decisions on testdata based on observed probabilities in the training data.Due to the advances in natural language processing (NLP)and data science techniques, social media has recently beenused for a variety of public health monitoring tasks in ad-dition to pharmacovigilance [30]. These include monitoringthe patterns of inﬂuenza [4], tracking tropical diseases likedengue fever [13], and analyzing disease outbreaks such asE. coli [10] and Ebola [27]. In behavioral medicine research,social media has been used to study users’ lifestyle and an-alyzing the health-related choices they make. Researchershave used social media to study nutrition [34] and obesitypatterns [23]. Applications also include analyzing alcohol[3], nicotine [31], and drug abuse [12].There has been someresearch in timeline creation [20] and event extractions fromtimelines [36, 21, 8] for speciﬁc events. However, little ef-fort has been invested in generating the summary of health-related data.Only a handful of studies has attempted to predict preg-nancy outcomes using quantitative data. Ines Banjari etal. [5] used clustering on a collection of questionnaire re-sults accompanied by blood samples of 222 pregnant womenwho were in the ﬁrst trimester. The authors performed hi-erarchical clustering considering three main features namelypre-pregnancy BMI, their age, and hemoglobin content. Viacluster analysis, the authors found that women with higherpre-pregnancy BMI and age have higher risks of complica-tions during pregnancy. Laopaiboon et al. [18] studied theeﬀect of maternal age and pregnancy outcome using healthrecords of 308,149 singleton pregnant women. They useda multilevel, multivariate logistic regression with cluster-ing technique to perform the study and found that 12.3%of these women had advanced maternal age (AMA) whichvaried across countries. Von Mandach et al. [37] studied202 fetal disorders from Swiss ADR database using recordslassiﬁed by regional pharmacovigilance centers as havingADRs. They performed a likelihood ratio and t-test andfound that fetal disorders were closely associated with theADRs of drugs they consumed. All these pregnancy-relatedstudies have involved data sources from clinical records, re-ports, hospital patient data which often is expensive to ob-tain. Also, little information is available on lifestyle habitsand drug usage after the patient’s exit the medical facili-ties. Hence, social media and health forums are potentiallyattractive sources for extracting health information, drugusage patterns and their eﬀects. Small samples of socialmedia data have been used for performing pregnancy-relatedstudies— such as the work by De Choudhury et al. [9], where376 women were monitored to predict postpartum changes.Automatically collecting and processing large samples of so-cial media data, however, presents signiﬁcant challenges dueto the lack of structure and use of informal language [32].

3. METHOD

Figure 1 gives a detailed illustration of our proposed sys-tem, which is broadly divided into three main steps: DataCollection and Classiﬁcation, User Health Timeline Extrac-tion, and Timeline Analysis. For data collection, we discusshow cohort timelines can be collected from the diﬀering in-terfaces of Twitter and DailyStrength. For the last step ofthe analysis, we show how the timelines can be divided intopregnancy trimesters so that trimester-speciﬁc information,such as drug usage, can be further analyzed. Each of thesesteps is detailed in the following subsections.

Twitter and DailyStrength are the sources of our datafor this study. We collected tweets originating from womenannouncing their pregnancy during a thirty-month time pe-riod, from January 2014 to September 2016. To identifyour initial set of potentially pregnant women, we appliedsimple search expressions of the forms “i’m * weeks/monthspregnant” and “i am * weeks/months pregnant” , with mi-nor variations adding to 18 queries. DailyStrength has avery diﬀerent structure compared to Twitter, and the web-site is divided into individual forums for speciﬁc cohorts.We obtained our data from ﬁve forums on DailyStrength(Pregnancy, Pregnancy After Loss Or Infertility, PregnancyTeens, Stillbirth, and Miscarriage). We collected all theposts from all the users from these forums.Due to the usage of relatively formal language and lownoise, posts from DailyStrength are not subjected to pre-processing. In contrast, tweets contain an approximatelyequal share of useful information and noise in them. Hence,the tweets are pre-processed by removing URLs, user han-dles, emoticons, and stopwords. In the case of DailyStrength,because we collect posts from pregnancy-related forums wemake the safe assumption that all users posting in the fo-rums are currently pregnant or have been pregnant in thepast. However, for Twitter data, manual inspection of asmall sample of tweets revealed that approximately 35-40%of them were false positives ( i.e. , posts that did not presentpersonal admissions of pregnancy). Therefore, prior to col-lecting the timelines of the users making the announcement,we employ an automatic text classiﬁcation technique to fur-ther ﬁlter out noisy tweets. We manually annotated 1200randomly selected tweets (approximately 2% of all the col-lected tweets) mentioning pregnancy announcements into isPreg (legitimate) and notPreg (not legitimate) classes. Some examples of pregnancy announcements and their an-notations are shown in Table 1. The annotations were per-formed by two annotators, and the inter-annotator agree-ment (IAA) for was κ = 0 .

79, which is regarded as sub-stantial agreement [17]. Disagreements were resolved by thethird author of this paper, who reviewed the disagreementcases and performed the annotations independently. Amongthe 1200 tweets, 753 tweets were classiﬁed as isPreg and 447were classiﬁed as notPreg.

Table 1: Sample tweets showing annotations forpregnancy announcements.

Tweet Classiﬁcation “I honestly still can’t believe I’m almost 5months pregnant. Like wut.” isPreg “I can’t do this anymore. I work my assoﬀ, and I’m eight months pregnant.” isPreg “I’m 18 weeks pregnant today and my 21stbirthday is tomorrow. It’s a good day” isPreg “I’m 21 weeks, 5 days pregnant. Or, asI like to think of it: 128 days out fromreclaiming my spot as my liquor store’sfavorite customer. ” isPreg “It’s like just yesterday I was was at thedoctor being told I was 14 weeks & 5 dayspregnant .. now I’m 27 weeks & 3 days” isPreg “I hate how bloated I get when I’m on myperiod, like I look like I’m 3 months preg-nant” notPreg “NOW. WAIT. ONE. MINUTE! I’mcatching up on notPreg “I hate that I look at least 4 months preg-nant every time I eat something wtf” notPreg “Girls will be two days pregnant alreadyposting pictures talking bout “I’m gettingbig.”” notPreg “My sister is ﬁve weeks and three dayspregnant. I’m going to be an auntie ohmy god”” notPregUsing the annotated data, we perform supervised classi-ﬁcation of the tweets. We employ a variant of an exist-ing social media text classiﬁcation system [33], which wereoriginally designed for adverse drug reaction detection. Pastresearch on social media text classiﬁcation suggests that aneﬀective mechanism for classifying short Twitter posts is togenerate large numbers of semantic features to balance thesparse word n-gram vectors. Therefore, for the classiﬁer,we primarily remove adverse drug reaction speciﬁc features,keep the domain-independent features, and add some addi-tional features. We brieﬂy discuss some of the features inthe following paragraphs. This the dataset will be made available with the ﬁnal ver-sion of the paper. Source code for the classiﬁcation system is available at:https://bitbucket.org/asarker/adrbinaryclassiﬁer. Accessedon: 10/10/2016. igure 1: System architecture depicting our social media mining pipeline collecting and analyzing cohortsfrom social media.

N-grams and synsets

Word n-grams are the most common text classiﬁcation fea-tures, consisting of sequences of contiguous n words in a textsegment. We preprocess the texts by performing stemmingand lowercasing, and use 1-, 2-, and 3-grams as features.In addition to the words themselves, we use their syn-onyms in some cases to increase vocabulary coverage. Foreach adjective, noun or verb in a tweet, we use WordNet. to identify the synonyms of that term and add the synony-mous terms as features. Sentiment representing features

Our inspections of the announcements suggest that usersgenerally express strong sentiments when making pregnancyannouncements. So, we add features that express the sen-timents of the users in various scales. We assign three setsof scores to sentences based on three diﬀerent measures ofsentiment. The ﬁrst set of scores are derived from lists ofpositive and negative terms [16], the second set of scores aredependent on the prior polarities of terms present in a post[14], and the third set of scores are derived from a subjec-tivity lexicon that presents both polarity and subjectivity[38].

Word clusters

Recent research on social media based text classiﬁcation sug-gests that using generalized representations of words, suchas clusters of similar words, may improve performance [26].In our work, we use the clusters generated by Owoputi etal. [29]. The authors generate the clusters by ﬁrst learn-ing vector representations of words [24] from over 56 milliontweets, and then employing a Hidden Markov Model-basedalgorithm that partitions words into a base set of 1000 clus-ters, and induces a hierarchy among those 1000 clusters. To generate features from these clusters, for each tweet,we identify the cluster number of each token, and use all thecluster numbers associated with a tweet in a bag-of-wordsmanner. Thus, every tweet is represented by a set of clusternumbers, with semantically similar tokens having the samecluster number. http://wordnet.princeton.edu/. Accessed on: 01/05/2016. Classiﬁcation

Using these features, we trained Support Vector Machine(SVM) classiﬁers for the classiﬁcation task. We used an RBFkernel, and we optimized the value of the cost parameter via10-fold cross-validation over the 1200 annotated posts. Weobtained the best results with cost=64.0, and we used thissetting to classify all the identiﬁed tweets in our collection.Results of the classiﬁcation are presented in the next section,including classiﬁcation performance and the number posts(Table 2).

After the classiﬁcation step, all the handles of the usersclassiﬁed to be legitimately pregnant are identiﬁed, and weattempt to collect their other posts using the Twitter API.We index all the posts into Apache Lucene for further anal-ysis. Using the API, we collect all the user posts that areavailable from the past ( i.e. , up to the limit allowed by Twit-ter) and sort them in chronological order, and we continuecollecting tweets over time to monitor future health-relatedevents.For DailyStrength, however, since the forums we chose areall pregnancy-related, it is assumed that all users posting inthese forums are/have been pregnant at some point duringtheir membership to the website. The users can post acrossdiﬀerent forums on the website which can include interest-ing information such as drug intake admissions and adverseevents. Hence, for each user posting a comment in one ofthe pregnancy related forums, we collect the user’s posts inall available forums to construct their timelines. Finally, weindex each individual timeline into Apache Lucene with thefollowing ﬁelds: userid, time, text, and trimester (if avail-able) for further processing. Using the collected timelines, we attempt to explore ifand how health-related events can be clustered into coarse-grained temporal windows. The duration of a pregnancymay be divided into three trimesters: ﬁrst– week 1 throughweek 12, second– week 13 through week 27, and third– week28 through birth. To successfully identify the trimester asso-ciated with a posted health-related event, information about We used the LibSVM implementation packaged withthe python scikit-learn implementation: http://scikit-learn.org/. Accessed on: 10/11/2016. https://lucene.apache.org/. Accessed on: 11/23/2015.he pregnancy start date is required. Via our manual inspec-tions of the timelines, we discovered that pregnant motherswho announce their pregnancies over Twitter also often pro-vide clues about the progress of the pregnancies. Considerthe tweets below: Oh well managed 8 out of 10 combat tracks, notbad at 28 weeks pregnant with the ﬂu but stilldisappointing

The ﬁrst tweet was posted during the third trimester andthe second tweet was posted during the second trimester ofpregnancy. Using this information, and the timestamp ofthe tweets, all the posts within a timeline can be groupedinto the three trimesters. The key NLP challenge in thisproblem is to detect the statements regarding the progressof the pregnancies.We use a combination of term and pattern matching al-gorithms to detect these trimester identiﬁers in each time-line. However, for Twitter, due to the 3200 tweet limitationenforced by the API, not all timelines that are extractedhave all the tweets posted during the pregnancy time pe-riod. In our current algorithm, we ﬁrst attempt to identifyall tweets that mention the terms ‘ pregnant ’ and ‘ pregnancy ’(seed word). Next, terms within a speciﬁed context windowof the seed word are collected. Based on the empirical as-sessment, we settled for symmetric context window of size6 terms. Within the context window, the algorithm thensearches for key temporal terms such as ‘ week ’ and ‘ month ’,along with the presence of a number mention ( e.g. , six, 12,eighteen and so on). The number, along with the other men-tioned terms are extracted and compared to the timestampof the associated tweet to identify the trimester.Following the organization of the timelines into trimesters,we assessed, in a preliminary fashion, if trimester-speciﬁchealth events can be collected for further analysis. De-pending on the intent of a study, the type of informationthat requires mining may vary, and detailed trimester-basedhealth-related event analysis is outside the scope of this pa-per. Therefore, we simply focused on generating frequenciesof the drugs that are mentioned at each trimester to makerough estimates about the drug usage patterns of the cohortat each phase. We perform a keyword search for each drugin Apache Lucene to obtain the drug mentions by users.Here, we make an assumption that all drug mentions areadmissions of drug intake by the user. We query our Luceneindex, and, for each drug, compute the number of users whohave consumed it. The goal was to ascertain if a drug-usageinformation is available, rather than to perform a thoroughanalysis, which we leave as future work. Distributions of thedrug mentions are presented in next section.

4. RESULTS AND DISCUSSIONS

The performance of our classiﬁer was evaluated via 10-fold cross-validation, and the best results obtained are pre-sented in Table 2. We compared the performance of theSVM to that of a Na¨ıve Bayes baseline, which obtained anF-measure of 0.70 for the isPreg class. Figure 2 shows theROC curves for each of the 10-folds of cross-validation, in-cluding the mean ROC for the positive class. The area underthe mean ROC curve is 0.82. Running the SVM classiﬁers

Figure 2: ROC curves for each fold of the 10-foldcross-validation, and the mean. on our collected data resulted in the discovery of 34,895 le-gitimate pregnant women from a total of 53,820 users.

Table 2: Results from tweet classiﬁcation for legiti-mate pregnancy announcements using SVM

Classiﬁcation Result Precision Recall F-measureisPreg 0.83 0.79 0.81notPreg 0.84 0.77 0.80We applied our pregnancy trimester extraction algorithmon the 34,895 user timelines classiﬁed as legitimate pregnantusers. Our algorithm detected pregnancy time-period for15,523 (approximately 45%) users and was able to furthercategorize each tweet belonging to these timelines into oneof the three trimesters. The remaining user handles werediscarded from the analysis performed in the rest of thispaper. We were able to collect over 30 Million tweets fromthese 15,523 users. Table 3 showcases a user timeline withthe pregnancy trimester details and health-related tweets ineach of the three trimesters.We observe that the timeline contains health-related in-formation such as drug intakes (in rows 1, 12, 20, 21, 22)and conditions/events ( e.g. , rows 1, 2, 4, etc.). We alsonotice that a large proportion of drug and condition men-tions happen to be ﬁrst-hand experiences. However, not allmentions of drugs are intakes (rows 24 and 25) and not alldrug intakes are drug intakes by the user (rows 11 and 14).Similarly, not all conditions mentioned in the tweets are ex-perienced by the user (rows 5 and 11). Mining drug intakeand events are important in pharmacovigilance research fortracking ADRs. Hence, accurately distinguishing personaldrug intake and events from mentions is an important NLPchallenge that we intend to address in the future.Trimester detection adds a very interesting NLP chal-lenge. While some tweets are relatively easy to detect andwere successfully processed by our rule-based algorithm, wefound some that were missed or mis-classiﬁed. Consider thefollowing Tweets, for example:

I b getting so much pressure next week is gone igure 3: Distributions of top 10 drug mentions within the timelines across the three trimesters. b my last week pregnant who want to make a betlol It is crazy to me that I am only 3 days past13 weeks pregnant.

Our approach currently fails to detect the ﬁrst tweet andmis-classiﬁes the second tweet as ﬁrst trimester instead ofsecond. We leave the optimization of our detection algo-rithm as future work.Figure 3 shows the distribution of popular drug mentionsacross the pregnancy trimesters for Twitter users. Even themost common drugs were mentioned by less than 0.5% of theusers and the proportion of actual intakes may be lower. Forinstance, ibuprofen was one of the most common drugs men-tioned in user timelines and it was mentioned by 76 uniqueusers in their ﬁrst trimester, 72 in their second, and 90 intheir third. For a collection of more 15,000 user timelines,we ﬁnd this proportion of mentions to be low for extensiveanalysis and hence we intend to expand our search termsand algorithms for cohort selection in the future.For DailyStrength, our timeline collection approach re-trieved a total of 257,531 posts from 11,435. In contrast totweets, which are restricted to 140 characters, DailyStrengthposts are longer. Thus, tracking the progress of pregnan-cies from their announcements to derive trimester informa-tion requires further NLP research. Discovering drug in-take, however, is similar, and we ﬁnd that common drugmentions within the user timelines in DailyStrength includedrugs such as folic acid, aspirin, zoloft and tylenol.

5. LIMITATIONS AND FUTURE WORK

We intend to build on this preliminary work in severalkey areas. Employing more sophisticated outcome detec-tion techniques is an important future goal of this study.From the NLP perspective, our technique does not take intoaccount tense ( e.g. , past/present) and so the chronologicalorder of posts may not represent the chronological orderingof events. Also, no mechanism is applied for gender detec-tion among pregnancy announcement tweets, although ourself-admission classiﬁer does attempt to ensure that usersincluded in the cohort are genuinely pregnant themselves.Among other things, we intend to expand the drug listby including misspellings, spelling variations, phonetic vari-ations and abbreviations of each drug. Similar to drug usage pattern extraction, we could use a disease and disorder ex-traction method to classify mentions of diseases which wouldexplain the reason why certain individuals consume a partic-ular drug. As mentioned already in the paper, our trimesterdetection technique is currently not optimal, and we willimprove it via the addition of more rules. Since the perfor-mance and eﬀectiveness of the downstream applications andanalyses depend heavily on the data collection and classiﬁca-tion steps, our immediate focus will be to improve these. Wewill employ more queries to signiﬁcantly increase the size ofthe cohort, and improve the performance of the classiﬁcationstep via the annotation of a much larger data set and theapplication of more sophisticated classiﬁcation techniques.Ensembles of classiﬁers have been shown to perform par-ticularly well for complex text classiﬁcation tasks [2], andwe will attempt to develop such systems with the view ofmaximizing recall while maintaining high precision.

6. CONCLUSION

In this paper, we presented the novel idea of collectinglongitudinal health-related information about targeted co-horts from social media. We focused on the cohort of preg-nant women in this study— a group that is not includedin pre-market clinical trials. We presented a pipeline whichincludes three stages— identiﬁcation of cohort, collection,and analysis. We discovered that large numbers of pregnantwomen can be identiﬁed with high-precision via a combi-nation of rule-based and machine learning techniques. Wediscussed how health-related timelines can be gathered fromtwo diﬀerent social networks. Finally, we showed how tem-poral categorization of the timeline may be performed, andwe veriﬁed that trimester-speciﬁc health-related informationcan be mined from the pre-processed timelines.We discussed several limitations of our work, which willbe addressed in future research. Crucially, while we focusedsolely on one cohort, our pipeline can be generalized forother population groups as well. This form of analysis maybe particularly useful for population groups about whomdata may not be available from other sources. In addition,social media may reveal information that people may notgenerally share via other means ( e.g. , drug abuse/usage ofillicit drugs). The results obtained by our current work arevery promising and warrant future research. . REFERENCES [1] Cdc deaths in 2013. , 2013.[2] Automatic evidence quality prediction to supportevidence-based decision making.

Artiﬁcial Intelligencein Medicine , 64(2):89 – 103, 2015.[3] Y. Aphinyanaphongs, B. Ray, A. Statnikov, andP. Krebs. Text classiﬁcation for automatic detection ofalcohol use-related tweets. In

International Workshopon Issues and Challenges in Social Computing , 2014.[4] E. Aramaki, S. Maskawa, and M. Morita. Twittercatches the ﬂu: detecting inﬂuenza epidemics usingtwitter. In

Proceedings of the conference on empiricalmethods in natural language processing , pages1568–1576. Association for Computational Linguistics,2011.[5] I. Banjari, D. Kenjeri´c, K. ˇSoli´c, and M. L. Mandi´c.Cluster analysis as a prediction tool for pregnancyoutcomes.

Collegium Antropologicum , 39(1), 2015.[6] J. Bian, U. Topaloglu, and F. Yu. Towards large-scaletwitter mining for drug-related adverse events. In

Proceedings of the 2012 international workshop onSmart health and wellbeing , pages 25–32. ACM, 2012.[7] BusinessWire. Twenty six percent of online adultsdiscuss health information online. ,2012.[8] S. Choudhury and H. Alani. Personal life eventdetection from social media. 2014.[9] M. De Choudhury, S. Counts, and E. Horvitz.Predicting postpartum changes in emotion andbehavior via social media. In

Proceedings of theSIGCHI Conference on Human Factors in ComputingSystems , pages 3267–3276. ACM, 2013.[10] E. Diaz-Aviles and A. Stewart. Tracking twitter forepidemic intelligence: case study: Ehec/hus outbreakin germany, 2011. In

Proceedings of the 4th AnnualACM Web Science Conference , pages 82–85. ACM,2012.[11] L. B. Finer and S. K. Henshaw. Disparities in rates ofunintended pregnancy in the united states, 1994 and2001.

Perspectives on sexual and reproductive health ,pages 90–96, 2006.[12] N. Genes. Twitter discussions of nonmedicalprescription drug use correlate with federal surveydata. In

Medicine 2.0 Conference . JMIR PublicationsInc., Toronto, Canada, 2014.[13] J. Gomide, A. Veloso, W. Meira Jr, V. Almeida,F. Benevenuto, F. Ferraz, and M. Teixeira. Denguesurveillance based on a computational model ofspatio-temporal locality of twitter. In

Proceedings ofthe 3rd international web science conference , page 3.ACM, 2011.[14] M. Guerini, L. Gatti, and M. Turchi. Sentimentanalysis: How to derive prior polarities fromsentiwordnet. arXiv preprint arXiv:1309.5843 , 2013.[15] R. Harpaz, W. DuMouchel, N. H. Shah, D. Madigan,P. Ryan, and C. Friedman. Novel data-miningmethodologies for adverse drug event discovery andanalysis.

Clinical Pharmacology & Therapeutics ,91(6):1010–1021, 2012. [16] M. Hu and B. Liu. Mining and summarizing customerreviews. In

Proceedings of the tenth ACM SIGKDDinternational conference on Knowledge discovery anddata mining , pages 168–177. ACM, 2004.[17] J. R. Landis and G. G. Koch. The measurement ofobserver agreement for categorical data. biometrics ,pages 159–174, 1977.[18] M. Laopaiboon, P. Lumbiganon, N. Intarut, R. Mori,T. Ganchimeg, J. Vogel, J. Souza, and A. G¨ulmezoglu.Advanced maternal age and pregnancy outcomes: amulticountry assessment.

BJOG: An InternationalJournal of Obstetrics & Gynaecology , 121(s1):49–56,2014.[19] R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah,J. Yang, and G. Gonzalez. Towards internet-agepharmacovigilance: extracting adverse drug reactionsfrom user posts to health-related social networks. In

Proceedings of the 2010 workshop on biomedicalnatural language processing , pages 117–125.Association for Computational Linguistics, 2010.[20] J. Li and C. Cardie. Timeline generation: Trackingindividuals on twitter. In

Proceedings of the 23rdinternational conference on World wide web , pages643–652. ACM, 2014.[21] J. Li, A. Ritter, C. Cardie, and E. H. Hovy. Major lifeevent extraction from twitter based oncongratulations/condolences speech acts. In

EMNLP ,pages 1997–2007, 2014.[22] I. Lobo and K. Zhaurova. Birth defects: causes andstatistics.

Nature Education , 1(1):18, 2008.[23] Y. Mejova, H. Haddadi, A. Noulas, and I. Weber.

Proceedings of the 5th International Conference onDigital Health 2015 , pages 51–58. ACM, 2015.[24] T. Mikolov, K. Chen, G. Corrado, and J. Dean.Eﬃcient estimation of word representations in vectorspace. arXiv preprint arXiv:1301.3781 , 2013.[25] A. Nikfarjam and G. H. Gonzalez. Pattern mining forextraction of mentions of adverse drug reactions fromuser comments. In

AMIA Annual SymposiumProceedings , volume 2011, page 1019. AmericanMedical Informatics Association, 2011.[26] A. Nikfarjam, A. Sarker, K. O’Connor, R. Ginn, andG. Gonzalez. Pharmacovigilance from social media:mining adverse drug reaction mentions using sequencelabeling with word embedding cluster features.

Journal of the American Medical InformaticsAssociation , 22(3):671–681, 2015.[27] M. Odlum. How twitter can support early warningsystems in ebola outbreak surveillance. In . APHA, 2015.[28] A. C. O’Higgins. A survey of the use of social mediaby women for pregnancy. In

Medicine 2.0 Conference .JMIR Publications Inc., Toronto, Canada, 2013.[29] O. Owoputi, B. O’Connor, C. Dyer, K. Gimpel, andN. Schneider. Part-of-speech tagging for twitter: Wordclusters and other advances.

School of ComputerScience , 2012.[30] M. J. Paul, A. Sarker, J. S. Brownstein, A. Nikfarjam,M. Scotch, K. L. Smith, and G. Gonzalez. Socialmining for public health monitoring and surveillance. aciﬁc Symposium of Biocomputing , 21:468–479, 2016.[31] K. W. Prier, M. S. Smith, C. Giraud-Carrier, andC. L. Hanson. Identifying health-related topics ontwitter. In

Social computing, behavioral-culturalmodeling and prediction , pages 18–25. Springer, 2011.[32] A. Sarker, R. Ginn, A. Nikfarjam, K. O’Connor,K. Smith, S. Jayaraman, T. Upadhaya, andG. Gonzalez. Utilizing social media data forpharmacovigilance: A review.

Journal of biomedicalinformatics , 54:202–212, 2015.[33] A. Sarker and G. Gonzalez. Portable automatic textclassiﬁcation for adverse drug reaction detection viamulti-corpus training.

Journal of biomedicalinformatics , 53:196–207, 2015.[34] S. Sharma and M. De Choudhury. Detecting andcharacterizing nutritional information of food andingestion content in instagram.

Proc. WWWCompanion , 2015.[35] S. Sinclair, M. Cunnington, J. Messenheimer, J. Weil,J. Cragan, R. Lowensohn, M. Yerby, and P. Tennis.Advantages and problems with pregnancy registries:observations and surprises throughout the life of theinternational lamotrigine pregnancy registry.

Pharmacoepidemiology and drug safety , 23(8):779–786,2014.[36] M. Wen, Z. Zheng, H. Jang, G. Xiang, and C. P. Ros´e.Extracting events with informal temporal references inpersonal histories in online communities. In

ACL (2) ,pages 836–842, 2013.[37] C. Wettach, J. Thomann, C. Lambrigger-Steiner,T. Buclin, J. Desmeules, and U. von Mandach.Pharmacovigilance in pregnancy: adverse drugreactions associated with fetal disorders.

Journal ofperinatal medicine , 41(3):301–307, 2013.[38] J. Wiebe, T. Wilson, and C. Cardie. Annotatingexpressions of opinions and emotions in language.

Language resources and evaluation , 39(2-3):165–210,2005. able 3: Excerpts from a Twitter user’s health timeline in reverse chronological order

No Tweet Trimester Category1 “God bless Zantac for helping me not want to throw up from the insane heartburn I’vebeen having. third drug, condition2 “Ugh. Awful dreams; was obviously clenching my teeth all night - woke up with sorejaw; a headache. third condition3 “Went to the chiropractor today and my lower back actually feels worse right now. third condition4 “If I don’t drink Powerade before bed, I get awful leg cramps. If I do, I get awfulheartburn. third condition5 “Betting I’ll be back at Dr. AnonX’s again real soon. Anonymized has a 101.8 degreefever. Why do my kids get sick so much?! third condition6 “W/ the leg pain from the INSANE cramp I had this AM; the pelvic pain from howbaby’s laying, the waddle is strong today. third condition7 “Fell asleep around 10:30. Just woke up sweating and uncomfortable. It’s too hot inhere to sleep now. third condition8 “Lunch was absolutely AMAZING (Tuscan chicken; artichoke soup from @anonymized)but now I have the worst heartburn. third condition9 “Baby’s moving around like crazy; the pain in my side/back is FINALLY letting up.RELIEF! third condition10 “My pelvis hurts so freaking bad right now and I still have 14 weeks to go til my duedate. second condition11 “At the doctor with Anon again. Fever was 102 this AM, gave her Tylenol, then up to104 after her nap. Her only complaint - being cold.” second drug, condition12 “@anonymized Taking multivitamin gummy in AM, calcium + D in afternoon, prenatalat bedtime....just like I always have.” second drug13 “Well this is a new one - my Vitamin D level is actually too HIGH. Now OB wants meto see bariatric doc/nutritionist again.” second drug14 “It’ll be another sleepless night checking on Anon periodically. 104.8 degree fever earlier,Motrin brought it down to 100.1” second drug15 “So help me if I’m getting ANOTHER cold I’m gonna be pissed. Scratchy sore throatand runny nose all of a sudden.” second condition16 “Slim chance it’s the start of appendicitis. If I get a fever, nausea/vomiting/diarrhea,;pain is worse, I need to come back ASAP.” second condition17 “Weak gag reﬂex + coughing + snot = disaster waiting to happen. I just want to gohome, crawl in bed, and sleep until this cold is gone.” second condition18 “Not bring able to take anything for this stupid cold is awful. So miserable. second condition19 “It’s amazing how a headache, raging hormones; lack of sleep make you want to stuﬀ apillow in the face of a snoring husband. second condition20 “For the second time in a row, Tylenol PM has left me wide awake at 3AM after passingout for a whopping 5 hours. second drug21 “I love the Olympics! Too bad I just took some Tylenol PM that’s starting to kick in soI won’t be watching much longer tonight.” second drug22 “AnonT stayed home from work today on daddy duty so I just popped some Tylenol PMand I’m sleeping this cold away. ﬁrst drug23 “My throat is dry; sore from breathing through my mouth but breathing through my noseisn’t possible right now. ﬁrst condition24 “Woke up to anonymized crying around 5:30; a skull-knocker headache. And guesswhat...no Tylenol except for PM stuﬀ. ﬁrst drug25 “Been sleeping awful so AnonT said to take Tylenol PM; get a good nights sleep. Comeup to bed; discover we’re out of Tylenol PM. ﬁrst drug, condition26 “Finally start to feel drowsy so I try to sleep, but then I get all twitchy and can’t laystill. ﬁrst condition27 “Pseudo gallbladder attacks in the middle of the night are awesome, said no one ever. ﬁrst condition28 “Just turned my head and something popped in my neck the wrong way. Big ouch. Needto hit up the chiropractor on my way home. ﬁrst condition29 “Gotta pack up the kids; head to MQT. Woke up with a skull knocker headache so methinks it’s time for a back; neck cracking. “Gotta pack up the kids; head to MQT. Woke up with a skull knocker headache so methinks it’s time for a back; neck cracking.