Causal Feature Selection for Individual Characteristics Prediction
CCausal Feature Selection for Individual Characteristics Prediction
Tao Ding ∗ , Cheng Zhang † , and Maarten Bos † ∗ Department of Information SystemsUniversity of Maryland, Baltimore County [email protected] ∗ † Disney Research,Pittsburgh, USA { cheng.zhang,mbos } @disneyresearch.com Abstract
People can be characterized by their demographic in-formation and personality traits. Characterizing peopleaccurately can help predict their preferences, and aidrecommendations and advertising. A growing numberof studies infer peoples characteristics from behavioraldata. However, context factors make behavioral datanoisy, making these data harder to use for predictiveanalytics. In this paper, we demonstrate how to em-ploy causal identification on feature selection and howto predict individuals’ characteristics based on these se-lected features. We use visitors’ choice data from a largetheme park, combined with personality measurements,to investigate the causal relationship between visitorscharacteristics and their choices in the park. We demon-strate the benefit of feature selection based on causalidentification in a supervised prediction task for individ-ual characteristics. Based on our evaluation, our modelsthat trained with features selected based on causal iden-tification outperformed existing methods.
Introduction
Understanding an individual’s characteristics is useful formany real life applications. The term individual characteris-tics refers to individual differences in characteristic patternsof thinking (Widiger 2011), and includes both demograph-ics and personality. Knowing someone’s individual charac-teristics can help to understand that person’s preferences,which has important applications ranging from health care(Smith and Spiro III 2002; Giota and Kleftaras 2013) tomarketing (Spence et al. et al. et al. et al. et al. et al. ∗ This work was accomplished during the period of the first au-thor working as research intern in Disney Research.
Figure 1: An exemple of individual characteristics predic-tion using theme park activities. The activities are caused byboth the target individual’s preferences and the contextualsituation, which in this case is the accompanying child. Toseparate the signal from noise for the individual characteris-tics prediction task, we propose to use a causal identificationmethod to determine the informative activity for this task.a person’s behavior and preferences from individual charac-teristics.Personality information can be arduous to obtain. A tra-ditional approach to measuring personality requires partici-pants to take a psychological test (e.g. filling out question-naires), which is time-consuming and difficult to scale up. Itis therefore desirable to circumvent this testing process andinstead directly predict individual characteristics based onreadily available observational data.A growing proportion of human activities, such as con-suming entertainment, shopping, and social interactions, arenow mediated by digital services and devices. These digitalfootprints of consumer behaviors can be recorded and ana-lyzed. Understandably, there is an interest in automaticallypredicting individual characteristics including age, gender,income and personality traits from these digital footprints.In this work, we focus on predicting individual characteris-tics from real-life experiences in theme parks.This task is challenging because behavioral data arenoisy. Noisiness can be caused by different factors, suchas that while people move in groups, we are trying to pre- a r X i v : . [ c s . S I] J a n ict personality at an individual level (Elliot et al. et al. Background and Related Works
We aim to predict individual characteristics from real-lifetheme park visits data with causal identification for featureselection. Therefore we first explain why individual charac-teristics are interesting and important. Then we summarizerelated work on individual characteristics prediction usingvarious type of data. Finally we discuss research on causalidentification and causal inference.
Individual characteristics
We define individual charac-teristics as both demographic information and personalitytraits. In the context of our application, demographic infor-mation includes age, income, and the number of kids in-cluded in a theme park visit; personality traits are describedusing the well established Big5 personality model (Goldberg1993). The Big5 personality model (Goldberg 1993) repre-sents personality with scores on five personality traits. Thesetraits are openness , conscientiousness , extraversion , agree-ableness and neuroticism . Table 1 shows a description of thetraits. The Big5 model is widely used to represent a person’spersonality.There is a rich body of work in behavioral science on therelationship between humans’ individual characteristics andtheir real-world behaviors. Table 1: Big Five personality model (Ding and Pan 2016) Personality Description
Openness A person’s level of intellectual cu-riosity, creativity and preference fornovelty and variety.Conscientiousness A person’s tendency to be orga-nized and dependable, show self-discipline, act dutifully, and preferplanned rather than spontaneous be-havior.Extraversion A person’s energy, positivity, as-sertiveness, sociability, talkative-ness, and tendency to seek stimula-tion from the company of others.Agreeableness A person’s tendency to be com-passionate and cooperative. Also ameasure of one’s trustingness, help-fulness, and well-tempered nature.Neuroticism A person’s tendency to experi-ence unpleasant emotions easily,and have low emotional stabilityand impulse control.One example study showed that age, level of educa-tion, and income among 250 hotel restaurant customers wascorrelated with complaint behavior (Sujithamrak and Lam2005). Another study revealed a relationship between smok-ing behavior and demographic variables (Moody 1980). Be-sides demographics, personality traits also predict behav-ior. One study found that smokers score higher on opennessto experience and lower on conscientiousness, a personal-ity trait related to a tendency to show self-discipline, actdutifully, and aim for achievement (Campbell et al. et al. et al.
Machine Learning for Individual Characteristics Predic-tion
Using machine learning methods to predict individualcharacteristics from behaviors has increasingly gained atten-tion. Ideally, various types of real-life behavior data wouldbe used to predict individual characteristics. However, exist-ing research is mainly limited to the use of social networkdata to predict individual characteristics.There have been several studies to show that users’ dig-ital footprints on social networks can be used to infertheir demographics and personalities (Farnadi et al.
The Distribution of Big 5 Personality traits. (a) Agreeableness (b) Conscientiousness (c) Extraversion (d) Neuroticism (e) Openness
Mislove et al. et al. et al. et al. et al. et al. et al. α (cid:80) i =1 n | w i | to theloss function (L1-norm), which forces weak features to havezeros as coefficients. This is also called LASSO regression(Tibshirani 1996). This inherently creates feature selection.Another method is correlation based feature selection (CFS)which filters the features by performing feature selectionbased on correlation analysis (Yu and Liu 2003). However,both these methods do not provide interpretable results andlack principled justification. Causal Identification
A fundamental assumption for re-search on individual characteristics prediction is the causalrelationship between the individual characteristics and be-haviors. Additionally, prediction is only possible if thetarget is the cause of the data (effect) (Sch¨olkopf et al. et al. et al.
Item Data Type Representation
Gender Categorical 0: male 1: fe-maleAge Categorical 17-78Total visits Numerical 0-99Vacation days Numerical 0-99Income Numerical 1-14Trip cost Numerical 0-9999
Dataset
The data for our study were collected from visitors of a largetheme park resort. The resort has over 30 hotels, more than100 restaurants and hundreds of attractions. We assume in-dividual characteristics affect the choices visitors make. Weasked visitors’ permission to use their choice and locationhistory. From visitors who agreed to this we retrieved the lo-cations they visited, including restaurants, stands and kiosks,attractions&rides, stores and other entertainment. Participat-ing visitors also allowed us access to data such as their parkentry time, hotel check-in time, purchases, and other meta-data such as length of stay, trip cost, and which of the parksthey visited first (see Table 2).Visitors who participated in our study were asked to fillout a questionnaire, in which they were asked about the pre-viously mentioned experiences at the resort, as well as de-mographical questions and the Ten-Item Personlity Inven-tory (TIPI). The TIPI was developed as a shortened versionof a 50-item personality scale and is often used to mea-sure Big5 personality traits. The TIPI has 10 items, withigure 3:
The Distribution of Demographics (a) Gender (b) Age (c) two items measuring each of the personality traits. The TIPIuses a 7-point Likert scale from 1 (“Strongly Agree”) to 7(“Strongly Disagree”). For every participant, we average thescores of the items associated with the personality to achievefive personality scores.A total of 3997 visitors participated. After filtering out theresponses with missing values, we collected the individualcharacteristics of 3293 participants. Figure 3 show the dis-tribution of demographics. In the remaining data 31% of aremales and 69% are females. The average age is 44 years old.The average number of children in the group is 1.2. Figure2 show the distribution of big5 personalities. The range ofthe scores are between [1,7]. Our participants experienced atotal of 505 different things in the park.The objective of the study is to measure individual dif-ferences based on visitors’ experiences and choices at thetheme park resort. As mentioned, studies that analyze peo-ple’s digital footprints generally have an advantage in thatthey are more likely to reflect the choices of the individualfilling out the personality survey (such as Facebook likes,publishing/commenting etc.). Most people do not visit thetheme parks alone (the average group size is larger than 3in our dataset). We have to assume that in many cases par-ents make decisions based on their children’s preferences.That makes it difficult to link individual characteristics tothe choices made by the parents. In the following section,we employ causal inference to capture effects caused by in-dividual characteristics in order to reduce others’ impacts indecision making.
Causal Identification
Causal identification aims at inferring causal relationshipsfrom observational data (Pearl 2009; Imbens and Rubin2015). Many observed correlations in observational data aremediated through unobserved confounding variables. Thegoal of causal identification is to distinguish between suchmediated correlations and truly causal relationships.The basis of our causal identification analysis is aBayesian network. This is a directed acyclic graph (DAG)in which nodes represent variables and arrows betweenthe nodes represent the direction of causation between thenodes. The problem is that we only observe the nodes; theseare our observations (location data, metadata, and the Big5scores). We want to infer arrows that best explain the ob-served correlations. This is done in such a way that the num-ber of causal arrows between the nodes is, in some sense, minimal to be consistent with the observed correlations.Having identified the causal structure, we exclude all pre-dictor variables which are not causally associated with thetarget variables of interest. This is the central idea of howwe propose to select features.There are two main classes of algorithms to learn causalDAGs from observational data: constraint-based and score-based ones. Constraint-based methods use independenceand dependence constraints obtained from statistical teststo narrow down the candidate graphs that explain the data.Our first proposed method relies on using the PC algorithm(Spirtes and Meek 1995) for feature selection; it belongs tothe first class. The algorithm begins by learning an undi-rected Bayesian network that explains the variations by run-ning conditional independence tests. The second phase con-sists of orienting the Bayesian network, thereby avoiding cy-cles and v-structures. We use a significant level of 0.05 in theindependent test.Score-based methods, on the other hand, provide a metricof confidence in the entire output model. The algorithm weuse in this paper is
Fast Greedy Equivalence Search (FGES),which greedily searches over Bayesian network structures(Ramsey 2015), and outputs the highest scoring model itfinds. The score which the algorithm uses is the BayesianInformation Criterion (BIC) (Raftery 1995):
BIC = 2 · ln P ( data | θ, M ) − c · k · ln( n ) . Above, M denotes the Bayesian network, θ denotes its pa-rameters, k is the total number of parameters, and n the num-ber of data points. The constant c is a free parameter that wecan tune. This constant penalizes large numbers of parame-ters and thus determines the complexity of the network; wechose c = 0 . to obtain a comparable number of featuresbetween PC and FGES. The FGES performs a forward step-ping search in which edges are added between nodes in orderto increase the BIC until no single edge addition increasesthe score. Then, it performs a backward stepping search inwhich unnecessary edges are removed.Both the PC and FGES algorithms are used in this paperfor feature selection and compared against the LASSO ap-proach in our experimental section. The L1 term is set as0.01 to obtain a comparable number of features. Predictive Evaluation
We implemented different prediction models to infer indi-vidual characteristics from location history and metadata.The experiments are designed to answer the following ques-tions: a) Can the visitors’ choices infer individual character-istics? b) Is there any benefit of performing feature selectionaccording to causal identification? c) Is metadata informa-tive to predict individual characteristics? To answer (a), wedenote the visitors’ choice history as a binary representationof a fixed size vector, 1 represents that the visitor visited aplace. We use LASSO (Least Absolute Shrinkage and Selec-tion Operator)(Tibshirani 1996) linear regression to performpredictive tasks for continuous outcomes like age, income, We tried different values of c = 1 , . , . , without noticinga significant change in predictive performance. igure 4: Stability of Feature Selection for Big5 Personality. (a) Agreeableness (b) Conscientiousness (c) Extraversion (d) Neuroticism (e) Openness
Table 3: Root mean square error (RMSE) and coefficient of determination ( R ) results for the personality prediction usingvisitors’ choices and metadataApproach Agr Cons Extr Neu Open R RMSE R RMSE R RMSE R RMSE R RMSE
Visitors’ choices
LASSO -0.090 0.171 -0.085 0.175 -0.087 0.242 -0.121 0.192 -0.065 0.171PC 1.143
FGES
LASSO 5.942 0.166 1.152 0.174 -0.338 0.242 2.109
FGES
The five personality are
Extraversion(Extr) , Agreeableness(Agr) , Conscientiousness(Cons) , Neuroticism(Neu) ,and
Openness(Open) . LASSO: α = 0 . . PC: p = 0 . . FGES: c = 0 . .the number of children and personality. LASSO is a regres-sion method that automatically performs both feature selec-tion and regression. All results are based on 10-fold crossvalidation to avoid overfitting. To answer (b), we run PCand FGES to search for causal explanations on training datafor each fold. We train predictive models by using effects offeatures caused by specific characteristics and predict indi-vidual characteristics on the test set. To answer (c), we addmetadata in models trained in (a) and (b) to see whether theperformance improves.To identify causal relationships, we employ a Tetrad(Spirtes et al. analysis to manipulate and individu-ally study the different individual characteristics. Tetrad pro-vides different causal search algorithms to search when theremay be unobserved cofounders of measured variables andoutput graphical representations (Scheines et al. c in the BICformula, is set to 0.1 in order to get a similar number of fea-tures as the PC algorithm can get from training data. Stability Test
To compare the stability of feature selectionmethods, we calculated the selection probability for each feature in the Big5 personality prediction task under boot-strapping, following (Mandt et al. Predictive Performance
To evaluate performance, forcontinuous outcomes, we evaluate the results based on rootmean squared error ( RMSE ) and
Co-efficient of Determina-tion ( R ). RM SE measures the difference between valuesthe model predicted and the observed values.
RM SE canbe described by the following formula:
RM SE = (cid:115) (cid:80) nt =1 ( y tobs − y tpred ) n where y tobs and y tpred are the observed and predicted scoresfor instance t , and n is the sample size. R is the ratio ofable 4: Root mean square error (RMSE) and coefficient of determination ( R ) results for the demographics using visitors’choices and metadata Approach Age R RMSE R RMSE R RMSE
Visitors’ choices
LASSO 2.294 0.146 -16.53 0.153 -0.352 0.256PC 13.675
FGES
LASSO 8.432 0.141 -2.386 0.143 1.160 0.255PC 21.684
FGES
LASSO: α = 0 . . PC: p = 0 . . FGES: c = 0 . . the model’s absolute error and the baseline mean predictedscores. It is expressed as: R = 100 × (cid:32) − (cid:80) nt =1 ( y tobs − y tpred ) (cid:80) nt =1 ( y tobs − ¯ y obs ) (cid:33) Above, ¯ y obs = n (cid:80) nt =1 y tobs is the mean of the observedscores. R contains the ratio of the variance of the predic-tion over the empirical variance of the scores. If this ratiois small, the prediction is accurate and R is large. If theratio approaches , then R approaches . Negative valuesindicate that the prediction is not reliable.Usually, the metadata are informative for individual char-acteristics, e.g. the trip cost is correlated with income andthe number of children; gender is correlated with person-ality. To examine the effectiveness of metadata in predic-tion performance, we have two datasets in the experiments:one includes only the visitors’ choices, while the other oneincludes both the visitors’ choices and metadata. Table 3shows the prediction performance of the Big5. The LASSOmodel without metadata shows visitors’ choices did not cap-ture enough predictive information, because all R score arelower than 0. Besides, We also compare prediction perfor-mance of three feature selection methods. After employingcausal identification in feature selection, the performanceis higher than using LASSO. The highest R score is ob-tained from FGES in predicting extraversion ( R = 2 . ).The lowest RMSE is obtained from PC and FGES modelin openness prediction. When the metadata are added in themodels, the overall performance of all models improves, e.g.the performance of predicting agreeableness increased from R = 1 . to R = 6 . which is the best performance forthe Big5.Table 4 shows the prediction performance of demographicinformation. Among the results of three outcomes (age,the number of children and income), visitors’ choices aremore helpful to predict age ( R = 2 . ) than to predictthe number of children( R = − . ) and income( R = − . ) in Lasso models. This means the visitors’ choicesare more informative to infer age. Meanwhile, after meta-data are included in the model, the overall performance im-proves. In general, the R results of causal identification show the models outperform the constant average baseline.The FGES model for predicting age and the number of chil-dren achieved best performance with R = 13 . and R = 0 . . The PC model for predicting income achievedbest performance with R = 8 . .In summary, there are two findings from predictive evalu-ation: 1) we prove the effectiveness of the metadata in pre-dicting individual characteristics; 2) the features selectedbased on causal identification can improve the predictionperformance of individual characteristics. Causal Relationship Analysis
In addition to building models that predict individual char-acteristics, we are also interested in understanding the causalrelationship between a visitor’s characteristics and theirchoices. The Tetrad method specifies causal relations amongthe variables via a representation of a directed graph. Theedge X → Y can be interpreted as X has a direct causaleffect on B. We extract causal relationships between person-ality traits and visitors’ choices with metadata from CBN,which are showed in table 5. Within the 505 locations, agree-ableness and neuroticism have direct causal effects only on5 of the guests’ choices. Agreeableness links to some spe-cific quick food restaurants and some gift shops (we omitactual names of these locations). People who score higheragreeableness tend to visit popular parks (Park A and Parkb) which are visited by most people. People who scorehigher on neuroticism link to some family style restaurantsand dining events, which mean they tend to enjoy the com-fort associated with environment. Conscientiousness has di-rect causal effects on 10 locations, most are facilities lo-cated in a specific hotel area of the theme park resort area,which means that conscientious people may be more likelyto spend time near their hotel. Openness has direct causaleffects on 10 location visits including thrill rides, quick foodrestaurants, and indoor theatres. Comparing with other per-sonality traits, people who score higher openness link tomost thrill rides with big drop, which mean they enjoy newexperiences and seek out adventure. Extraversion has directeffects on visits of many different places (13 places, locatedin 6 different parks).able 5: Identified items that are caused by personality found by PC algorithm. Each [.] present a facility or a meta feature whichare used as features. Apart from the personal meta data, only the facility ids are used in our method. Due to confidentiality, wedo not list the names of the facilities; we only show the metadata of the facilities to interpret the results. The metadata of thefacilities themselves are not used in the analysis. The blue colored features indicate that the causal relationships are mutual. Trait Services/Metadata
Agr [Park A: Restaurants, Quick Service], [Park B: Toys, Apparel, Accessories], [Park A: Housewares,Food], [Park A: Gift shop], [Park B: Housewares, Apparel, Accessories], [age], [gender]Cons [Park A: Camera, Media, Apparel, Accessories], [Resort A], [Resort B: Spa, Pool Bars], [Resort B: Restau-rants, Quick Service], [Resort B: Restaurants, Table Service], [Park: Restaurants, Quick Service], [ResortC: Apparel, Accessories], [Park F: Camera, Media, Apparel, Accessories], [Park C: Gift shop], [Park D:Apparel, Accessories], [income]Neu [Park A: Restaurants, Character dining], [Park A: Restaurants, Table service], [Resort D: Restaurants,Buffet/Family Style], [Park A: Restaurants, Table service], [Park A: Gift shop], [age], [income]Extr [Park C: Restaurant, Quick Service ], [Park A: Character showcase, Preschool, Kids], [Park C: Gift shop,Apparel, Accessories], [Park C: Indoor theater], [Park D: Gift shop, Camera, Media], [Resort A], [ResortD: Lounges], [Park A: Apparel, Accessories], [Park D: Restaurant, Character Dining, Family Style], [ParkE: Gift shop], [Park A: Art Collection, Gift shop], [Resort E: Health beauty], [Park F: Gift shop, Apparel,Accessories], [gender], [
Discussion
We demonstrated predictive improvements by using causalidentification. Personality inference is complex and can beinfluenced by factors such as age (Caspi and Silva 1995),income (Raadal et al.
Conclusion
In our study, we focused on three main tasks. (1) We em-ployed causal identification to select features which are mostinformative of individual characteristics (2) we built per-sonal traits prediction models based on guests’ choices andmetadata (3) we employed causal relationship analysis toobtain human interpretable results. Our investigation hasshown that models using causal identification significantlyoutperform the baseline models in all individual characteris-tics’ prediction, as well as demonstrated the causal relation-ships between guests’ choices and their personality traits.
Acknowledgement
We appreciate Michelle Ma for providing the illustrationof an example in our paper. We are also grateful to KunZhang for his valuable advice and guidance about identify-ing causal relationship using Tetrad.
References
Stephanie Booth-Kewley and Ross R Vickers. Associations be-tween major domains of personality and health behavior.
Journalof personality , 62(3):281–298, 1994.Stephen Campbell, Lyndsay Henry, Jackie Hammelman, and MayaPignatore. Personality and smoking behaviour of non-smokers,previous smokers, and habitual smokers.
J Addict Research &Therapy , 5:191, 2014.Gian Vittorio Caprara, Shalom Schwartz, Cristina Capanna,Michele Vecchione, and Claudio Barbaranelli. Personality andpolitics: Values, traits, and political choice.
Political psychology ,27(1):1–28, 2006.Avshalom Caspi and Phil A Silva. Temperamental qualities at agethree predict personality traits in young adulthood: Longitudinalevidence from a birth cohort.
Child development , 66(2):486–498,1995.Avshalom Caspi, Brent W Roberts, and Rebecca L Shiner. Per-sonality development: Stability and change.
Annu. Rev. Psychol. ,56:453–484, 2005.Antonio Chirumbolo and Luigi Leone. Personality and politics:The role of the hexaco model of personality in predicting ideologyand voting.
Personality and Individual Differences , 49(1):43–48,2010.Mark Cook, Alison Young, Dean Taylor, and Anthony P Bedford.Personality correlates of alcohol consumption.
Personality and In-dividual Differences , 24(5):641–647, 1998.ao Ding and Shimei Pan. Personalized emphasis framing forpersuasive message generation. arXiv preprint arXiv:1607.08898 ,2016.Aronson Elliot, Wilson Timothy D., and Robin M. Akert.
Socialpsychology . Pearson Education Canada, 2012.Golnoosh Farnadi, Susana Zoghbi, Marie-Francine Moens, andMartine De Cock. Recognising personality traits using facebookstatus updates. In
ICWSM13 . AAAI, 2013.Bob M Fennis and Ad Th H Pruyn. You are what you wear: Brandpersonality influences on consumer impression formation.
Journalof Business Research , 60(6):634–639, 2007.Kyriaki G Giota and George Kleftaras. The role of personalityand depression in problematic use of social networking sites ingreece.
Cyberpsychology: Journal of Psychosocial Research onCyberspace , 7(3), 2013.Jennifer Golbeck, Cristina Robles, and Karen Turner. Predictingpersonality with social media. In
CHI’11 extended abstracts onhuman factors in computing systems , pages 253–262. ACM, 2011.Lewis R Goldberg. The structure of phenotypic personality traits.
American psychologist , 48(1):26, 1993.Samuel D Gosling, Sei Jin Ko, Thomas Mannarelli, and Margaret EMorris. A room with a cue: personality judgments based on of-fices and bedrooms.
Journal of personality and social psychology ,82(3):379, 2002.Fred I Greenstein. Can personality and politics be studied system-atically?
Political Psychology , pages 105–128, 1992.Isabelle Guyon, Constantin Aliferis, and Andr´e Elisseeff. Causalfeature selection.
Computational methods of feature selection ,pages 63–82, 2007.Lois W Hoffman. The influence of the family environment on per-sonality: Accounting for sibling differences.
Psychological Bul-letin , 110(2):187, 1991.Geert Hofstede and Robert R McCrae. Personality and culture re-visited: Linking traits and dimensions of culture.
Cross-culturalresearch , 38(1):52–88, 2004.Guido W Imbens and Donald B Rubin.
Causal inference in statis-tics, social, and biomedical sciences . Cambridge University Press,2015.Ahmad Jamal and Mark MH Goode. Consumers and brands: astudy of the impact of self-image congruence on brand preferenceand satisfaction.
Marketing Intelligence & Planning , 19(7):482–492, 2001.Michal Kosinski, David Stillwell, and Thore Graepel. Privatetraits and attributes are predictable from digital records of hu-man behavior.
Proceedings of the National Academy of Sciences ,110(15):5802–5805, 2013.Michal Kosinski, Sandra C Matz, Samuel D Gosling, VesselinPopov, and David Stillwell. Facebook as a research tool for thesocial sciences: Opportunities, challenges, ethical considerations,and practical guidelines.
American Psychologist , 70(6):543, 2015.Chin-Feng Lin. Segmenting customer brand preference: demo-graphic or psychographic.
Journal of Product & Brand Manage-ment , 11(4):249–268, 2002.Stephan Mandt, Florian Wenzel, Shinichi Nakajima, John Cun-ningham, Christoph Lippert, and Marius Kloft. Sparse probit linearmixed model.
Machine Learning , pages 1–22, 2017.Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka On-nela, and J Niels Rosenquist. Understanding the demographics oftwitter users.
ICWSM , 11:5th, 2011. Philip Moody. The relationships of quantified human smoking be-havior and demographic variables.
Social Science & Medicine. PartA: Medical Psychology & Medical Sociology , 14(1):49–54, 1980.Gaby Odekerken-Schr¨oder, Kristof De Wulf, and Patrick Schu-macher. Strengthening outcomes of retailer–consumer relation-ships: The dual impact of relationship marketing tactics and con-sumer personality.
Journal of business research , 56(3):177–190,2003.Judea Pearl.
Causality . Cambridge university press, 2009.James W Pennebaker, Roger J Booth, and Martha E Francis. Lin-guistic inquiry and word count: Liwc [computer software].
Austin,TX: liwc. net , 2007.M Raadal, P Milgrom, P Weinstein, L Mancl, and AM Cauce. Theprevalence of dental anxiety in children from low-income familiesand its relationship to personality traits.
Journal of dental research ,74(8):1439–1443, 1995.Adrian E Raftery. Bayesian model selection in social research.
Sociological methodology , pages 111–163, 1995.Joseph D Ramsey. Scaling up greedy equivalence search for con-tinuous variables.
CoRR, abs/1507.07749 , 2015.Richard Scheines, Peter Spirtes, Clark Glymour, ChristopherMeek, and Thomas Richardson. The tetrad project: Constraintbased aids to causal model specification.
Multivariate BehavioralResearch , 33(1):65–117, 1998.Bernhard Sch¨olkopf, Dominik Janzing, Jonas Peters, EleniSgouritsa, Kun Zhang, and Joris Mooij. On causal and anticausallearning. arXiv preprint arXiv:1206.6471 , 2012.H Andrew Schwartz, Johannes C Eichstaedt, Margaret L Kern,Lukasz Dziurzynski, Stephanie M Ramones, Megha Agrawal,Achal Shah, Michal Kosinski, David Stillwell, Martin EP Selig-man, et al. Personality, gender, and age in the language of so-cial media: The open-vocabulary approach.
PloS one , 8(9):e73791,2013.Timothy W Smith and Avron Spiro III. Personality, health, andaging: Prolegomenon for the next generation.
Journal of Researchin Personality , 36(4):363–394, 2002.Jacqui Spence, Russell Abratt, and Bobby Amos Malabie. Useof psychographics in consumer market segmentation: the southafrican experience.
South African Journal of Business Manage-ment , 28(2):33–41, 1997.Peter Spirtes and Christopher Meek. Learning bayesian networkswith discrete variables from data. In
KDD , volume 1, pages 294–299, 1995.Peter Spirtes, Clark N Glymour, and Richard Scheines.
Causation,prediction, and search . MIT press, 2000.Siriporn Sujithamrak and Terry Lam. Relationship between cus-tomer complaint behavior and demographic characteristics: Astudy of hotel restaurants’ patrons.
Asia Pacific Journal of TourismResearch , 10(3):289–307, 2005.Robert Tibshirani. Regression shrinkage and selection via thelasso.
Journal of the Royal Statistical Society. Series B (Method-ological) , pages 267–288, 1996.William Yang Wang, Edward Lin, and John Kominek. This text hasthe scent of starbucks: A laplacian structured sparsity model forcomputational branding analytics. In
EMNLP, Seattle, WA, USA ,2013.Thomas A Widiger. Personality and psychopathology.
World Psy-chiatry , 10(2):103–106, 2011.Chao Yang, Shimei Pan, Jalal Mahmud, Huahai Yang, and PadminiSrinivasan. Using personal traits for brand preference prediction.In
EMNLP , pages 86–96, 2015.ei Yu and Huan Liu. Feature selection for high-dimensional data:A fast correlation-based filter solution. In