Using Machine Learning to Guide Cognitive Modeling: A Case Study in Moral Reasoning
UUsing Machine Learning to Guide Cognitive Modeling:A Case Study in Moral Reasoning
Mayank Agrawal ([email protected])
Department of Psychology, Princeton University
Joshua C. Peterson ([email protected])
Department of Computer Science, Princeton University
Thomas L. Griffiths ([email protected])
Departments of Psychology and Computer Science, Princeton University
Abstract
Large-scale behavioral datasets enable researchers to use com-plex machine learning algorithms to better predict human be-havior, yet this increased predictive power does not always leadto a better understanding of the behavior in question. In thispaper, we outline a data-driven, iterative procedure that allowscognitive scientists to use machine learning to generate mod-els that are both interpretable and accurate. We demonstratethis method in the domain of moral decision-making, wherestandard experimental approaches often identify relevant prin-ciples that influence human judgments, but fail to generalizethese findings to “real world” situations that place these prin-ciples in conflict. The recently released Moral Machine datasetallows us to build a powerful model that can predict the out-comes of these conflicts while remaining simple enough to ex-plain the basis behind human decisions.
Keywords: machine learning; moral psychology
Introduction
Explanatory and predictive power are hallmarks of any use-ful scientific theory. However, in practice, psychology tendsto focus more on explanation (Yarkoni & Westfall, 2017),whereas machine learning is almost exclusively aimed at pre-diction. The necessarily restrictive nature of laboratory exper-iments often leads psychologists to test competing hypothesesby running highly-controlled studies on tens or hundreds ofsubjects. Although this procedure gives a better understand-ing of the specific phenomenon, it can be difficult to gener-alize the findings and predict behavior in the “real world,”where multiple factors are interacting with one another. Con-versely, machine learning takes full advantage of complex,nonlinear models that excel in tasks ranging from image clas-sification (Krizhevsky et al., 2012) to video game playing(Mnih et al., 2015). The performance of these models scaleswith their level of expressiveness (Huang et al., 2018), whichresults in millions of parameters that are difficult to interpret.Interestingly, machine learning has long utilized insightfrom cognitive psychology and neuroscience (Rosenblatt,1958; Sutton & Barto, 1981; Ackley et al., 1985; Elman,1990), a trend that continues to this day (Banino et al., 2018;Lzaro-Gredilla et al., 2019). We believe that the reverse di-rection has been underutilized, but could be just as fruitful.In particular, psychology could leverage machine learning toimprove both the predictive and explanatory power of cog-nitive models. We propose a method (summarized in Figure1) that enables cognitive scientists to use large-scale behav- Figure 1: A systematic, data-driven procedure for building in-terpretable models that rival the predictive power of complexmachine learning models.ioral datasets to construct interpretable models that rival theperformance of complex, black-box algorithms.This methodology is inspired by Box’s loop (Box &Hunter, 1962; Blei, 2014; Linderman & Gershman, 2017),a systematic process of integrating the scientific method withexploratory data analysis. Our key insight is that training ablack-box algorithm gives a sense of how much variance ina certain type of behavior can be predicted. This predictivepower provides a standard for improvement in explicit cogni-tive models (Khajah et al., 2016). By continuously critiquingan interpretable cognitive model with respect to these black-box algorithms, we can identify and incorporate new featuresuntil its performance converges, thereby jointly maximizingour two objectives of explanatory and predictive power.In this paper, we demonstrate this methodology by buildinga statistical model of moral decision-making. Philosophersand psychologists have historically conducted thought exper-iments and laboratory studies isolating individual principlesresponsible for human moral judgment (e.g. consequentialistones such as harm aversion or deontological ones such as notusing others as a means to an end). However, it can be diffi-cult to predict the outcomes of situations in which these prin-ciples conflict (Cushman et al., 2010). The recently released a r X i v : . [ c s . C Y ] M a y oral Machine dataset (Awad et al., 2018) allows us to builda predictive model of how humans navigate these conflictsover a large problem space. We start with a basic rationalchoice model and iteratively add features until its accuracyrivals that of a neural network, resulting in a model that isboth predictive and interpretable. Background
Theories of Moral Decision-Making
The two main fam-ilies of moral philosophy often used to describe human be-havior are consequentialism and deontology . Consequential-ist theories posit that moral permissibility is evaluated solelywith respect to the outcomes, and that one should choosethe outcome with the highest value (Greene, 2007). On theother hand, deontological theories evaluate moral permissi-bility with respect to actions and whether they correspond tospecific rules or rights.The trolley car dilemma (Foot, 2002; Thomson, 1984)highlights how these two families differ when making moraljudgments. Here, participants must determine whether it ismorally permissible to sacrifice an innocent bystander in or-der to prevent a trolley car from killing five railway work-ers. The “switch” scenario gives the participant the option toredirect the car to a track with one railway worker, whereasthe “push” scenario requires the participant to push a largeman directly in front of the car to stop it, killing the largeman in the process. Given that the outcomes are the same forthe “switch” and “push” scenarios (i.e., intervening results inone death, while not intervening results in five deaths), con-sequentialism prescribes intervention in both scenarios. De-ontological theories allow for intervening in the “switch” sce-nario but not the “push” scenario because pushing a man tohis death violates a moral principle, but switching the direc-tion of a train does not.Empirical studies have found that people are much morewilling to “switch” than to “push” (Greene et al., 2001; Cush-man et al., 2006), suggesting deontological principles factorheavily in human moral decision-making. Yet, a deontologi-cal theory’s lack of systematicity makes it difficult to evaluateas a model of moral judgment (Greene, 2017). What are therules that people invoke, and how do they interact with oneanother when in conflict? Furthermore, how do they interactwith consequentialist concerns? Would people that refuse topush a man to his death to save five railway workers still makethe same decision and with the same level of confidence whenthere are a million railway workers? Any theory of humanmoral cognition needs to be able to model how participantstrade off different consequentialist and deontological factors.
Moral Machine Paradigm
As society anticipates au-tonomous cars roaming its streets in the near future, the trol-ley car dilemma has left the moral philosophy classroom andentered into national policy conversations. A group of re-searchers aiming to gauge public opinion created “Moral Ma-chine,” an online game that presents users with moral dilem- (a) An autonomous car is headed towards a group ofthree pedestrians who are illegally crossing the street.The car can either stay and kill these pedestrians orswerve and kill three other pedestrians crossing legally.(b) An autonomous car with five human passengers isheaded towards a group of pedestrians who are illegallycrossing the street. Staying on course will kill the pedes-trians but save the passengers, while swerving will killthe passengers but save the pedestrians.
Figure 2: Two sample dilemmas in the Moral Machinedataset. In every scenario, the participant is asked to choosewhether to stay or swerve (Awad et al., 2018).mas (see Figure 2) centered around autonomous cars (Awad etal., 2018). Comprising roughly forty million decisions fromusers in over two hundred countries, the Moral Machine ex-periment is the largest public dataset collection on humanmoral judgment.In addition to the large number of decisions, the exper-iment operated over a rich problem space. Twenty uniqueagent types (e.g. man, girl, dog) along with contextual infor-mation (e.g. crossing signals) enabled researchers to measurethe outcomes of nine manipulations: action versus inaction,passengers versus pedestrians, males versus females, fat ver-sus fit, low status versus high status, lawful versus unlawful,elderly versus young, more lives saved versus less, and hu-mans versus pets. The coverage and density of this problemspace provides the opportunity to build a model that predictshow humans make moral judgments when a variety of differ-ent principles are at play. redicting Moral Decisions As described earlier, the iterative refinement method we pro-pose begins with both an initial, interpretable model and amore predictive black-box algorithm. In this section, we doexactly this by contrasting rational choice models derivedfrom moral philosophy with multilayer feedforward neuralnetworks.
Model Descriptions
We restricted our analysis to a subset of the dataset ( N = , , Interpretable Models
Choice models (CM) are ubiquitousin both psychology and economics, and they form the basis ofour interpretable model in this paper (Luce, 1959; McFaddenet al., 1973). In particular, we assume that participants con-struct the values for both sides, i.e., v left and v right , and chooseto save the left side when v left > v right , and vice versa. Thevalue of each side is determined by aggregating the utilitiesof all its agents: v side = ∑ i u i l i (1)where u i is the utility given to agent i and l i is a binary indi-cator of agent i ’s presence on the given side.McFadden et al. (1973) proved that if individual variationaround this aggregate utility follows a Weibull distribution,the probability that v left is optimal is consistent with the ex-ponentiated Luce choice rule used in psychology, i.e., P ( v left > v right ) = P ( c = left | v left , v right ) = e v left e v left + e v right (2)In practice, we can implement this formalization by us-ing logistic regression to infer the utility vector u . We builtthree models, each of which provided top-down different con-straints on the utility vector. Our first model, “Equal Weight,”required each agent to be equally weighted. At the other ex-treme, our “Utilitarian” model had no restriction. A thirdmodel, “Animals vs. People,” was a hybrid: all humans werewere weighted equally and all animals were weighted equally,but humans and animals could be weighted differently.Research in moral psychology and philosophy has foundthat humans use moral principles in addition to standard util-itarian reasoning when choosing between options (Quinn,1989; Spranca et al., 1991; Mikhail, 2002; Royzman &Baron, 2002; Baron & Ritov, 2004; Cushman et al., 2006).For example, one principle may be that allowing harm is morepermissible than doing harm (Woollard & Howard-Snyder,2016). In order to incorporate these principles, we moved be-yond utilitarian-based choice models by expanding the defi-nition of a side’s value: v side = ∑ i u i l i + ∑ m λ m f m (3) where f m is an indicator variable of whether principle m ispresent on the side and λ m represents the importance of prin-ciple m . We built an “Expanded” model that introduces twoprinciples potentially relevant in the Moral Machine dataset.The first is a preference for allowing harm over doing harm,thus penalizing sides that require the car to swerve in orderto save them. Another potentially relevant principle is that itis more justified to punish unlawful pedestrians than lawfulones because they knowingly waived their rights when cross-ing illegally (Nino, 1983). This model was trained on thedataset to infer the values of u and λ . Neural Networks
We use relatively expressive multilayerfeedforward neural networks (NN) to provide an estimate ofthe level of performance that statistical models can achieve inthis domain. These networks were given as inputs the forty-two variables that uniquely defined a dilemma to each partici-pant: twenty for the characters on the left side, twenty for thecharacters on the right side, one for the side of the car, andone for the crossing signal status. These are the same inputsfor the “Expanded” choice model. However, the “Expanded”model had the added restriction that the side did not changean agent’s utility (e.g., a girl on the left side has the same util-ity as a girl on the right side), while the neural network hadno such restriction.The networks were trained to minimize the crossentropybetween the model’s output and human binary decisions. Thefinal layer of the neural networks is similar to the choicemodel in that it is constructing the value of each side byweighting different features. However, in these networks,the principles are learned from the nonlinear interactions ofmultiple layers and the indicators are probabilistic rather thandeterministic.To find the optimal hyperparameters, we conducted a gridsearch, varying the number of hidden layers, the number ofhidden neurons, and the batch size. All networks used thesame ReLU activation function and and no dropout. Giventhat most of these models both performed similarly andshowed a clear improvement over simple choice models, wedid not conduct a more extensive hyperparameter search. Aneural network with three 32-unit hidden layers was used forall the analyses in this paper.
Model Comparisons
Standard Metrics
Table 1 displays the results of the fourrational choice models and the best performing neural net-work. All models were trained on eighty percent of thedataset, and the reported results reflect the performance on theheld-out twenty percent. We report accuracy and area underthe curve (AUC), two standard metrics for evaluating classi-fication models. We also calculate the normalized Akaike in-formation criterion (AIC), a metric for model comparison thatintegrates a model’s predictive power and simplicity. All met-rics resulted in the same expected ranking of models: NeuralNetwork, Expanded, Utilitarian, Animals vs. People, EqualWeight.able 1: Comparison of Standard Metrics
Model Type Accuracy AUC AIC
Equal Weight 0 .
571 0 .
616 1 . .
630 0 .
702 1 . .
732 0 .
780 1 . .
763 0 .
826 1 . .
774 0 .
845 0 . Performance as a Function of Dataset Size
Table 1demonstrates that our cognitive models aren’t as predictive asa powerful learning algorithm. This result, however, is onlyobservable with larger datasets. Figure 3 plots each metric foreach model over a large range of dataset sizes. Choice modelsperformed very well at dataset sizes comparable to that of alarge laboratory experiment. Conversely, neural networks im-proved with larger dataset sizes until reaching an asymptotewhere N > , Identifying Explanatory Principles
The neural network gives us an aspirational standard ofhow our simpler model should perform. Next, our task is toidentify the emergent features it constructs and incorporatethem into our simple choice model.
Calculating Residuals in Problem Aggregates
By aggre-gating decisions for each dilemma, we can determine the em-pirical “difficulty” of each dilemma and whether our modelspredict this difficulty. For example, assume dilemmas A andB have been proposed to one hundred participants. If ninetyparticipants exposed to dilemma A chose to save the left sideand sixty participants exposed to dilemma B did, the empiri-cal percentages for A and B would be 0 .
90 and 0 .
60, respec-tively. An accurate model of moral judgment should not onlyreflect the binary responses but also the confidence behindthose responses.We identified the specific problems where the neural net-work excelled compared to the “Expanded” rational choicemodel. Manually inspecting these problems and clusteringthem into groups revealed useful features beyond those em-ployed in the choice model that the neural network is con-structing. We formalized these features as principles and in-corporated them into the choice model to improve prediction.Two examples are represented in Table 2.Table 2a describes a set of scenarios where one humanis crossing illegally and one pet is crossing legally. Empir-ically, users tend to overwhelmingly prefer saving the hu-man, while the choice model predicts the opposite. Ourchoice model’s inferred utilities and importance values reveala strong penalty (i.e., a large negative coefficient) for (1) hu-mans crossing illegally and (2) requiring the car to swerve. However, the empirical data suggests that these principles areoutweighed by the fact that this is a humans-versus-animalsdilemma, and that humans should be preferred despite thecrossing or intervention status. Thus, the next iteration ofour model should incorporate a binary variable signifyingwhether this is an explicit humans-versus-animals dilemma.We can conduct a similar analysis for the set of scenariosin Table 2b. Both models output significantly differentdecision probabilities, the neural network being the moreaccurate of the two. Most salient to us was an effect ofage. Specifically, when the principal difference between thetwo sides is age, both boys and girls should be saved at amuch higher rate, and information about their crossing andintervention status is less relevant. To capture this fact, wecan incorporate another binary variable signifying whetherthe only difference between the agents on each side is age.
Incorporating New Features
The two features we identifiedare a subset of six “problem types” the Moral Machine re-searchers used in their experiment: humans versus animals,old versus young, more versus less, fat versus fit, male versusfemale, and high status versus low status. These types werenot revealed to the participants, but the residuals we inspectedsuggest that participants were constructing them from the rawfeatures and then factoring them into their decisions.Incorporating these six new features as principles resultedin 77.1% accuracy, nearly closing the gap entirely betweenour choice model and neural network performance reportedin Table 1. Figure 4 illustrates the effects of incorporatingthe problem types into both the choice model and the neu-ral network in details. Importantly, we observe that “NeuralNetwork + Types” outperforms “Neural Network” at smallerdataset sizes, but performs identically at larger dataset sizes.This result suggests that the regular “Neural Network” is con-structing the problem types we identified as emergent featuresgiven sufficient data to learn them from. More importantly,our augmented choice model now rivals the neural network’spredictive power. And yet, by virtue of it being a rationalchoice model with only a few more parameters than our “Ex-panded” (and even the “Utilitarian”) model, it remains con-ceptually simple. Thus, we have arrived at an interpretablestatistical model that can both quantify the effects of utili-tarian calculations and moral principles and predict humanmoral judgment over a large problem space.Figure 4b still displays a gap between the AUC curves, sug-gesting there is more to be gained by repeating the processand potentially identifying new even more principles. For ex-ample, the last iteration found that when there was a humans-versus-animals problem, humans should be strongly favored.However, residuals suggest that participants don’t honor thisprinciple when all the humans are criminals. Rather, in thesecases, participants may favor the animals or prefer the crim-inal by only a small margin. Thus, our next iteration will in-clude a feature corresponding to whether all the humans arecriminals. Our model also underperforms by overweighting a) Dataset Size vs. AIC (b) Dataset Size vs. AUC (c) Dataset Size vs. Accuracy
Figure 3: Test-set performance metrics of choice models and neural network as a function of dataset size. Models were trainedon five 80/20 training/test splits. Error bars indicate ± Left Side Agents Right Side Agents Car Side Empirical CM NN
Pregnant Woman Crossing Illegally Cat Crossing Legally Left 0.779 0.411 0.797Stroller Crossing Illegally Cat Crossing Legally Left 0.826 0.425 0.801Dog Crossing Legally Male Doctor Crossing Illegally Right 0.312 0.693 0.293Cat Crossing Legally Man Crossing Illegally Right 0.308 0.692 0.266Old Woman Crossing Illegally Cat Crossing Legally Left 0.670 0.306 0.622 (a) Problems indicating Human vs. Animals Principle
Left Side Agents Right Side Agents Car Side Empirical CM NN
Old Man Crossing Legally Boy Crossing Illegally Right 0.350 0.647 0.341Old Woman Crossing Legally Girl Crossing Illegally Right 0.337 0.642 0.321Man Boy Left 0.113 0.417 0.097Old Woman Crossing Legally Girl Crossing Illegally Left 0.268 0.570 0.269Old Woman Woman Right 0.256 0.475 0.269 (b) Problems indicating Old vs. Young Principle the effects of intervention. In problem types such as maleversus female and fat versus fit, the intervention variable isweighted much differently than in young-versus-old dilem-mas. The next iteration of the model should also include thisinteraction. Thus, this methodology allows us to continuouslybuild on top of the new features we identify.
Conclusion
Large-scale behavioral datasets have the potential to revo-lutionize cognitive science (Griffiths, 2015), and while datascience approaches have traditionally used them to predictbehavior, they can additionally help cognitive scientists con-struct explanations of the given behavior.Black-box machine learning algorithms give us a sense ofthe predictive capabilities of our scientific theories, and weoutline a methodology that uses them to help cognitive mod-els reach these capabilities:1. Amass a large-scale behavioral dataset that encompasses alarge problem space2. Formalize interpretable theories into parameterizable psy-chological models whose predictions can be evaluated While a batch size of 8 ,
192 was used for Table 1, a batch sizeof 512 was used here because of the smaller dataset sizes.
3. Compare these models to more accurate, but less inter-pretable black-box models (e.g., deep neural networks,random forests, etc.)4. Identify types of problems where the black-box modelsoutperform the simpler models5. Formalize these problem types into features and incorpo-rate them into both the simple and complex models6. Return to Step 4 and repeatWe applied this procedure to moral decision-making, start-ing off with a rational choice model and iteratively addingprinciples until it had a comparable predictive power withblack-box algorithms. This model allowed us to quanti-tatively predict the interactions between different utilitar-ian concerns and moral principles. Furthermore, our re-sults regarding problem types suggest that moral judgmentcan be better predicted by incorporating alignable differencesin similarity judgments (Tversky & Simonson, 1993), suchas whether the dilemma is humans-versus-animals or old-versus-young.The present case study, while successful, is only a limitedapplication of the methodology we espouse, and furtherdemonstrations are required to illustrate its utility. It will be a) Dataset Size vs. AIC (b) Dataset Size vs. AUC (c) Dataset Size vs. Accuracy
Figure 4: Test-set performance metrics before and after incorporating new principles. Models were trained on five 80/20training/test splits. Error bars indicate ± Acknowledgments.
We thank Edmond Awad for providing guid-ance on navigating the Moral Machine dataset.
References
Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learningalgorithm for boltzmann machines.
Cognitive science , (1), 147–169.Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A.,. . . Rahwan, I. (2018). The moral machine experiment. Nature , (7729), 59.Banino, A., Barry, C., Uria, B., Blundell, C., Lillicrap, T., Mirowski,P., . . . others (2018). Vector-based navigation using grid-likerepresentations in artificial agents. Nature , (7705), 429.Baron, J., & Ritov, I. (2004). Omission bias, individual differences,and normality. Organizational Behavior and Human DecisionProcesses , (2), 74–85.Blei, D. M. (2014). Build, compute, critique, repeat: Data analysiswith latent variable models. Annual Review of Statistics and ItsApplication , , 203–232.Box, G. E., & Hunter, W. G. (1962). A useful method for model-building. Technometrics , (3), 301–318.Cushman, F., Young, L., & Greene, J. D. (2010). Our multi-systemmoral psychology: Towards a consensus view. The Oxford hand-book of moral psychology , 47–71.Cushman, F., Young, L., & Hauser, M. (2006). The role of con-scious reasoning and intuition in moral judgment: Testing threeprinciples of harm.
Psychological science , (12), 1082–1089.Elman, J. L. (1990). Finding structure in time. Cognitive science , (2), 179–211.Foot, P. (2002). The problem of abortion and the doctrine of thedouble effect. Virtues and Vices and Other Essays in Moral Phi-losophy , 1932.Greene, J. D. (2007). The secret joke of kant’s soul. In W. Sinnott-Armstrong (Ed.),
Moral psychology: The neuroscience of moral-ity: Emotion, brain disorders, and development (Vol. 3, chap. 2).MIT Press.Greene, J. D. (2017). The rat-a-gorical imperative: Moral intuitionand the limits of affective learning.
Cognition , , 66–77.Greene, J. D., Sommerville, R. B., Nystrom, L. E., Darley, J. M., &Cohen, J. D. (2001). An fmri investigation of emotional engage-ment in moral judgment. Science , (5537), 2105–2108.Griffiths, T. L. (2015). Manifesto for a new (computational) cogni-tive revolution. Cognition , , 21–23. Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q. V.,& Chen, Z. (2018). Gpipe: Efficient training of giantneural networks using pipeline parallelism. arXiv preprintarXiv:1811.06965 .Khajah, M., Lindsey, R. V., & Mozer, M. C. (2016). How deep isknowledge tracing? arXiv preprint arXiv:1604.02416 .Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Ima-genet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).Linderman, S. W., & Gershman, S. J. (2017). Using computationaltheory to constrain statistical models of neural data.
Current opin-ion in neurobiology , , 14–24.Luce, R. D. (1959). Individual choice behavior: A theoretical anal-ysis .Lzaro-Gredilla, M., Lin, D., Guntupalli, J. S., & George, D. (2019).Beyond imitation: Zero-shot task transfer on robots by learningconcepts as cognitive programs.
Science Robotics , (26).McFadden, D., et al. (1973). Conditional logit analysis of qualitativechoice behavior.Mikhail, J. (2002). Aspects of the theory of moral cognition: In-vestigating intuitive knowledge of the prohibition of intentionalbattery and the principle of double effect.Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J.,Bellemare, M. G., . . . others (2015). Human-level controlthrough deep reinforcement learning. Nature , (7540), 529.Nino, C. S. (1983). A consensual theory of punishment. Philosophy& Public Affairs , 289–306.Quinn, W. S. (1989). Actions, intentions, and consequences: Thedoctrine of doing and allowing.
The Philosophical Review , (3),287–312.Rosenblatt, F. (1958). The perceptron: a probabilistic model forinformation storage and organization in the brain. Psychologicalreview , (6), 386.Royzman, E. B., & Baron, J. (2002). The preference for indirectharm. Social Justice Research , (2), 165–184.Spranca, M., Minsk, E., & Baron, J. (1991). Omission and com-mission in judgment and choice. Journal of experimental socialpsychology , (1), 76–105.Sutton, R. S., & Barto, A. G. (1981). Toward a modern theoryof adaptive networks: expectation and prediction. Psychologicalreview , (2), 135.Thomson, J. J. (1984). The trolley problem. Yale Law Journal , ,1395.Tversky, A., & Simonson, I. (1993). Context-dependent preferences. Management science , (10), 1179–1189.Woollard, F., & Howard-Snyder, F. (2016). Doing vs. allowingharm. In E. N. Zalta (Ed.), The stanford encyclopedia of phi-losophy (Winter 2016 ed.). Metaphysics Research Lab, StanfordUniversity.Yarkoni, T., & Westfall, J. (2017). Choosing prediction over expla-nation in psychology: Lessons from machine learning.
Perspec-tives on Psychological Science ,12