[PDF] Integrating data science ethics into an undergraduate major

Abstract

We present a programmatic approach to incorporating ethics into an undergraduate major in statistical and data sciences. We discuss departmental-level initiatives designed to meet the National Academy of Sciences recommendation for weaving ethics into the curriculum from top-to-bottom as our majors progress from our introductory courses to our senior capstone course, as well as from side-to-side through co-curricular programming. We also provide six examples of data science ethics modules used in five different courses at our liberal arts college, each focusing on a different ethical consideration. The modules are designed to be portable such that they can be flexibly incorporated into existing courses at different levels of instruction with minimal disruption to syllabi. We conclude with next steps and preliminary assessments.

Full PDF

IIntegrating data science ethics into anundergraduate ma jor

Benjamin S. Baumer ∗ Statistical & Data Sciences, Smith CollegeandRandi L. GarciaPsychology and Statistical & Data Sciences, Smith CollegeandAlbert Y. KimStatistical & Data Sciences, Smith CollegeandKatherine M. KinnairdComputer Science and Statistical & Data Sciences, Smith CollegeandMiles Q. OttStatistical & Data Sciences, Smith CollegeAugust 4, 2020

Abstract

We present a programmatic approach to incorporating ethics into an undergrad-uate major in statistical and data sciences. We discuss departmental-level initiativesdesigned to meet the National Academy of Sciences recommendation for weavingethics into the curriculum from top-to-bottom as our majors progress from our intro-ductory courses to our senior capstone course, as well as from side-to-side throughco-curricular programming. We also provide six examples of data science ethics mod-ules used in ﬁve diﬀerent courses at our liberal arts college, each focusing on a diﬀerentethical consideration. The modules are designed to be portable such that they can beﬂexibly incorporated into existing courses at diﬀerent levels of instruction with mini-mal disruption to syllabi. We conclude with next steps and preliminary assessments.

Keywords: data ethics, education, case studies, undergraduate curriculum ∗ Benjamin S. Baumer is Associate Professor, Statistical & Data Sciences, Smith College, Northampton,MA 01063 (e-mail: [email protected] ). This work was not supported by any grant. The authors thanknumerous colleagues and students for their support. a r X i v : . [ s t a t . O T ] J u l The potential consequences of the ethical implications of data science cannotbe overstated.”—National Academies of Sciences, Engineering, and Medicine (2018)

Data ethics is a rapidly-developing yet inchoate subﬁeld of research within the disciplineof data science, which is itself rapidly-developing (Wender & Kloefkorn 2017). Within thepast two years, awareness that ethical concerns are of paramount importance has grown.In the public sphere, the Cambridge Analytica episode revealed how the large scale har-vesting of Facebook user data without user consent was not only possible, but permissableand weaponized for political advantage (Davies 2015). Facebook CEO Mark Zuckerberginitially characterized “the idea that fake news on Facebook inﬂuenced the [2016 UnitedStates Presidential] election in any way” as “pretty crazy”—comments he later regretted(Levin 2017). Nevertheless, the subsequent tongue-lashing and hand-wringing has led tosubstantive changes in the policies of several large social media platforms, including severalprominent public ﬁgures being banned. Popular books like O’Neil (2016) and Eubanks(2018) have highlighted how algorithmic bias can steer even well-intentioned data scienceproducts into profoundly destructive forces. These incidents have revived a sense amongtech professionals, and the public at-large that ethical considerations are of vital impor-tance.As academics, it is our responsibility to educate our students about ethical consider-ations in statistics and data science before they graduate. To that end, recent work byElliott et al. (2018) addresses how to teach data science ethics. The machine learningcommunity convenes the ACM Conference on Fairness, Accountability, and Transparency(which includes Twitter as a sponsor), which focuses on ethical considerations in machinelearning research and development. Some of the ﬁrst wave of data science textbooks includechapters on ethics (Baumer et al. 2017).Most speciﬁcally, the National Academies of Sciences, Engineering, and Medicine Roundtableon Data Science Postsecondary Education devoted one of its twelve discussions to “Inte-2rating Ethics and Privacy Concerns into Data Science Education” (Wender & Kloefkorn2017). National Academies of Sciences, Engineering, and Medicine (2018) includes thefollowing recommendations for undergraduate programs in data science:Ethics is a topic that, given the nature of data science, students should learnand practice throughout their education. Academic institutions should ensurethat ethics is woven into the data science curriculum from the beginning andthroughout.The data science community should adopt a code of ethics; such a code shouldbe aﬃrmed by members of professional societies, included in professional devel-opment programs and curricula, and conveyed through educational programs.The code should be reevaluated often in light of new developments.In light of this, it seems clear that indiﬀerence to ethics in data science is not an informedposition, and in fact, the default position of indiﬀerence prevalent in the tech communityis exactly the problem we are trying to help our students solve. In this sense, indiﬀerenceto ethics in data science is counter to the mission of our program, and in a larger sense toour profession.In the major in statistical and data sciences at Smith College, we have incorporateddiscussions of ethics (in one form or another) into all of our classes, including the seniorcapstone, in which about 25% of the content concerns data science ethics. Especially inlight of concerns about academic freedom, we wish to stress that this treatment is notabout indoctrinating students about what to think, but rather to force students to grapplewith the often not-so-obvious ramiﬁcations of their data science work and to develop theirown compasses for navigating these waters (Heggeseth 2019). It is not a political stance—it is an educational imperative, as stressed by recommendations 2.4 and 2.5 in NationalAcademies of Sciences, Engineering, and Medicine (2018).In this paper, we present a programmatic approach to incorporating ethics into an un-dergraduate major in statistical and data sciences. In Section 2, we review and delineatenotions of ethics in data science. We discuss departmental-level initiatives designed to meetthe NAS recommendation for weaving ethics into the curriculum from top to bottom, and3rom side-to-side as well through co-curricular programming in Section 3. In Section 4 weprovide six diﬀerent modules that focus on data science ethics that have been incorpo-rated into ﬁve diﬀerent courses. The modules are designed for portability and are publiclyavailable at our website . We review evidence of our progress in Section 5. Ethical considerations in statistics have been taught for decades, going back to the clas-sic treatment of misleading data visualization techniques in Huﬀ (1954). In this section,we review the literature on teaching data science ethics with an eye towards explicatingdiﬀerent notions of what falls under that umbrella.

From a legal perspective, the General Data Protection Regulation (European Parliament2018)—which became enforceable in 2018—provides Europeans with greater legal protec-tion for personal data stored online than is present in the United States. This discrepancyhighlights the distinction between ethical and legal considerations—the former should beuniversal, but the latter are patently local. At some level, laws reﬂect the ethical valuesof a country, but a profession cannot abdicate its ethical responsibilities to lawmakers. AsO’Neil notes: “it is unreasonable to expect the legal system to keep pace with advances indata science.” (Wender & Kloefkorn 2017)Within statistics, a major ethical focus has been on human subjects research.

TheBelmont Report is still required reading by institutional review boards (IRBs) (NationalCommission for the Protection of Human Subjects of Biomedical and Behavioral Research1978). It posits three major ethical principles (respect for persons, beneﬁcence, and justice)and outlines three major applications (informed consent, assessment of risks and beneﬁts,and selection of subject). Yet just as we reject the argument that all legal data scienceprojects are ethical, we question the supposition that all IRB-approved data science projectsare ethical. IRBs have not been able to keep pace with the rapid development of data science https://bit.ly/2v2cf8n after the data had beencollected, meaning that the approval covered the analysis of the data, not the collection orthe design of the experiment. This example illustrates how university IRBs are ill-equippedregulate “big data” studies (Meyer 2014).Major professional societies, including the American Statistical Association (ASA)(Committee on Professional Ethics 2018 b ), the Association for Computing Machinery(ACM) (Committee on Professional Ethics 2018 a ), and the National Academy of Sciences(NAS) (Committee on Science, Engineering, and Public Policy 2009), publish guidelinesfor conducting research. These documents focus on topics like professionalism, propertreatment of data, negligence, and conﬂicts of interest. Similarly Tractenberg (2019 a ),Tractenberg (2019 b ), and Gunaratna & Tractenberg (2016) explore ethics in statisticalpractice but don’t mention newer concepts like algorithmic bias . Loukides et al. (2018) fo-cuses on industry and identiﬁes ﬁve framing guidelines for building data products: consent,clarity, consistency, control, and consequences. For oversight, Germany is considering rec-ommendations for a data science ethics review board (Tarran 2019). Canney & Bielefeldt(2015) present a framework for evaluating ethical development in engineers.A broader discussion of professional ethics in statistics and data science would includeissues surrounding reproducibility and replicability, which would include concepts like trans-parency, version control, and p-hacking (Wasserstein et al. 2016, Wasserstein et al. (2019)).Inappropriate analysis remains a problem in many ﬁelds, including biostatistics (Wanget al. 2018). The machine learning community is having intense debates about the extentto which data or algorithms are ultimately most responsible for bias in facial recognitionand other AI-driven products (Cai 2020).While these areas remain crucially important—and continue to play a role in our5urriculum—we focus here on more modern manifestations of data science ethics broughton by “big data.” These include primarily algorithmic bias, but also ethical concernswhen scraping data from the web, storing your personal data online, de-identifying and re-identifying personal data, and large-scale experimentation by internet companies in whatZuboﬀ (2018) terms “surveillance capitalism.” These ethical areas are obviously informedby longstanding ethical principles, but are distinct in the way that computers, the Internet,and databases have transformed the way we live (Hand 2018).Our focus areas mostly intersect with those identiﬁed by National Academies of Sciences,Engineering, and Medicine (2018) as needed by data scientists: • Ethical precepts for data science and codes of conduct, • Privacy and conﬁdentiality, • Responsible conduct of research, • Ability to identify “junk” science, and • Ability to detect algorithmic bias.This paper oﬀers examples for implementing these focus areas. For example, Section4.6 contains a module that has students apply ethical codes in context. The modulesin Sections 4.1 and 4.3 explore notions of privacy and conﬁdentiality. Sections 4.5 and4.3 provide modules that illuminate notions of responsibility when conducting research.Sections 4.2 and 4.6 present modules that encourage students to detect algorithmic biasin action. Yet we also go beyond these key areas. The module in Section 4.4 exploresboundaries between legal and ethical considerations. In other activities not presented here,we engage students in our senior capstone and machine learning courses with deep questionsabout the impact that actions by large-scale Internet companies have on our lives.

While discussion about data science ethics abounds, there are few successful models forhow statisticians and data scientists can teach it. Indeed, relevant work on teaching datascience by Donoho (2017), Hicks & Irizarry (2018), Baumer (2015), Hardin et al. (2015),and Kaplan (2018) either barely mentions ethics or doesn’t mention them at all. Despite6ecommending the inclusion of ethics into data science curricula, even the the NationalAcademies of Sciences, Engineering, and Medicine (2018) report does not include explictrecommendations for how to do so. One of the primary challenges is that while educators aretypically well-trained in the ethics of human subjects research, few have explicit training in,say, algorithmic bias, or even general ethical philosophy. But why should a lack of trainingprevent us from teaching our students? As Bruce (2018) points out, ethical issues are notreally a technical problem, but rather “a general issue with the impact of technology onsociety,” to which we all belong. We might make up for our lack of training by partneringwith philosophers and ethicists to develop a robust ethical curriculum (Bruce 2018).Echoing Bruce (2018) that “there is a long history of scholars and practitioners becominginterested in ethics when faced with new technologies,” Gotterbarn et al. (2018) argueforcefully that the recent uptick in interest in “computing ethics” is merely the most recentstar turn for a longstanding and valued component of the computer science curriculum.While this is surely true at some level and important to keep in mind, it hardly seemslike the renewed attention on ethics is unwarranted. Moreover, Gotterbarn et al. (2018)’sfocus is on articiﬁal intelligence driven systems like self-driving cars, whereas our focus ison data .Several examples of how to teach ethics in statistics, data science, and (mostly) com-puter science exist. Neﬀ et al. (2017) takes a broad view of data science ethics, bringingtools from critical data studies to bear on the practice of actually doing data science. Bur-ton et al. (2018) outlines a strategy for teaching computer science ethics through the useof science ﬁction literature. Elliott et al. (2018) provides a framework for reasoning aboutethical questions through the dual prisms of Eastern (mainly Confucianism) and Westernethical philosophies. We found this inclusive approach to be particularly valuable given thelarge presence of international (particularly Chinese) students in our classes. Perhaps pre-saging many recent scandals, Zimmer (2010) analyzes a Facebook data release through anethical lens. Fiesler analyzes ethical topics in a variety of computer science courses (Saltzet al. 2019, Fiesler et al. 2020, Skirpan et al. 2018). Grosz et al. (2019) describes how ethicseducation is integrated into the computer science curriculum at Harvard. Barocas teachesan undergraduate elective course on data science ethics at Cornell (Wender & Kloefkorn7017).These articles oﬀer guidance on how to teach ethics in data science, but leave manystones unturned. In this paper, we present six additional concrete modules for teachingdata science ethics, as well as outlining departmental initiatives for fully integrating ethicsinto a data science curriculum and culture.

At Smith, every department periodically reviews and updates a list of learning goals fortheir major. The major in statistical and data sciences (SDS) is designed to cover abroad range of topics to produce versatile future statisticians and data scientists. Ourlearning goals include skills like: ﬁtting and interpreting statistical models, programmingin a high-level language, working with a wide variety of data types, understanding the roleof uncertainty in inference, and communicating quantitative information in written, oral,and graphical forms. Most recently, we added the following learning goal:Assess the ethical implications to society of data-based research, analyses, andtechnology in an informed manner. Use resources, such as professional guide-lines, institutional review boards, and published research, to inform ethicalresponsibilities.In support of this learning goal, we have taken measures to: • incorporate ethics into all of our classes, culminating in a thorough treatment in thesenior capstone course. • support student engagement in extra-curricular and co-curricular events that touchon data science ethics. • bring a diverse group of speakers to campus to give public lectures that often focuson ethical questions. • include a candidate’s ability to engage with data science ethics as a criterion in hiring. • increase inclusion at every level of our program.8e discuss six speciﬁc modules for courses in Section 4. In this section we discuss ap-proaches for the other measures. We recognize that not every institution has the curricularﬂexibility and resources that we have at Smith, nor is our student body representative ofthose at diﬀerent types of institutions (e.g., R1’s or two-year colleges). Nevertheless, mostof the modules we present can be stitched into a single class period, which should provideinstructors at any institution with a reasonable opportunity to incorporate some of thismaterial.Our students are very interested in ethical questions in data science (see Section 5.2).As digital natives, they bring an importantly diﬀerent perspective to questions about, forexample, sharing one’s personal data online. Many of them have never seriously consideredthe ramiﬁcations of this. The notion that “if you’re not paying for the product, then youare the product” is new, scary, challenging, relevant, personal, and engaging to them ina way that helps them see data science as more than just a battery of technical skills(Fitzpatrick 2010). Thus, teaching ethics in data science is another way to foster studentinterest in the discipline . Framing ethical questions in data science as unsolved problemshelps students imagine themselves making meaningful contributions to the ﬁeld in a waythat may seem too remote of a possibility in, say, estimation theory.In particular, algorithmic bias intersects with questions about inclusion and diversitywith which students are already grappling on a daily basis. During the past two years,we have applied for (and received) funds from the community engagement center and theProvost’s oﬃce to support student engagment with the Data for Black Lives conference(Milner 2019). In 2018, the ﬁrst year of the Data for Black Lives conference, we hosted aremote viewing party on campus. In 2019, one of us attended the conference with ﬁve stu-dents. This experience led to a student inviting Data for Black Lives founder YeshimabeitMilner to campus for a public lecture entitled “Abolish Big Data”. These experiences helpstudents connect what they are learning in the classroom to larger movements in the realworld, and give them the sense that their skills might be used to aﬀect positive change inthe world—a powerful motivator.The SDS major at Smith includes an “application domain” requirement. One of thepurposes of this requirement is to ensure that students understand that all data and analyses9ave a context. Conducting ethical data analysis requires knowledge of the context in whichthe data is being used. For example, only through having some understanding of the historyof racial/ethnic groups in the United States can data scientists hope to code and use raceappropriately in their analyses (see Section 4.5).The major at Smith requires every student to take one course that focuses explicitlyon communication. Another simple initiative was to allow students to fulﬁll this require-ment by taking the “Statistical Ethics and Institutions” course taught at nearby AmherstCollege by Andreas V. Georgiou, the former President of the Hellenic Statistical Authority(Langkjær-Bain 2017). Although the course did not explicitly focus on communication, wemade an exception to our policy to allow students to have this unique opportunity to learnabout statistical ethics from the person at the center of a world-famous episode. Moreover,ethics and communication are intertwined, in that conveying ethical subtleties requires adiﬀerent skill set than say, explaining a statistical model.We are fortunate that our institution provides generous funding for bringing outsidespeakers to campus, and we have taken full advantage of their largesse over the past twoyears. We welcomed BlackRock data scientist Dr. Rachel Schutt to give a talk titled “A Hu-manist Approach to Data Science,” in which she underscored the importance of recognizingthe people behind the numbers, and highlighted examples of recently published researchthat raised profound ethical dilemmas. Dr. Terry-Ann Craigie of Connecticut College cameto talk about the intersections of race, data science, and public policy. Dr. Emma Bennof Mount Sinai discussed how her intersectional social identity has informed her work as abiostatistican. Alumna Gina DelCorazon spoke about her experiences as Director of Data& Analytics at the National Math and Science Initiative in her talk “From Interesting toActionable: Why good context matters as much as good code.” At the invitation of astudent group, Dr. Alisa Ainbinder, an alumna working locally in data science, discussedethical considerations in her work in non-proﬁt accelerator programs. Hearing from pro-fessionals about the ethical considerations in their work helps reinforce the messaging wegive them class.Finally, we take small steps to ensure that incoming faculty are capable of supporting ourprogram in meeting this newest learning goal. They cannot be dismissive of ethical concerns10n data science. In the same way that a candidate who didn’t understand correlation wouldnot be hireable, we consider whether a candidate who seemed ignorant of data scienceethics would be hireable. To assess this, we might ask a question about data scienceethics during a ﬁrst round or on-campus interview. We might ask candidates to submita separate statement on data science ethics as part of their application, or to discussethical considerations in their teaching and/or research statement. To be clear, we cannotand do not infringe upon the candidate’s academic freedom by assessing what they thinkabout data science ethics. Rather, we are merely trying to assess how deeply they havethought about data science ethics and thus whether they are suﬃciently prepared to helpthe program meet our learning goals.

In this section we present six modules for teaching ethics in data science that are used ina variety of courses. Here, we give a brief description of each module, its learning goals,and the context of the course in which it is delivered. In our supplementary materials, weprovide more complete teaching materials.

OkCupid is a free online dating service whose data has been scraped on at least threeknown occasions. Kim & Escobedo-Land (2015) presented scraped data on nearly 60,000OkCupid users in the greater San Francisco area in the early 2010’s for use in the class-room and subsequently released the data as the R package okcupiddata . Around thatsame time, Chris McKinlay created 12 fake OkCupid accounts and wrote a Python scriptthat harvested data from around 20,000 women from all over the country (Poulsen 2014).In 2016, Kirkegaard & Bjerrekær (2016) published a paper in an open-access psychologyjournal investigating a variety of hypotheses about OkCupid users—along with the cor-responding data from 70,000 users. From the same underlying data source, these threeincidents provide fertile ground for substantive discussions about the corresponding ethicalconsiderations. 11ome further detail reveals fascinating disparities: • Kim & Escobedo-Land (2015) obtained explicit permission from OkCupid CEO Chris-tian Rudder before publishing the data in a statistics education journal. Their goalwas to illuminate statistical phenomenon using data that was relevant to undergradu-ate students. In addition, the authors removed usernames from the data as a modestattempt at de-identifying the users. Only later were the authors alerted to the factthat even though usernames had been stripped, the full-text of the essay ﬁeld oftencontains personally-identifying information like Facebook and Instagram handles. • McKinlay did not publish the data he collected—his goal was personal. Essentially,he trained his own models on the data he collected to ﬁnd his own match. It worked—he is now engaged to the woman he met. Only after his story was published werequestions raised about whether he had violated the Computer Fraud and Abuse Act. • Kirkegaard & Bjerrekær (2016) included username, age, gender, and sexual orienta-tion in the data set. This meant that users were easily identiﬁable and particularlyvulnerable. While the blowback in this case was immediate, Kirkegaard insisted thatthe data were already public and his actions were legal.Collectively, these episodes raise issues about informed consent, data privacy, terms ofuse, and the distinction between laws and ethics. One could use these incidents to motivatecoverage of technical concepts such as k -anonymity (Sweeney 2002) and diﬀerential privacy(Dwork et al. 2006). In our senior capstone course (see Section 4.6), we ask students tobreak into three groups and discuss the relevant ethical issues involved in each case. Then,we bring students together to write a coherent response. Some students elect to use theseincidents as the subject of a longer essay, as described in Section 4.6. Discussions on the perniciousness of “algorithmic bias” in machine learning and artiﬁcialintelligence have become more prevalent of late, both in the news media as well as inacademic circles (Noble 2018, Eubanks (2018) O’Neil (2016)). However, few of these12deas have been incorporated into the classroom. For example, in James et al. (2013)—apopular introductory textbook on machine learning—the

Credit dataset is often used asan example (it is available in the companion

ISLR

R package). Readers are encouraged toapply various predictive algorithms predict the credit card debt of 400 individuals usingdemographic predictors like

Age , Gender (encoded as binary), and

Ethnicity with levels

African American , Asian , and

Caucasian . While the data is simulated, one must stillwonder what kind of thinking are we tacitly encouraging to students by using ethnicity topredict debt and thus perhaps credit score. This is especially fraught in light of existinginequalities to access to credit that fall on demographic lines. In other words, to quoteMilner (2019), “What are we optimizing?”In this module, we propose a hands-on in-class activity to help students question thesupposed objectivity of machine learning algorithms and serve as a gateway to discussionson algorithmic bias. The activity centers around StitchFix, an online clothing subscriptionservice that uses machine learning to predict which clothes consumers will purchase. Newusers are asked to complete either a men’s or women’s “Style Proﬁle” quiz, whose responsesare then used as predictor information for the company’s predictive algorithms. However,both quizzes diﬀer signiﬁcantly in the types of questions asked, how the questions are asked,in which order they are asked, and what information and visual cues are provided.Figure 1 (current as of December 16, 2019) presents one example relating to clothingstyle preferences, speciﬁcally jean cut. The prompt in the men’s quiz shows photographsof an individual actually wearing jeans, whereas the women’s quiz presents the options in amuch more abstract fashion. On top of diﬀerences relating to clothing style and ﬁt, manydiﬀerences exist in how demographic information is collected. Figure 2 presents an exampleof a question pertaining to age. While both groups are asked the same question of “Whenis your birthday?” individuals completing the women’s quiz are primed with a “We won’ttell! We need this for legal reasons!” statement, whereas those completing the men’s arenot. One has to suspect such a diﬀerence was not coincidental, but rather reﬂects a priorbelief of the quiz designers as to the manner in which one should ask about age. Otherdiﬀerences include questions pertaining to parenting and occupation.Many of these diﬀerences can be attributed to prevailing biases and beliefs on the nature13igure 1: Example diﬀerence between men’s (left) and women’s (right) StitchFix StyleQuizzes: Question on jean preferences. Contrast the abstract presentation of jeans shownto women with a picture of someone actually wearing jeans shown to men.Figure 2: Example diﬀerence between men’s (left) and women’s (right) StitchFix StyleQuizzes: Question about age. We note the disclaimer present for women is omitted formen. 14f gender and thus can serve as fertile ground for student discussions on algorithmic bias.Speciﬁcally, this module can satisfy three goals. First, it provides students with anexample of algorithmic bias to which they can directly relate. This stands in contrast tomore abstract and much less accessible examples discussed in academic readings and newsmedia, such as facial recognition software. Second, it asks students to view the statistical,mathematical, and machine learning topics covered in class through a sociological lens, inparticular relating to the nature of gender. Third, it gives students the opportunity to thinkabout statistical models in a rich, real, and realistic setting, in particular what predictorvariables are being collected and what modeling method/technique is being used.

Perhaps in part thanks to the aptly-named Facebook movie (

The Social Network ), socialnetworks are intuitive to students. The relatively simple mathematical formulation ofnetworks (i.e., graphs) makes them easy to understand, but the complex relationships andbehaviors in such networks lead to profound research problems. Moreover, analyzing socialnetwork data leads to thorny ethical questions.A 300-level course on statistical analysis of social network data has as its primaryobjective for students to “learn how to answer questions by manipulating, summarizing,visualizing, and modeling network data while being vigilant to protect the people who arerepresented in that data.” Thus, ethical concerns surrounding privacy and conﬁdentialityare woven directly into the main course objective.The primary textbook is Kolaczyk & Cs´ardi (2014), which provides a thorough treat-ment of both the theoretical and applied aspects of social network analysis. However,supplementary readings are especially important, since Kolaczyk & Cs´ardi (2014) fails toaddress the many complex ethical issues that arise for these data. We employ supplementalreadings to address data ethics on topics including: • collecting social network data • informed consent for social network surveys • data identiﬁability and privacy in social networks • link prediction 15 data ethics speciﬁc to social networksIn our supplementary materials we present a module applied during the ﬁrst weekof class in which we use an example from popular culture (the television show Grey’sAnatomy ) to motivate ethical issues in social network analysis. It has several goals: • Prime students to always think about how the data were collected • Prime students to think about the beneﬁts of and risks of each data collection /analysis / visualization, etc. • Encourage students to create their own understanding of how data ethics pertainto social network data as opposed to being provided with data ethics rules. Thisencourages critical thinking which can then be transferred to other topics and typesof data.It is especially important to introduce ethical considerations on the ﬁrst day of thecourse to set the tone and give students the message that data ethics is inextricable fromthe rest of the content of the course.

Ethical usage of data can come into conﬂict with copyright law. Music usage, for example,is heavily protected by copyright laws. The ﬁeld of Music Information Retrieval (MIR)seeks to address questions about music, such as ﬁnding all covers of a particular song ordetecting the genre of a song. In MIR, access to music is critical to conducting research,and that access is governed by copyright laws.Music is also a medium that has a fraught history navigating the line between shar-ing and violating copyright. This history is complicated by the power dynamics at playbetween recording companies and artists, and recording companies and listeners. Today,music is often consumed through streaming services, distorting our understanding of musicownership. Since music is heavily protected by copyright but remains omnipresent in ourlives, conversations about data access require nuance about ownership, sharing, and thesubtleties of ethical vs. legal considerations.16nderstanding that the goal of copyright is to protect artists, and then contrastingstudents’ experiences of accessing and digesting music, this debate’s overarching goal is tohave students navigate legal considerations (i.e., copyright) and ethical considerations (i.e.,when to share or not share data) in the contexts of pushing research forward and of thecapitalist motivations of the music industry. The legal restrictions of copyright and theethical responsibilities of a researcher to protect and appropriately use (and share) theirdata provide a fascinating grey area for this debate. The generational experience of ourcurrent students informs their notions of morality and access, which in turn leads them toconfront legal restrictions in an interesting way.To explore data access and copyright, we provide a module in which students have adebate about whether the music copyright laws should be softened for those conductingMIR research. This debate is not as simple as whether to relax these laws, instead one sideis defending the role and purpose of copyright laws for music while the other side not onlyadvocates for relaxing these laws but also for how to accomplish this. This requirement ofproposing a solution required students to hold the responsibilities of a researcher who hasbroad access to data in contrast with the ease at which we can share music (and data).This debate activity was originally part of a senior seminar introducing students to MIR,but this activity could be done in any course where data provenance, data usage, or dataaccess is discussed. For this activity, students were randomly assigned to one side of thedebate. In preparation for the debate, students were required to submit a position paper(due just before the debate) that presented a coherent argument that is well supported bythe literature. Students were also barred from sharing arguments with each other (even ifassigned to the same side of the debate). However, they could share resources with eachother (just not their opinion of these resources).This structure of a preparatory paper followed by a debate required students to engagewith the research process at a deep level. For the actual debate, each side was givenopportunities to present their ideas and oﬀer rebuttals to the other side. This meant thatnot only did they have to ﬁnd resources and digest them, they had to discuss the ideasboth in written text and orally in a debate setting.17 .5 Teaching about race and ethnicity data

In an upper-level research seminar on intergroup relationships cross-listed in the psychologydepartment, students learn the psychology of close relationships between people who havediﬀering social group identities (e.g., racial/ethic and gender group identities). In addition,students learn to analyze dyadic data through multilevel modeling (i.e., mixed linear mod-eling), and write reproducible research reports in APA format with the R package papaja (Aust & Barth 2018). This course attracts a diverse group of students in terms of majors,professional goals, interests, statistical preparation, and personal identities. In this ethicsmodule, we describe a discussion and data cleaning activity used to get students thinking ina more careful and nuanced way about the use of race and ethnicity data in their analyses.The instructor provides psychological data from her own research program, and theoverarching focus of the course is to form research questions answerable through the analysisof data that has already been collected. Since the focus is on analyzing existing data (inaddition to talking about race), we also discuss: • how to communicate transparently one’s use of conﬁrmatory versus exploratory anal-yses • the philosophical diﬀerences between inductive and deductive reasoning • the prevention of p-hacking (Wasserstein et al. 2016, Wasserstein et al. (2019)) andHARKing (Hypothesizing After the Results are Known; Kerr 1998)On the ﬁrst day of this course, we have a class discussion about how we will try tocreate a climate of psychological safety (Edmondson 1999) together. This initial discussionhelps to set the tone of respect and generosity that we will need in order to have fruitfuldiscussions about race and ethnicity data. In the ﬁrst half of the course, class sessionsalternate between discussions about assigned readings (from psychology) and the statisticaland data science instruction they need to complete their projects. In the second half ofthe course, class sessions are mainly used for actively working on their projects. The twoparts of this ethics module (discussion and data cleaning) might be split across two classsessions.The activity described in this module consists of a class discussion about race and arace/ethnicity data cleaning activity in the context of a psychology article about interracial18oommate contact (Shook & Fazio 2008). The structure of this activity invites studentsto discuss the article ﬁrst in small groups, and then as a class. The larger class discussionportion of this activity is designed to evolve into a broader discussion about the coding anduse of race and ethnicity data in quantitative research. Some important revelations thatmight be pulled from the discussion include: • Researchers studying interracial interactions make choices about who to focus on,and, in the past, this choice has often been to focus on white participants only.An acknowledgement of white privilege and who, historically, has been asking theresearch questions might come out as well. • A person’s personal racial/ethnic identity may be diﬀerent from how they are per-ceived by another person (roommate). • The choice to use a person’s own racial/ethnic identity data or someone’s perceptionof their race depends, in part, on the research question. When is identity or perceptionmore important for the speciﬁc research context? • Race is not as clear of a categorical variable as we think it is. Can we think of otherinstances of this, for example, with gender categorization? • Are there times when it could serve a social good to use race in our analyses and,in contrast, are there ways in which using race and ethnicity data in analyses mightreify socially constructed racial categories? • If you decide to use race in your analyses, what might you do in smaller samples ifthere are very small numbers of ethnic minority groups relative to White/European-Americans? Is it ever OK to collapse racial/ethnic categories? What immediateconsequences do these choices have for the interpretation of your analysis and whatbroader consequences might these choices have when your results are consumed byyour intended audience?The second part of this activity asks students to code raw race/ethnicity data into anew categorical variable called race clean . They do this part in pairs. Then, in smallgroups, they discuss the decisions they made when completing this task and also any feelingsthey had during the task, as those feelings reﬂect the hard realities that researchers mustconfront in their work. The raw data comes in check-all-that-apply and free response19ormats. Students will ﬁnd this task quite diﬃcult, and perhaps uncomfortable. Thegoal is not to have them ﬁnish, but to get them to recognize the ambiguity inherent theconstruction of categorical race/ethnicity variables. They may have used the clean versionof race/ethnicity variables in the past without thinking much of it.Lastly, the module also contains notes on closing thoughts the instructor might oﬀertheir students after this activity. It is very important not to skip the wrap-up for thisactivity. Let students know that this is not the end of the discussion. As future datascientists, they can play an active role in creating ethical guidelines for moving towardsmore appropriate use of race and ethnicity data.

Weapons of Math Destruction in the senior capstone

In the senior capstone course, roughly 25% of the course is devoted to learning aboutdata science ethics. During the ﬁrst half of the semester, we spend every other classperiod discussing ethical considerations that arise from weekly readings of O’Neil (2016).These readings introduce students to episodes in which often well-intentioned data scienceproducts (e.g., criminal sentencing algorithms, public school teacher evaluations, US Newsand World report college rankings, etc.) have had harmful eﬀects on society. These episodesare accessible to students and provide many opportunities to engage students in thoughtfulconversation.The material in O’Neil (2016) also intersects with a wide variety of statistical topics,such as modeling, validation, optimization, Bayesian statistics, A/B testing, Type I/II er-rors, sensitivity and speciﬁcity, reliability and accuracy, Simpson’s paradox, multicollinear-ity, confounding, and decision trees. A clever instructor could probably build a successfulcourse entirely around these topics.Moreover, the ethical considerations that O’Neil (2016) raises about algorithmic bias,informed consent, transparency, and privacy, also touch on hot-button social questions sur-rounding structural racism, gender equity, software licensing, cheating, income inequality,propaganda, fake news, scams, fraud, pseudoscience, and policing bias. Situated in thefallout from the 2008 global ﬁnancial crisis, but presaging Cambridge Analytica and fakenews, the book feels simultaneously dated and relevant. Our students lived through the20lobal ﬁnancial crisis but most were too young to understand it—for many of them thebook allows them to grapple with these events for the ﬁrst time as adults.The ﬁrst major goal of the module is to raise awareness about the manifold ethicalconsiderations in data science. Reading O’Neil (2016) and having class discussions aboutthe material will help accomplish this learning goal. We employ a variety of techniques,including think-pair-share, breakout groups, student-led discussions, and even lecturing tokeep students engaged in class.However, the second major goal is to have students write something constructive aboutdata ethics. To this end, more structured readings are needed. We present students withtwo frameworks for thinking critically about data science ethics: Data Values and Principles(Gershkoﬀ et al. 2019) and the Hippocratic Oath for Data Science (National Academiesof Sciences, Engineering, and Medicine 2018). The former deﬁnes four values (inclusion,experimentation, accountability, and impact) and twelve principles that “taken together,describe the most eﬀective, ethical, and modern approach to data teamwork.” The latterprovides a data science analog to the oath that medical doctors have taken for centuries. Wethen ask students to write an essay in which they analyze a data science episode—perhapsdrawn from O’Neil (2016)—in the context of one of these frameworks.During the course, students write four papers of varying length on data science ethics.Together, these assignments not only impress upon students the importance of ethics indata science, but also give them tools and experience to reason constructively about datascience ethics in the future. The goal is to produce students who have fully integratedethical questions into their understanding of statistics and data science.

Early returns suggest that our emphasis on teaching data science ethics is having an impact.To support this claim we relate two concrete anecdotes, analyze results from an anonymousstudent survey, and provide several free responses from students.21 .1 Data science ethics in action

One student used her experience with data science ethics directly in a summer internshipwith an anonymous company to help draft the company’s heretofore non-existent policiesaround ethical data use (Conway Center for Innovation and Entrepreneurship 2019).“[She] was also the ﬁrst data scientist to work in the [company] space. Untilher arrival, [company]’s businesses lacked clear guidelines for collecting dataand ways for using that data to generate insights. Surprised by this, [she] ﬁrstinitiated conversations with the [company] team around ethical concerns in datacollection.

Drawing on lessons from her academic work , and discussionswith her Smith mentors, she helped to develop policies for [company] businessesto ethically collect, manage, and act on customer data moving forward.”While this is clearly just one example, we note that the connection between data scienceethics in practice and her academic coursework was made explicit by the student .Another anecdote involves a student group supporting students in our major that heldtheir annual “Data Science Day” on November 9th, 2019. At the open house portion ofthe event, in addition to operating booths on data visualization and machine learning,students set up a “data ethics” booth with handouts posing ethical and philosophicalquestions about the use of data (see Figure 3). While this event was sponsored by theprogram, programming for the event was entirely determined by students. The inclusionof the booth suggests that students see ethics as an integral component of data science, onpar with data visualization and machine learning. We interpret this as an early sign of ourprogram’s success at emphasizing the importance of ethical thinking in data science.Furthermore, in the wake of discussions on racism and white supremacy spurred bythe death of George Floyd in May 2020, two students created a Data Science Resourcesfor Change website. They state: “In order to be thoughtful, eﬀective, and inclusive datascientists, we believe it is important to understand the ways in which bias can play adangerous role within our ﬁeld, to understand the ways in which data can be used toeither reinforce/exacerbate or ﬁght oppression, and to support the inclusion of voices ofcolor within the community.” To this end this website includes numerous resources such asreading lists, videos and podcasts, organizations to support, and notable people to follow.22igure 3: The SDS student group chose to staﬀ a ’data ethics’ booth at Data Science Day2019.

We conducted an anonymous online survey during the summer of 2019, in which 23 studentsparticipated . The results in Figure 4 reveal that students are interested in learning moreabout data science ethics and feel that it is an important part of their education. However,they are less certain that they have achieved our stated learning goal. Unfortunately, noneof the respondents had taken the capstone course (see Section 4.6), and so these resultsalmost certainly undersell the eﬀectiveness of our ethical curriculum.The ﬁrst panel in Figure 4 reﬂects self-assessments from students about two aspectsof our major learning goal. The questions reﬂect both the ability of a student to assessthe ethical implications of data science work, as well as their ability to draw on publishedmaterials to inform their thinking. These ideas are most explicitly and thoroughly tackledin the senior capstone, and so the lack of respondents with that course under their beltrenders this picture incomplete.The second panel addresses the importance of ethics to a student’s data science edu- This survey was approved by the Smith College IRB, protocol 18-111. o what extent do you feel capable of assessingthe ethical implications to society of data−basedresearch, analyses, and technology?To what extent do you feel capable of usingresources, such as professional guidelines,institutional review boards, and publishedresearch, to inform ethical responsibilities?Not capable at all Very capable n n n Figure 4: Student self-assessment of their ethical capabiliities, and the importance of datascience ethics in their education, from an anonymous survey of 23 students. We note thatnearly all respondents saw the inclusion of data science ethics as an important enhancementto their education, although they were less certain of their own capabilities in analyzingethical concerns. 24ation. Here, students universally believe that data science ethics is important to them intheir education, with most responding that it is “very important.” This ﬁnding supportsthe recommendation of National Academies of Sciences, Engineering, and Medicine (2018).Finally, the third panel in Figure 4 makes plain that no students feel that the inclusionof data science ethics detracts from their data science education, with most students seeingthe inclusion as an enhancement. We encourage data science programs contemplatingadding ethical content to consider this point particularly. That is, the respondents tothis survey did not see the inclusion of data science ethics as a distraction from moreimportant, interesting, technical, or valuable content. Rather, learning about data scienceethics enhances that curriculum.

In Appendix ?? we present selected quotes in their full context. Here, we highlight a fewof the most relevant thoughts and connect them to broader themes.First, faculty should not assume that students know about data science ethics justbecause it is often in the news. To the contrary, learning about data science ethics can berevelatory for students.“It was the ﬁrst time I had ever thought that data science had ethical implica-tions and it really changed the way I thought about the work that I do.”Second, as noted by National Academies of Sciences, Engineering, and Medicine (2018),teaching data science ethics as a one-oﬀ topic is not likely to be suﬃcient, and studentsnotice the tangential nature of this approach.“I’d like to see more data ethics integrated with in-class work. In my experience,data ethics has been presented as an additional topic as opposed to somethingthat is an intrinsic part of data science work itself.”Third, far from being oﬀ-putting, these students found ethics in data science to be atopic likely to engage a broader set of students with data science.25I also think that providing. . . a push for students outside of the major [to]have access to more resources about data ethics, (i.e. data talks which includeethics being more widely broadcast to the rest of the student body) should beseriously considered.” The long-term health of data science as a discipline relies on public trust. Ethical lapses,or gross indiﬀerence to ethics, has resulted in the deployment of data science products thatare harmful to society, due to biases that we now recognize. Our students are part of thegeneration of data scientists that will address these issues and restore faith in data-drivenapplications. In order to do this, they need to see weighing ethical considerations as anintegral part of the process of doing data science. We present our approach to achievingthis in the hopes that others will emulate and reﬁne what we have started.

References

Aust, F. & Barth, M. (2018), papaja: Create APA manuscripts with R Markdown . Rpackage version 0.1.0.9842.

URL: https://github.com/crsh/papaja

Baumer, B. S. (2015), ‘A data science course for undergraduates: Thinking with data’,

TheAmerican Statistician (4), 334–342. URL: http://dx.doi.org/10.1080/00031305.2015.1081105

Baumer, B. S., Kaplan, D. T. & Horton, N. J. (2017),

Modern Data Science with R ,Chapman and Hall/CRC Press: Boca Raton.

URL:

Bruce, K. B. (2018), ‘Five big open questions in computing education’,

ACM Inroads (4), 77–80. URL: https://dl.acm.org/citation.cfm?id=3230697

Communications of the ACM (8), 54–64. URL: https://dl.acm.org/citation.cfm?id=3154485

Cai, F. (2020), ‘Yann LeCun quits Twitter amid acrimonious exchanges on AI bias’,Synced: AI Technology & Industry Review.

URL: https://syncedreview.com/2020/06/30/yann-lecun-quits-twitter-amid-acrimonious-exchanges-on-ai-bias/

Canney, N. & Bielefeldt, A. (2015), ‘A framework for the development of social responsi-bility in engineers’,

International Journal of Engineering Education (1B), 414–424.Committee on Professional Ethics (2018 a ), ACM Code of Ethics and Professional Conduct ,Association for Computing Machinery, Inc.

URL:

Committee on Professional Ethics (2018 b ), Ethical guidelines for statistical practice, Tech-nical report, American Statistical Association. URL:

Committee on Science, Engineering, and Public Policy (2009),

On being a scientist: a guideto responsible conduct in research , 3 edn, Washington, DC: National Academies Press.

URL:

Conway Center for Innovation and Entrepreneurship (2019), ‘One data scientist’s experi-ence innovating at [company]’, The Jill Ker Conway Innovation and EntrepreneurshipCenter. Article is no longer available online.Davies, H. (2015), ‘Ted Cruz campaign using ﬁrm that harvested data on millions of un-witting Facebook users’, The Guardian.

URL:

Journal of Computational and GraphicalStatistics (4). URL: https://amstat.tandfonline.com/doi/full/10.1080/10618600.2017.1384734

Dwork, C., McSherry, F., Nissim, K. & Smith, A. (2006), Calibrating noise to sensitivity inprivate data analysis, in S. Halevi & T. Rabin, eds, ‘Theory of cryptography’, Springer,pp. 265–284.

URL: https://link.springer.com/chapter/10.1007/11681878 14

Edmondson, A. (1999), ‘Psychological safety and learning behavior in work teams’,

Ad-ministrative science quarterly (2), 350–383.Elliott, A. C., Stokes, S. L. & Cao, J. (2018), ‘Teaching ethics in a statistics curriculumwith a cross-cultural emphasis’, The American Statistician (4), 359–367. URL:

Eubanks, V. (2018),

Automating inequality: How high-tech tools proﬁle, police, and punishthe poor , St. Martin’s Press.European Parliament (2018), Regulation on the protection of natural persons with regardto the processing of personal data and on the free movement of such data, and re-pealing directive 95/46/ec (data protection directive), Technical report, European Union.

URL: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679

Fiesler, C., Garrett, N. & Beard, N. (2020), What do we teach when we teach tech ethics?a syllabi analysis, in ‘Proceedings of the 51st ACM Technical Symposium on ComputerScience Education’, pp. 289–295. URL: https://dl.acm.org/doi/10.1145/3328778.3366825

Fitzpatrick, J. (2010), ‘If you’re not paying for it; you’re the product’.

URL: https://lifehacker.com/if-youre-not-paying-for-it-youre-the-product-5697167

Gershkoﬀ, A., Therriault, A., Satyanarayan, A., Jones, B., Burg, B., Hurt, B., Granger,B., Jacob, B., Doig, C., Fryar, C., Ramanan, D., Bhargava, D., Perez, F., Greenleigh,I., Feng, J., Loyens, J., Morgan, J., Ram, K., Green, L., Barba, L., Colaco, M., Rocklin,28., Jamei, M., Horn, M., Harris, N. E., Elprin, N., Kaldero, N., Chopra, N., McGarry,P., Todkar, R., Jurney, R., Brener, S., Couture, T., Thibodeaux, T. & McKinney, W.(2019), ‘Data values and principles’.

URL: https://datapractices.org/manifesto/

Gotterbarn, D., Wolf, M. J., Flick, C. & Miller, K. (2018), ‘Thinking professionally: Thecontinual evolution of interest in computing ethics’,

ACM Inroads (2), 10–12. URL: https://dl.acm.org/citation.cfm?id=3204466

Grosz, B. J., Grant, D. G., Vredenburgh, K., Behrends, J., Hu, L., Simmons, A. & Waldo,J. (2019), ‘Embedded EthiCS: Integrating ethics across CS education’,

Commun. ACM (8), 54–61. URL: https://doi.org/10.1145/3330794

Gunaratna, N. S. & Tractenberg, R. E. (2016), Ethical reasoning with the 2016 revisedASA ethical guidelines for statistical practice, American Statistical Association.

URL:

Hand, D. J. (2018), ‘Aspects of data ethics in a changing world: where are we now?’,

BigData (3), 176–190. URL: https://doi.org/10.1089/big.2018.0083

Hardin, J., Hoerl, R., Horton, N. J., Nolan, D., Baumer, B. S., Hall-Holt, O., Murrell,P., Peng, R., Roback, P., Temple Lang, D. & Ward, M. D. (2015), ‘Data science instatistics curricula: Preparing students to ’think with data”,

The American Statistician (4), 343–353. URL: https://doi.org/10.1080/00031305.2015.1077729

Heggeseth, B. (2019), ‘Intertwining data ethics in intro stats’, Symposium on Data Scienceand Statistics.

URL: https://drive.google.com/ﬁle/d/1GXzVMpb6GVNfWPS6bd9jggtqq1C77Wsc/view

Hicks, S. C. & Irizarry, R. A. (2018), ‘A guide to teaching data science’,

The AmericanStatistician (4), 382–391. URL: https://amstat.tandfonline.com/doi/abs/10.1080/00031305.2017.1356747

How to lie with statistics , WW Norton & Company, Inc.James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013),

An Introduction to StatisticalLearning: with Applications in R , Springer.

URL: https://faculty.marshall.usc.edu/gareth-james/ISL/

Kaplan, D. (2018), ‘Teaching stats for data science’,

The American Statistician (1), 89–96. URL: https://amstat.tandfonline.com/doi/full/10.1080/00031305.2017.1398107

Kerr, N. L. (1998), ‘Harking: Hypothesizing after the results are known’,

Personality andSocial Psychology Review (3), 196–217.Kim, A. Y. & Escobedo-Land, A. (2015), ‘OKCupid data for introductory statistics anddata science courses’, Journal of Statistics Education (2). URL: https://amstat.tandfonline.com/doi/abs/10.1080/10691898.2015.11889737

Kirkegaard, E. O. & Bjerrekær, J. D. (2016), ‘The OKCupid dataset: A very large publicdataset of dating site users’,

Open Diﬀerential Psychology .Kolaczyk, E. D. & Cs´ardi, G. (2014), Statistical analysis of network data with R , Vol. 65,Springer.Kramer, A. D. I., Guillory, J. E. & Hancock, J. T. (2014), ‘Experimental evidence ofmassive-scale emotional contagion through social networks’,

Proceedings of the NationalAcademy of Sciences (24), 8788–8790.

URL:

Langkjær-Bain, R. (2017), ‘Trials of a statistician’,

Signiﬁcance (4), 14–19. URL: https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2017.01052.x

Levin, S. (2017), ‘Mark Zuckerberg: I regret ridiculing fears over Facebook’s eﬀect onelection’, The Guardian.

URL:

Ethics and data science , Sebastopol, CA:O’Reilly Media.

URL:

Meyer, R. (2014), ‘Everything we know about Facebook’s secret mood manipulationexperiment’, The Atlantic.

URL:

Milner, Y. (2019), Data for Black Lives II, Data for Black Lives, MIT Media Lab, 75Amherst St., Cambridge, MA 02139.

URL: http://d4bl.org/conference.html

National Academies of Sciences, Engineering, and Medicine (2018),

Data science for un-dergraduates: opportunities and options , National Academies Press.

URL: http://sites.nationalacademies.org/cstb/currentprojects/cstb 175246

National Commission for the Protection of Human Subjects of Biomedical and BehavioralResearch (1978), The Belmont report: Ethical principles and guidelines for the protectionof human subjects of research, Technical Report 0012.

URL: https://videocast.nih.gov/pdf/ohrp belmont report.pdf

Neﬀ, G., Tanweer, A., Fiore-Gartland, B. & Osburn, L. (2017), ‘Critique and contribute:A practice-based framework for improving critical data studies and data science’,

BigData (2), 85–97. URL:

Noble, S. U. (2018),

Algorithms of Oppression: How Search Engines Reinforce Racism ,NYU Press.

URL:

O’Neil, C. (2016),

Weapons of math destruction: How big data increases inequality andthreatens democracy , New York: Crown.

URL: https://weaponsofmathdestructionbook.com/

URL:

Saltz, J., Skirpan, M., Fiesler, C., Gorelick, M., Yeh, T., Heckman, R., Dewar, N. & Beard,N. (2019), ‘Integrating ethics within machine learning courses’,

ACM Transactions onComputing Education (TOCE) (4), 1–26. URL: https://dl.acm.org/doi/10.1145/3341164

Shook, N. J. & Fazio, R. H. (2008), ‘Interracial roommate relationships: An experimentalﬁeld test of the contact hypothesis’,

Psychological Science (7), 717–723.Skirpan, M., Beard, N., Bhaduri, S., Fiesler, C. & Yeh, T. (2018), Ethics education incontext: A case study of novel ethics activities for the CS classroom, in ‘Proceedings ofthe 49th ACM Technical Symposium on Computer Science Education’, pp. 940–945. URL: https://dl.acm.org/doi/10.1145/3159450.3159573

Sweeney, L. (2002), ‘k-anonymity: A model for protecting privacy’,

International Journalof Uncertainty, Fuzziness and Knowledge-Based Systems (5), 557–570. URL:

Tarran, B. (2019), ‘German commission calls for risk-based regulation of algorithmic sys-tems’,

Signiﬁcance (6), 4–5. URL: https://doi.org/10.1111/j.1740-9713.2019.01329.x

Tractenberg, R. E. (2019 a ), ‘Strengthening the practice and profession of statistics anddata science using ethical guidelines’. URL: https://osf.io/preprints/socarxiv/93wuk

Tractenberg, R. E. (2019 b ), ‘Teaching and learning about ethical practice: The case anal-ysis’. URL: https://osf.io/preprints/socarxiv/58umw/download

Wang, M. Q., Yan, A. F. & Katz, R. V. (2018), ‘Researcher requests for inappropriateanalysis and reporting: A US survey of consulting biostatisticians’,

Annals of Internal edicine (8), 554–558. URL: https://doi.org/10.7326/M18-1230

Wasserstein, R. L., Lazar, N. A. et al. (2016), ‘The ASA’s statement on p-values: context,process, and purpose’,

The American Statistician (2), 129–133. URL: https://doi.org/10.1080/00031305.2016.1154108

Wasserstein, R. L., Schirm, A. L. & Lazar, N. A. (2019), ‘Moving to a world beyond“ p < . The American Statistician (sup1), 1–19. URL: https://doi.org/10.1080/00031305.2019.1583913

Wender, B. & Kloefkorn, T. (2017), ‘Roundtable on data science postsecondary education,Meeting

URL:

Zimmer, M. (2010), ‘”but the data is already public”: on the ethics of research in Facebook’,

Ethics and information technology (4), 313–325. URL: https://link.springer.com/article/10.1007/s10676-010-9227-5

Zuboﬀ, S. (2018),