[PDF] Abusive Advertising: Scrutinizing socially relevant algorithms in a black box analysis to examine their impact on vulnerable patient groups in the health sector

Abstract

The targeted direct-to-customer marketing of unapproved stem cell treatments by a questionable online industry is directed at vulnerable users who search the Internet in the hope of a cure. This behavior especially poses a threat to individuals who find themselves in hopeless and desperate phases in their lives. They might show low reluctance to try therapies that solely promise a cure but are not scientifically proven to do so. In the worst case, they suffer serious side-effects. Therefore, this thesis examines the display of advertisements of unapproved stem cell treatments for Parkinson's Disease, Multiple Sclerosis, Diabetes on Google's results page. The company announced a policy change in September 2019 that was meant to prohibit and ban the practices in question. However, there was evidence that those ads were still being delivered. A browser extension for Firefox and Chrome was developed and distributed to conduct a crowdsourced Black Box analysis. It was delivered to volunteers and virtual machines in Australia, Canada, the USA and the UK. Data on search results, advertisements and top stories was collected and analyzed. The results showed that there still is questionable advertising even though Google announced to purge it from its platform.

Full PDF

AAbusive Advertising: Scrutinizing sociallyrelevant algorithms in a Black Box analysis toexamine their impact on vulnerable patientgroups in the health sector

Master Thesis by Martin Reber

March 2, 2020Technische Universit¨at Kaiserslautern,Department of Computer Science,67653 Kaiserslautern,GermanyExaminer: Prof. Dr. Katharina ZweigTobias Kraﬀt a r X i v : . [ c s . C Y ] J a n igenst¨andigkeitserkl¨arung Hiermit versichere ich, dass ich die von mir vorgelegte Arbeit mit dem ThemaAbusive Advertising: Scrutinizing socially relevant algorithms in a Black Boxanalysis to examine their impact on vulnerable patient groups in the healthsectorselbstst¨andig verfasst habe, dass ich die verwendeten Quellen und Hilf-smittel vollst¨andig angegeben habe und dass ich die Stellen der Arbeit - ein-schließlich Tabellen und Abbildungen -, die anderen Werken oder dem Internetim Wortlaut oder dem Sinn nach entnommen sind unter Angabe der Quelleals Entlehnung kenntlich gemacht habe.Kaiserslautern, den 2.3.2020Martin Reberii bstract

The targeted direct-to-customer marketing of unap-proved stem cell treatments by a questionable onlineindustry is directed at vulnerable users who search theInternet in the hope of a cure. This behavior espe-cially poses a threat to individuals who ﬁnd themselvesin hopeless and desperate phases in their lives. Theymight show low reluctance to try therapies that solelypromise a cure but are not scientiﬁcally proven to doso. In the worst case, they suﬀer serious side-eﬀects.Therefore, this thesis examines the display of advertise-ments of unapproved stem cell treatments for Parkin-son’s Disease, Multiple Sclerosis, Diabetes on Google’sresults page. The company announced a policy changein September 2019 that was meant to prohibit and banthe practices in question. However, there was evidencethat those ads were still being delivered.A browser extension for Firefox and Chrome was devel-oped and distributed to conduct a crowdsourced BlackBox analysis. It was delivered to volunteers and vir-tual machines in Australia, Canada, the USA and theUK. Data on search results, advertisements and topstories was collected and analyzed. The results showedthat there still is questionable advertising even thoughGoogle announced to purge it from its platform. iii usammenfassung

Die Direktvermarktung von nicht zugelassenenStammzellbehandlungen von der fragw¨urdigen Online-Industrie dahinter zielt auf Patienten, die das Internetin der Hoﬀnung auf Heilung durchsuchen. DiesesVerhalten stellt eine besondere Gefahr f¨ur Menschendar, die sich in verzweifelten Phasen in ihrem Lebenbeﬁnden. Sie k¨onnten wenig Zur¨uckhaltung zeigen, diebeworbenen Therapien auszuprobieren, die zwar eineHeilung versprechen, diese aber nicht durch anerkann-te klinische Tests belegen k¨onnen. Im schlimmstenFall erwarten die Patienten schwerwiegende Neben-wirkungen.Daher untersucht diese Theis die oben genanntenWerbeanzeigen auf der Ergebnisseite der Online-Suchmaschine Google nach einer ¨Anderung der Plat-formrichtlinien im September 2019. Besonders ging esdabei um Anzeigen bez¨uglich Behandlugen von Parkin-son, Multipler Sklerose und Diabetes, die in den Ver-haltensregeln ausdr¨ucklich verboten wurden.Browsererweiterungen f¨ur Firefox und Chrome wur-den entwickelt und verteilt, um damit eine crowdsour-ced

Black Box Analyse durchzuf¨uhren. Freiwillige Teil-nehmer und virtuelle Maschinen in Australien, Kana-da, den USA und Großbritannien wurden rekrutiert.Es wurden Daten zu Suchergebnissen, Werbung undSchlagzeilen auf der Ergebnisseite von Google gesam-melt. Die Analyse derer ergab, dass es trotz des explizi-ten Verbots dieser Praktiken noch immer fragw¨urdigeWerbung gab. ontents

List of Figures vii1. Introduction 1

2. Fundamentals 5

3. Related Work 41 ontents

4. EuroStemCell Data Donation 2019 / 2020 (EDD) 69

5. Conclusion 976. Future Work 99Bibliography 103A. EuroStemCell Data Donation: Development 133

A.1. My Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.2. User Story . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.3. Product Backlog . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134A.4. Participant survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 136A.5. Query composition and crawled HTML elements . . . . . . . . . 138A.5.1. Query composition . . . . . . . . . . . . . . . . . . . . . . 138A.5.2. Crawled HTML elements . . . . . . . . . . . . . . . . . . . 139

B. EuroStemCell Data Donation: Data Analysis and Visualizations 141

B.1. Downloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141B.2. Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143B.3. Advertisements and Advertisers . . . . . . . . . . . . . . . . . . . 143

C. Functionality of a Search Engine 145 vi ist of Figures ist of Figures A.2. Description of crawled elements . . . . . . . . . . . . . . . . . . . 140B.1. Firefox plugin statistics . . . . . . . . . . . . . . . . . . . . . . . . 141B.2. Daily chrome users . . . . . . . . . . . . . . . . . . . . . . . . . . . 142B.3. Cumulative Chrome registrations . . . . . . . . . . . . . . . . . . 142B.4. Donations by group . . . . . . . . . . . . . . . . . . . . . . . . . . 143B.5. Absolute Number of advertisements per group . . . . . . . . . . 143B.6. Fraction of Prescription Treatment Advertisements per group . 144viii . Introduction

Digitalization has changed mankind in many ways. One technology that hasgrown to be indispensable is the search engine. They serve as an entry pointto the WWW separating the websites most relevant to a user from noise.Most search engines provide their service to Internet users at no monetarycost. They ﬁnance their operation through advertisements (or short: ads)displayed along with search results on the search engine’s result page (SERP).They call this sponsored or aﬃliated search. These integrated search engines(ISEs) combine search utility with the capabilities of an advertising exchangeand thus connect advertisers, content providers and searchers. However, mostISEs are privately operated Internet platforms. They rose to be powerfulintermediaries that take the role of algorithmic gatekeepers. Not only dothey control the ﬂow of communication between users and content providers.On their platform, they also organize ad distribution and direct attention.Concurrently, they get to retain transaction data of all involved participants,e.g. user data, website content, ad eﬃcacy, conversion costs of businesses.In some domains, misconﬁguration of algorithms has only minor conse-quences, like irrelevant search results or dysfunctional technical components.When it comes to medicine and health though, digitalization is probably go-ing to have the “most immediate and profound personal and social conse-quences” (Petersen, Tanner, et al., 2019, p.368). ISEs and their advertisingpartners combine various personal data to explore “the most intimate aspectsof our selves” (Petersen, Tanner, et al., 2019, p.368). In the realm of health,they can have immediate eﬀect on the well-being of citizens and their sur-roundings.Although society is heavily aﬀected by privately-operated Internet-basedplatforms it cannot assess the functionality and safety of those software sys-tems. A Guardian journalist describes this situation as “operating on blind,ignorant, misplaced trust” (Goldacre, 2014) and adds that choices in algorithmdesign are generally being made without citizens noticing. To counter-balanceproblematic and business-driven development of algorithms, the concept of“ algorithmic accountability ” arose. It describes the aspiration to scrutinizethe mechanisms of opaque algorithms and understand how and why they pro-duce a certain output. It also demands for institutions to be held responsiblefor the algorithms they produce (USACM, 2017). This can be achieved withmeans like the Black Box analysis portrayed in this thesis. In this context, a

Black Box denotes an “opaque technical device of which only the inputs andoutputs are known” (Bucher, 2016, p. 83). 1 hapter 1: Introduction

The motivation of this project has come from the work of Anna Couturier whoholds the dual role of PhD researcher in Science, Technology and InnovationStudies at the University of Edinburgh and Digital Manager at EuroStemCell .In this secondary role, she has observed the impact of targeted advertisementand Google as an intermediary on inquiries made to the EuroStemCell projectby patients and carers looking for information about stem cell treatments andserious conditions and diseases online. This master’s thesis contributes toa deeper analysis of stem cell treatments and digital health information aspart of a collaboration between the University of Edinburgh and the Algo-rithm Accountability Lab(AALAB) . At the project’s completion, the ﬁndingswill be handed over to a number of patient organizations including the AnneRowling Clinic , Parkinson’s UK , the Centre for Regenerative Medicine andthe Australian Stem Cell Network . EuroStemCell fosters an interdisciplinarynetwork of scientists and patient groups to research and communicate the sub-jects surrounding stem cells. They ﬁll the role as a professional medical or-ganization to “counteract the un-controlled and premature commercializationof stem cell interventions.” (Weiss et al., 2018). From these tight partnerships(and academic literature alike), evidence arose that patients diagnosed withParkinson’s Disease or Multiple Sclerosis were exposed to questionable ad-vertisement when searching the web on Google. They were questionable andproblematic in a sense that they advertised scientiﬁcally unproven stem celltreatments (SCT) (Enserink, 2006; ISSCR, 2019) to aﬀected Internet usersthat might be looking for a cure to their disease. Figure 1.2 shows examplesof problematic ads. The motivation of this thesis was to examine whethervulnerable user groups (patients of severe diseases) were speciﬁcally targetedby advertisement on the Google search engine result page (SERP). This re-search would have been especially concerned with the promotion of unprovenstem cell treatments for Parkinson’s disease, Multiple Sclerosis and Diabetes(Type I and II) on Google’s web search platform. This is important as thepresumably targeted users represent a vulnerable group whose exploitationcan have severe consequences. We picked Google because it is a popular in-tegrated search engine (ISE) with large market share (Ratcliﬀ, 2019). Most The AALAB strives to establish ethics in programming, especially in socially sensitiveapplications, automated decision making systems (ADMs) and artiﬁcial intelligence (AI) To my knowledge, stem cell-related treatments are only clinically tested and empiricallyproved to be helpful for diseases concerning the blood and immune system with advancesin the area of skin and cornea (eye) (Eurostemcell, 2020b; Food and Drug Administration,2019). Thus, in this thesis, by “questionable SCT” I denote those treatments that usestem cell-related practices that are

NOT yet approved by medical authorities. Thesequestionable procedures are not yet approved and possibly dangerous. .1. Motivation Figure 1.1.:

Sample screenshots of advertisements presented at a typical Googlesearch result page (SERP) (30.9.2019) before the policy change, cour-tesy of Anna Couturier

Figure 1.2.:

Single ad of questionable stem cell treatment provider (30.9.2019) be-fore the policy change, courtesy of Anna Couturier hapter 1: Introduction importantly, anecdotal evidence suggested that the problematic phenomenaappeared on Google, which supposedly handles promotion of unproven treat-ments very strictly according to their advertising policy. They claim to ban allads concerned with speculative and experimental medical treatments, speciﬁ-cally including stem cell therapy (Biddings, 2019b; Google, 2019n).This thesis presents a browser plugin associated with a client-server soft-ware system to crawl the Google SERP and store results, ads and top storiesin a database for further analysis. The goal was to ﬁnd out whether the prob-lematic advertisements were still being delivered over Google’s ISE. Google’sannouncement to implement adaptive measures from the beginning of Octo-ber 2019 pressured this project to engineer a lightweight, ﬂexible and practicalsolution on the fast track. Thus, many potentials for improvement could notbe considered (see Section 4.3).A qualitative analysis of advertisements on Google’s SERP is presented anda general assessment of the Black Box approach is conducted. The analysesshowed that there still is questionable advertising of unapproved SCT prac-tices. Furthermore, they showed that a variety of actors compete for attentionin the advertising ecosystem surrounding SCT.The assessment of our Black Box approach produced interesting insights forfuture work concerning methodology and requirements of such analyses. Chapter 2 introduces fundamentals that are required to follow the reasoningin this work. In Section 2.1 and 2.2 the notions of information and modellingare explored. Section 2.3.1 discusses algorithms in general to deduce howISEs operate and whether a programming artifact can be responsible for itsoutcomes. Systems theory will be explored to draw models of communicationand the socio-technical system of web-search in Section 2.4 and Section 2.5,respectively. Because this thesis elaborates on the workings of IntegratedSearch Engines and web-advertisement, Section 2.6 and 2.7 elaborate on therespective topics. Then the socio-technical system will be applied to websearch to explain the interactions of ISEs and their users in Section 2.8.In the second part, Chapter 3 describes the context in which this workwas embedded and examines the notion of digitalized health in Section 3.1.Section 3.3 further discusses how algorithms can be assessed regarding theiraccountability. The chapter closes with a closer look at Black Box analysis asmeans to analyze undisclosed algorithms in information systems (Section 3.4)and approaches to their governance (Section 3.2). Finally, Chapter 4 describesthe

EuroStemCell Data Donation that took place in 2019. Its goal was to verifythe impact of policy changes Google initiated after anecdotal evidence aboutads on unproved therapies arose . see (Biddings, 2019b) for Google’s announcement. . Fundamentals This chapter describes the required key concepts on which the following partsof this thesis are based on. Because the conceptions of these terms diﬀergreatly depending on the domain, there is a need to deﬁne and contrast someof them. First, this thesis describes multidimensionality of information in Sec-tion 2.1. Then, it will be situated in the communication process in Section 2.4.This lays foundations for the introduction of socially relevant algorithms inSection 2.3.1. It makes use of several examples to show the considerable im-pact of these artifacts of information technology on our society. Based onthis, technical and social systems and ultimately, the socio-technical systemare referenced in Section 2.5. The derivation of the concept of socio-technicalsystem is based on (Kienle & Kunau, 2014). To illustrate how the theoreticalmodel of socio-technical systems ﬁnds its real-world application in Section 2.8,the advertising ecosystem as well as integrated search engines are reviewed inSection 2.6 and Section 2.7, respectively.

Information is deﬁned as meaningful part of a message or a set of sym-bols (Meadow & Yuan, 1997). This distinguishes it from data , which has“little or no meaning to a recipient” (Meadow & Yuan, 1997, p.701) and un-derlines how the notion of information is strongly dependent on the recipient’scontext. In (Meadow & Yuan, 1997), Meadow and Yuan claim that there can-not be information overload through too much data. They argue that data isonly considered informative if it was received and comprehended. They alsorequire it to ultimately change the knowledge state of the recipient.This reﬂects a search engine’s capability to crawl the web (composed ofdata) and extract only those webpages that it deems worthy to present to auser (information). The users on the other hand perceive the results as po-tential information and subjectively judge which data gets their attention.Meadow assumes recipients determine relevance regarding understandability,redundancy and alignment with subjective beliefs (Meadow & Yuan, 1997).Madden lists geographical, cultural and social as well as educational and pro-fessional (area of interest and level of experience) factors(Madden, 2000) that From German:“Derjenige Anteil einer Nachricht, der f¨ur den Empf¨anger einen Wert be-sitzt” (Siepermann, Markus, Lackes, Richard et al., 2019) “Knowledge is the accumulation and integration of information received and processed bya recipient” (Meadow & Yuan, 1997, p. 701) Meadow oﬀers three interpretations of data, one of which explains it as “set of symbols inwhich the individual symbols have potential for meaning but may not be meaningful toa given recipient” (Meadow & Yuan, 1997, p. 704). hapter 2: Fundamentals contribute to perceived relevance.However, this thesis follows Belkin’s argumentation in (Belkin, 1978) con-cerning the usefulness of a deﬁnition of information. He argues that by drop-ping the compulsion to deﬁne, one is enabled to choose a useful interpretationthat caters ones needs. Consequently, he suggests accepting diverse conceptsas a way of looking at a phenomenon rather than squeezing all applicationsinto one deﬁnition. Hence, the following paragraph depicts how scholars sum-marize the conceptualizations of information. According to McCreadie andRice (McCreadi & Rice, 1999), information can be: • Representation of knowledge:

Information stored on a medium (e.g.a website or a database), • Data in an environment: signals obtained from the environment,including unintentional communication, • Resource or commodity: “A message, a commodity, something thatcan be produced, purchased, replicated, distributed, sold, traded, ma-nipulated, passed along, controlled” (McCreadi & Rice, 1999, p.47), • Part of process of communication:

Assumes meaning originatesfrom people, not from words, hence context plays an important role.The concepts above help to describe the manifold manifestations of trans-mitted information in the communication processes of the socio-technical sys-tem. This thesis is mainly concerned with information as part of a process ofcommunication . The suggestive nature of SCT-related advertising only un-folds in the context of patients or carers desperately searching for support. Theinformational character of the promotional message arises from the subjectiverelevance for users aﬀected by a medical condition. The following section dealswith the speciﬁcs of communication and Section 2.8 relates the methodologyof information concepts to the web search and advertising context. In the following, Weisberg’s elaborations on modeling in (Weisberg, 2013) aredescribed. This is required to understand the premises on which the follow-ing models are constructed on. He distinguishes physical, mathematical andcomputational models. They are “potential representations of a target sys-tem” (Weisberg, 2013, p. 171) that diﬀer in their representational capability.Each model consists of a structure and its construal (interpretation). Thelatter deﬁnes the assignments of real entities to structural elements and theintended scope. The scope limits the model’s expressiveness to some speciﬁc However, all other forms are present as well. The creative of ads constitutes a repre-sentation of knowledge, be it legitimate or questionable. Data in the environment ofthe human-computer interaction are constantly being extracted through tracking andexploited by analysis services. Usage data and user proﬁles are regularly traded as acommodity at low per-unit prices(see Section 2.6) .3. Socially Relevant Algorithms aspects of a phenomenon. Finally, the ﬁdelity criteria describe the standardsby which a model’s representational qualities can be judged. However, thiswork only presents descriptions of models . They are distinct from the modelsthemselves and from the target system. The target system is constructed bythe modeler through abstraction of a real-world phenomenon. This abstrac-tion intentionally reduces complexity while preserving similarity with respectto a certain subject of interest. It does so by reducing it to the most relevantaspects.Due to vagueness or ignorance, these descriptions may specify more thanone distinct model or a family of models (Weisberg, 2013, p. 172). This isimportant to acknowledge, because the web search and advertising ecosystemis a highly complex and opaque agglomeration of a multitude of actors. Thus,a modeler must ﬁnd a balance between simpliﬁcation and explanatory powerof a model. This thesis highlights the importance to scrutinize

Socially Relevant Algo-rithms (SRAs) like the ones deployed in web search and advertising systems.It is required to understand the basic categories of algorithms and how theycan express bias.

Integrated search engines like Google’s platform are operated by algorithms.By integrated search engine we denote an information system that combinessearch engine and ad exchange An information system consists of humansand machines that create information and who are interrelated through com-munication processes which generally describes a computer-assisted systemdesigned for a special purpose (Gabriel, 2016). Zwass for example describesit as “an integrated set of components for collecting, storing, and processingdata and for providing information, knowledge, and digital products” (Zwass,2016).An algorithm is a ﬁnite set of rules that yield a sequence of well-deﬁned in-structions which need to be followed to solve a class of problems or produce adistinct outcome from an input in ﬁnite time(Introna, 2016; Knuth, 1968) . Itconsists of a logic component that describes the domain-speciﬁc problem and Even though I present the “models” of communication and socio-technical systems I amaware that they are rather models’ descriptions than a model themselves. However, I usethe term “model” to refer to their respective descriptions for the sake of readability. For a detailed description, see Section 2.7.2 for web advertisement and Appendix C forsearch engines, respectively.) Knuth also lists eﬀectiveness as an equally important future. However, he deems an algo-rithm eﬀective when it can be fulﬁlled by a human using pen and paper only. A deﬁnitionthat does not realistically hold with today’s advanced algorithms. Another notable fact isthat Knuth means reasonable duration , when he speaks of ﬁnite time. Acceptable runningtimes for algorithms naturally are a moving target due to technological advancement. hapter 2: Fundamentals data structures and a control component dedicated to the problem-solvingstrategy (Kowalski, 1979). This allows to separate the eﬃciency-centered con-trol from the functional logic. The latter is solely concerned with functionalaspects. For example, to ask the right question, modeling a suitable repre-sentation, use appropriate data and ﬁnd an adequate solution. This workaddresses the logic components of ISEs and examines it all along the develop-ment axis. This is where companies and developers make conscious decisionsabout how an algorithms is designed and how outcomes are computed.Algorithms can be arbitrarily complicated. They range from simple al-gebraic calculations via computational heuristics to applications of artiﬁcialintelligence (AI) . Trivial algorithms dedicated to simple algebraic calcula-tions, sorting or other unsophisticated operations are not deemed socially rel-evant. Only if their outcomes have repercussions on individual humans orsociety as a whole, their actions must be evaluated from a societal perspec-tive. Admittedly, this is a fuzzy distinction as socially relevant algorithms canbe composed of other trivial algorithms. Additionally, the above descriptionstrongly depends on the deployment context. Nevertheless, several algorithmclasses are at risk to discriminate those aﬀected (even unintentionally throughtheir respective choice of criteria, training data, semantics, and interpreta-tion (Diakopoulos, 2013a)). Discrimination can occur through an advertiser’smalicious intent, the targeting process or the targeted audience (the even-tual outcome) (Speicher et al., 2018). This can make users subject to bias,manipulation, constrained freedom, surveillance, discrimination, commercialor political inﬂuence, or loss of sovereignty (Gillespie, 2014; Saurwein et al.,2017). They are distinguished by (World WIde Web Foundation, 2017) ac-cording to the way they process information. Below, the categories are listedalong with the respective pitfalls. Prioritization

Rank or score entities based on certain characteristics. Thechoice of these characteristics and the underlying values and norms haveimmediate impact on the order of results which could falsify the originalintention.

Classiﬁcation

Categorize an entity and assign it to a group due to its fea-tures. A faulty classiﬁer might wrongfully label an entity with severeconsequences.

Association

Establish relationships between entities. They are deduced se-mantically, through similarity or connotation, thus not necessarily rea-sonable or real.

Filtering

Exercising choice about what to consider relevant, possibly withoutrevealing the criteria this decision is based on and applying possiblybiased ﬁlters. The interest of AI is in “the synthesis and analysis of computational agents that act intel-ligently” (D. L. Poole & Mackworth, 2010). Some scholars prefer the term computationalintelligence to emphasize that the agency is based on computation(D. Poole et al., 1998). .3. Socially Relevant Algorithms According to (D. L. Poole & Mackworth, 2010), algorithms are agents be-cause they act in an environment. Going by Max Weber’s deﬁnition, to actmeans internal or external “doing” that is premised on subjective purpose ordeliberate intention (Arbeitsgruppe Soziologie, 1978) .Algorithms that power Internet-based platforms like Google’s platform ful-ﬁll Poole’s requirements to be intelligent agents (D. L. Poole & Mackworth,2010) which are derived from Turing’s approach to intelligence in (Turing,2009). His notion explains intelligence by behavior. Following Skinner, behav-ior is any externally observable action (or doing ) by an organism if it happenswith reference to its environment (Skinner, 1938) . Baum notes how behavioris generally aimed at a goal and a result of deliberate choice of actions thatconsiders future consequences (Baum, 2013). Because actions are intended be-havior (Kienle & Kunau, 2014), inanimate entities like algorithms are capableto behave within the boundaries of their deﬁned actions. On top of that, tech-nical components and the repercussions of their actions aﬀect both social andtechnical entities in the socio technical system. Thus, they have a strong rela-tional aspect when they facilitate communication processes (see Section 2.5).Thus, they can be seen as agents of collective agency (in Section 2.3.1).This explains why Google’s platform qualiﬁes as intelligent agent by fulﬁllingPoole’s requirements (D. L. Poole & Mackworth, 2010). It emphasizes thecapability of their algorithms to act appropriately with respect to circumstanceand goal. Furthermore, the intelligent algorithmic actors is ﬂexible pertainingto resources (computational space and time) and learning experiences.The paragraphs above explained how ISE’s algorithms construct an informa-tion system that is designed for a certain purpose and to interact with humansthrough distinct technical components or computational artifacts, namely al-gorithms. Introna claims that “[a]lgorithmic action has become a signiﬁcantform of action (actor) in contemporary society” (Introna, 2016, p. 37).In Section 3.3 algorithms and particularly those that act intelligently aredescribed as agents that are able to perform self-suﬃciently in their environ-ment, Nonetheless, they cannot be perceived as self-suﬃcient moral agentsof their doing. They can be judged by the decisions that were made alongthe chain of instructions and the output they generate which constitutes theiractions. Socially Relevant Algorithms constitute the technical components in socio-technical systems (STS, see Section 2.5). They have a signiﬁcant impact ona social system and can mostly be found in human-computer interaction, forexample, when a human searches the World Wide Web (WWW) using a search Herein, the terms agent and actor are used interchangeably to describe subjects or entitiesthat act. Unfortunately, this does not allow to observe and scrutinize technical components by theirbehavior as they are not alive in the original sense. Nonetheless, a technical systemcan be seen as an (non-biological) organism that is comprised of diﬀerent organs (itscomponents) and pursues a certain goal. hapter 2: Fundamentals engine and is computationally targeted with advertising. Here, human andcomputer engage in mutual communication. The idea to evaluate algorithmsas part of a greater system is a perspective that expands the boundaries ofcomputer science beyond the realms of bare construction of computers andalgorithms design. It addresses accountability and responsibility concerningthe development, implementation and use of algorithms that play a signiﬁcantrole in socio-technical systems. It addresses long-term eﬀects and emergentbehavior as well as a wider scope of stakeholders. Sometimes, the outcomesof these algorithms are accompanied by discrimination, induce manipulationor express other unwanted side-eﬀects. Basically, all classes of algorithms asdenoted in Section 2.3.1 can suﬀer from biases. Below, SRAs are listed thatshowed signiﬁcant impact on either individuals or society, some problematicor at least questionable others merely thought-provoking . • Scoring credit risk (Citron & Pasquale, 2014), recidivism (Larson et al.,2016) and social behavior (K¨uhnreich, 2017; Stanley, 2015) • Nation-wide face recognition (Chen, 2017) and predictive policing (Pe-teranderl, 2017) • Fake News (Albright, 2017) and emotional manipulation in social net-works (Kramer et al., 2014) • Racial ad delivery (Angwin & Parris Jr., 2016; Sweeney, 2013) and sexistrecruitment (Reuters, 2018) • Home automation (Peterson, 2020) and automotive software (Gelles etal., 2015; Koscher et al., 2010) • Dubious autoplay feeds (Maheshwari, 2017; Max Fisher & AmandaTaub, 2019a, 2019b) • Art performances (Weckert, 2020) Another critical application domain is the search engine. Search engines actas the entry portal to the WWW, creating comprehensiveness in humongousmass of websites out there and satisfy users’ information need. Users conﬁ-dently trust a search engine to answer their query and rank results by truerelevance (Pan et al., 2007). They allow an algorithm to deem some infor-mation more worthy than other. Thus, researchers claim that search engineshave the power to shape public opinion (Zittrain, 2014), disseminate conspir-acy theories (Ballatore, 2015), redeﬁne history (Grimmelmann, 2008), perpet-uate negative stereotypes (Baker & Potts, 2013; Kay et al., 2015), manipulate This list intends to give a rough overview of SRAs to demonstrate their widespread ap-plication in all sorts of domains of everyday live. The compilation includes scientiﬁc aswell as journalistic sources and is in no means exhaustive. This study was widely criticized by popular media (Chambers, 2014; Grohol, 2018) andacademia alike (Jouhki et al., 2016; Shaw, 2016) for including uninformed participantson a large scale and basing marginal ﬁndings on antique research methods. Weckert created a virtual traﬃc jam on

Google Maps by pulling a wagon full of cell phonesdown an empty road. .4. Communication individual users (Epstein & Robertson, 2013, 2015) and discriminate basedon race (Angwin & Parris Jr., 2016; Sweeney, 2013) and gender (Adam Gale,2015; Kay et al., 2015; Otterbacher et al., 2017).An attempt to describe these eﬀects is made by Gillespie in (Gillespie, 2014).He distinguishes six dimensions of algorithmic impact on society. Patterns ofinclusion , the evaluation of relevance and the promise of algorithmic objectiv-ity all relate to the functionality of search engines as an unbiased informationprovider that delivers relevant answers from an objective selection of knowl-edge to users. The cycles of anticipation , entanglement with practice and production of calculated publics describes how algorithms analyze and targetusers and how these inferences and the users’ respective expectations reboundto society. Further down, we will see how the practices of integrated searchengines like Google are subject to all of them.These alarming consequences are not necessarily intended by their develop-ers but usually emerge as unwanted side eﬀects, unexpectedly and throughinteraction with society. Algorithms may indirectly disadvantage users inways that are not necessarily illegal or intended by their developers. Oncethey exercise socially problematic behavior, they should be scrutinized by thepublic (Sandvig et al., 2014).This thesis deﬁnes Socially Relevant Algorithms as algorithms that have animmediate eﬀect on a social system through their close coupling with socialprocesses (communication). They are part of a socio-technical system , wherethey constitute the technical part. To examine the characteristics of interaction between humans and computers,this chapter contemplates diﬀerent models of communication.In the course of years, several models have gained popularity. This chapterdiscusses three popular models of interaction to describe the interactions inthe process of communication between a human and a technical agent. First,both extremes of the human-computer spectrum will be explored. Shannon’smodel of tech-focused communication in the context of electrical communi-cation engineering is to be contrasted with Watzlawick’s approach of humanpsychology. Lastly, a context-conscious model by Kienle will be evaluated.The comparison should illustrate why the subject of human-computer inter-action present in web search and advertising requires a speciﬁc approach tocommunication. In order to fully explain the nature of the web search and ad-vertising ecosystem, any arbitrary model might be insuﬃcient. Consequently,an appropriate model must be able to reﬂect the system’s properties and Although (Epstein & Robertson, 2013, 2015) are widely referenced, the studies are equallyharsh criticized due to their miscalculations, exaggeration and sensational claims, forexample in (Algorithm Watch, 2017). See Section 2.5 Watzlawick et al. deﬁne interaction as mutual exchange of messages between two or morepersons (Watzlawick et al., 2007). hapter 2: Fundamentals means of interaction. The following critique is mainly based on (Kienle &Kunau, 2014) with speciﬁc examples by the author of this thesis to illustratethe inapplicability or ﬁtness of the respective model’s characteristics to websearch. Figure 2.1.:

Schematic diagram of a general communication system, illustrationfrom (Shannon, 1948, p. 381)

Figure 2.1 shows a communication model dedicated to describing the ex-change of information between two partners via telegraph or any wired con-nection. A source submits a message to a transmitter that encodes the messageand sends a signal on a channel. During transmission, it may be aﬀected bya noise source. The possibly corrupted message is then received and decodedby a receiver, which typically applies the inverse function of that done by thetransmitter. After reconstructing the original message, it is delivered to itsdestination.(Shannon, 1948)This model falls short of many aspects that are essential for human-computerinteraction in web search. (Kienle & Kunau, 2014) enumerates the followingshortcomings that are then adapted to the web search context. First, Shannonreduces the content of the message to its syntax only. The value of received in-formation only depends on not-yet transmitted signals. The more of a messagehas been received, the lower is the informational value of residual signals. Thisis a wrong assumption in the context of web search. Even though searchersmay have reviewed numerous results, the single most relevant result or adver-tisement that they eventually accept has higher informational value than thepreceding signals. Shannon further assumes that all messages are equivalentlyimportant for all destinations. This falls short of describing a web searchscenario, where searchers have a unique background or context and expect acustom answer for a speciﬁc question. Users only consider the subjective valueof advertisements and search results. Nonetheless, in this thesis the notion ofsender, message and receiver is retained to describe the agent who initiatesthe communication, the transferred information and the addressed person. Herein, the partners denote agents involved in mutual communication. .4. Communication Other models emphasize inter-human communication and add an empathicaspect. Watzlawick et al. present their psychological approach in (Watzlawicket al., 2007). His model is concerned with two human communication partnersthat are situated in vicinity to each other (possibly in one room). He interpretsthe entirety of behavior as means to transmit a multipartite message. Themodel is strictly restricted to observable actions and its trajectory depends onthe subjective interpretation of the course of actions.Watzlawick et al. formulated ﬁve axioms based on their experience as thera-pists in (Watzlawick et al., 2007, pp. 53–70). Below, they are enumerated andsubject to discussion with respect to their applicability to the human-computerinteraction of web search.

1. Axiom

Non-communication is impossible.

2. Axiom

All communication includes a content and a relational aspect suchthat the latter determines the former, which forms a meta-communication.

3. Axiom

The nature of a relationship is determined by the succession ofcommunication perceived by the parties or their interpunction thereof

4. Axiom

Human communication utilizes analogue and digital modes.

5. Axiom

Course of inter-human communication is either symmetrical or com-plementary. For the ﬁrst axiom to hold, Watzlawick presumes the analogous humancommunicators to be in one room. Obviously, this cannot be guaranteedwith Internet-based service. Furthermore, remote communication oﬀers manyways to not communicate, most of which pertain to not initiating a com-munication process online (not sending a message, not clicking a button).Through the technical communication channel (the Internet), the intention ofnon-communication remains shrouded and cannot be evaluated unlike otherthan with a passive agent in human-to-human interaction.As to the second axiom , Determining a relationship between conversa-tional partners over Internet-based services is diﬃcult. Internet intermediariessuch as platforms, search engines and ad exchanges complicate ﬁnding the truesource of information. Imbalance of power over the communication channel(which is dictated by the platform) and a disparate state of knowledge aboutthe respective partner usually leave users in the dark about the workings ofthe communication and the intentions of their counterpart.With respect to static websites like most search engine result pages, the third axiom cannot be applied, too. Usually Internet-based services respondto user queries, the interchange accurately logged in ﬁles of the web server.There is no ambiguity to the course of communication. Users often perceive theInternet-based service’s response to a user query, as a direct answer. However, Translated from German from (Watzlawick et al., 2007, pp. 53–70) hapter 2: Fundamentals the user query alone is not the only input to a search engine for example. Itsalgorithm organizes a plethora of information about the user and leveragesbackground information in a way that the users can never be sure, when theircommunication with the platform provider actually started. As most usersare unaware of the unobtrusive and constant tracking, testing and adaptingof web-services, they are also ignorant about the entirety of exchanges in acommunicationThe depiction of analogue modes of communication as non-verbal can be sus-tained as claimed in the fourth axiom . However, gestures and facial expres-sions do no (yet) play a role in web search. Instead, background informationin the form of data about a user and knowledge about context inﬂuence thecommunication. Kienle and Kunau note that this can explain reduced com-municative capability of interacting with and via technical systems(Kienle &Kunau, 2014, p. 59). Watzlawick argues that especially in human-computerinteraction, it is important to provide meta-information along with a mes-sage so the communication partners can negotiate their relationship and theinterpretation of the message(Watzlawick et al., 2007, p. 55).Interestingly, the last axiom allows a two tiered interpretation. At ﬁrst,the role of interrogator (users) and respondent (search engine) are very clearand fulﬁll the requirements of complementary communication with mutualreinforcements of this distinct relationship. Nonetheless, one could see thesearch engine providers’ learning strategies as a symmetrical approach to thequestion-answer-dialog. By learning more about the users, their intentionsand the context in which a query is formulated, there is a notion of reciprocallearning, though on diﬀerent levels and orchestrated with a distinct intention.The users’ learning is objective-oriented with respect to their information need,the focus of the search engine’s learning however is subject-based and on theusers and its own means to serve them. In conclusion, even though this modelis well suited to illustrate direct human communication, yet again it cannotbe applied to human-computer communication without ﬂaws. The context-oriented communication model by Kienle (Kienle, 2003, pp. 22–27) depicts communication diﬀerently. It is no longer an unidirectional auto-matic process pushing a message from a sender to a receiver. Now, all involvedparties are responsible for a common understanding (H. H. Clark & Brennan,1991). Kienle adds that, the involved parties mutually refer to or react to eachother’s messages (Kienle, 2003, p. 17). It is based on the notion of social action by Luhmann. He describes it as action whose intention includes the supposedor expected attitudes of other people who are involved in the communica-tion (Arbeitsgruppe Soziologie, 1978, p. 129). Kienle calls these assumptions context and assigns to it the part of an environment that aﬀects individuals’actions during interaction and facilitates mutual understanding (Kienle, 2003,p. 22). Kienle’s model interchangeably assigns the roles of sender and receiverto the communication partners. Her model allows for switched positions andfor technical entities to participate as long as they can fulﬁll the tasks involved14 .4. Communication in the process.

Figure 2.2.:

Luhmann’s triple selection in social actions, from (Kienle & Kunau,2014, p. 71)

Luhmann derives his model from the idea that humans can only cope withcomplexity through selection (Luhmann, 1984, p. 48). Thus, he proposes acommunication process that passes through several selections (see Figure 2.2.First, the information to be communicated is selected among many alter-natives, secondly the form of transmission is chosen (the kind of message),then the recipient evaluates how to understand the message. Eventually, thepersons addressed select how the new “information diﬀerence” aﬀects theirbehavior (Luhmann, 1984, 194ﬀ). We see these steps in web search as well. Asearch engine selects only a fraction of available information and speciﬁcallychoses a personalized subset thereof to answer the users’ queries. Then, theresults are presented in the most meaningful way. Based on their subjectiveassessment, users accept a relevant result, reformulate queries or reject theoutput, exhibiting a degree of satisfaction. Finally, they may or may not actby clicking on an organic or paid search result.In Kienle’s model in (Kienle, 2003) and (Kienle & Kunau, 2014), theseselections also takes place, though they are inﬂuenced by internal and externalcontext of the agents. According to her, internal context includes knowledge,emotions and assumptions (especially about the partner).While the internal context is invisible for the counterpart, the external con-text is shared. It is based on common perceptions and experiences. as wellas mutual beliefs. Extra-communicational behavior is adapted to the contextand enriches the verbal (direct) communication. In this model, context has asigniﬁcant function.First, shared context supports success monitoring with respect to the intendedoutcome of the communication.Second, the explicit message can omit information that can be inferred fromcontext. In the end, a consistent inner context about the counterpart’s atti-tude and common belief or shared assumptions about an outer context are thepremises for successful communication.The development of search engine capabilities featured in Appendix C ex-15 hapter 2: Fundamentals

Figure 2.3.:

Context-oriented communication model (Kontext-orientiertes Kom-munikationsmodell), from (Kienle, 2003, S. 35) .4. Communication (a) Face-to-face situa-tion (Kienle, 2003,p. 37) (b)

Computer-mediated situa-tion (Kienle, 2003, p. 44)

Figure 2.4.:

Sender activities in the context-oriented communication model, from(Kienle, 2003) hibit a tendency of to concentrate on user intent and context. Apparently,the ambiguity of textual queries degrades the result quality like verbal-onlycommunication without contextual knowledge. Thus, it is vital for a technicalagent to identify the respective human’s context, attitudes and intentions tofully grasp the nature of the communication and answer accordingly. Boththe inner and the outer context are explored through data-supported usermodeling and predictive analysis.Kienle’s consideration of the sender’s activities is especially interesting re-garding web search. In the face-to-face situation depicted in Figure 2.4a manyof the activities listed can be eﬀortlessly applied to ISEs. They try to evalu-ate and estimate a searcher’s background, intentions and knowledge throughproﬁling and computational models (see Appendix C and Section 2.6). ISEsalso exclude irrelevant advertisements and search results through selectionand personalized ranking on the ad exchange. Then the algorithms determineappropriate descriptions and provide diﬀerent forms of presentation through

Knowledge Graphs and infoboxes . After that, they steer attention througha structured search result page and ads on the bottom or top of the SERP.Finally, an ISE validates success with click-through analysis (L. Granka etal., 2004; Joachims et al., 2007). They only fail at making context deducible.The context-attributes used in the selection and delivery process remain dis-closed. Thus, a receiver might ﬁnd a message relevant and useful. But userscan never fully grasp why a subset of ads or results is shown. Nowadays, ISEsmake this context explicit, at least pertaining to advertisements when theygive reasons as to why an advertisement was shown. Google gives users someinformation on why they see a certain ad. Naturally, these explanations areonly vague (Google, 2020d).With a computational agent as an intermediary, the communication process See Appendix C for an introduction to both hapter 2: Fundamentals changes. A technical system transmits the message, reducing the choices avail-able in the selection of medium (see the second step in Luhmann’s three-foldselection, Figure 2.2). The communication situation cannot be immediatelyexperienced since there is usually a signiﬁcant distance between sender andreceiver. Context blurs or perishes making interaction more tedious. Now,communication partners must consider the limited means of expression. Extra-communicative behavior can no longer be directly observed, and the partnerscannot necessarily assume a shared context. Thus, context information has tobe made explicit if it contains useful information for a recipient. This entailsa change in senders’ activities, as seen in the transformation from Figure 2.4ato Figure 2.4b. In computer-mediated communication instead of implicitly re-ferring to context, context has to be made explicit to a degree that it supportsthe sender’s intentions and the receiver’s ability to understand and accept in-formation. Kienle supposes to use diﬀerent illustrations and cues to facilitatecomprehension.(Kienle, 2003) (Kienle & Kunau, 2014)Based on the argumentation above, this thesis understands communicationas Kienle deﬁnes it and illustrates it in her model: Interaction encoded in sym-bols, regarding the mutual context . We need this extended perspective oncommunication to comprehend the interaction between users, advertisers andintegrated search engines . Below, the nature of the communication’s con-tent is discussed, and Section 2.7 explains how search engines achieve contextawareness without engaging in face-to-face communication. The notion of socio-technical systems represents the idea that social, psy-chological and technical factors can be tightly connected in a way that theycan only be understood in combination as an integrated whole (Kienle & Ku-nau, 2014, p.81). It originated from a very analogue mining context (Trist &Bamforth K.W., 1954) and was applied to modern software engineering (Som-merville, 2016) using the systems theory below . In the discipline of infor-matics or computer science this change of mind emphasizes, that not only thedesign and implementation of algorithms should be of concern, but also theirimpact on individuals and society as a whole. It is reﬂected by the constantlychanging eﬀorts of the discipline to self-deﬁne. Coy shows how the trajec-tory of deﬁnitions changes over the years and shows how there is a growingconscience for applications and implications of algorithms (Coy, 2013). Coyquotes Wilfried Brauer twice over a decade, showing the scope of informaticsgrew from data processing by means of digital computers towards “theory, From German: “Durch Zeichen vermittelte Interaktion, wobeauf gemeinsamen KontextBezug genommen wird.” (Kienle, 2003, p. 20) There are diﬀerent ways to apply the the model of context-oriented communication tothe web search ecosystem. In alternative approaches technical agents take the role of amediator. Although this would better represent the actual ﬂow of information, it wouldnot signiﬁcantly support this work’s analysis. The line of argumentation is drawn from (Kienle & Kunau, 2014) .5. Socio-technical Systems methodology, analysis and construction, application (and) consequences ofdeployment” (Coy, 2013, p.489) .Kneer and Nassehi deﬁne a system as “the entirety of a set of entities andtheir mutual relations” (Kneer & Nassehi, 1993, p.25) . Anything not in-cluded in a system’s deﬁnition is called environment (Kienle & Kunau, 2014).Sommerville extends this deﬁnition with a purpose the system is dedicated to.From his Software Engineering perspective, he adds that the components ofa system cooperate to deliver a set of services to a user (Sommerville, 2016,p.556). Following the deﬁnition above, technical systems consist of interrelated tech-nical components. They constitute the entities. Luhmann describes thosecomponents as coupling of causal elements (Luhmann, 2000, p. 370) whichmay include human behavior, if it happens in an automatic and determinedmanner and not through arbitrary decisions (Luhmann, 2000, p.370). This re-ﬂects the connectedness of the discrete computational instructions that drivean algorithm. Further, he argues that technical systems are allopoetic . Thismeans, they were constructed by an external force and are not self-suﬃcient.Thus, they cannot reproduce or renew themselves which means they are au-tonomous but not autarkic. They rely on external resources (like energy, re-placement parts or activation though signals) what makes them non-autarkic.However, they autonomously carry out their operations in a self-determinedmanner. They halt operation when they receive no further input from theirenvironment. Thus, Luhmann concludes that technical systems are exter-nally controlled and organized (Luhmann, 2000). This applies to algorithmsin so far as they are created from the outside through programmers and theyrely on hardware and energy to operate. They perform their predeterminedactions according to their instructions. They do not compute for the sakeof computation but to enact their creator’s intentions through performativity (Section 3.3) which denotes the outcomes that emerge from an algorithm’sdeployment rather than the written code.Following Kienle and Kunau, technical systems are deemed faulty, if notthey do not behave as intended by its constructors (Kienle & Kunau, 2014).Here, we can observe a possible discrepancy between the purpose-directedactions and the eventual outcomes of an algorithm. The latter can deviatefrom the expected results even though a technical agent only performs asintended by its developers. This opens the space for discussion about whatseparates intentional functionality from undesired side-eﬀects that algorithmscan produce in a socio-technical system. Translated from German (Coy, 2013, p.489) Alongside entities , this thesis interchangeably refers to the constituent parts of a systemas elements . allo , Greek for “diﬀerent, other” and poiesis for “An act or process of creation”, seehttps://en.wiktionary.org/wiki/allopoiesis hapter 2: Fundamentals According to Luhmann, not humans but communications constitute a socialsystem (Kneer & Nassehi, 1993, p. 65). Thus, Kneer and Nassehi deﬁne thesocial system as systems that recursively generate communication from com-munication in a continuous manner until the system perishes. Its constituentelements are communications, that reference each other. The relations de-scribe the kind of dependence between them(Kneer & Nassehi, 1993, p.80).In contrast to technical systems, social systems are autopoietic (Luhmann,2000) . They are self-suﬃcient as they proliferate through succeeding op-erations from the elements within. Only if newly created communicationcan reasonably connect to existing communication, the social system liveson (Klymenko, 2012). Additionally, they are self-describing in a sense thatthey constitute themselves through diﬀerentiation from their respective envi-ronment. This operational closedness ensures that social systems develop theirown structure based on intrinsic operations alone. These operations are notdetermined by the system’s environment, but by a selective choice of environ-mental inﬂuences, at the system’s discretion. Hereby, the social system cancompose its own structure by selectively reacting to an arbitrarily complexenvironment. It observes the environment and creates its identity by distin-guishing between inside and outside in its communications (Luhmann, 1998;Mayr, 2012) By determining what communication is acceptable within the sys-tem, it can diﬀerentiate between system, other systems and environment. Thisemergent behavior is a result of the three-fold selection process in the creationof communication by Luhmann (Luhmann, 1984)(see Section 2.4). Throughthis self-description, the system can be observed, described and analyzed fromthe outside (Kienle & Kunau, 2014; Kunau, 2006). This way, subsystem canarrive at functional diﬀerentiation (Mayr, 2012). Similar to its technical coun-terpart, social systems are autonomous but not autarkic. Even though theysustain themselves through recursive communication (which makes them au-tonomous), they are not immune to impulses from the outside (their environ-ment) and are subject to boundary conditions. Nonetheless, the social systemsovereignly decides on how to incorporate impulses from the outside (Kienle& Kunau, 2014).Hence, Luhmann deduces that society itself must be the ultimate socialsystem, including the entirety of all social communication (Luhmann, 1984,p.555). Klymenko points out that, according to Luhmann, this super-systemcan be partitioned into subsystems with their respective environments. Throughself-description these fragments can distinguish themselves from other subsys-tems. Consequently, systems can recursively consist of interrelated systems.This allows us to treat society as an amalgamation of multiple social subsys-tems, each with its own communications. Today, this separation happens on afunctional basis, so those sub-societies are shaped by their speciﬁc form of com-munication (Klymenko, 2012). Drawing from this distinction, we can makeout the social system of online advertising that is comprised of the subsystems Greek for “self-produced, self-organized”, see https://en.wiktionary.org/wiki/autopoiesis .5. Socio-technical Systems of users, advertiser and search engine providers. To describe and analyze social systems that sustain a tight relationship witha technical system, Kienle and Kunau came up with a new deﬁnition to mergeboth. According to them (Kienle & Kunau, 2014, p.97), a social system con-stitutes a socio-technical system (STS) if:1. The technical system supports the social system’s communication pro-cesses,2. There is mutual inﬂuence,a The technical system inﬂuences the social system,b The social system shapes the technical system,3. The technical system becomes part of the social system’s self-description.This underlines the interrelation of both. Now the social system is activelydesigning and constructing the technical system. This, in turn, is weaved intothe communication processes that sustain the social system. Eventually, it be-comes indispensable, so the social system integrates it into its self-description.The model characteristics with respect to Weisberg’s “model of models” canbe described as follows. The structure of the STS is composed of a technicaland a social system. Furthermore, in comprises communication processes ofthe social system that are aﬀected by inﬂuences of the technical system. Ad-ditionally, there are creative and manipulating actions towards the technicalsystem. Its intended scope is to explain a speciﬁc phenomenon that requiresto involve both technical and social agents. It allows to analyze the mutualinterferences and the technical adoptions that are integrated in a social sys-tems self-description. The (unspoken) ﬁdelity criteria is the capability of themodeler to somehow restrict the boundaries of said systems and narrow thenarrow the signiﬁcant variables.In Section 2.8, this model will be applied to the web advertising ecosystem.It will describe how the social system of advertiser, users, engineers and societyas a whole interact through the technical system and integrate it in their self-description. 21 hapter 2: Fundamentals “ The predominant economic model behind most Internet services isto oﬀer the service for free, attract users, collect information aboutand monitor these users, and monetize this information. ” (Mikianset al., 2012)Some Internet platforms exploit basic human needs like socializing withothers, information seeking and communication to hoard personal data andcapitalize on the analysis of this information (Petersen, Tanner, et al., 2019). Asoon as customers are proﬁled and recognized online, they can be targeted withpersonalized advertising and search results (Google, 2019q) in real time (Steel& Angwin, 2010). “[I]f an ad network is able to accurately target users, we candeduce that the ad network is able to determine user characteristics” (Guhaet al., 2010, p. 1), Guha concludes.The sections below describe the methods and merits as well as a critiqueof data collection and targeted advertising. In the context of this work, it isimportant to understand them as the foundation for modern online advertis-ing. Some problems that emerge from the technical systems in web search andadvertising have their roots here.

Today, information that was seemingly meaningless alone is enriched throughthe amalgamation of data from diﬀerent sources. There seems to be no suchthing as useless data. According to (Federal Trade Commission, 2014), someof those companies have 3000 data segments for almost all U.S. consumers.Data brokers buy and sell information in packages that include overhead whichwas not ordered in the ﬁrst place but are part of the deal (Federal Trade Com-mission, 2014). Some user data segments are sold oﬀ for less than $ . Thisdata is collected through web tracking. There are two kinds of online tracking.Stateful technologies use cookies, cache, HTML5 properties and session IDs toidentify users. Stateless technologies or ﬁngerprinting on the other hand, com-bine properties of hardware, operating system, browser and the conﬁgurationthereof to identify a user (Laperdrix et al., 2019). While active ﬁngerprintingis performed by scripts and plugins and therefore can be inhibited by prohibit-ing their execution, passive ﬁngerprinting can be derived from network traﬃcand thus remains unseen and untouched by the user (Mayer & Mitchell, 2012,p. 421)Web tracking enables companies to reveal users’ demographics (Hu et al.,2007), location, purchasing decisions and interests as well as some sensitiveinformation about them like health conditions, political or religious views (Bi Study was conducted 2013, so prices may have changed. Nonetheless, as there are morenetworked entities today, the amount of data most likely increased, with the price of anindividual bit of information consequently decreasing. .6. Data Economy and advertising et al., 2013), sexual orientation (Mistree, 2009) and relationship status (Back-strom & Kleinberg, 2014) through their online activities (Mayer & Mitchell,2012). This amalgamation of data from various sources allows companies spe-cialized in data-collection, -analysis and -fusion to derive PII from at ﬁrst user-neutral data (Krishnamurthy & Wills, 2009b). Sparse individualized data likebrowsing histories or product ratings are suﬃcient to de-anonymize users in anapproach presented in (Narayanan & Shmatikov, 2008). Browsing behavioralso suﬃces to learn about a user’s demographics (Goel et al., 2012). Thiseven includes oﬄine behavior such as movement, speech or geolocation (Laneet al., 2011) (H. Lu et al., 2012). Technological progress beneﬁts this develop-ment. Social media entices users to unveil intimate details about themselves,digital communication can be crawled, mobile technology reveals geospatialdata (Yuan et al., 2012). Machine Learning renders manual or explicit clas-siﬁcation superﬂuous. Consequently, user proﬁles are no longer composed byquery similarity and classiﬁed in groups of equal interest and purchase deci-sions. Modern approaches derive clusters and semantic relationships from userbehavior (X. Wu et al., 2009).Proﬁling can generate problematic categories and unwanted side-eﬀectsthat allow discrimination or questionable targeting of users. Angwin et al.showed how Facebook allowed to target “jew haters” (Angwin et al., 2017) orexclude users by race (Angwin & Parris Jr., 2016). Speicher et al. scrutinizeddiﬀerent targeting methods used by advertising-based platforms and foundthree major methods: attribute-based targeting , PII-based (custom) audi-ence targeting and look-alike audience targeting (Speicher et al., 2018).These methods can discriminate users or groups of users. Speicher et al.further showed that selectable categories on Facebook correlate with sensitiveattributes of users (like ethnicity) (Speicher et al., 2018). Further, through theuse of look-alike audiences bias is propagated to the selection of new subjects.Lastly, the wide availability of personal data and the eﬃcacy of combinationand analysis thereof facilitates discrimination by PII.Because data brokers and the industries that tap into their resources aregenerally customer-oriented but not consumer-oriented, it remains laboriousfor individuals to inquire about their data, their origin, sourcing techniquesand usage (Federal Trade Commission, 2014; Marwick, 2014). In 2015, Dattaet al. found that users could not review all data that was used by Google tocreate their proﬁle. Furthermore, protected attributes carrying sensitive per-sonal information were used in the proﬁling process. This potentially exposes Tufekci distinguishes proﬁling and modeling . According to him, proﬁling only aggregatesdata about individuals and categorizes them, whereas modelling infers attributes andintentions beyond the former knowledge with the help of data and computational meth-ods (Tufekci, 2014b). However, as this distinction is not at the heart of this work, it usesthe terms interchangeably. Attribute based: Determining the target audience by selection of attributes that usersmust express PII-based targeting: Specifying distinct users by their PII Look-alike targeting: Targeting an audience similar to an existing sample customer base,also known as remarketing hapter 2: Fundamentals users to discrimination and deters the ability to comprehend the reason behindad choices (Datta et al., 2015). The opacity may lead to distrust with respectto unaccountable data sources (Pasquale, 2008) that leaves users in the darkabout the origin of a computation. This can lead to users losing conﬁdence inan algorithmic system.Today, agency is passed onto privacy policies, Terms & Conditions and thelike to a degree that they deteriorate to “defaults” (Introna, 2016). Appar-ently, they are usually skipped or skimmed and only read if coerced to (Ste-infeld, 2016). This questions the concepts of fair conduct and informed con-sent in this interconnected socio-technical system. Ambiguous and mislead-ing privacy policies further the collection as they are incomprehensible to anaverage Internet user and grant vaguely deﬁned rights to ﬁrst- and third-parties (Reidenberg et al., 2015). Eventually, giving truly informed consentto data collection may be hampered by the design of the decision processas people’s capabilities with respect to memory load and concentration arechallenged (Veltri & Ivchenko, 2017). In (Federal Trade Commission, 2014),authorities voice recommendations that would allow citizens to easily identifydata brokers that trade their data and require those businesses to disclose ifand how they deduce from raw data. Speciﬁcally the categories or proﬁlesthat they attach to a consumer should be revealed, so concerned users canscrutinize and correct this information. In this sense, EU’s GDPR (GeneralData Protection Regulation)(European Parliament & EU Council, 2016) al-lows subjects of data collection at least theoretically to demand details aboutthe data stored on them.Through third-party tracking technologies that are embedded into websites, personally identiﬁable information (PII) is transferred to entities other thanthe ﬁrst-party website a user originally intended to visit (Krishnamurthy &Wills, 2009a). Tracking providers’ services span across a wide variety of ﬁrst-party websites . Hence, they are able to aggregate usage data from multiplesources to create a user proﬁle (Krishnamurthy & Wills, 2006) that allowsinferences about the personality of a user (Lambiotte & Kosinski, 2014). Ac-quisitions and technological advance realize a “potential of signiﬁcant growthin aggregate data” (Krishnamurthy et al., 2007, p. 548), for example, whenGoogle acquired DoubleClick in 2007 (Google, 2007). This diﬀusion or leak-age of PII (Krishnamurthy & Wills, 2009b) leads to an imbalance of power asusers cannot easily examine the usage of their data .This shows how intermediary platforms agglomerate data sources and collec-tion utilities to enhance their services and horizontally integrate technologiesthat allow them to analyze and target speciﬁc users. Through the complextracking networks and advanced analysis methods an information asymmetryarises (Tufekci, 2014a). Users are unaware or resigned towards the collection (Krishnamurthy & Wills, 2009b) showed that 70% of ﬁrst-party websites were supportedby the Top-10 tracking providers in 2008 already. Researchers add: “Aggregator nodes in possession of information that can be tracked toindividual users could potentially use it in a manner that violates the legitimate privacyexpectations of users” (Krishnamurthy & Wills, 2006, p. 1). .6. Data Economy and advertising and have to understanding of the tracking imposed on them. Online compa-nies, however, can construct a rich representation of users. As a consequence,some users try to protect themselves from tracking. Whenever a citizen leaves a digital footprint, it can be added to their path.Avoidance is practically impossible due to the high degree of digitalizationand the technological divide that separates tech companies from the averageuser’s capabilities to fend oﬀ attempts of tracking. On top of that, organiza-tional structures in advertising ecosystems are hard to decipher, which makesblocking malicious content cumbersome (Krishnamurthy & Wills, 2006). Re-searchers suggest that eﬃcacy of tracking protection techniques are inverselycorrelated with page quality or browsing experience strongly impede webbrowsing experience(Krishnamurthy et al., 2007). The better the protection,the more features are unavailable ant the less comfortable the web browsingexperience (Krishnamurthy et al., 2007). A study from 2010 showed that thevast majority of tested browsers could uniquely be identiﬁed, even after aﬁngerprint has changed (Eckersley, 2010) .Still, there are some technical and behavioral measures that can reduce thedissemination of personal identifying information, for example using the TORbrowser or the NoScript browser extension as well as various tools to blockads (Eckersley, 2010) (Krishnamurthy et al., 2007). Ultimately, some schol-ars discuss obfuscation and misleading actions like entering ambiguous andfalse data as a last resort to privacy (Brunton & Nissenbaum, 2011). Be-cause they see the free web’s business model at stake, some scholars suggesttools like MyAdChoices . This browser extensions detects behavioral advertis-ing and allows ﬁne-grained control over what information is shared with ad-vertisers (Parra-Arnau et al., 2017). Toch summarizes diﬀerent approaches topreserve both privacy and online advertising including but not limited to aggre-gated proﬁles, client-side distribution of PII or supply-side user controls (Tochet al., 2012).The paragraphs above showed how tracking protection can actually facilitatetracking. One way or another, some users can be identiﬁed through associateddata and a proﬁle is compiled. Then, they can be subject to targeted orpersonalized advertising.

Marketing does no longer serve a large audience but can be tailored to in-dividuals by deducing knowledge about them, that they were not necessarilywilling to expose (Tufekci, 2014b). Behavioral targeting of ads is increasing Ironically, adding protection measures to a browser can help to identify an individualclient (Eckersley, 2010). https://addons.mozilla.org/de/ﬁrefox/addon/noscript/ hapter 2: Fundamentals their click-through rate signiﬁcantly compared to non-targeting controls an ad-dresses similar users of a distinct audience (Yan et al., 2009). It also enhancespersuasion and motivates purchases (Matz et al., 2017).In 2010 already, Gauzente suggests that most of the Internet users areaware of sponsored ads on SERPs, with an increasing tendency (Gauzente,2010). Moreover she ﬁnds that a positive attitude towards them improvesclick-through-rate. Users feelings towards targeted behavioral advertising andthe heavy use of user data to identify customers and audiences are still man-ifold, undecided and ambiguous. They oppose persistent tracking, intrusiveanalysis and overly personal advertising, yet expect time-relevant and inter-est based advertisement (Ur et al., 2012) (Ruckenstein & Granroth, 2019).Schumann et al. suggest that users may accept targeted advertising due toeither perceived utility they of a website or an act of reciprocity with respectto the free service they receive. In doing so, they balance the negative loss ofsensitive information against the beneﬁts of the transaction (Schumann et al.,2014). Users may have diﬀerent mental models of the Internet and its threatsto privacy, however they do not express an increased eﬀort to protect againstprivacy invasion if they are more literate (Kang et al., 2015) .This underlines how citizens are generally aware of data collection and tar-geting but mostly resign with respect to those practices (Hargittai & Marwick,2016). In accordance with that, (kim tami et al., 2019) claims that trans-parency about data collection practices increases user acceptance if they aredeemed acceptable.Google disallows misconduct on their platform and enumerates prohibitedpractices on its Advertising Policy Help website (Google, 2019o) . In thecontext of this work the prohibition of misleading content is most interesting.Google outlaws false statements about qualiﬁcations and claims that promiseunrealistic results. These two rules inhibit most of the practices documentedin Section 3.1. Our mission is to organize the world’s information and make ituniversally accessible and useful. –Google in 2020 (Google, 2019a) [W]e expect that advertising funded search engines will be inher-ently biased towards the advertisers and away from the needs ofthe consumers. –Sergej Brin in 1999 (Brin & Page, 1999) The study in (Kang et al., 2015) was conducted in an university setting mostly withparticipants in their 20s (students). However, that is already worrisome. Forbidden practices include omitting relevant information (payment model, legal or ﬁ-nancial details), promoting unavailable oﬀers (products not in stock, inactive deals),misleading content (speciﬁed above), unclear relevance (ads unrelated to the search key-word) and unacceptable business practices (fraud, unduly conduct of business).(Google,2019o) .7. Integrated Search Engines: Google as an advertisement enabler Mission statements like the one above show the aspiration of ISE operators tomake sense of the world wide web and put the chaos in order. Market leaders inthis ﬁeld have arrived at monopolistic scale with the capability to serve billionsof users at once and satisfy an inexhaustible thirst for knowledge (Ratcliﬀ,2019; statcounter, 2019b) .The following paragraphs deal with the functionality of ISEs and the rolethey play in a modern society . It further models the online advertisingecosystem and describes the diﬀerent ways of advertisers to connect to usersin Section 2.7.2 and Section 2.7.3.The selection choices of this information selection process are subject toacademic discussion concerned with the power of intermediary platforms to actas editors and the demand of accountability thereof (Bracha & Pasquale, 2008;Edelman, 2011; L. A. Granka, 2010; Grimmelmann, 2010; Introna, 2016). Anexcerpt of these works is discussed in Section 2.7.4.This helps to understand how the decisions of an intermediary like Googlehave signiﬁcant impact on advertisers and users alike. According to Battelle, “[ ⋯ ] a search engine connects words you enter (queries)to a database it has created of Web pages (an index) [ ⋯ ][and] then producesa list of URLs (and summaries of content) it believes are most relevant foryour query” (Battelle, 2005) . This leads to a four-step model of searchcomposed of formulation, action (search), review and reﬁnement that has beenestablished by Shneiderman et al. as early as 1954 and applied to many websearch engines today (Shneiderman et al., 1997) Broder categorizes intentions to search the web into navigational, informa-tional and transactional approaches (Broder, 2002) . According to a studyfrom 2007 in (Jansen et al., 2008), over 80% of queries identify as informa- Users of web search engines are people who query the information system to return onlinecontent (websites, facts, media) that is relevant to their question. According to (comscore, 2019), Google has had a market share of 98.3% in Germany and62.5% in the US search engine market in October 2019 serving 60 million unique users inGermany and processed 10,718 million search queries in the US (comscore, 2019). Othersources claim, Google received at least two trillion inquiries in 2016 (Sullivan, 2016b).Thus, some observers appoint Google the default search engine (Editorial, 2010). Appendix C dives deeper into the mechanisms of collection, indexing, ranking and servingthat a search engine (SE) provides and gives a detailed overview on SE capabilities. In this context, relevance (or being relevant) is deﬁned as being able to satisfy the needsof the user(Merriam-Webster, 2019) or being related to an event or subject(CambridgeDictionary, 2019). Additionally, Broder includes query reﬁnement that enables users to iteratively enhancetheir satisfaction with results through reformulation and modiﬁcation of the originalquery (Broder, 2002). Similar to Battelle and other researchers he assumes that userstake search engines as an aid to satisfy their information need. Navigational queries describe the urge to access a speciﬁc site.

Informational search includes directed and undirected questions, advice-seeking and re-quests pertaining to listing and locating on- and oﬄine entities (Rose & Levinson, 2004).

Transactional search is concerned with interaction and the intent to “perform some web-mediated activity” (Broder, 2002, p. 5). hapter 2: Fundamentals tional. In (Rose & Levinson, 2004) Rose and Levinson suggest that naviga-tional queries represent a minority of web search. Furthermore, they intro-duced the resource category to replace the transactional one. This shouldcontain all intentions to ﬁnd non-informational content online (downloads,recipes, entertainment, aids to oﬄine tasks such as purchases). This is re-ﬂected by Ashkan et al.’s introduction of horizontal categories distinguishing commercial from non-commercial query interests (Ashkan et al., 2009) afterscholars learned that frequent queries often originate from the intention topurchase something (Dai et al., 2006). Later research suggests that searchintentions and strategies signiﬁcantly vary between demographic groups andregional aﬃliation (Weber & Jaimes, 2011) or gender and task (Lorigo et al.,2006). This shows how users mainly engage with search engines when theyperceive an information need or require resources to base their decisions on.On top of that, if they consistently search a topic, they are likely pondering apurchase decision. This can be interpreted as a willingness to spend money.Google continuously advances and furthered its search engine capabilitiesthrough a plethora of updates, features and patents all in order to improve itsalgorithms and thus user satisfaction (Slawski, 2019). According to marketobservers, Google rolls out updates multiple times a day to enhance its serviceand adapt to changes in search behavior (Dodd, 2017; Moz Resources, 2019) Observers note that they usually are dedicated to optimize the search enginefor user-oriented quality content, fend oﬀ malicious attempts of SEO, under-stand a searcher’s context and intentions and expand the variety of queriesthat can be processed (Vinoth, 2017) .Throughout this evolution, a paradigm shift has been and still is observable.The search engine matured from only working with bare keyword associationto processing conversational queries (Slawski, 2018; Sullivan, 2013). Semanticanalysis and context play an important role now (Broder, 2002; Halevy et al.,2018; A. M. Pasca & van Durme, 2014). Furthermore, an intricate knowl-edge repository, fueled by ontologies (Menzel, 2010; Semturs et al., 2015) andenriched by the users themselves is employed to make sense of at ﬁrst incom-prehensible queries. Additionally, personalization of search results based on auser’s background, search history and interaction with results seems to play animportant role (Balog & Kenter, 2019; Brukman et al., 2013; Lawrence, 2010;Zamir et al., 2010) up to the point where some people express their fear ofa “closed-in eﬀect”, that is ﬁguratively named “The Filter Bubble” (Pariser,2011).To fund their operations, search engines often display promotional resultsalong with their organic search results. They are similar styled but markedas advertisement. Moz, a Search Engine Optimization provider writes, without indicating a source: “Eachyear, Google makes hundreds of changes to search. In 2018, they reported an incredible3,234 updates — an average of almost 9 per day, and more than 8 times the number ofupdates in 2009.” (Moz Resources, 2019) Origin unclear, information is available verbatim on various sites “organic” denotes unpaid results on the search engine result page that are listed due totheir relevance to the search query.(Google, 2019t) .7. Integrated Search Engines: Google as an advertisement enabler Figure 2.5.:

Examples for ads on Google: Above, an organic search result, belowa promotional one, denoted by the green marker on the top left

Advertisements in its basic understanding refers to “drawing attention tosomething” (Dyer, 1982, p. 2). This does not necessarily mean a productbut can also address an idea, value belief or opinion, for example the claim ofa therapy’s superior eﬃcacy (Dyer, 1982).The ﬁeld of online advertising makes use of “Information Retrieval, Ma-chine Learning, Data Mining and Analytic, Statistics, Economics, and evenPsychology to predict and understand user behavior” (Yuan et al., 2012, p. 1).It quickly matured from merely displaying static promotional web banners inthe mid- 90s to integrated networks which automatically deliver personalizedmulti-media advertisements in present days (Rashtchy et al., 2007). The ad-vantages of online advertising over traditional formats are clear: pricing (costcontrol through diﬀerent pricing models (Yuan et al., 2012)), optimization (va-riety of media, real-time display and measurability), reach (virtually unlimitedadvertising space, no geographical borders, tap into arbitrary demographics)and precisely targeted ads (targeting customers based on arbitrary attributes).Also, search marketing allows to “brand” search terms . (Rashtchy et al.,2007). This can enable advertisers in the health sector to establish legitimacythrough association of their brand with popular search terms (e.g. the namefor a clinic appears in the top results after searching for stem cell treatments ).The advancement of the Internet as a medium for communication, e-commerceand information allow ISEs to seize a strategic role in connecting advertiserswith customers. They guide searchers to their goals and, by the way, place pro-motion preferably associated with the information need, search intent, productor service that is being searched. Similar to the results, the selected advertise-ments are to be as relevant to the speciﬁc user as possible. Users that expressan information need are more likely to engage with advertising relevant totheir cause (Yuan et al., 2012). “Search enables advertisers to associate a brand with a term, even a term that is tradi-tionally associated with other companies or industries” (Rashtchy et al., 2007, p. 184). hapter 2: Fundamentals Online ads usually consist of a title, creative (text or media), an URL anda landing page (Yuan et al., 2012) whereas the latter two do not necessarilyhave to match exactly.Scholars and professionals alike speak of push and pull or search and display advertising (Rashtchy et al., 2007). The former targets searchers and ushersthem to a speciﬁc webpage that addresses their information need. The latter isdisplayed along the web experience and may interrupt the browsing experience,thus annoy a user (Rashtchy et al., 2007).According to Mayer, six business models compose the online advertisementlandscape. Advertising companies, hosting platforms, frontend services, an-alytics services, social networks and content providers cooperate in arbitrarycombinations to deliver promotional messages to Internet users (Mayer &Mitchell, 2012) .With respect to web search Yuan et al. boil this down to a 4-party modelto simplify the workings and reduce its constituent parts to the most relevantfunctional entities. The paragraph below describes this and illustrates howthe system of online advertising is composed of ad exchanges, advertisers,publishers and users in Figure 2.6. The descriptions are drawn from (Yuanet al., 2012). Publishers oﬀer advertisement space (the inventory slots ) on the service theyoﬀer to gain revenue. A search engine may opt to show promotionalresults in a designated space on its SERP. Thus, Yuan et al. argue that,conceptually, ISEs qualify to be a publisher in this model.

Ad exchanges handle the negotiation of ad delivery and auctioning of inven-tory slots. As a broker focused on supply and demand it computes thematching based on keywords and query terms, website content and userdata, respectively (Google, 2019i). Since these networking agents actas intermediaries, systematic targeting of ads is possible, either basedon website speciﬁcs (target group, topic, location) or user characteris-tics (Mayer & Mitchell, 2012). Yuan et al. distinguish between supply-side or demand-side networks, combination of both and data exchanges.However they note, that the lines between them blur as an all-in-oneapproach popularizes. Nonetheless, data exchanges play a distinct rolein delivering user data for behavioral targeting (Yuan et al., 2012). Themore an ad exchange can make sense of the relations between keywordsin terms of similarity and relevance and the more it learns about users’search context, the more valuable the service it can provide. Hosting services provide utility to easily set up websites while frontend services publishcontent or support extended functionality via JavaScript libraries or APIs, respectively.Content providers and social networks publish content, media or widgets to increase userengagement and collect usage data through tracking. This data is usually monetized intargeted advertising. .7. Integrated Search Engines: Google as an advertisement enabler Google is a strong player in this ﬁeld with 70% market share in theonline advertisement business (Graham, 2019). Advertisers are eager to promote their service or product. They bid on inven-tory slots through the ad exchange. The eﬃcacy of their ads stronglyvaries with position, context and the number of other ads on the web-site. Hence, the price varies based on these metrics and the ﬁt of bidphrase and query term or popularity of the keyword. They choose whichpromotional content to deliver, set up campaign goals, select a billingmethod and review the ads’ performance.

Users access websites to satisfy their information need. The results they re-ceive from search engines are individually tailored and purely based onrelevance. However, which ad they receive, depends on multiple factors.Quality of the match between advertisements and query keywords, bidprices and expected revenue ratios computed by the ad exchange inﬂu-ence the choice (Yuan et al., 2012). Advertisement delivery can also besteered via signals emitted by a search user (Shah, 2019).

Figure 2.6.:

Online Advertisement Ecosystem, by Yuan, Abidin et al (Yuan et al.,2012)

Figure 2.6 by Yuan et al. shows how the four participants are related.There is a ﬂow of cash between the commercial players in exchange for inven-tory slots. Users generally are compensated with value with respect to theirinformation need as they receive online services, which usually are free (com-pared to traditional paper advertising, where magazines must be purchased).They in turn return to the promoted services or products with a commercialinterest or even purchase intention. The interactions between users, search based on ad revenue hapter 2: Fundamentals Advertisers Ad Agency Ad Network Publisher UserAd Exchange

Figure 2.7.:

Ad paths, by Muthukrishnan (Muthukrishnan, 2009a, p. 2), extendedby author (addition of ad exchange as a new intermediary) engine and ad network providers and advertiser constitute the communicationin the social system of the STS of web search. The diﬀerent perspectives onGoogle’s role therein are discussed in the next part.There are several diﬀerent methods to place advertisements on a website(see Figure 2.7). In the traditional

Direct Buy pricing model advertisers buya distinct slot on the ﬁrst-party website of a publisher to place their promo-tion (Mayer & Mitchell, 2012). Usually, these ads categorizes as

BrandingAds with long-term contracts for distinct slots and no targeting diﬀerentia-tion (Yuan et al., 2012). Yuan et al. describe other business models in webadvertisement. For example, publisher networks or ad agencies / advertisernetworks operate supply- or demand-side platforms. They act as intermedi-aries and facilitate their members’ or customers’ advertisement business. Indoing so, they organize the entirety of the inventory slots or ads of their cus-tomers (Muthukrishnan, 2009a; Yuan et al., 2012). From this duality, adexchanges like Google AdSense emerged. They manage diﬀerent kinds of ads,including sponsored search ads and contextual ads . In the ﬁrst case, ads arematched to users based on keywords (query terms, content on the relevantwebsites, e.g.) and user-PII and displayed among the search results. The lat-ter describes ads that are targeted based on context (domain, user intent, e.g.)and PII with ﬂexible localization on a publisher’s website. These types of adscan further be diﬀerentiated by delivery method, trading place, competitionmethod, pricing model and automation (Yuan et al., 2012) The principal goal of the Internet-based advertising system is to ﬁnd “thebest match” in terms of both relevance and revenue between a speciﬁc user ingiven context and set of available ads through computation. A summary of diﬀerentiations found in (Yuan et al., 2012) is described below:

Delivery method

Forward contract or on the spot

Trading place

Over the counter or on a transparent market (auction)

Competition method

Pricing models

Flat-rated (per time), or cost-wise (per click, mille, action or conversion)

Automation

Manual (mostly in negotiation and campaign planning) or automated (real-time bidding) .7. Integrated Search Engines: Google as an advertisement enabler Muthukrishnan describes the business model of ad exchanges like GoogleAdSense in (Muthukrishnan, 2009a, p. 2) as follows. A user u visits a website w that allots space to ads. The publisher p ( w ) requests an ad from the adexchange E and also denotes a minimum price p for the inventory slot. Inthis model, it is assumed, that p ( w ) knows u ’s characteristics and shares thisinformation with E . The ad exchange provider furthermore knows about the ad conﬁguration (Muthukrishnan, 20010) on the target page. Additionally,it can crawl content on w to make inferences. Then, E requests ads from adnetworks a , ⋯ , a m . It may disclose some information E ( u ) and E ( w ) about u and w along with the minimum price to each of them. This could includePII of u or topics of w . An ad network may return a bid b i > p and an ad d i of one of its customers to display on the slot. In a competition method(see above) determined by the exchange, the inventory slot is sold to thewinner who can now serve its ad on the publisher’s website to the user if itﬁts the conﬁguration. This is called an impression . The winners are notiﬁedof their success (and possibly, the losers, too). All of this happens in a matterof milliseconds.(Muthukrishnan, 2009a) Google extends the above model byusing AdRank , a measure that inﬂuences the position an advertisement canattain. It is inﬂuenced by the respective bid, ad-content and landing pagequality, competing other ads, search context, relevance and performance. Inan auction scenario, the AdRank determines an ads success in an auctionand its position on the SERP (Google, 2019b). The dynamically computed

AdRank threshold is a score set by Google to determine the minimum priceof a speciﬁc inventory slot and the rejection level for ads competing for theslot (Google, 2019e).Google serves both side of the market. The platform

AdSense enables pub-lishers to sell inventory slots on their respective sites via Google ad exchange. Google Ads on the other side allows advertisers to bid on advertising spaceon websites and the search engine to display their creatives (Google, 2019t).Furthermore, it hosts a tracking and analysis service that enables customersto gather information about web site visitors. To use Google Analytics , theyonly have to include a JavaScript snippet or “Google Tag”. Then they haveaccess to a rich set of analysis tools and the opportunity to link insights andstatistics to their respective advertising campaigns on the Google Ads. Both,Search-Engine-Advertising (SEA) (listing ads as promotional results along onthe SERPs of Google’s search engine or its partners’) and display ads (deliv-ered over the AdSense program to a network of publishers, the Google DisplayNetwork ) can be purchased (Google, 2019f, 2019i). Google Ads oﬀers bothsponsored search and contextual ads in a generalized second price auction The ad conﬁguration includes localization, dimensions, media type of an inventory slotand other conditions determined by the publishers. It is guided by presumptions aboutad eﬃcacy and user engagement drawn from empirical data (Muthukrishnan, 2009b). https://ads.google.com/ Creatives denote the visual appearance of ads, including but not limited to text, images,media and the respective styling. hapter 2: Fundamentals (GSP) (Google, 2019i). It allows automation of ad delivery based on spec-iﬁed goals (clicks on the ad, conversions (some intended user action like apurchase or phone call) or impressions (mere display), e.g.) and automatedbidding on advertisement slots (Google, 2019c, 2019g). If serving an ad viaGoogle Ads, advertisers have multiple ways of targeting users. They can picka speciﬁc audience (by demographics, aﬃnity, purchase interests, speciﬁc be-havior, similarity with another audience or by reconnaissance (remarketing)).Besides, they can address searchers by the topics and content of sites theysearch for or the keywords they type in. On top of that, users in a deﬁnedsituation can be approached, for example at a distinct life event (marriage)or situation (time, place, mobile) (Google, 2020a, 2020c). Herein, ads canbe published automatically in an arbitrary fashion or deliberately on speciﬁcsites, apps or media (Google, 2019s). Ultimately, Google allows advertisers toaddress individual users by “Customer Match”, if it is compliant with privacypolicies (Google, 2019d).Nevertheless, Google inhibits advertisers to imply knowledge of PII in theirads or market to a very narrow audience only. In fact, it also speciﬁcally pro-hibits promotion in sensitive categories such as clinical trials, personal hard-ships and health (Google, 2019p). Furthermore, there are numerous institu-tions that should guide advertisers in achieving ethical conduct of business .The auction process depicted above shows how an ad exchange acts as an in-termediary between advertisers and user. Furthermore, we can conclude thatbased on the insights from Section 2.6, Google qualiﬁes as both an advertis-ing and data exchange. With its tracking services and analysis capabilities itleverages data collected about users and their online interactions to enable be-havioral targeting. With this this technique, they are able to directly addressspeciﬁc users that they assume to be in their target group. Unfortunately,this may include sensitive categories such as medical conditions. Even thoughthey cannot be either immediately and the use thereof is prohibited, they canstill be targeted through sophisticated combination and computation of userattributes. With their decisions on how to collect, index, rank and present results andadvertisements, integrated search engines exercise great power. People turntowards them in search of all sorts of information. They conﬁdently trust asearch engine to objectively rank results of a query by true relevance (Panet al., 2007). They shape searchers’ perceptions of the web and intervenewith their behavior online. This can have signiﬁcant social and commercialimplications as it assigns visibility and directs attention (Goldman, 2006).Grimmelmann enumerates three diﬀerent views of scholars concerning Google’srole in society (Grimmelmann, 2013, 2014). He contrasts the role as con- See (Edelman et al., 2007) for an elaboration of the GSP .7. Integrated Search Engines: Google as an advertisement enabler duit (Chandler, 2007) with those of an editor (Goldman, 2006; Volokh &Falk, 2012) and an advisor (Grimmelmann, 2014). These roles are describedand discussed below. As conduits , ISEs appear as gatekeepers or bottlenecksthat mediate between content providers, advertisers and consumers. Thus,a conduit can exercise power through blocking websites or neglecting certainadvertising customers (Grimmelmann, 2010). They could refuse to index con-tent, manipulate auctions or introduce bias. In (Chandler, 2007), Chandlerjuxtaposes speakers and listeners to stress how intermediaries can shape thecommunication between those parties. Comparing this communication as aform of verbal exchange relates it to the question of free speech as a foundationof fair use. She raises the question of how free speech can be guaranteed ifgatekeepers like IREs have the opportunity to deliberately interfere with theinteractions conducted on their platforms and automate their business withundisclosed algorithms. Chandler links this to net neutrality , a principle bywhich selection intermediaries such as search engines and ad exchanges shouldnot discriminate content and exercise bias (Chandler, 2007). This idealis-tic approach means to maintain free speech online and is based on the ideaof functional similarity between search engines and Internet service providers(ISP) and network providers. They all act as bottlenecks in data transmission,they argue, thus need to be treated accordingly. It is vigorously contested byGrimmelmann who argues that fulﬁlling all principles he derived from the ideaof net-neutrality is just unrealistic and renders search useless(Grimmelmann,2010, p.436f) . Nonetheless, he adds that giving search engine operators freereign is not an option. Granka agrees and elaborates in (L. A. Granka, 2010)how following these principles would hamper quality of search results and com-petition through malicious manipulation and less market diﬀerentiation. Inaddition, she notes that the most wide-spread components of search enginealgorithms are already widely known and well researched. Pasquale summa-rizes that net neutrality should be imposed on search engines only in regardto transparency concerning business relations, promotional content and paidresults (Pasquale, 2008).The editor describes another perspective on intermediaries. Selection liesin the very nature of ISE. All of their practices constitute a form of editorialjudgment. Goldman points out how search engine providers decide upon whatdata to index, how to rank it and which part of it to eventually present. Eventhough most of these operations are performed automatically in a seeminglyobjective-computational rationale, the inner workings of these procedures,their weights and factors, parameters and input are clearly deﬁned. Thesedecisions generate an editorial act along with the manual adjustments thatmade in response to certain issues, he argues (Goldman, 2006). Herein, thelatter may reﬂect a company’s values and willingness to self-regulate, though Grimmelmann enumerates eight principles that characterize neutral search in the waya radical conduit would perform it (additions for clariﬁcation added in parenthesis):equality (no diﬀerentiation among websites), objectivity (distinguishing between cor-rect and incorrect results), (no) bias, (suﬃcient) traﬃc, relevance (maximize user satis-faction), (no) self-interest, transparency (full disclosure of algorithms), (no) manipula-tion.(Grimmelmann, 2010) hapter 2: Fundamentals the actual criteria the algorithms ought to comply with usually remain un-known (Diakopoulos, 2013b) . The misuse of editorial power though canmislead users (Grimmelmann, 2010). Nonetheless, Grimmelmann demandsplatforms to take responsibility and moderate content, even manually, in or-der to cope with the “disturbing demand-driven dynamics” (Grimmelmann,2018, p. 1) that scourge Internet platforms. He deems this measures nec-essary as algorithms cannot be conscious or self-aware about the entirety ofconsequences that entail their actions.Grimmelmann notes how there is space left for another form of intermediarybetween the objective conduit and the subjective editor. While a conduit’sjob is to “deliver to each website the user traﬃc to which it is properly enti-tled” (Grimmelmann, 2014, p.873), the editor only cares to satisfy the audienceand keep it from switching to competitors. In (Grimmelmann, 2014) users areintroduced as the subject of interest, who are actively educating themselves ona certain topic. This underlines how the two approaches above are combined.Instead of being a passive audience, users formulate their goals and expecta speciﬁc mix of websites that cater to their needs. According to Grimmel-mann, the advisory search engine answers to a user’s query in a personalizedway that is “uniquely relevant to the user’s unique interests” (Grimmelmann,2014, p.874).The choices Google makes pertaining to ranking are relevant because re-search suggest that results higher up on the search result page receive more at-tention and generate higher click through rates (L. Granka et al., 2004) (Lorigoet al., 2006) . Scholars assume that this observation can be attributed to twodiﬀerent factors. Firstly, search engines by design try to return the most rel-evant results on top of the list. This is perceived as an indication of qualitywhich they call trust bias . Users trust the algorithm to deliver the truly sig-niﬁcant result at ﬁrst. Secondly, the relevance of an advertisement is assessedin comparison to other results on the page, leading to a “quality-of-contextbias” (Joachims et al., 2007). This has ramiﬁcations for ad delivery as well.If businesses in the stem cell tourism industry manage to get listed amongapproved clinics, governmental agencies and medical authorities in the healthsector, they beneﬁt from the quality-of-context bias. A slot on the SERPamong those entities could be interpreted as a token of legitimacy (see Sec-tion 3.1). They can also leverage the trust bias as ISEs seemingly conveyobjective importance. On top of that, keyword-based advertising campaignsmight claim an association with a topic like emergent stem cell treatments ora form of therapy. They might try to “brand” a speciﬁc search term with theirname and solidify their popularity among searchers in this ﬁeld.All of the above is equally relevant for advertising displayed on the SERP.Ads are located at the top and bottom of the result page and thus are perceived In a

TechCrunch article, an interviewee points out, how “[t]here are things Google hasdeemed relevant to the public interest that they’re willing to kind of intervene andguard against, but there really is not a great understanding of how they’re assessingthat” (Dickey, 2017, Robyn Kaplan). Both studies were small scale eye-tracking experiments with only 26 and 23 validly report-ing participants respectively but are widely cited and accepted. .8. Application to Web Search as being signiﬁcant results deliberately chosen by an intermediary. However,no privately operated ISE can grant full disclosure of its workings. Never-theless, its operators have to be aware of the ramiﬁcations that ensue theireditorial choices. From a constructionist perspective, one can model the socio-technical systemof (sponsored) web search and aﬃliated online advertisement using the elabo-rations in Section 2.5. Below, this model will be constructed from the insightsabove.The social system is represented by the fraction of society that is concernedwith web search and online advertising. Herein, this subsystem is denoted the“Web Search Society” (WSS). In this analysis, the WSS consists of commu-nications between four kinds of participants. The WSS is inﬂuenced by (1)consumers or users that search the web, (2) the companies developing ISE and running ad exchanges and search engines and (3) advertisers promotingtheir products, services and ideas. These inﬂuencers embody the WSS’s en-vironment. They can stimulate the communication within the social system.Ultimately, content providers or publishers (website hosts) and governing in-stitutions that regulate the WSS could be included as well. However, thisthesis concentrates on the interactions of the ﬁrst three and only covers thelatter to a small extent.Through the open-minded approach to information in Section 2.1 it is pos-sible to identify communication processes induced by those agents. Below, thefour-fold approach is reviewed with respect to web search. • Representation of knowledge:

Website content, Knowledge Graph,algorithms • Data in an environment:

User data, implicit user feedback, WWWstructural data, semantic ontologies • Part of process of communication:

Query semantics, advertise-ments, editorial selection, online behavior • Resource or commodity:

Ads, websites, user data, attentionAlong these assignments, the communication processes in the WSS can besketched. Figure 2.8 shows them schematically, connecting the agents in theenvironment of the WSS through their mutual communication. The direc-tion indicates sender and receiver, the arrows are labeled according to theinformation the respective communication carries. This thesis refers to users when it considers humans involved in a human-computer interac-tion (here: searching the web via a search engine). They conduct searches, review resultsand act based on the information they retrieved. In contrast, citizens are concernedwith their role and relationships within a society. Their perspective includes policies andgovernance issues and how the socio-technical system can be shaped. A platform that combines search engine and ad exchanges hapter 2: Fundamentals Figure 2.8.:

Communication processes in the Web Search Society, illustration byauthor

The technical system (TS) manifests in an integrated search engine (ISE)which supports the above communication processes. The ISE comprises algo-rithms that enable web search, collect and analyze data and organize onlineadvertisement. These algorithms constitute the entities or components of thetechnical system. The communication processes it supports are evaluated andfed back into the system to re-calibrate its workings. Herein, users seek tosatisfy an information need and inquire about a subject. They want to ﬁndan informational resource on the WWW. They do so by inquiring the searchengine providers via the search engine’s web interface. The company runningthe search engines executes algorithms to ﬁnd relevant search results. First, itcrawls the web and collects websites by publishers. Then, it indexes the col-lection. Eventually, it displays a ranked list of ﬁndings on the search engineresult page to answer the user. Through selective presentation of publisher’scontent along with ads to users in a comprehensible way, Google Web Searchsupports the communication between publishers (producers), advertisers andusers (consumers). This allows users to satisfy their information need, adver-tisers to target consumers and publishers to reach their audience. Concur-rently, it facilitates negotiations and auctions about inventory slots on GoogleAds’ ad exchange. This also enables advertisers to place their promotionalmessage on the SERPs or third-party sites which are eventually displayed tousers through Google AdSense. Furthermore, it collects and merges data fromdiﬀerent sources. It computationally draws conclusions about the outer con-text of the web search ecosystem and the inner context of users with GoogleAnalytics. This inﬂuences the capability of the communication partners tobridge the digital gap and base their interaction on more or less mutual con-text.The technical system strongly inﬂuences the WSS. Through its editorial38 .8. Application to Web Search choices it determines what people perceive as relevant and shapes users’ waysof formulating their questions. It furthermore dictates the code of conductwith respect to ad’s and websites’ quality in form and function. Through itsdominant position in the market as a major search engine that accumulatesa plethora of data and its actions have signiﬁcant repercussions on the onlineexperience of society. It also impacts individuals’ sense of privacy since thedissemination of user data and targeted advertising are part of the technicalmechanisms and social communications alike (Hargittai & Marwick, 2016) .As shown above, the WSS includes the technical system in various aspectsof its communication as required by Kunau in (Kunau, 2006). Additionally,the WSS incorporates the mechanics of the ISE in its self-description. Theseinclude but are not limited to characteristics like instant answers, targetedadvertising, realtime bidding on ads, as seen in Figure 2.9. Hence, withoutthe traditional ISEs there would be no online search as we know it. Theemergence of the term “to google” (Duden, 2020; Merriam Webster, 2020)reﬂects this, as well as the rise of an online advertising industry (Evans, 2009)in the last decades and the comprehensive research in the ﬁeld of search enginetechnology (see Section 2.7 and Appendix C and online behavior). Figure 2.9.:

A model of the socio technical system of web search and advertising,own illustration adapted from (Kienle & Kunau, 2014)

In this work, the focus lies on the communication between IRE and user, asit is the only immediately observable interaction. However, the analysis belowtries to infer about the inner context of the intermediary and the motives ofadvertisers (the selection of ads). (Hargittai & Marwick, 2016) is from 2014 but remains relevant, with respect to thewidespread use of social networks among young users and networked devices on oneside and the evolution of tracking techniques on the other. . Related Work In academia, accusations of discriminatory and biased algorithms are not un-common (Sweeney, 2013). Consequently, there have been numerous attemptsby scholars of various disciplines to reverse engineer or scrutinize privately op-erated information systems that have social impact. For example, researchersinvestigated; • Web search (Hannak et al., 2013; Willis & Tatar, 2012) and advertis-ing (Guha et al., 2010; Speicher et al., 2018; Sweeney, 2013), • E-commerce (Hannak et al., 2014; Mikians et al., 2012; Valentino-DeVrieset al., 2012) and reviews (Mukherjee et al., 2013), • Text completion (Diakopoulos, 2013b), correction (Keller, 2013) and de-tection (Sap et al., 2019), • Perception of online environments (Hannak et al., 2014; Kraﬀt et al.,2017; Larson & Shaw, 2012), • Finance (Lazer et al., 2014; Pulliam & Barry, 2012).This chapter discusses literature related to my analysis of direct-to-customermarketing of stem cell-related services on integrated search engines.First, in Section 3.1 the realm of stem cell tourism is explored with its impli-cations for patients and caretakers. This helps to understand how our investi-gations can contribute to the protection of vulnerable user groups like patients.Then Section 3.2 presents diﬀerent approaches to regulation of Internet-basedservices. This is meant to emphasize the role that society plays in technologi-cal assessment. Next, the concepts of algorithmic accountability with respectto transparency and responsibility are explored in Section 3.3. This is im-portant as it enables to designate moral agency of SRAs. Lastly, Black Boxanalysis are described in Section 3.4. In this thesis, it is the method of choiceto scrutinize opaque SREs.

The health sector oﬀers a growing number of opportunities to deploy digitaltechnologies. Health-related online research (specialized vertical search en-gines or websites for both novices and experts) is widely accessible to Internetusers. Social networking platforms host research communities, patient discus-sion forums, crowdfunding campaigns and lobby groups that strive for medical41 hapter 3: Related Work progress in one way or another. Digitalization includes wearable devices thatallow individuals to monitor or track the functions of their body (Lupton,2012) . It also encompasses gadgets and technical devices that record andanalyze usage (e.g. “smart” toothbrushes). Digital technologies allow stake-holders to actively engage in open discussion, lobby for progress, be involvedin patient groups and steer public opinion as well as raise awareness or politi-cal attention (Petersen, Tanner, et al., 2019). The improved access to healthrelated resources and social networks of people aﬀected by a condition is espe-cially important to individuals who are in any way “incapacitated, immobileand socially isolated through illness or disability” (Petersen, Tanner, et al.,2019, p.3). It allows patients to engage actively in periods of near-hopelessness,when survival itself may be at stake (Novas, 2006).This active stance reﬂects patients’ desire to take control and achieve sub-jectively signiﬁcant improvements through SCTs, though most of them donot expect miraculous recoveries but merely slight improvements of their con-ditions.(Petersen et al., 2013). These technologies eventually provide a com-modity to an increasingly large market that trades personal data and infers farreaching conclusions from it (Petersen, Tanner, et al., 2019; A. Tanner, 2018).Unfortunately, this also includes an emerging black market with medical databeing a casualty of data breaches (V. Liu et al., 2015; Tindera, 2018). Researchsuggests, health care providers were most often breached in the US from 2010until 2017 (they accounted for 70%). In this period, the frequency of incidentsincreased almost every year (McCoy & Perlis, 2018). Recent regulations havepushed for commercial access to health records through questionable “empow-erment” of patients which will further the dissemination of health-related data.This reinforces the imbalance between institutions with commercial interestsand individuals concerned with their health (Ebeling, 2019). Concurrently,major Internet-based corporations tap into the market of health-related prod-ucts to expand their portfolio. We see platforms like Amazon and Googleacquire businesses that grant access to millions of people’s health data (Farr,2019; Scott, 2019) or provide web services to health care institutions (RoyalFree London, 2017). Observers predict, operations like those will likely re-shape the health care landscape(C. Tanner et al., 2019). Ultimately, onlinemarketing is an important aspect of the digitalization of the health sector. Itallows oﬀerors of health-care services to directly identify potential consumersand approach them in a personalized fashion through data amalgamated fromdiﬀerent sources.The development of health-related online activities is fueled by a new formof patient activism (Petersen, Schermuly, et al., 2019) and the right to try that was already passed as a law in 36 US-states in 2017. It gave patients whosuﬀered from “intractable or incurable conditions the opportunity to samplealmost any last-gasp therapy without interference from government regula-tors” (Hiltzik, 2017). It is assumed that patients turn towards the Internetin search of information and counseling about SCT due to the Internet-based Lupton mainly discusses the implications of constant surveillance through mobile healthtrackers on the individual subject and society. .1. Digitalized Health in the Realm of Stem-Cell-Tourism nature of the stem cell tourism industry (Master et al., 2014). Patients maynot be aware of the risks involved in the advertised treatments and ignorantof the information they need to gather (Connolly et al., 2014).Unfortunately, the Internet is the place, where the “politics of evidence”enfold, as Tanner puts it (C. Tanner et al., 2019). He means, that it is hard toobtain reliable information and ﬁnd credible advice among all the hype storiesand anecdotal evidence (in crowdfunding abstracts, patient blogs e.g. ). Arecent study found that there is a need for comprehensive information and ac-tive campaigning of medical authorities and professional organizations to meetthe expectations that patients have when they conduct research on stem celltreatments (Zarzeczny et al., 2019). Some institutions issued advice on thistopic to guide patients seeking to try experimental treatments (Eurostemcell,2020a; International Society of Stem Cell Research, 2019) Observers see a risein crowdfunding campaigns concerned with unproven stem cell treatments (Pe-tersen, Tanner, et al., 2019; C. Tanner et al., 2019) due to insurers refusal tocover expenses for experimental treatments (Snyder & Turner, 2019). Thisway, they can circumvent scrutiny by professional medical institutions.Stem cell clinics and aﬃliated businesses also list their treatments on popularplatforms that register clinical trials to promote their unapproved therapies.Researchers found out that most of these studies are lacking scientiﬁc, ethi-cal or regulatory review, charge patients for participation and are conductedwith an unjustiﬁable risk (Turner, 2017). These phenomena amplify the nar-rative of stem cells treatments being a novel and universal cure and falselygrant them a scientiﬁc character. On top of that, it enables marketers andproviders of unproven treatments to advertise directly to consumers, circum-venting regulation, expert review and professional oversight(Petersen, Tanner,et al., 2019).For patients with severe conditions this poses a threat as they may be luredtowards unproven treatments in the best case and fake medicine or dubiouspractices in the worst case. All of which come with possibly disastrous con-sequences such as physical harm, psychological distress and ﬁnancial loss forthe patients themselves or their caretakers (Amariglio et al., 2009; Lysaghtet al., 2017; Nagy & Quaggin, 2010; L. O’Donnell et al., 2016). Furthermore,this leaves responsibility in the hands of a layman as patients must judge thevalidity of cutting-edge technology and emerging medical therapies with theirlimited understanding of the subject (Petersen, Tanner, et al., 2019).Providers of questionable SCT argue that freedom of choice and patientautonomy can be achieved through direct-to-customer marketing. However,they disregard the idea of informed-consent if they assume patients to makedecisions based on unreliable and implausible claims or tokens (Turner, 2018).Online communication of SCTs mostly lacks medical information and truthfuldisclosure about a treatments details and eﬃcacy (Connolly et al., 2014).Clinics and agencies concerned with either travel, advertisement, marketing,health or all of the aforesaid capitalize on tokens of legitimacy (Sipp et al., whose creation is encouraged by clinics themselves according to (Ryan et al., 2010) hapter 3: Related Work .The direct-to-customer marketing of SCT is seen as problematic as it lever-ages a narrative of hope, rides the hype of regenerative medicine and is mostlybased on anecdotal success stories (Enserink, 2006). Medical travel meanwhilearrives at a new scale since it became a competitive online-based market withwilling customers that seek health services abroad. Some of these servicesare experimental procedures in less regulated environments as well as treat-ments exclusive to only a group of patients (Hiltzik, 2017; Whittaker et al.,2010). This industry has already ﬂourished in the last years, with marketingand clinic networks spanning around the globe featuring hundreds of clinicsworldwide (Munsie et al., 2017). The global market for stem cell therapies(SCTs) is expected to grow by almost 28% in the next ten years making ita multi-billion dollar business (BIS Research, 2019). It comprises institutionsfrom travel, advertisement, marketing, health and government (Turner, 2007)This “stem cell tourism” is described as an online, direct-to-consumer adver-tised Internet-based industry where patients and carers cross geographical orjurisdictional boundaries to receive stem cell treatments for which there existslittle to no clinical evidence of safety or beneﬁt (Master et al., 2014; Petersen etal., 2017). While the mainstream research community assumed the providersof SCTs to operate from Asia, Mexico and the Caribbean, there is evidencethat the market is increasingly served by US ﬁrms and other middle menalike who strongly advertise their services online (Turner & Knoepﬂer, 2016) . (Mackey et al., 2015) shows that marketing expenditure for Internet direct-to-customermarketing doubled from 2005 to 2009 Turner identiﬁed 351 businesses engaged in direct-to-customer advertising and 570 clinicsoﬀering stem cell interventions in (Turner & Knoepﬂer, 2016) and 432 businesses and716 clinics in (Turner, 2018), respectively.) .1. Digitalized Health in the Realm of Stem-Cell-Tourism Many of those US companies advertise a plethora of unlicensed interventionsfor sundry conditions, some promote SCTs for more than 30 diﬀerent dis-eases (Taylor-Weiner & Graﬀ Zivin, 2015; Turner, 2018; Turner & Knoepﬂer,2016). Unfortunately, local businesses involved with stem cell tourism are notyet subject to regulation concerning the therapies they promote the facilitatingservices they provide (Turner, 2015).Research suggests that people with poor health are not also newcomers tothe web but also use it more frequently (Houston & Allison, 2002; Li et al.,2016). Over the years, this correlations remained stable but overall onlineinformation seeking decreased, possibly due to concerns about false informa-tion (Li et al., 2016). Trust plays an important role in this activity, especiallywith elder users (Miller & Bell, 2012). This raises concern as researchers foundthat low digital literacy leads to online behavior that entails potential harm.In detail, Gangadharan worries that marginal users struggle with adoption ofonline activities. They could be discriminated and exploited because they areunable to identify malicious actors and distinguish promotional from organiccontent (Gangadharan, 2017).Already, we see how algorithms in health care endanger parts of the popu-lation. In New York, for instance, black patients were deterred from higher-quality health-care thanks to a biased algorithm that falsely inferred goodhealth from low health-care spending. Contrary to that interpretation, it wasbad access and distrust in institutions that made the discriminated groups tospend less on health care and treatments (Akhtar, 2019; Obermeyer et al.,2019). With respect to SCT and medical travel, Turner criticizes in (Turner,2018) how neither government nor professional authorities (like the FDA) canoversee and regulate the market. Sipp et al. conclude that stem cell tourismfurther grows even though scientiﬁc communities, media and governmentalauthorities issue warning (Sipp et al., 2017). Additionally, scholars point outthat the ﬁnancial and social implications are unpredictable, hence not includedin today’s discussions on how digital technologies should advance(Petersen,Tanner, et al., 2019). The direct-to-customer marketing seems to be a cru-cial aspect of this industry. It leverages the insights from data collection andanalysis that is enabled by digital technologies like web tracking and compu-tational modeling. It allows SCT-providers to individually address potentialcandidates for stem cell treatments without publicly exposing their marketingeﬀorts. The increasingly personalized nature of these Internet services mightundermine the notion of a public opinion on the subject of SCT as users canbe individually targeted with prseudo-informational content (Tufekci, 2014b).Thus, some organizations demand to discuss algorithmic accountability (aware-ness of an algorithm’s potential risks) and algorithmic justice (compensationfor harm done by an algorithm) (World WIde Web Foundation, 2017) beforedeveloping socially relevant algorithms in a sensitive ﬁeld like health care.Below, in Section 3.3, these ideas will be discussed more thoroughly. By marginal users , the author means members of historically marginalized or discriminatedgroups, like poor people, ethnic minorities or other groups at the fringe of society. hapter 3: Related Work This section elaborates on diﬀerent approaches to regulation of technologieslike stem cell treatments of web advertisement. They are gathered from schol-ars of various domains. However, their universal applicability can supportthe analysis of the forming eﬀects that actors in the socio-technical systemexpress.

A mix of laissez-faire attitude, unwillingness and wide-eyed astonishment hasallowed tech companies to impose their algorithms, packaged in business mod-els onto the world and its populations (Mager, 2012). The history of searchengine related cases shows that the interests of stakeholders are not alignedwith policies and legislation, yet (Gasser, 2006). Some authorities responded“perfunctory” (Gasser, 2006) to technological progress and deferred policiesuntil the market has already created precedents. Others embraced the tech-nological advances and implement governance-supporting algorithms, bettersooner than later (Kubota, 2019) .Some might argue that companies acting as intermediaries should not beheld liable for content they republish or host. In the USA, this was inte-grated into legislation, so ﬁrms do not fear prosecution there (US Code 47 § . Supporters advocate this as being the sole way to protectfree speech (Ammori, 2014). The freedom of expression, they argue, is thefoundation that allows platforms to operate on user-generated content, enablebloggers to communicate with their readers and to sustain life of communi-ties that discuss controversial topics (EFF, 2019). In U.S. court rooms, casesinvolving the editorial characteristic of search engines were generally ruled intheir favor, referencing the U.S. constitution’s ﬁrst Amendment and the rightto free speech (Volokh & Falk, 2012).In the meantime, the European Union has taken up a diﬀerent stance. Avoluntary agreement titled “code of practice on disinformation” was signedby big tech companies and the Union (Schulze, 2019). Along with these self-commitments, the EU wants to “upgrade liability and safety rules for digitalplatforms, services and products” (Schulze, 2019). They are willing to forceregulation onto technology companies to protect citizens in their member na-tions (Ungku, 2019). Germany, for example, imposed signiﬁcant ﬁnes of upto 50 million euros on misconduct or hosting of “criminal” content (Faiola & “One signiﬁcant highlight of these new rules is that the era of algorithm regulation is oﬃ-cially coming [ ⋯ ] [A]lgorithms should have values, and they must have the right values.At the same time, algorithms should be rule- and law-abiding”, Zhu Wei, associate pro-fessor at China University of Political Science and Law, deputy director of the university’sResearch Center of Communication Law (Kubota, 2019) Communications Decency Act ,47 U.S.C. § § .2. Governance Kirchner, 2017). Furthermore, the EU crafted the far-reaching General DataProtection Regulation (GDPR) to theoretically grant the right be informedabout data collection to users (European Parliament & EU Council, 2016).Observers notice in the press how regional legislation (here: the aforemen-tioned European advances) have an international inﬂuence on how servicesare provided in other countries. Thus, a reevaluation of an algorithm that wasinitiated due to local regulation often disseminates. Local adjustments leadsto global adoptions. This way, a public discussion about the suggestive natureof Google’s autocomplete feature had ramiﬁcations for the global applicationof the algorithm, for example (Dickey, 2017). The public was not informed ofwhether this was a matter of precaution or simply a measure to avoid multi-ple versions of code. This shows that it is worth to scrutinize and questionalgorithms, as beneﬁcial eﬀects are contagious.Scholars claim, it ﬁrst takes a scandal pertaining to data privacy or dis-crimination for public to take notice, media to report or government to act(O’Neil, 2017a). But instead of precipitant legislation, scholars demand anopen discourse and common understanding of values and policy objectives.Accordingly, this should steer discussions on regulatory strategies and yieldsound policies that govern in agreement with all stakeholders (Gasser, 2006).In Section 3.1, it came clear that the realm of proprietary SRAs needs somesort of governance. The paragraph above showed how legislation can attemptto regulate Internet-based companies (in the case of Germany and the EU)or how they fail to do so due to conﬂicts of interest (free speech and contentcontrol).Goldman argues to let intermediaries ﬁx the problems themselves as anyregulatory interventions reduced their freedom to improve service quality andadapt to their environment (Goldman, 2006). This might pose a problemas they are proﬁt-driven companies that are possibly more concerned withcustomers than consumers. Some algorithms disrupt and transform socialsystems and impose new rules of engagement (Kitchin, 2017). This promptssome scholars to propose that this kind of technological advancement is due toa technological determinism (Habermas, 1968; Mensch, 1980; Schelsky, 1961) that imposes its reign on a society and shapes it accordingly to ﬁt its functionalrequirements (Grunwald, 2002). It infers that humans are doomed to “say cer-tain words, click certain sequences, and move in predictable ways” (Ananny,2016, p. 104) so an algorithm can anticipate their actions. Accordingly, advo-cates argue that in this sense, technological progress would strive to a singleoptimal solution in an almost Darwinian sense (Ropohl, 2013). Ropohl imme-diately rejects the idea and points to the multiplicity of stakeholders and theirvariety of motivations and goals when it comes to technology (Ropohl, 2013).This infers that there are diverse agents interested in shaping technological The idea of technological determinism was heavily criticized by scholars like Ropohl, whoargued that technological progress can be controlled with appropriate methods. It re-quires a systemic approach though, as an individual cannot face the challenge alone.Furthermore, Ropohl rejects the idea of the “best solution” that technology supposedlystrives to achieve. He argues that in the multiplicity of stakeholders, this is a naivesimpliﬁcations (Ropohl, 1983, 2013). hapter 3: Related Work progress.In consequence, society has to appoint agents to enforce governance if it doesnot want to surrender to technological progress that is both uncontrollable andunstoppable (or enforced and dictated by a single actor) as it is destined inthe dystopia of technological determinism (Grunwald, 2002). Grunwald addsthat society has to consistently reﬂect on its norms and regulations once it hasa learning experience regarding emerging technologies. It must question themotives and intentions of stakeholders and the basis of their decisions. Thesereevaluations must be premised on the new insight that entail technologicalprogress (Grunwald, 2002). Black Box analysis are a possible tool to sourcethese insights and fuel discussions on the subject of technology evaluation.To sum up, herein this idea is rejected due to two reasons. First, as de-scribed in Section 2.5 and Section 2.8 a social system uses components of atechnical system to facilitate its communication. It negotiates what technolog-ical advances it considers necessary to support its communication and freelydecides what to include in its self-description. Thus, it has capability to shapethe human-computer interaction that it integrates in its communication pro-cesses. Second, as shown in Section 3.3, the emergent behavior of algorithmscan be accounted for by an agentic swarm. Its constituent actors make dis-crete design decisions concerning the technical system based among others onlaws and norms. Moreover, as pictured in this chapter, these decisions canbe subject to a variety of governance forces that have a forming impact. Inconclusion, society has the power to form technology in its respective socio-technical system by leveraging the various forces that are capable to shape anobject to govern. Due to an increasingly complex and interconnected world, scholars developeda new perspective on governance, that is no longer state-centered and monop-olized by institutional authorities. The “new” governance is concerned withthe collective creation of rule through mechanisms that are not uniquely con-trolled by governmental agents by an autonomous network of interdependentactors (Stoker, 1995). Kooiman points out how it is more of a process thanan entity. From now on, well-being, progress and security can no longer beachieved by one central agent alone. In contemporary societies, he argues, suc-cessful governance is a matter of interaction and cooperation between state,private, NGOs and hybrid actors (Kooiman, 2008). There is no longer onesingle authority that dictates and decides but a networked plurality of inter-disciplinary stakeholders that engage in cooperation and confrontation andcollectively come to a conclusion. The boundaries between traditional in-stitutional actors, private sector companies, citizens and bystanders blur asthey are more and more interconnected (Introna, 2016). Nevertheless, Grun-wald notes that governance in a democracy has to be legitimized by stateactors (Grunwald, 2000). However, a government alone cannot achieve this. Non-governmental organizations .2. Governance In (Grunwald, 2000) he elaborates on four aspects that hamper state actorsin meeting expectations as serious regulator. The factors are as follows:

Knowledge:

In a decentralized and functional diversiﬁed society, a state ac-tor cannot assemble all required knowledge to properly govern complextechnology

Orientation:

The state itself cannot represent its citizens’ concerns anymore.Instead of for the common good it acts on behalf of its own interests.

Implementation

In a diﬀerentiated society and political landscape, there isno central body of planning, implementing and controlling change thatcould consistently carry out the transformations.

Acceptance

Due to the ﬁrst two problems, explicit and enforced measureswill not be accepted by societyThe bottom line is that governmental agents cannot solve this issue satisfac-tory due to the complex nature of the interconnected society and the multitudeof stakeholders with contradicting interests. It needs some other sort of gov-ernance that is capable to act eﬀectively, legitimately and extensive in bothspace and time in order to make claims relevant to society without the limitsof national laws.Ananny recommends a multivariate approach to algorithmic accountability.Code transparency, state regulation and user education on their own do notgrasp the scope of a socio-technical system, he says (Ananny, 2016). Donzelot,who calls this emerging social tendency that arises in absence of conﬂict, op-pression and poverty “mobilization of society”, suggests that problems mustbe solved by society in a bottom-up manner instead of the state implementingsolutions top-down. He sees social partners to self-manage and resolve issues ina decentralized manner. In this approach, he expects society to accept sharedresponsibility and ﬁnd answers in the mutual fruitful conﬂict that used to beextinguished by states in the past (Donzelot, 1991). The actual governmenttakes the role of a meta-government , coordinating and stimulating discourse.This perspective allows us to think of the entirety of society as an active bodyof citizens that engages in molding its future because it is aware of its ownneeds. It seeks confrontation with other agents and is willing to negotiate theprocesses that aﬀect them.In (Lessig, 2006), Lawrence Lessig labels these stakeholders and draws aframework of four “regulators” shaping governance of Internet-based agents.Although they are distinct forces, they are highly interdependent. Not onlycan they shape the object of regulation, but they also aﬀect how other forcesbehave through their interdependence.

Law is the state-driven regulator. It is equipped with the most immanentconsequences. Misdemeanor entails prosecution and conviction mightbe severe (see the EU case above). Taxation and beneﬁts incentivizedecent behavior. 49 hapter 3: Related Work

Figure 3.1.:

Lessig’s four regulating forces, from (Lessig, 2006, p. 123)

Norms steer behavior through community-imposed punishment. Disregardingthese rules (both explicit and implicit) might get an oﬀender expelledfrom a social group or a company to fall into disgrace.

Markets enact their force through supply and the nature of the services andproducts they provide. They steer through pricing, accessibility andmarketing, for example. In doing so, they can restrict access, shapetheir supply and advertise their positions.

Architecture constitutes the last pillar. Technical infrastructure, protocolsand code create a space for communication that is constrained by thelimits of hardware and software. Behavior is limited to what is techni-cally feasible and allowed by the programming(Lessig, 2006, 120ﬀ).Lessig adds that the regulators above can act indirectly and enact theirpower via another force. For example, Google as a market agent, investing intoacademia (Google Transparency Project, 2017, 2018) (HIIG, 2020) (Readie,2020) , in order to inﬂuence public discourse an thus norms or engaging withpolitical organizations to shape law (Corporate Europe Observatory, 2016; Vo-gel, 2017) . Google also gives incentives to agents who play by the rule and The Alexander von Humboldt Institute for Internet and Society (HIIG) was founded in2012 by the Humboldt University of Berlin (HU), the Berlin University of the Arts (UdK)and the WZB Berlin Social Science Center, together with the Hans Bredow Institute forMedia Research (HBI) in Hamburg as a partner through an initial donation from Googlein the amount of (cid:164) Readie “promotes digital policies that beneﬁt society and drive economic growth” (Readie,2020) and sees itself as a network of organizations within the digital economy. The

Google Transparency Project and

Corporate Europe Observatory are two investigativetransparency organizations concerned with the entanglement of Internet corporationsand policy makers or political authorities. They use publicly available data like meetingrecords or business reports to ﬁnd and analyze the interlockings. .2. Governance fear demotion for “gaming” the system (Rashtchy et al., 2007; Yuan et al.,2012). This allows the platform to shape the Norms and architecture of theweb advertisement ecosystem and push customers or other aﬃliated agentsto adapt or adopt a practice (Edelman, 2011). Ultimately, platforms mightalso engage in politics, speciﬁcally concerning regulation of the world wideweb and Internet-based services (Google, 2012). Journalists as well as schol-ars raise awareness of big platforms’ capability to inﬂuence oﬄine behaviorof citizens. In (Bond et al., 2012) more than 60 million Facebook users weremobilized to vote This urges scholars to speak against what they call “digitalgerrymandering” (Zittrain, 2014, p.335). Lessig warns that this heavily un-dermines credibility and acceptance if done non-transparent. Analogously, ifmarkets enact their power through opaque code and infrastructure, it createsan imbalance that is perceived as unfair (Lessig, 2006).Drawing from this fourfold forceﬁeld of regulation allows us to put a labelon some of the entities in the agentic swarm inﬂuencing the socio-technicalsystem of web-advertisement. Law is enforced by the body of government. Inlegislation, politicians determine fair conduct in the online advertising businessby a set of commands and threats. By this, they sketch the values of therespective community and impose punishment on those who disregard themby a centralized authority. Social norms on the other hand are enacted ina decentralized manner through entities of a social systems like professionalassociations, advertisers, net activists, citizens, users or cultural distinct partsof the population. They are enforced through societal sanctions followingviolations. The market forces are shaped by Internet-based companies andISEs like Google and their business partners, in this case advertisers. However,most importantly, the regulation imposed by architecture (code and technicalinfrastructure) can be attributed to the “architects of our society” (Glaser,2009), namely informaticians and computer scientists. They shape cyberspacewith the values and norms they embed in code. Their structural perspective ismolded into technical infrastructure whose performativity or emergence deﬁnesthe means of communication in a socio-technical system. Therefore, they playan important part in governing Internet-based services, as explained above inSection 3.3.Due to the interdependent nature of these forces, it is hard to assess a netimpact of single measures or one regulator as a whole. Nonetheless, it is suf-ﬁcient to show that computer scientist play a signiﬁcant role in establishinggovernance. They are required to contribute their expertise, both domain-speciﬁc but also interdisciplinary, as Glaser argued in (Glaser, 2009). Onlythrough their participation, a balance of power can be established and main-tained (Lessig, 2006).According to (Grunwald, 2000), the social partners need to determine ﬁveaspects, in order to jointly shape technology:1. An object to shape The researchers found only marginal eﬀects on increased voting willingness but concludethat the experiment only consisted of one message displayed to each user, so it might beextensible. hapter 3: Related Work

2. Involved actors that are willing to design3. Goals and intentions (non-discrimination, privacy boundaries)4. Means to inﬂuence the formation5. Reasonable expectation of successAll requirements can be satisﬁed to a certain extent with respect to BlackBox testing of SRAs as it has been presented in related work and this thesis.The object to shape is either an algorithm (albeit unknown in its speciﬁcs)or the whole web-advertising ecosystem. The actors willing to do so are re-searchers that lay their ﬁnger on unwanted side eﬀects of those objects, citizensthat demand change and politicians who invite the collaborative eﬀorts of allparties to craft a socially acceptable system through legislation. Herein, mem-bers of the society could be integrated as auditors, enacting governance viaan auditing platform or participating in a“bug bounty” (Eslami et al., 2019).Removing discrimination, harmful bias or ensuring safe conduct on Internetplatforms are the common goals of the actors. The tools and measures used toscrutinize the technical systems and justify change include software and meth-ods like the ones mentioned at the beginning of Chapter 3 and the outcomeof this thesis as presented in Chapter 4. For the last item on the list, onecan only hope to make a valid and convincing case to persuade all involvedactors to accept a regulatory measure. The past shows, that this is possible.Apparently, Google is generally willing to make their services safe and sane,as seen in the examples above.Lessigs model enables us to understand how technical systems can be reg-ulated. It supports to ﬁnd actors that engage in governance and opens newperspectives on the challenge of algorithm accountability. With this in mind,the academic body, media and citizens can scrutinize SRAs and punish mis-behavior and ignorance of common norms accordingly.

Scholars saw a rise in attempts to govern search engines over the years. Ac-cording to Gasser, 2006 future debates will have to consider a wide array ofsubjects. Discussions will include • infrastructure (physical and logical characteristics of search engines), • content (free speech and limitations on it, cultural bias), • ownership (proprietary code, indexed content), • security (fraud, safe conduct), • identity and privacy (governmental access, commercial exploitation), • participation (impact on political and cultural processes), • ethics (tension between localized laws and morality of conduct).52 .3. Algorithm Accountability Furthermore, in Gasser, 2006 Gasser highlights how the high variety oftopics poses a challenge for regulators and identiﬁes some key aspects. So-cial partners must prioritize issues, reconcile policy goals, ﬁnd appropriatestrategies and most importantly ﬁnd timely solutions that are internationallyand interculturally acceptable. To meet these challenges, Gasser derives threedemocratic key principles that are generally consistent with ethical concept likehuman right and agreed upon across cultural boundaries. He suggests guidingpolicies with informational autonomy, diversity and information quality. Theﬁrst comprises free speech, freedom of choice and possibility to participate.Diversity is concerned with variety of information and source thereof. Anenvironment with high-quality information encompasses functional and cogni-tive as well as aesthetic and ethical dimensions (Gasser, 2006). All of theseaspects support sound decision-making, for example with health-related issues(see Section 3.1) and should guide a technology assessment like the Black Boxanalysis. We can beneﬁt from the insights in this chapter to develop methodsof collaborative examination later in this work. “ Explainability is a social agreement. We decided in the past itmattered. We’ve decided now it doesn’t matter. ” (Heaven, 2013,p.35) The purpose of Algorithm Accountability is to assess “power structures, bi-ases, and inﬂuences that computational artifacts play in society” (Diakopoulos,2015, p.3). In recent years the ﬁeld has developed in an interdisciplinary dis-cussion spanning the domains of law, tech, business, sociology and psychology.In 2017, the

ACM US Public Policy council came up with the following sevenprinciples to foster algorithmic accountability (USACM, 2017).1. Awareness2. Access3. Accountability4. Explanation5. Data Provenance6. Auditability7. Validation and Testing (USACM, 2017)The council aimed to encourage algorithm designers to act responsibly, know-ing how their choices in algorithmic design can introduce bias and entail harm. Nello Cristianini in (Heaven, 2013). He is with the University of Bristol, UK and writesabout the evolution of AI research hapter 3: Related Work They demand them to provide interfaces for public scrutiny, explanations of al-gorithmic decisions and documentation of data and procedures used in testingand training (USACM, 2017). In this context, an explanation is a “comprehen-sible representation of a decision model associated with a black box, acting asan interface between the model and the human” (Pedreschi et al., 2018, p. 6).Thus, the goal is to communicate an algorithm’s functionality and purposeso that humans can understand it. The explanation needs to be interpretableby stakeholders at their respective level of domain-speciﬁc literacy . Hence,Algorithmic Accountability strives to establish transparency of algorithmicdecisions for the sake of public scrutiny and responsible development that isaware of potential bias. Lessig argues that “in at least some critical contexts, the kind of code thatregulates is critically important” (Lessig, 2006, p.139). By “kind of code” hedistinguishes between open and closed code. Herein, he is concerned with thetransparency of its functionality. Transparency, he argues, depends on thekind of architecture and code a computer scientist choses. It creates credibil-ity and legitimacy because users are aware of how the architecture componentregulates them (Lessig, 2006). Moreover, transparency enables informed deci-sions (Diakopoulos, 2013a). Hence, critical scholars see the urge to reestablishtransparency in domains that require consumers to exercise information liter-acy . This ability allows consumers to “recognize when information is neededand have the ability to locate, evaluate, and use [it]” (“Presidential Com-mittee on Information Literacy: Final Report”, 1989). Opaque technologies,they argue, hamper this ability and thus harm the credibility and trust thatorganizations rely on to provide their services (Albright, 2017) . Naturally,there are limits to open code, especially with respect to proprietary code ofprivate companies. It usually constitutes a trade secret and loss thereof woulddiminish competitive advantage and put the company at risk (Diakopoulos,2013a). On top of that, disclosure would open the gates to malicious actorswho arbitrarily manipulate or “game” an algorithm which would degrade thequality of search or advertising (Bracha & Pasquale, 2008; L. A. Granka,2010).People might turn against algorithms that do not perform correctly in theireyes (Dietvorst et al., 2015). However, research suggests that users defend orchallenge an opaque algorithm, even if they perceive it as biased, dependingon whether they beneﬁt from it (Eslami et al., 2019). On another platformof similar dominance (Facebook), researchers found that ad explanations canbe incomplete and misleading. Moreover, they allow malicious advertisers to Interpretability is deﬁned as the “ability to explain or to present in understandable termsto a human” (Doshi-Velez & Kim, 2017, p. 2). The perspective on interpretability fromresearchers in the ﬁeld of Machine Learning (

Explainable AI ) is used here because it isequally complex with similarly far-reaching consequences for society. Though this was meant to apply to journalism, reporting and the dissemination of news,we can clearly see how this can be generalized to search engines and SRAs alike. .3. Algorithm Accountability obfuscate their intention to target sensitive attributes (Andreou et al., 2018).Dietvorst also showed that if participants observed forecasting algorithms per-form, they showed less conﬁdence in its performance. This could have impli-cations about advertising algorithms as well. Irrelevant ads after targetingcould disappoint users but transparency about choice of inputs might churntrust in a SRA.Of course, the more complex an algorithm, the more complicated an in-formational description gets. To bridge the gap between complexity and ex-plainability scholars suggest a standardized label like the “Nutrition Label forprivacy” (Kelley et al., 2009) that allows quick and easy understanding of analgorithm’s “ingredients”.On this basis, some scholars demand a standardized disclosure of an al-gorithm’s basic aspect. Diakopoulos, for example, suggests the following in(Diakopoulos, 2013a):1. Criteria of prioritizations, classiﬁcations, rankings and associations in-cluding their deﬁnitions, implementations, thresholds and possible alter-natives.2. Input and other relevant parameters3. False positives and false negatives as well as the method of balancingthose two4. Training data, potential bias and the ensuing evolutionNonetheless, there is more to an algorithm’s performativity than code. Inthe case of Google, company values, hiring procedures, hidden labor of qualityraters and culture play an important role (Bili´c, 2016). The question of responsibility concerning SRAs in complex socio-technical sys-tems is not trivial. Letour comes up with the notion of an actant describingan artiﬁcial actor (like an algorithm or any arbitrary technical entity) thatrequires a human actor to enact agency (Latour, 2005). But once they collec-tively act, they can only be held accountable together . Consequently, thisperspective holds all entities accountable that fall in line with the algorithm’spurpose. Design decisions, emergent eﬀects as well as interpretation of outputsand ensuing actions are all interdependent and rely on each other.Introna points out that only through their execution, algorithms have theability to “enact objects of knowledge and subjects of practice in more or lesssigniﬁcant ways” (Introna, 2016, p.27). Introna uses Law’s idea of “empiricalpractice with ontological contours” (Law & Lien, 2013) to stress how algo-rithms perform in the real-world, and have the capability to create entities, Like a human ﬁring a gun. In his sense, both are to be held accountable. The human forpulling the trigger, the gun for shooting the bullet. hapter 3: Related Work rules, norms and social measures. What they call performativity stresseshow the code is not an end in itself, but it exerts agency through empirical,ontological and normative artifacts that emerge from its execution. These ar-tifacts may have a signiﬁcant impact on society. This eﬀect is concerning ina sense that the inscrutable instructions and how they produce their outputsoften remain obscure Black Boxes to those who are aﬀected (Heaven, 2013).From the deﬁnition of algorithms in Section 2.3.1, it can be inferred thatall computational steps as well as inputs and outputs are well-deﬁned. In-trona argues how algorithms express a nature of ﬂow, inheriting from priorand imparting to subsequent actions (Introna, 2016). Thus, the speciﬁcs ofall actions are signiﬁcant regarding its following practices. Because the op-erations are interrelated, an algorithm’s outcome can never be accounted foror associated with a single act or actor alone. All involved actors partakein design, development, execution and interpretation of the algorithm. Espe-cially in large and complex SRAs, a heterogeneous “agentic swarm” (Bennett,2010, p.32) collaborates to creatively construct distributed, sophisticated al-gorithms. This collective authorship is motivated by various goals at diﬀerenttimes (Seaver, 2014). This creates a complicated and ever-changing structure.Seaver concludes: “once these systems reach a certain level of complexity,their outputs can be diﬃcult to predict precisely, even for those with technicalknow-how” (Seaver, 2014, p. 418).As a consequence, they cannot be judged separately from their developmentor deployment (Geiger, 2014). However, design of code is not self-suﬃcient,but it is deliberately determined by programmers. Observers assume that al-gorithms of large software systems incorporate values and attitudes of theircreators and users through criteria choices, training data, semantics, inter-pretation and possibly feedback (Diakopoulos, 2013a; Grimmelmann, 2017).Even the notion of relevance with respect to search results and personalizedadvertising is highly subjective (van Couvering, 2007). Seaver understandsthese properties as intrinsic parts of culture that will ﬁnd their representationas technical details in an algorithm’s code (Seaver, 2017). He further pointsout that one should especially pay attention to the logic that guides the deci-sions on algorithmic workings, data structures and methods. Seaver expectsthem to be more persistent than the technical details. Thus, assessment ofalgorithms has to consider their respective “relational, contingent [and] con-textual” (Kitchin, 2017, p. 18) features and the socio-technical system theyperform in (Kitchin, 2017). This suggests, that no one involved can fully graspthe multitude of purposes, intentions and motivations that a piece of softwarewas built on.Conclusively, we need to understand algorithms in their respective contextand how they are embedded in the social system. Thus, one should not as-sign agency to the algorithmic actor or the developer of single instructions Introna describes performativity as an “ontology of becoming” (Introna, 2016). In thissense, an algorithm does not exist solely for the sake of execution of its step-wise instruc-tions but for to be “enacted as such by a heterogeneous assemblage of actors, impartingto it the very action we assume it to be doing” (Introna, 2016, p.23) .3. Algorithm Accountability alone, but rather to the entirety of participants in the ﬂow of actions alongits development, deployment and usage (Introna, 2016). In this sense, Dattanotes that online advertising is a result of complicated mechanisms and inter-actions between data collection, user proﬁling, keyword bidding and inventoryauctions. Thus they admit that it is unrealistic to assign blame for a spe-ciﬁc ad delivery to a single actor only from external observation (Datta et al.,2015). Ananny even holds the users accountable since they contribute to thealgorithms output through their interaction . Bilic also notes how their “freelabor” and commodiﬁed transactions are an integral part of the STS of websearch (Bili´c, 2016).In Section 3.2 diﬀerent approaches of governance to shape SRA were dis-cussed. Lessig’s proposal described four forces, one of which was concernedwith architecture. This perspective is concerned with algorithms than sustaina socio-technical system. Above, this thesis argues that the collective of cre-ators has to ensure the correct behavior of algorithms. Glaser points out howinformaticians partly carry responsibility for the radical changes that trans-form our society today. They encode laws and norms into software and provideinfrastructure for society to operate on. Thus, he concludes, they are indeedarchitects of tomorrow’s society and are therefore accountable for the reper-cussions of information technology on society (Glaser, 2009). Though in thisthesis, informaticians is used equivalently with computer scientists , the lattersuggests that professionals and academics in this ﬁeld are merely concernedwith the design and development of hard and software and the networking ofcomputers alone. This reduces the role of informaticians, computer scientistsand all IT-professionals to that of technical suppliers. Unfortunately, this re-sembles the public opinion, argues Glaser in (Glaser, 2009). He points outhow the portrayal of computer scientists as only being occupied with technicalaspects of computation deprives them of their qualiﬁcation or authorizationto evaluate the social or systemic ramiﬁcations of their actions due to theirsupposedly techo-centric world view. Glaser’s insists on repositioning the dis-cipline as a science concerned with structure and communication of technicalsystem. He claims that informaticians’ have the ability to identify and analyzestructures and mechanisms of technical and non-technical systems (organiza-tional and social) and transform them into computational processes. Thiscompetence can be applied interdisciplinary to evaluate and improve socio-technical systems.In the view of this, the Chain of Responsibilities is introduced to describepitfalls throughout the lifecycle of an algorithm from development to deploy-ment including evaluation. The concept is drawn from (Zweig, 2016; Zweig,Fischer, et al., 2018; Zweig, Wenzelburger, et al., 2018). It is adapted to shiftthe focus from Automated Decision Making towards SRAs in general becauseboth domains face similar challenges, such as a high degree of complexity,an unknown array of (confounding) variables and high signiﬁcance for those He asks :“[W]ho is the maker and who is its target when algorithms dynamically adaptto the users they encounter? Should users be held partly accountable for an algorithm’soutput if they knowingly provided it with data?” (Ananny, 2016, 108f.) hapter 3: Related Work Figure 3.2.:

Chain of Responsibility on the left, possible pitfalls in the develop-ment and deployment process in the respective phases on the right,adapted from (Zweig, Fischer, et al., 2018) and altered with respectto orientation of the graphic and wording of the pitfalls aﬀected by its outcomes. The metaphor of a chain underlines, how an al-gorithm can only live up to expectations if all links hold (or can only be asreliable as its weakest link). The similarity to the waterfall model of softwaredevelopment is not a coincidence. Errors early in the process are propagatedthroughout the progress of the development, as subsequent steps are based ontheir predecessors. It emphasizes how every actor involved in the developmentand deployment process is responsible for the algorithm as a whole due itsinterrelated creation. On the left side of Figure 3.2, the responsibilities of therespective phases in software development and deployment of SRAs are listed.On the right hand challenges and risks are enumerated. These pitfalls needspecial attentions in the process of creating SRAs and releasing them into thewild. Below I elaborate on the distinct phases’ most important tasks thatare introduced in (Zweig, Wenzelburger, et al., 2018), (Zweig, Fischer, et al.,2018) and (Zweig, 2016).1.

Problem deﬁnition:

First, the problem to be solved has to be clearlydeﬁned. Here, misinterpretation of requirements or wrong assumptionscan lead to misconceptions about the purpose of an algorithm. Espe-cially in multi-causal and interdisciplinary problem spaces, this is a greatchallenge that requires the cooperation of domain experts from diﬀerentﬁelds.2.

Algorithm development: i Algorithm selection:

Failures in problem analysis can lead tomisinformed choices of methods and algorithms. Some algorithmsmay be more suitable to solve the problem than others. Detecting58 .3. Algorithm Accountability these problems is facilitated by access to code, concise speciﬁcationwith respect to purpose and function and a large user base. Forexample, what existing code to reuse or which class of algorithmsmight be appropriate for a certain problem?ii

Algorithm implementation:

The transformation of algorithmsinto machine-readable code bears the risk of wrong translation orusage of programming language with limited or inappropriate ap-plicability.3.

Data and method selection: i Data collection:

Availability, purpose and origin of data can havean impact on data quality, bias and relevance. Data needs to beaccessible and usable. On top of that, the method might require acertain sample size to work properly.ii

Data selection:

Developers must determine which subset of datathey assume to be a meaningful input to the algorithm. Here, noiseor irrelevant data might hamper an algorithm. The choice has tobe made regarding the speciﬁc problems nature.iii

Operationalization:

The translation of data into informationalmeasures (like relevance) can lead to errors due to misconceptionsabout certain causations and interpretations.iv

Method selection:

Developers have to come up with an idea ofhow to solve the problem. Here, misconceptions about a model, itsstructure and construal, can lead to errors. This includes parameterspace, ﬁdelity criteria and intended scope of a method.4.

Design, training, testing:

Intelligent software systems must be trainedon training data that can include biases. Developers ought to determineadequate training parameters and decide whether the data suﬃcient inquality and quantity to ﬁnd patterns and draw conclusions. When toend training and testing and how to deﬁne success or correctness is an-other important decision. In this phase it is vital to explora all possibleusage scenarios.5.

Deployment:

Deployment to a social context entails a learning expe-rience for all its user. It requires them to interact with the system asintended. This requires the system to be explainable. Naturally, somecannot or want not to comply with these demands. Moreover, unwantedeﬀects can emerge from the human-computer interaction. Furthermore,the system could be used in an improper manner or its results could bemisinterpreted.6.

Re-Evaluation:

The behavior of a software systems and the quality ofits output are compared with the expectations it has to satisfy. Thisfeedback can be used to improve the system or detect issues. Feedbackloops that reinforce negative eﬀects due to asymmetric feedback (O’Neil,59 hapter 3: Related Work . Furthermore, input monitoring andBlack Box experiments should ensure unbiased foundations and correct per-formance of an algorithmic system. One method to establish Algorithmic Accountability is to conduct a

Black Boxanalysis . Black Box analysis is a form of reverse engineering where an opaquesystem is scrutinized by analyzing observable in- and outputs, deducing theinner mechanics that transform the former into the latter and approximatingthe inner workings with models. This can be achieved by manipulation andobservation of the box (Ashby, 1957). The insights are usually juxtaposed toexpectations with respect to certain statistics, norms or standards of stake-holders about how the system is intended to work (Diakopoulos, 2013a). Thiskind of analysis tries to produce a model (computational or mathematical) ofan algorithm.To analyze SRAs, scholars have made up diﬀerent approaches. In Mikianset al., 2012, for example, crowdsourced user requests were rerouted over theresearcher’s proxy and captured in a man-in-the middle fashion to examineprice and search discrimination. Other work has spawned various software so-lutions to run Black Box experiments on web search and targeted advertising.Below, some programs to scrutinize the workings of ISE are listed withoutintention to be exhaustive. XRay leverages diﬀerential correlation to examine targeted advertising andeducate users how their input (email, web search, shopping behavior)translates into certain outputs (personalized ads, prices, product recom-mendations) (L´ecuyer et al., 2014),

AdScape examined user interest based personalization on 175k display adsfrom 180 websites (Barford et al., 2014),

AdReveal analyzes targeting mechanisms for ad delivery (B. Liu et al., 2013), Data Scientists extract knowledge from data. They “[require] an integrated skill set span-ning mathematics, machine learning, artiﬁcial intelligence, statistics, databases, and opti-mization, along with a deep understanding of the craft of problem formulation to engineereﬀective solutions” (Dhar, 2013, p. 1). Diakopoulus denotes

Reverse Engineering as “the process of articulating the speciﬁcationsof a system through a rigorous examination drawing on domain knowledge, observation,and deduction to unearth a model of how that system works” (Diakopoulos, 2013a, p. 16) .4. Black Box Analysis AdFisher examines the the relationship between behavioral tracking and Google Adsand the impact of Google’s Ad Settings (Datta et al., 2015).

AdAnalyst reviews Facebook’s ad explanations and collects data on ads andexplanations to give users an understanding of the advertising algorithmsand data sources (Andreou et al., 2018). It also examines the advertisersbehind promotions and “measures” the ad ecosystem (Andreou et al.,2019).

Datenspende Project crowdsourced data collection for an analysis of SERPpersonalization during the last German election (Bundestagswahl) witha Browser extension (Kraﬀt et al., 2017).Most models in this ﬁeld can be classiﬁed in either reverse engineering (BlackBox explanation) or design (transparent box design) approaches. While theﬁrst is concerned with the general logic of mechanics within the Black Boxand an explanation thereof and how outputs correlate with inputs, the lattertries to re-create the outputs of an algorithm with a given set of trainingdata (Diakopoulos, 2013a).This work elaborates on the Black Box explanation problem, speciﬁcallyoutcome explanation (Guidotti et al., 2018) (Pedreschi et al., 2018). It at-tempts to reconstruct an explanation of an algorithm from only the output.In our case, just a fraction of the input was available. We could only col-lect the information that participants submitted via the surveys they ﬁlledout when they downloaded the plugin. Contrary to this, a fully observableIn-Out-Relationship would require an API that serves as single source of in-put (Diakopoulos, 2013a). But even then, an opaque system may use morethan that input. Thus, in our study, the variety of input variables that thealgorithms takes into account remain mostly unknown and uncontrollable.It has no be noted that reverse engineering SRA is a highly complex en-deavor, as there is constant feedback from the social system and the workingsof the technical system usually are in an ever-changing state. Thus, Seaverargues that in analyzing them, the Black Box algorithm is more of a socialconstruction created by outsiders that diﬀers for each observer as it is inﬂu-enced by cultural background (Seaver, 2014, pp. 413 & 419). Herein, Seaver’snotion is adopted as he accepts a variety of interpretations to exist. The at-tempts to analyze the technical system in this thesis are part of the socialsystem’s communication processes and thus can yield diﬀerent descriptions ofthe same algorithm depending on which communication processes the BlackBox analysis observes . The detailed speciﬁcs of an algorithm cannot bedetermined by observers outside of the Black Box. Eventually, they do notneed to be known in their completeness to infer about an algorithm’s work-ings and eﬀects in practice (Diakopoulos, 2013a). It is suﬃcient to “developa critical understanding of the mechanisms and operational logic” (Bucher,2016, p. 86). Rather, the examination should be conducted with focus on Theoretically, the best we could do is create an isomorph representation of the algo-rithm (Ashby, 1957). hapter 3: Related Work relevant aspects only and consider those conditions that are required to un-derstand a phenomenon (Grunwald, 2002). Hence, the Black Box analysisof the web-advertisement algorithms of Google conducted in Chapter 4 canbe restricted to the question of whether there still are questionable advertise-ments delivered via Google Ads after the announced policy changed that areharmful to patients. In this sense, it is irrelevant to examine the technicalsystems of Google’s ad exchange and search engine as an integrated Internet-service. Rather, the implications for the distinct social system of patients ofParkinson’s Disease, Multiple Sclerosis and Diabetes are of interest.Nevertheless, the results and interpretations of the analysis can have conse-quences for the socio-technical system. Ideally, it facilitates understanding ofthe technical system. It might inﬂuence the use and perception thereof amongthe entities of the social system. This can spark new motivations and com-munication and a changed behavior of interactions with the algorithm. Fora responsible society, methods of algorithm accountability like the Black Boxanalysis are integrated in their respective self-description, thus into the STS.Hence, this thesis claims that the Black Box analysis itself is an SRA.In this thesis, we strive for empirical quantiﬁable evidence of the phe-nomenon and do not try to create an accurate representation of the algorithm. Ashby (Ashby, 1957) points to the three central questions below that re-searchers have to consider in a Black Box analysis.1. What is the analysis process?2. Which properties can be uncovered, which remain disclosed?3. What methods should be used?In the paragraphs below, these questions will be discussed in more detail.

Analysis process

As argued at the beginning of Section 3.4, a Black Box analysis can be denoteda socially relevant algorithm as it has repercussions on both the technical andsocial system. It furthermore facilitates the discourse about the adequacy ofalgorithmic decisions. Thus, the Chain of Responsibility from Section 3.3.2can be used to design the analysis process along its axis. Again, Figure 3.3illustrates how each phase of the development process should receive attentionaccording to its speciﬁc concern. To the right of each phase, the Black Box-speciﬁc pitfalls are listed. Below, a list of exemplary questions was compiledthat can support the execution of a Black Box analysis. They guide the design,development and deployment of a Black Box analysis study and assist in thepost-analysis process as well. They are mostly based on lessons learned in theprocess of conducting the EuroStemCell Data Donation.1. • What phenomena emerge from the SRA’s deployment?62 .4. Black Box Analysis

Figure 3.3.:

Translation of the Chain of Responsibilities to the Black Box analysisprocess, own illustration, adapted and altered from (Zweig, Fischer,et al., 2018) • How are they interrelated and what dependencies exist? • How can the scope of interest be determined and limited? • In this scenario, what is the real impact of the SRA in question? • How can this translate into a testable hypothesis? • Who are the stakeholders that need to be considered? • How should the study be sized in time and space? • How are they aﬀected by the SRA, the Black Box analysis and itsoutcome? • What are their motives and attitudes towards the analysis? • How can they contribute? • Which resources can be used (crowdsource labor and hardware)? • How to design the study to analyze the Black Box? • What is the ideal scientiﬁc approach in terms of eﬃcacy, eﬀective-ness, eﬃciency and validity?2. • How can the phenomenon be analyzed? • Are there reliable (software-) solutions available? • Are there accessible APIs? • Which hardware is required to conduct the analysis? • What programming approach is adequate in functionality and sus-tainability? 63 hapter 3: Related Work • Which variables are of interest? • Which inputs to the Black Box are observable? • What are the limitations of data collection? • How to clean the data and remove noise? • Which methods are most suitable to approach the problem withrespect to data collection and analysis? • How are participants recruited?4. • Can the application be tested in a realistic environment?5. • How can the change of the target system be controlled for? • Could there be countermeasures by the target system? • Does the study need to be adapted? • How to evaluate the quality of the results? • What repercussions and side eﬀects can the analysis produce? • Is the approach explainable and reliable? • How can the analytic process can be guaranteed to be consistentacross time and space?6. • How to interpret the results? • What implication do they have? • How can the results be presented in a comprehensive and unbiasedway? • Are the results actionable?To scrutinize the crucial steps of an analysis, Kraﬀt introduces a conceptualpipeline of generic Black Box analyses in (Kraﬀt et al., 2020, forthcoming).He emphasizes the crucial steps in the process and points out possible sourcesof errors and misconceptions. His ideas will be used to assess the EDD alongthe pipeline (seen in Figure 3.4) in Section 4.3.2.

Figure 3.4.:

Conceptualized process of a black box analysis. The numbers repre-sent the diﬀerent steps in which errors can occur, from (Kraﬀt et al.,2020, forthcoming)

According to this, errors can by introduced in probing the system (1) witheither a Scraping Audit (1A), a Sock Puppet Audit (1B) or a CrowdsourcedAudit (1C). Then, central data collection (2) can fail and data cleaning (3)can degrade quality. The choice of data analysis methods (4) is crucial aswell. Eventually, the presentation of the results also needs careful attention(5) (Kraﬀt et al., 2020, forthcoming).64 .4. Black Box Analysis

Properties

The nature of the discoverable properties is mainly dependent on the appliedmethod and how the challenges reviewed in Section 3.4.2 can be met. Mosttimes when dealing with proprietary systems, researchers can just assumethe inputs and manipulate only a fraction of these variables. Consequently,inferences from the output are mainly based on informed statistics and subjectto noise and methodological limits.

Methods

Kitchin proposes six diﬀerent methods of algorithm examination in (Kitchin,2017). They are reviewed below to show alternative approaches and why theywere not applied . Below they are assessed with respect to their applicabilityin the EDD. • Examining pseudo-code / source code • Reﬂexively producing code from task formulation and design ideas • Interviewing designers or conducting an ethnography of a coding team • Unpacking the full socio-technical assemblage of algorithms • Examining how algorithms do work in the world • Reverse EngineeringApproach (1) fails at the

Access challenge as well as the second method (2),which is also impracticable due to the networked nature of the algorithm. In-terviewing designer could possibly yield interesting insight in design decisions,constraints and implementation details, but again it breaks down due to access.Reviewing the entire social impact poses a complex problem due to the algo-rithm being “performative” (see above) having emergent eﬀects. (Diakopoulos,2013a). As we have not had immediate contact with Google’s algorithm de-signers and unpacking the full socio-technical system of web advertising wouldexceed the scope of this work, we dropped the ﬁrst four alternatives. Never-theless, the real-world eﬀects of algorithms (5) were examined in Section 3.1,possible implementations (1,2 and partly 3) were derived from academic lit-erature and patents in Appendix C and conducted a small-scale study (6) inChapter 4. To expand the approaches to reverse engineering, ﬁve diﬀerent

Al-gorithmic Audits of opaque Internet-platforms are proposed in (Sandvig et al.,2014). All come with distinct advantages and drawbacks.

Code Audit

Code review of proprietary code by expert third parties (Pasquale,2010)

Noninvasive User Audit

Self-reported measures of users’ normal interactions He notes, however, that “[e]ach approach has its strengths and drawbacks and their use isnot mutually exclusive” (Kitchin, 2017, p. 22). hapter 3: Related Work Scraping Audit

Observing the results of repeated scripted queries to a plat-form or requests to an API

Sock Puppet Audit

Programmatically impersonate speciﬁc user behavior ortraﬃc

Crowsdsourced or Collaborative Audit

Recruit real users to collect dataFrom the approaches above, we merged

Scraping Audit with

CrowdsourcedAudit . This way, we were not forced to ﬁnd aﬀected individuals in person toobserve for a noninvasive user audit or construct reliable and authentic butartiﬁcial user proﬁles. As Google does not provide an API or discloses code ordata for this cause, we had to discard these approaches, too. The beneﬁts ofthe methods we applied are natural interaction with the web service by partic-ipants with real proﬁles and the opportunity to get a broad selection of inputconﬁgurations through a variety of users. The disadvantages of procedurallycollecting data from a platform are the risk of detection (and subsequentlyblocking requests or adapting outputs to them), the possibility of violationof the service’s terms of service and the lack of fully controlled real-userdata as regular Internet user would produce it . It was the most cost- andtime-eﬃcient approach that allowed us to quickly distribute our software andgather data via data donations.Data donations are an emerging topic in the scientiﬁc community and sparkinterdisciplinary discussions. Scholars make a case for donations as an act ofsovereignty that can “generate social bonds, convey recognition and open upnew options in social space” (Hummel et al., 2019, p. 48). This way, patientscan be involved in scientiﬁc progress and be invited to take an active standenacting their autonomy on behalf of solidarity (Prainsack, 2019). We alsofaced the challenges of data donations with respect to trust, future use, in-vasiveness, aﬀected people and voluntariness pointed out in (Hummel et al.,2019). To do so, we collaborated with a trustworthy institution (EuroStem-Cell), declared the possibility of future accessibility of the data (Couturier,2019). We further minimized invasiveness through reduced data collectionand a non-obtrusive software implementation. As to voluntariness, the studywas proposed to aﬀected and non-aﬀected individuals alike. In contrast to do-nations in the purely medical ﬁeld, the participants were not subject to moralpressure or an alluring expectation of direct reciprocity. After all, this studywas not concerned with researching curative therapies but misconduct in on-line advertising. Concerning the aﬀected people, we did not check whetheronly the people who decided to contribute donated but also other users of therespective browser. As consent was given at installation, anyone using thebrowser took part in the study. Sandvig presumed that under the US Computer Fraud and Abuse Act (CFAA) (US Code18 § For privacy reasons, we only collected self-reported demographic and statistical data atregistration and the eventual submissions .4. Black Box Analysis The author further admits and accepts the dissonance between the notion oftransparency and trust that we established through publication of the collecteddata and the possibility of uncertain and possibly problematic future use.

There are numerous challenges to the Black Box analysis of an algorithmsthough. They range from the most trivial pitfalls to sophisticated technicalrestrictions and from adversarial eﬀorts to systematic complications.As scholars have noted in Section 3.3.2, the algorithm cannot be divorcedfrom the conditions it was developed under or the contexts it is applied in.Thus, a wholesome analysis of an algorithm and the eﬀects thereof requirean interdisciplinary team of examiners (Zweig, Fischer, et al., 2018). Theyneed to understand not only the technical aspects, the mathematical mod-els or methods but also the domain-speciﬁc preconditions and ramiﬁcations.Furthermore, interviews with designers and programmers of an algorithm canbe helpful, as Sandvig suggested. After all, their motivations, beliefs, ideas,visions and corporate culture may be weaved into the code.The inability to analyze clear code is due to the following challenges de-scribed in (Kitchin, 2017). They were enriched with examples and relatedproblems below:

Access

Proprietary algorithms of large Internet-based companies are simplynot meant to be analyzed from the outside (Obermeyer et al., 2019).It often is a trade secret and disclosure would allow gaming the algo-rithm (Diakopoulos, 2013a). “[The algorithms] are designed to workwithout human intervention, they are deliberately obfuscated, and theywork with information on a scale that is hard to comprehend” (Gillespie,2014, p. 26), concludes Gillespie. Eventually, there might be inputs thatthe algorithm considers but that are not observable, thus not measur-able (Pedreschi et al., 2018). This relates to the problem of correlationvs. causation, because statistical signiﬁcance cannot guarantee a causalrelation or design intention (Diakopoulos, 2013a). The origin of an out-put can remain undetected and an eﬀect might be misattributed to anon-causal source.

Heterogeneous and embedded

The algorithms of complex software systemsare highly interdependent networked algorithmic systems . They areembedded in socio-technical assemblages of various types of entitiesthat all may feedback into the system. Their constituent parts are thework of collective authorship, created “with diﬀerent goals at diﬀerenttimes” (Seaver, 2014, p. 418). A plethora of distinguished conﬁgurationsof an Internet-based service can be A/B-tested and the variety of actorsengaging in networked systems make it hard to determine the reason “In fact, what we might refer to as an algorithm is often not one algorithm butmany” (Gillespie, 2014, 12f.), says Gillespie. This reﬂects the capability of an algorithmto be ﬂexible and adapt to context due to its nature as intelligent agent (see Section 2.3.1) hapter 3: Related Work behind marginally diﬀerent outputs (Diakopoulos, 2013a). Further-more, Internet-based services are delivered over a network of multiplemiddle-men. Routing and load-balancing of traﬃc make the route ofrequests and the actual source of an answer opaque (Guha et al., 2010).Thus, it is hard to establish a truly identical experimental setup for twoexperiments. Ontogenetic, performative and contingent

Algorithms of large-scale softwaresystems are constantly changing, either being updated or adapting tocontext. They are ﬂuid in their manifestations in code and need to be as-sessed with respect to their “contextual, contingent unfolding across situ-ation, time and space” (Kitchin, 2017, p. 21). Moreover, they are highlyadaptive to the user as they are personalizing their outputs (Bucher,2016). It was early acknowledged that in a Black Box experiment, theexaminer and the subject of interest form a system with feedback. Thus,the process of examination may aﬀect the Black Box and thus alter its in-ner workings, making it harder to reproduce experiments (Ashby, 1957).Gillespie concludes that the entanglement of algorithms with its audi-ence creates a moving target meaning the relationships are constantlychanging (Gillespie, 2014). On top of that, the emergent eﬀects of analgorithm can only be assessed with respect to the context it performsin (Introna, 2016). Drawing from the fact that inputs are unknown,countless and arbitrary and outputs are ﬂuid, contextual and subject topersonalization, the real challenge is to ﬁnd a stable representation of asystem and its environment to analyze .After all, it seems like it is impossible to fully “unbox” complex Black Boxsystems. Nevertheless, scholars like Hilgers argue that even with the epis-temological limits of the method and the sheer impossibility to deduce allspeciﬁcs, the analysis still yields insights and allows knowledge acquisition.Even if all we learn is that we need new methods and practices to analyzeBlack Boxes (von Hilgers, 2011). A/B-Testing is widely applied in web development to assess the eﬃcacy of design changesin either processes or presentation on the feedback of uninformed users by providingslightly diﬀerent versions of a service (Christian, 2012)(Journalistic source, but gives aconcise and comprehensible description of the technique). Bucher puts it in a nutshell: “If the Black Box by deﬁnition is a device of which only theinputs and outputs are known, what remains of the metaphor when we can no longereven be certain about the inputs or outputs?” (Bucher, 2016, p. 94) . EuroStemCell Data Donation 2019 / 2020 (EDD) As introduced before, EuroStemCell is concerned with educating the pub-lic and patients especially about stem cells. They work closely with patientgroups, educators, policy makers and regulators to develop material that catersto their respective needs (Eurostemcell, 2019). They produce material onforms of treatments, speciﬁc therapies, scientiﬁc works and commercial as-pects. One of their major concerns is to inform the public about questionableapplications of stem cells pertaining to Parkinson’s disease, Multiple Sclerosisand Diabetes and the respective clinics or providers. Anna Couturier, Digi-tal Manager at EuroStemCell and PhD candidate in Science, Technology andInnovation Studies at the University of Edinburgh found, along with manyother scholars that these agents make use of online advertising, possibly in atargeted manner (behavioral advertising) to market directly to aﬀected indi-viduals. She contacted us with the intention to scrutinize these practices withrespect to the underlying algorithms.She initiated the EuroStemCell Data Donation project (EDD) to examineonline advertising pertaining to unapproved stem cell treatments. As a partof that project this thesis intends to answer whether;1. There is no more evidence of questionable advertising on Google’s searchengine result page concerning unproven stem cell treatments of Parkin-son’s Disease, Multiple Sclerosis or Diabetes (I and II), e.g..2. Users aﬀected by any of the diseases (Parkinson’s disease, Multiple Scle-rosis, Diabetes) receive more critical advertisement than members of acontrol group.The cooperation between Couturier and the AALAB started in summer2019. In a two-day workshop, the project’s keystones were discussed. Follow-ing these agreements, a plugin for both Firefox and Chrome browser wasdeveloped along with a Django server that received and stored data. While theserver and backend were constructed by a fellow student on AALAB’s payroll,the plugin was part of this thesis.The server counterpart was developed by a fellow student (Roman Kraﬀt )andis not part of this thesis. The data collection and study design were adminis- Herein, the terms plugin, extension and addon are used interchangebly https://addons.mozilla.org/en-US/ﬁrefox/addon/EuroStemCell-data-collection https://chrome.google.com/webstore/detail/EuroStemCell-data-collect/mdlalccnlkekigohghfbifkibgphaick r kraﬀ[email protected] hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) tered by researchers of EuroStemCell, Anna Couturier and AALAB, TobiasKraﬀt .After development, just before go-live, Google announced in a new health-care and medicines policy to “prohibit advertising for unproven or experimen-tal medical techniques such as most stem cell therapy” (Biddings, 2019a) ina blog post which lets assume that the company was aware of the issue (Bid-dings, 2019a). We expected this change to degrade the quality and quantityof the collected material with respect to our research question (see below).However, data collection ran for about 3 months. Despite the high attentionthe subject received and the wide reach of EuroStemCell’s partnership net-work, installation numbers stagnated at a low two-digit range. As a result,this thesis’ focus was shifted from a quantitative to a qualitative analysis. Because of the small scale of the development project, the manageable amountof expected requirements, the time constraint imposed by Google’s policychange and the proof-of-concept nature of the plugin, we omitted an exten-sive documentation of requirements and project planning and in turn useda SCRUM-like approach to development. Below, explanations of the pluginsworkings are enhanced with screenshots and UML diagrams . Couturier acted as product owner, the initial product backlog was compiledduring aforementioned workshop (see Appendix A.3). The software shouldregularly search Google for keywords, collect content from the SERP and sendthis to a server dedicated to storing the results. Its goal was to imitate a userwho repeatedly queries Google for speciﬁc search terms (see Appendix A.2 forthe

User Story of a typical user). User should experience easy installationand on-boarding and only little disruption in their browsing experience. Uponregistration, a survey should provide statistical background information aboutparticipants. The infrastructure should be scalable and maintainable withrespect to updates.

To allow a crowdsourced audit (see Section 3.4.1), we decided to collect datavia a browser plugin. This way, the study gets easily scalable on the client side.Moreover we could capitalize on the realistic nature of participant’s requestsas they would engage with Google using their natural browsing proﬁle andbehavior. [email protected] kraﬀ[email protected] The

Uniﬁed Modelling Language is a popular language to model software systems. Amongothers, it comprises graphic notations to express structure, activity and ﬂow of software. .1. The Donation Plugin The Plugin was designed to operate on the current versions of Mozilla Fire-fox and Google Chrome. They were picked because they cover a majority ofusers as they are among the most popular web browsers (statcounter, 2019a).By using two major platforms, we could beneﬁt from their infrastructure thatallowed easy distribution, download and install, uncomplicated updates andpossibly gave us an air of legitimacy as being hosted from the oﬃcial storesite. The usage process was derived from the requirements compiled in theproduct backlog (see Appendix A.3). It is illustrated in Figure 4.1.As most participants / donors were assumed to be patients of the afore-mentioned diseases, thus elderly people with limited technological literacy andwillingness to cope with complicated software, we needed to provide an unso-phisticated piece of software. It required a seamless onboarding process andautomatic execution with minimal user involvement. Hence, we minimized thenumber of steps in the registration process and provided FAQs. Additionally,it should not interfere with everyday browsing and operate in an unobtru-sive manner. That is why the collection runs in non-active tabs in the currentbrowser window. Nevertheless we provided transparency through a utility thatshowed the recent submissions to give users an idea of how their contributionlooked like.Upon downloading, participants should be walked through a gapless on-boarding process. First, they were to accept a privacy statement , then theyshould be redirected to a survey. Here, we wanted to request informationabout participants for statistical reasons and to assign them to a study group.We furthermore planned to gather information to control for frequency of use,domain-speciﬁc results (in the case of academic researchers) Groups should beallocated server-side in a country-by-disease manner plus an additional controlgroup each. Users impacted by a disease were to be allotted to the respectivegroup, unaﬀected people were to be used as control. Controls should suc-cessively ﬁll the control groups. This would ensure that the users were notscattered among the groups and we could guarantee to provide at least onecomparative study. Their donations should be assigned to a participant andgroup identiﬁer.From then on, the plugin should automatically crawl the SERP of Google atbrowser startup and every 4 hours (starting at midnight). Upon completion,it submitted the collection to the server along with participant- and plugin-related administrative and statistical data (IDs, version, time, language). Theplugin queried the Google search engine with terms according to the studygroup a participant was associated with. We denote the results of the indi-vidual queries (searches for keywords) donations . The wrapped up collectionsthat were sent to the server were called submissions . Every four hours, theterms were subsequently sent to Google in a randomized order. The plugin re-quested the website [top level] /search?q= [term] , Herein, the notions of participants and donors are distinct. Participants describe usersthat only downloaded, installed the plugin and registered, whereas donors are activecontributors who submitted their respective collected data. hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) Figure 4.1.:

Sketch of the plugin-server-communication of the EuroStemCell DataDonation, by author where [top level] corresponds to the respective top level domain of a partic-ipant group’s region and [term] relates to the search terms or query. Thequeries were composed of either a [disease] preﬁx (“parksinson’s”, “multi-ple sclerosis”, “diabetes”) followed by clinical terms or “stem cells” in a moregeneral wording (see Appendix A.5.1 for details).

Sprints lasted about two weeks and were loaded with about three work pack-ages each. The software then evolved in a planned manner, as prioritizedby the product owner. Each sprint concluded with a working prototype ofthe plugin that was critically reviewed by the product owner. Versioning wasensured on an university-based github repository .Development was guided by Mozilla’s online documentation of browser ex-tensions (MDN contributors, 2019). According to Mozilla’s documentation,a browser extension consists of a manifest ﬁle , a background page , contentscripts , an options page , browser actions and others.Additionally, to ensure browser interoperability, the webextension-polyﬁll li-brary was included in my code . This allowed me to development the Firefoxversion only. If ported, the library checks the environment it runs in andadapts the code to use callbacks on Chrome and promise -based APIs on Fire-fox. This pertains to all functions of the chrome and browser namespaces,respectively.Furthermore, the uploaded package included HTML ﬁles for on-boarding, https://git.cs.uni-kl.de/m reber16/EuroStemCell https://github.com/mozilla/webextension-polyﬁll, licensed under Mozilla Public License2.0 .1. The Donation Plugin oﬀ-boarding and overview over submitted results, a privacy statement, a con-ﬁgurations ﬁle, their respective CSS and JavaScript ﬁles as well as icons forthe addon’s button.On- and oﬀ-boarding sites comprised informational content whereas theoptions page contained the mandatory survey. They were plain HTML pagesstyled with CSS. We used a design similar to the EuroStemCell corporatedesign to create a feeling of familiarity and leverage the legitimacy of saidorganization. After all, trust is deemed an important success factor in datadonations (Prainsack, 2019).The manifest ﬁle declared version number, extension name and other speciﬁcsthat are required for upload to the browser addon stores in a JSON ﬁle for-mat. Moreover, the ﬁle details the scripts to run and the required permissions.We minimized the amount of permissions to increase the acceptance rate forprivacy-sensitive users through explicitly stating the domains we intended tocrawl.The background page incorporated the main script, the background script ,that runs once the browser starts if the addon is active. It administers reg-istration, manages communication with the server, keeps track of the studyschedule and initiates the crawls. In addition, it checks for updates and loadsthe conﬁguration ﬁle that comprises all parameters for data extraction andserver communication. Eventually it contains handlers to process browser-actions that are triggered after a click on the addon’s button on the browserinterface.First, the page-crawl.js script was developed. It extracts information fromHTML elements on the Google SERP according to the respective parameters(see Appendix A.5.2 for detailed descriptions). It receives them upon invo-cation through the parameters passed by the background script. Finally, itreturns the donation to the background-script.Then the registration process was implemented as illustrated in Figure 4.2.At installation users were prompted to read, understand and accept a privacystatement, see (Figure A.1a). Users were directed to an options page, wherethey ﬁlled out a survey, see Figure A.1b. They were interrogated with respectto health condition, demographics and stem cell-related experiences (see Ap-pendix A.4 for details) . The client submits this information to the serverand registers as a new user there. The server answers with a participant ID, astudy identiﬁer and a list of keywords associated with the study (for a detailedlist of query compositions, see Appendix A.5.1).After registration, scheduling was initiated by the background-script . Thescheduler is also started at each browser startup. First, it executes a crawl,then it uses the browser.alarms API to ﬁre every 4 hours (or more speciﬁcallyat 0, 4, 8, 12, 16 and 20 o’clock). Java Script Object Notation Here, we included a question concerned with the participants being contacted by direct-to-marking practitioners of stem cell therapies. We hoped that this would encouragecontribution, underline the high topicality of the issue and give patients the feeling thatthey are seen and their problems are acknowledged. Research suggests that this canassist reconciliation from harm (Prainsack, 2019). hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) Figure 4.2.:

UML activity diagram of the registration process, by author

To start a crawl, the background page opens a new tab for each search termin the background and injected the page-crawl script. After each keyword-crawl, the respective tab was closed, the results were collected and returned.Then, the next result page was requested according to the randomized keywordlist. We decided to run the collection in the background to provide a lessintrusive experience. The crawl code was injected directly into the newlyopened tabs to circumvent the implementation of content scripts which wouldhave applied to all requests to Google. That could have been seen as privacyinvasion by participants, thus we reduced the scope of the crawl to only thosetabs that the extension itself opened.The page-crawl script returned the results as listed in Appendix A.5.2 tothe background script, which added context information like user and studyidentiﬁers, packaged them and submitted them to the server. As we intendedto deliver the plugin to diﬀerent time zones, we decided to include a time zoneoﬀset identiﬁer with the submissions. For a sketch of the collection process,see Figure 4.3The browser action button was added to increase both engagement of usersand transparency of the addon. While users where not actively participat-ing, the button was styled with an attention-grabbing exclamation mark.Upon click or after the ﬁrst automated crawl (about 30 seconds after browserstartup), the button would resolve to a clean EuroStemCell symbol, if a userwas ready to participate. If not, the registration procedure was imitated. Ifa user was signed up and actively donating, the click on the browser buttonrevealed a page showing recent donations. This was implemented for trans-parency reasons and to consider the relationality of the donated data. Thisway, we could honor the participants’ work through a display of their contri-butions (Prainsack, 2019).74 .1. The Donation Plugin

Figure 4.3.:

UML activity diagram of the data collection process, by author

The ﬁrst release candidate was uploaded to the addon / extension stores ofthe respective browsers after testing. The upload consisted of packaged code,privacy statements, explanatory screenshots and a

Readme ﬁle. The raw codewas also published and updated on a public repository under GNU GPL v3for the sake of transparency and reproducibility . Following feedback fromstakeholders and to resolve issues that came up during production, the pluginwas continuously improved in terms of usability, recognizability and stability.Subsequent updates versions were distributed via the stores update mecha-nisms.Then, 13 Virtual Private Servers (VPS) were used to provide region-speciﬁcbaseline data. Three servers were set up in each of the regions in scope .The machines operated on a clean-slate Ubuntu 18.04 LTS and ran Firefoxand Google Chrome browsers which would only access Google search websitesof various domains (.com,. ca, .co.uk, respectively). The VPS providers wereeach based in and oﬀered services from one of the respective countries, so wecould accommodate for regional eﬀects. The machines were regularly moni-tored and updated to assure duly operations. A server-side logging processwas established to give a rough overview of VPS performance. IP loggingof only virtual clients was rejected by project partners due to privacy con-cerns. Though running on the same speciﬁcations, some servers suﬀered fromunexplainable loss of performance while others operated ﬂawlessly. Hence, op-eration of the Firefox browsers was switched to headless mode to decrease the Repository: https://github.com/AALAB-TUKL/EuroStemCell-data-donation Australia, Canada, United Kingdom, United States of America. In the course of the study,one VPS was added in Florida as our partners noted a large density of ﬁrms practicingstem cell therapy there. hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) processing load. As the reiteration of the plugin process strained the workingmemory of the servers which caused the browsers to crash, cron-jobs wereisntalled to schedule reboots for the machines and restarts for the browsers.The automatized behavior of our plugin allowed us to initiate the donationscomputationally.The overall study period lasted from September, 30th 2019 until March 2020.After that, an oﬀboarding prompt was delivered via an update of the respectiveplugins informing the participants of the end of the study and inviting them toﬁll out an oﬀboarding survey. Finally, they were asked to uninstall the plugin. The donation data was downloaded at the beginning of February. Thus,the study period in scope ranges from September, 30th 2019 until February,11th 2020. The data was compiled to a CSV ﬁle and analyzed with Pythonafter a ﬁrst evaluation in Microsoft Excel. The illustrations were created using Jupyter Notebook in combination with pandas for data cleaning and formattingand matplotlib as well as bokeh for visualization.

In summary, 162 participants registered their plugins on the server. 102 ofthem were actively contributing. They are denoted donors . Of those, 24were VPS servers automatically submitting data as described above (the VPSrepresented 23.5.% of contributing participants). The VPS accounts are ad-dressed by

VPS or VPS donor and the supposedly human donors as “real”donors . Figures B.1 to B.3 in Appendix B.1 show the download statistics ofboth versions of the extension . The store statistics in Appendix B.1 showedthat download ﬁgures plateaued after the mid of November (a third of thestudy period). Participants

Figure 4.4.:

Cardinality of all study groups, grouped by region, color-coded bycondition Data is among the work’s uploaded ﬁles for the reader’s examination. A Comma-separated value (CSV) is a textﬁle containing data that is delimited by a distinctseparator The ﬁgures are drawn from the addon stores’ statistics dashboard. .2. Findings The scope of this thesis was limited to the Parkinson’s Disease (PD) studygroups because the numbers were too low in the groups concerned with theDiabetes and Multiple Sclerosis conditions, as seen in Figure 4.4. The chartvisualizes the respective group sizes and shows that there are as low as zeroparticipants in some groups. The study groups were encoded by numbers.Groups 3, 6, 9, 12 and 15 thus accommodated the users aﬀected by PD fromCanada, the UK, Australia, the US and the global control respectively. Userswho indicated the absence of a relevant medical condition in the survey wereassigned to the control. In the following, this thesis will refer to them as control or control group . These participants were assigned in a fashion thatensured a certain control group size that would allow comparability. The con-trol “buckets” for each condition were subsequently ﬁlled. First, all unaﬀectedparticipants were assigned to the PD control bucket. After this reached a sizeof 50 participants, another condition’s bucket was going to be ﬁlled. We choseto ﬁrstly ﬁll the PD bucket as it was Couturier’s primary concern to investigatethe situation in the realm SCT with respect to PD.Figure 4.5 illustrates the cardinality of the PD study groups. It shows howmany real participants were assigned to the respective groups. From thisanalysis we could have inferred the eﬃcacy of our communications strategy.Because we partnered with medical institutions to promote our cause in thediﬀerent regions, the numbers would possibly reﬂect the success of the respec-tive communication strategy. Nonetheless, their numbers were too low to drawstatistically signiﬁcant conclusions. Figure 4.5.:

Numbers of “real” participants (donors) in the Parkinson’s studies(without VPS participants)

Donations

In the study period, 177,756 donations were submitted to the collectionserver. The contributing participants averaged at 21,747 submissions with amedian of 105. This measure and the 80th percentile of 3270 donations showhow the distribution of donations among donors ﬁts a long tail distribution, The terms “donations”, “submissions”, “entries” are used interchangeably to refer to thedata from an individual crawl that was submitted by actively contributing participants.A crawl denotes one request-scrape-collect cycle of the plugin with one of the study’srespective keywords. hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) Figure 4.6.:

Total donations of real and VPS donors per day (encoded with blueand orange bars, respectively), from September, 30th 2019 until Febru-ary 2nd, 2020 thus is heavily skewed. Figure 4.6 shows that the collection server receivedregular daily donations on a stable level from mid-November on. Althoughthe VPS’ submission frequencies may vary slightly as we see in Figure 4.6,they were a reliable source of donations as they continuously submitted dataas planned.

Figure 4.7.:

Distribution of donations over hours of a day

The VPS donors worked as expected, submitting in a recurring manner, asdepicted by the regular four-hour pattern of the orange bars in Figure 4.7.The contributions in Figure 4.7 between the scheduled donations show thesubmissions at browser startup (by real participants, encoded with blue bars)or reboot / restart (by VPS donors, indicated by orange bars). As most“real” donations were submitted between the four hour spikes (the blue barsin Figure 4.7), triggering the initial donation at startup was a vital functionfor our data collection. This allowed us to capture data even when users werejust brieﬂy browsing the web.Many real donors collected only little data, but there are some users thatconsistently submitted, as seen in Figure 4.8. The ﬁgure shows the individualsubmissions of each real donor. Each donors contributions are depicted bydata points along the x-axis which measures time. The donors are sortedtop down by contribution rank. The lower part of the illustration shows that78 .2. Findings

Figure 4.8.:

Submission events of real donors over the course of the study. Theblue markers indicate the top-20 donators. there were about 140 users that only occasionally donated (red data points).The illustration reﬂects the rise in donation numbers in mid-November, asvisualized in Figure 4.6. Also, we can derive usage patterns from this data thatwould allow us to validate the self-declaration of users concerning computer-/ internet usage. Figure 4.8 further visualizes that there are about 20 donorswho account for about 75% of the donations.

Figure 4.9.:

Donations by individual participant and cumulative submissions

Figure 4.9 supports the lead from above concerning the 20 most activedonors. On top of that it visualized the large contribution of VPS donorswhich amounted to 63.8% of all entries. However, if we would narrow theresearch down to the top 20 donors, we would lose many of the real donors.79 hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD)

As shown in Figure 4.10, the majority of real donors only submitted lowquantities, most of them for a very short period of time (as low as a singleday, see in Figure 4.8.

Figure 4.10.:

Histogram of donor’s donation distributions

Advertisements

Among the 177,756 donations stored at the server, only 5.7% contained ads.This number is derived by selecting only those entries that contain values inthe ads ﬁeld. As some submissions included more than one advertisementper page, they were extracted, which lead to 21,188 single advertisements.According to the domain of the landing pages about 285 hosts accountedfor the paid slots on the SERPs. Figure 4.11 shows that the advertisementson the SERP originate from many small-time advertisers and only few largecompanies. This is reﬂected by an average ad-count per host of 74 and amedian of only 7. 80% of advertisers appeared less than 50 times in the data.This leads to the conclusion that there are many minor players in the ﬁeldwho compete with very strong actors that have signiﬁcant impact as their adsare regularly delivered and thus dominate the ﬁeld of advertisements on theSERPs of SCT-related searches. Figure 4.11.:

Histogram of advertisement host distribution by ad count

Because the intricate nature of the stem cell therapy ecosystem and my A click on the advertisement directs a user to a landing page. This can happen immediatelyor via a proxy which enables an ad exchange platform to monitor click-through rates. Inthe latter case we could seldom capture the destination of a link due to obfuscation (seeSection 4.3 .2. Findings limited knowledge thereof, I consulted with Anna Couturier, PhD candidatein Science, Technology and Innovation Studies of the University of Edinburghto assess the background and validity of the ads and their respective promo-tional messages. Due to her proﬁciency and experience in the ﬁeld of sciencecommunication and stem cell-related research, she undertook the task of la-beling the hosts . This being said, it has to be noted that the categorizationand labeling as well as the distinction of problematic ads do not reﬂect myeducated decision. The labels were selected by Couturier to reﬂect the back-ground of advertisers as it could be inferred from the contents on their website(the advertisement’s landing page). Commercial clinics were deemed to be the

Most Problematic actors, aggres-sively advertising questionable SCT as it was described in Section 3.1. The

Quite Problematic category contains mostly commercial actors that capital-ize on patients’ conditions through complementary services or are interestedin involving them in clinical trials. As described in Section 3.1, private andcommercialized clinical trials are that charge for participation are a threatfor aﬀected people as they might exploit their dire need for a cure.

Poten-tially Problematic institutions need to be evaluated in a more detailed way.Their inﬂuence do not have immediate impact on patients, but pharmaceuticalcompanies and lobbying groups might have an interest in branding keywordsor “framing” (Kahnemann & Tversky, 1984) the search domain around stemcell research and treatments (as described in Section 2.7.3). This can be in-terpreted as the Market-Force of Lessigs regulation framework presented inSection 3.2. The entities in the

Neutral category were deemed unbiased byCouturier in a sense that they would not actively engage in advertising ques-tionable therapies.Figure 4.12 shows that the top-20 of advertisers by number of ads in the datasample comprise many diﬀerent categories. A multitude of advertisers withvarying motives compete for users’ attention on the SERP. This is especiallyinteresting in the ﬁeld of emerging technologies like SCT where persuasion bycommercial actors and lobby interests clash with educational eﬀorts by NGOsand legitimate medical authorities. Among the top-20 there are 6 foundations “This masters thesis is contributing to on-going work at the University of Edinburghthrough the PhD work of Anna Couturier, PhD candidate in Science, Technology andInnovation Studies. As such, the qualitative analysis of the sources of advertisementsis still on-going and will include a detailed coding of the advertising sources accordingto a number of factors, including relationship to stem cell tourism, potential risk topatients, scientiﬁc credibility, and ﬁnancial impact. The coding included here is a rough”ﬁrst pass” ﬁnding for the purpose of this master’s thesis and has been designed tomark potentially problematic advertising sources. These sources have been marked asproblematic according to a number of factors derived from an overview of the landingpages. These factors include explicit referencing of scientiﬁcally unproven treatments,vague claims about medical outcomes, promotion of stem cell tourism, and targeting ofvulnerable patient communities with ﬁnancial impact.”(Anna Couturier) The complete list of hosts with their respective labels and risk score can be reviewedin

Ads Data labeled by Couturier.csv and problematic mapping by Couturier.csv .The ﬁles are among the uploaded ﬁles. Both documents were ﬁlled out by Anna Cou-turier. hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) Most Problematic commercial clinicclinical trials - privateclinical trials - commercialcomplementary treatment - commercialblood banking - commercialQuite Problematic health news - commercialpolitical lobby organizationpharmaceutical companycommercial non-health speciﬁcconference - commercialPotentially Problematic biopharma supplieshealth news - publicresearch instituteblood banking - publicclinical trials - publicconference - publicgovernmentalhealthcare provider - institutionnon-proﬁt health organizationpatient groupssocialcrowdfundingotherNeutral newsNot to determine unknownPossibly drugs Needs review

Table 4.1.:

Advertisement host labels and categorization proposed by Anna Cou-turier .2. Findings Figure 4.12.:

Top 20 advertising domains with respective ad count and labelingby Couturier hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) Figure 4.13.:

Proportion of critical actors in each of the host categories as proposedby Couturier dedicated with educating about PD and fostering scientiﬁc research Thesecond largest source of advertisements titled

Prescription Treatment Website accounts for promotion of PD drugs related to Carbidopa / Levodopa thatare direct-marketed to both patients and practitioners . This shows that notonly aﬀected people are addressed but also health-care professionals. Nineproviders of drugs were identiﬁed in the obfuscated links that direct users totheir respective landing pages via an ad network. Furthermore, there are fourclinical trials being advertised among the top-20, three of which are deemedproblematic. Apparently, there is also recruitment for clinical trials via onlineadvertising, which may be a hint to the marketing strategies of providers ofunproven SCT to acquire customers through ostensible research.This categorization was further boiled down to a binary classiﬁcation of crit-ical / noncritical hosts. Although the majority of critical actors were in factcommercial clinics, there were also entities labeled as commercial clinics thatwere not deemed critical, as seen in Figure 4.13. Additionally, some providersof health news, private clinical trials and complementary treatments qualiﬁedto be critical. False claims with respect to treatment eﬃcacy, open promotionof stem cell tourism and claims of applicability of SCT for sports injuries,hair transplants and cosmetic treatments accounted for the categorization asa problematic actor.To investigate the diﬀerences between the three groups (aﬀected, controland VPS) with respect to entries, advertisements and critical ads, the donors’ The hosts parkinsons.org.uk , parkinsons.org , michaeljfox.org , ukscf.org , hfsc.org , blood.ca all non-proﬁt NGOs concerned with funding independent research in the ﬁeldof medical application of stem cell and providing educational material about conditions,clinics, treatments and other health-related procedures. Carbidopa / Levodopa are two medications used to treat symptoms of PD. They do notalter the progression of the disease. This can be inferred from the creatives of the ads that speciﬁcally address practitioners .2. Findings entries were grouped by study ID for further analysis. Figure 4.14b showedthat users from the aﬀected study groups (study groups 3, 6, 9, 12) receivedmore advertisements in proportion to the number of requests than participantsassigned to control or VPS groups) . The fraction of critical ads was surpris-ingly low in the aﬀected groups, as seen in Figure 4.14c. Figure 4.14b raisesthe suspicion the there is some sort of ﬁtting of the ad delivery algorithm torepeated probing through research as described in (Guha et al., 2010). Thiscan be inferred from our VPS servers in Figure 4.14b as they see a particularlyhigh number of ads. (a) Total number of ads received byparticipants of each group, y-axis is log-scaled (b)

Advertisements received byparticipants of each group as afraction of total entries, y-axisas a proportion . (c) Critical advertisements pergroup as a fraction of total adsreceived, y-axis as a proportion

Figure 4.14.:

Overview of donation statistics by group

Figure 4.15 contrasts the proportion of ads between VPS and real donors.Again, there is no clear sign of targeted advertising between groups or due toreal / VPS distinction. This is probably due to the small number of partic-ipants and the skewed distribution of contributions. Interestingly, the VPSservers that operated from the UK received the ads and the smallest share ofcritical ads.A further analysis could conﬁrm this, as seen in Figure 4.16. Study group 3from Canada can be excluded from this examination as there were not enoughdonors. There was no signiﬁcant diﬀerence between the proportion of critical Note that one entry can yield multiple advertisements. This explains the values higherthan 1. Those participants received more than 1 ad on average each time they queriedGoogle. hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) (a) Real donors (b)

VPS donors

Figure 4.15.:

Proportion of critical ads delivered to real donors, grouped by study .2. Findings Figure 4.16.:

Proportion of critical ads received among all ads, by study group

Figure 4.17.: keywords ads among the study groups as Figure 4.16 shows and a Kruskal-Wallis Testconﬁrmed.Figure 4.17 shows that keywords associated with PD were not necessarilymore targeted as other. Critical advertisers seemed to concentrate on adver-tising stem cell treatments, therapies and cures in general.

Detailed inspection of exemplary SCT advertisers

The host named swissmedica.startstemcells.com was selected by Cou-turier to be a typical source of problematic advertisements . Figure 1.2proved an insightful example of its ads. It was consistently placing ads inthe course of the study and was the third-largest source of advertisements inthis studyTypical example for problematic ads were the ones hosted by swissmed-ica.startstemcells.com were composed of a certain number of keywords in al-ternating arrangements. The terms included but were not limited to It might be of interest that their ads were the only ones in our collection whose URLfeatures smileys and emoticons. Admittedly, this is eye-catching. hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) Cure with the new technology. Proven results. Higher success rate.The latest treatment. Save & eﬀective. No side eﬀects. In details!Revitalization. Diagnostic. Post treatment. Stem cells treatment.Accommodation. Treatment for 60 diseases. Higher success rate. – swissmedica ad content on Septmeber 30th, 2019

The latest treatment. Proven results. No side eﬀects. Cure withthe new technology. High success rate. In details! Post treatment.High success rate. Dementia. Diabetes 2. Diagnostic. Arthritis.Autism. Multiple sclerosis. Innovative treatment. Treatment for60 diseases. – swissmedica ad content on February 8th, 2020

Figure 4.18.:

Typical examples for swissmedica advertisement creatives • Proven results • Cure with the new technology • Higher success rate • No side effects • Treatment for 60 diseases • Higher success rate • Save & effective • Destinations: Switzerland, Slovenia, Serbia, Russia, Austria

These keywords relate to the narratives of the stem cell tourism industry. Theyusually advertise their treatments as safe, successful, advanced and approved.They furthermore oﬀer international travel and claim applicability for a widearray of conditions.

As further discussed below, we could not determine the mechanics behind thetargeted advertising of questionable SCT. This is due to the limited number ofactively contributing participants and the nature of the data collection. Ourapproach refrained from large-scale data collection for the beneﬁt of privacyand data security. We did not want to stimulate privacy concerns amongpotential participants or endanger them through uncertain future use of thepublished data. Furthermore, we cannot guarantee, that our research had notrepercussions on the target system because our intent was discovered.Since selection processes are at the heart of an ad exchange, the entiretyof advertisements in the online advertising ecosystem are subject to rigorousselection. From the plethora of available ads, only few make it to the biddingprocess due to quality or policy reasons. Then, they are subject to an intricate88 .3. Lessons learned and opaque auction. In the end, only those ads that have optimal valuewith respect to user personalization, bidding price, quality, inventory slot andcompeting content are delivered to a searcher. As a consequence, we can nevergrasp the entirety of ads related with a subject. We can only assess a subsetthereof, in a speciﬁc environmental conﬁguration regarding user proﬁle, timeand space of a request. In conclusions, there might be ads out there that arehighly signiﬁcant to a research question but there is no way to guarantee thatthey are eventually being delivered to participants.The VPS services were locally sourced from Australia, Canada, the UnitedKingdom and the United States of America. Even though the providers werelocated there, we could not guarantee that the virtual servers really operated inthe respective ZIP codes. We found that some location declarations from theprovider deviated from our contractual agreements or details retrieved from athird party localization service. Nevertheless, we cannot expect server farmsto located in an average neighborhood, so the IP location probably reveals theartiﬁcial nature of a web request, anyway.Due to privacy concerns we were not tracking Google login status, cookies orﬁngerprints. To better understand targeting, the insights with respect to usertracking would have enabled an analysis through Google’s lenses and controlfor tracking protection measures possibly employed by users. Also, we couldhave examined whether users receive diﬀerent ads and results depending onwehther they are logged in on Google.

In the course of the EuroStemCell Data Donation, the remarks of Section 3.4.1were implemented if feasible. Nevertheless there some learning experiencesthat are described in the following paragraphs. They originate from the re-view of literature, discussions with peers, the deployment of the plugin, datacollection and analysis and the interpretation thereof. Some were of concep-tional nature, while others just took time to review and ﬁx. They are presentedso future research can built on top of them. First, learnings concerning thestudy design are listed. Then, the individual learnings pertaining to technicalaspects are assigned to the crucial phases of Kraﬀt et al.’s Black Box AnalysisProcess (Kraﬀt et al., 2020, forthcoming). He describes the critical steps ofa Black Box analysis and illustrates how practitioners can fail to conduct asound analysis. However, his model is only concerned with the actual execu-tion and evaluation of an analysis. Thus, the study design aspects are notincluding in these assignments.

The following thoughts were compiled after the data collection, when someshortcoming of the approach became evident. Herein, the learnings with re-spect to study design are described in chronological order pertaining to theanalysis process depicted in Section 3.4. They describe the two initial steps89 hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) in the development process as depicted in Figure 3.3 and precede the actualanalysis, that will be covered in 4.3.2.

Pre-Study:

A pre-study on existing solutions in the ﬁeld of Black Boxanalysis could have facilitated the development of the plugin. Section 3.4gives a brief overview of developments made so far. Most of the Black Boxsofware solutions are open source, though some of them operate on outdatedbrowser versions. They provide insights with respect to technologies of browserautomation (like

Selenium ) or other libraries concerned with web crawling.However, as the EDD project was forced to quickly deliver a working pluginafter the surprising announcement of the policy change, those alternativescould not be reviewed in-depth.

Target audience:

In the projects introductory workshop, the plugin, theusage scenario (including the search terms) and the typical users were mod-eled. Usage scenarios, search term formulation and search strategies werediscussed among young and tech-savvy academics. However, research sug-gestes that these properties vary by demographic and motivation (Lorigo etal., 2006; Weber & Jaimes, 2011). Investigating real usage scenarios, personalbackgrounds of potential users and Internet and technology literacy distribu-tions among them could have supported a more reﬁned understanding of theplugin’s target audience.Supposedly, it would have been advisable to consult with representatives ofthe target user audience which were assumed to be elderly due to the natureof the diseases we covered. The opportunity to connect with them throughpatient groups associated with Eurostemcell was left untouched due to theconstrained time and the geographical distance. To learn more about popular

Figure 4.19.:

Keyword suggestions in the Google Ads campaign setup process,screenshot of the Google web interface of Google, by author search terms in a certain ﬁeld, Google’s suggestions can be examined (see 4.19.90 .3. Lessons learned

They are presented in the process of setting up a new advertising campaign viaGoogle Ads. Researchers could infer popular keyword combinations or searchqueries from these suggestions, as they are probably compiled for advertisers(who aim to maximize reach or eﬃcacy of their ads). In conclusion, theseapproaches could strongly facilitate a more customized development.

Organization:

VPS services were ordered and managed abroad from aGerman location and payed with Scottish credentials, it was a common viewto see services suspended due to measures of automatic fraud detection. Itprobably streamlines organizational processes with respect to payment andmanagement if resources and agency were allocated more closely.

Reach:

Distribution and promotion of the plugin and EDD’s mission wereonly conducted via EuroStemCell’s network of aﬃliated researchers and pa-tient groups. We did not try to advertise our cause to other groups that mayhave been enthusiastic to join. After all, is a thing on so-cial media, and a broadly discussed topic in medicine, sociology, law and, ofcourse, business. There are various NGOs, interests groups and individualsthat are engaged with medical data donations and its personal and societalimplications. For example, the Hasso Plattner institute recently introduced a Data Donation Pass (Hasso-Plattner-Institut, 2020; Schapranow et al., 2017).It might give research endeavors like this an uplift to connect with like-mindedprojects and leverage their respective networks or advances in the ﬁeld of soci-etal data donations. Additionally, this gives the chance to take a participatoryrole in the development of data donation concepts and infrastructures.

Crowdsourcing Recruitment:

Since a crowdsourced audit approach wasselected, this was the most critical step for the EDD project. Recruitingreal-world participants leverages the opportunity to probe a Black Box withreal-world user proﬁles. However, the target audience we meant to address ishard to mobilize, apparently. Though there are reportedly strong ties betweenEuroStemCell and patient groups, we failed to get aﬀected people onboard.“A number of patient recruitment events were held including three eventswith Parkinson’s UK, two events with the Anne Rowling Clinic and a numberof internal recruitment drives (using mailing lists and direct mailings) withthe Australian Stem Cell Network, the University of Texas in Austin, Yale-New Haven Hospital, and the Edinburgh Parkinson’s Research Interest Group.However, these events produced more one-to-one structured interview oppor-tunities rather than translation to study recruitment. This may have been dueto the demographic targeted as well as the diﬃculty in translating in-personengagement into digital engagement”(Anna Couturier).

Education:

Presumably, the target audience consisted mostly out of seniorcitizens. This can be deduced from the fact that the project is directed atpeople suﬀering from Parkinson’s disease, Multiple Sclerosis and Diabetes. Italso reﬂected Couturier’s experience in this ﬁeld. She further assumed thatthese users have low technology-literacy. Thus, educational material or on-boarding guides regarding the plugin and the EDD might have supported thecause. Hands-on trainings or explanatory videos could have boosted adoption. https://twitter.com/search?q=datadonation hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) However, these measures must be speciﬁcally designed to address the targetgroup and convincingly engage them to join. Unfortunately, this was notin the scope of this thesis as it requires comprehensive analysis of demands,expectations and motivations of the target group as well as an investigationof available methods and their respective eﬃcacy.

Survey:

The survey was composed for statistical purposes. In fact, thecorrectness of the submitted data was never controlled. We trusted usersto truthfully ﬁll out the survey and not falsify information. Also, this kindof information gathering is a balancing act between invading the privacy ofsensitive groups and detailing a user’s characteristics which facilitates analysis.On another note, the survey could have been expanded by questions like “

Howdid you learn about the study? ”This would enable researchers us to evaluate the success of your recruitmenteﬀorts between regions and among partner institutions and communicationchannels. In consequences, this would have allowed us to strengthen somebonds and emphasize our eﬀorts to push the EDD to some regions.

Time period:

It remained unclear whether the time period allocated forthe study had any impact on the results. We see that some academics allottedas little time as a week to their study (Guha et al., 2010; Yan et al., 2009), whileothers processed data from longer intervals. As described in Section 2.7, someweb services update on a daily basis, which infers highly volatile algorithms.However, the changes might be so marginal that for narrowed-down researchquestions it may be unlikely to see an impact. Nevertheless, the longer thestudy interval, the greater the eﬀect of aggregated changes. This being said,for a snapshot-like investigation of a speciﬁc question (like in this case) it couldsuﬃce to reduce time and broaden the search eﬀort in this time in exchange(e.g. expand queries, create sophisticated proﬁles, create more variety amongparticipants). If there is no major advancement in the ﬁeld of stem-cell relatedresearch or a major shift in the web advertising ecosystem, the structure ofresults supposedly remains stable. Nonetheless, these are interesting eﬀectsthat should deﬁnitely be accounted for.

Figure 4.20.:

Conceptualized process of a black box analysis. The numbers repre-sent the diﬀerent steps in which errors can occur, from (Kraﬀt et al.,2020, forthcoming)

The lessons learned with respect to technical aspects are structured alongKraﬀt’s concept of Black Box analyses (Kraﬀt et al., 2020, forthcoming). Fig-ure 4.20 schematically displays the analysis process in the last four steps ofthe Chain of Responsibility described in Section 3.4 and Figure 3.3, especially.92 .3. Lessons learned (1) “Fluid” Internet:

As seen in Appendix C, ISEs like Google and othermodern Internet-based platforms are in a constant ﬂow. They dynamicallyadapt their websites to follow trends, update their algorithms daily and im-prove their services through A/B-Testing. This makes web crawling strenuous,as website structure can change any time. Thus, it is hard to identify diﬀerentinstances of the same ad. Incomplete information due to real-time auctionamong several other advertisements, load balancing and network routing mayaﬀect delivery (Guha et al., 2010) Thus, if relying on HTML tags, one hasto closely monitor the online documents to register changes and appropriatelytweak the respective software. A slight change in website (DOM-) structureor naming conventions (element IDs) would have rendered our crawl useless,as no data would have been extracted. This can be countered with storing thewhole website. (1 A/B) Infrastructural Limitations:

By using VPS hosts that accom-modate numerous virtual systems, there is the risk that an IP range will beblocked by Internet services. This happened at our US-based VPS server loca-tion in Dallas, where requests to Google’s web search were consistently blocked.Some other locations required us to solve captchas to prove the truthful in-tentions and non-robotic nature of the user. As the servers were meant toautomatically deliver baseline results, this turned out to be impracticable asit required constant manual interaction. (1 A/B) VPS Security

Although the VPS’ operating systems were reg-ularly updated to the latest version, we received alerts of increased Disk I/Orequests during the study on one of the Australian VPS (see Figure 4.21a).As we did not perform recurring high-load operations on these machines, theypossibly received malicious attention from the outside. The server logs showedthe patterns of a distributed brute-force authentication attack over SSH on al-most all of the servers (see Figure 4.21b for an example log). The serverbecoming a target of coordinated attacks disqualiﬁes it as reliable control forthe study. However, due to the structure of the attacks I assumed that wewere dealing with an arbitrary non-targeted online attack with either leaked orwidely used “standard” credentials. As our servers were protected with strongpassphrases, they were not shut down. As a countermeasure, we could haveused a SSH port diﬀerent from the standard Port 22, blocked all access fromIPs other than the ones on a whitelist, entirely prohibit SSH remote loginsor only allowed SSH login via public/private RSA keys. These approacheswere rejected, because the problem occurred at the end of the machines’ life-cycle. For future studies that use a similar setup, it would be advisable touse a whitelisted VPN server to connect to the VPS, so all stakeholders haveaccess through a protected tunnel. Other than that, enabling authenticationvia ﬁngerprint is also eﬀective but requires the stakeholders to collect theirrespective keys ﬁrst and add them to every server. (1 A/B) Human-Computer Diﬀerences:

As discussed in (Diakopoulos,2013a), a SRA might behave diﬀerently if queried by an automatic agent. Di-akopoulus therein experienced this phenomenon as results of human-computerinteraction did not line up with bare API requests. He argues that in order to93 hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) (a)

VPS provider alert after a spike of 1409 request per second, screenshot by author (b) auth.log of the attacked server, screenshot by author

Figure 4.21.:

Brute-force attack on one of the VPS in the Australia control groupon 15.02.2020 .3. Lessons learned conduct a truly reliable study, one has to closely imitate users and simulate theusage scenario as close as possible. This has to be adapted to the respectivetarget audience of a Crwodsourced Audit as well since demographics mighthave an eﬀect on Internet and media literacy. (1 B/C) Research detection: Internet-based service providers of Google’sscale might have the capability to detect automatized audits. There is no suchevidence but some of our VPS were blocked because of increased traﬃc fromthe respective Internet node. This proves that there are at least some mecha-nisms to deal with suspicious traﬃc. Scholars already noted the possibility ofthis to happen in (Datta et al., 2015).This being said, organized computational approaches like Sock Puppet orCrowdsourced Audits that operate in very predictable patterns are easy tobe identiﬁed and might see countermeasures such as captchas, traﬃc thresh-olds, IP-range blocking and adapted responses. They were all hosted by asingle provider, thus it seemed like other virtual machines on the respectiveserver already produced too much traﬃc. This being said, relying on third-party hardware, especially virtual machines can impede a research endeavor.Actions by other clients of the respective virtual machine can arouse suspi-cion. This might entail punitive measures by the researched Black Box systemagainst the whole IP range allocated to the virtual machines of a server. (1 B/C) Bot-Control:

A scraping audit like the one used in this thesismust be easily manageable. This requires centralized roll-out, administrationand controlling of VPS as well as real-time information about every machine’sperformance. It took an unnecessarily long time to set up the VPS due tothe multitude of providers, procedures, requirements. Although the rentedVPS had equal speciﬁcations, the runtime behavior of the machines diﬀeredgreatly from optimal to unstable to unusable. Some would perform ﬂawlessly,others crashed at low loads. It would have saved a lot of time and eﬀort toorder VPS services from only one provider that operates globally and serveswith scalability both in size and reach. This could have greatly reduced setuptimes, administrative overhead and conﬁguration eﬀorts. Also, it would havegreatly simpliﬁed logging of VPS performance. (1 C) Timing Intervals:

Data analysis showed that the majority of sub-missions by real users occurred in between the 4-hour intervals. Thus, in-cluding the data donation in the startup process of the browsers was vital forthe collection. Of course, the subject of interest (stem cell treatments) mayshow some topical advancement over time, but it is not as time-sensitive asfor example news-related political data shortly before a major election (as in(Kraﬀt et al., 2017)). (2) Collected Data: “Raw” data is superior. By pre-selecting the at-tributes to store, the chance to re-analyze the results is missed. Thus, anevaluation from a diﬀerent perspective or with an alternative research ques-tion at a later point of time is basically impossible. Also, the snapshot of thereal result page is lost. On top of that, future research is hampered by thislimitation. After all, we decided to publish the collected data after the study meaning unﬁltered hapter 4: EuroStemCell Data Donation 2019 / 2020 (EDD) concludes. (3) Timeliness and obfuscation of ads: Many ads were delivered overad networks like Google’s doubleclick or googleadservices . To enable perfor-mance tracking and billing, these referrer links contain uniquely identifyingsequences and are often obfuscated with respect to their actual destination.Moreover, those links are only valid for a limited time. Therefore, the desti-nations of the links collected during the study period were not accessible forfurther examination at the time of the analysis.The source of an advertisement was inferred only from the data that was avail-able on the SERP, namely the respective name of each ad as it was denoted inthe crawled HTML element. In the process of creating ads though, one is nothindered to put any arbitrary URL as a redirect destination. Theoretically,an entity other than the promoted one may have created the ad.Consequently, neither their origin, nor their destination could be retrieved.Further research should consider capturing the eventual landing pages (possi-bly after a user interaction like a click) as well to allow a reliable associationof ads and websites. Nevertheless, some links included clear text destinationURLs that could be extracted and scrutinized. (3) Data Format: The data was made available as a csv-ﬁle download.The collected data was very heterogenous with respect to symbols (some evenincluded smileys) and special characters like commas were not escaped inthe ﬁrst place. Thus, the delimiter (we used a semi-colon, “;”) has to becarefully picked to correctly structure the downloaded data. Moreover, theserver download function initially changed all double-quotes to single-quotesmaking parsing the string data to JSON impracticable.96 . Conclusion “I think our ﬁndings suggest that there are parts of the ad ecosys-tem where kinds of discrimination are beginning to emerge andthere is a lack of transparency,[t]his is concerning from a societalstandpoint.” (Simonite, 2015, Anupam Datta, one of the develop-ers of AdFisher)In this thesis, I examined the socio-technical system of web-advertising usingthe example of Google’s integrated search engine. I developed a browser pluginto crowdsource data that was used to conduct a Black Box analysis of saidsystem. I wanted to scrutinize whether a change in Google’s advertising policyhad any eﬀect on problematic health-related ads.The data from our collection shows that Google’s policy change did noteradicate questionable stem cell advertisements on its online platform. Thus,patients of severe diseases are still being targeted by providers of unprovenstem cell treatments and other questionably actors. This poses a societal riskbecause a vulnerable user group is being discriminated. The second researchquestion cannot be fully answered. Although there were no signiﬁcant eﬀects,this might be due to our small and possibly biased sample.Besides, we learned that there are several competing actors that advertise inthe realm of stem cell treatments. Those actors have distinct motivations withrespect to either commercial or educational intentions. There is a constantstruggle for attention between cautionary medical associations and question-able actors. The narrative of stem cell tourism as described in Section 2.6 couldbe conﬁrmed as there were multiple agents that openly advertised unapprovedtreatments.Discrimination can occur through an advertiser’s questionable motivation,the targeting process or the targeted audience (the eventual outcome) (Spe-icher et al., 2018). Due to the high complexity and interdependency of theplatform, we cannot determine which of the three causes ultimately lead tothis condition. In order to sustain the web search ecosystem, it is vital toguarantee users safe interaction with advertisers’ content (K. O’Donnell &Cramer, 2015). Society and especially advertisers and intermediaries in theonline advertising ecoystem need to consider users’ perception of ads, includingpotential confusion as well as concerns regarding personalization and abuse.To summarize, it should be possible to scrutinize socially relevant algo-rithms as they have signiﬁcant impacts on society. Because society decideswhich parts of a technical system to adopt, all involved parties have to assesstechnical components collaboratively to establish fair and safe communica-tion processes. Either providers of SRE should enable examination or societyshould strive to analyze, evaluate and correct these systems. 97 . Future Work

On a last note, we found that socially relevant algorithms like the ones de-ployed in Google’s ISE are impossible to scrutinize from the outside. Any con-ventional small-scale study fails because of unobservable variables, timelinessof algorithms, interdependence of actors, Personalization and A/B-Testing ofonline services make it make it hard to retrieve a comparable snapshot of a sys-tem. Due to the opaque nature of these SRE, researchers are compelled to useBlack Box analysis and demand “infrastructure and tools to study these sys-tems at much larger scale” (Simonite, 2015, Roxana Geambasu). This wouldallow for a “widely applicable, systematic approach with a real impact” (Pe-dreschi et al., 2018, p. 5). This being said, academics concerned with the ﬁeldpropose two main approaches. Along with an (possibly selectively) accessibleAPI to test SRAs (possibly by a watchdog authority (Zweig, Fischer, et al.,2018)) it would be helpful to establish methods and infrastructures that allowfor crowdsourced and publicly available data donations.The ﬁrst approach intends to probe SRAs or socio-technical systems via aninterface that enables researchers to gather receive output for a speciﬁed. Inour case, outputs are usually heavily personalized, so this would require com-puting input conﬁgurations based on the variables that the algorithm uses.As these remain undisclosed, this option falls short. Further research maycome up with software to facilitate crowdsourced data collection and stan-dardized Black Box frameworks to scrutinize online platforms. Regulatoryeﬀorts should encourage developers of algorithms to comply with principles ofalgorithm accountability and foster public scrutiny (USACM, 2017).However, SRAs must be evaluated in the respective contexts or environ-ments they are applied in to account for emergent eﬀects. Thus, involvingthe aﬀected social system is crucial for a sound analysis. Thus, it has beensuggested to establish trustworthy and honest

Donation Brokers (Vaught &Lockhart, 2012). These could act as an intermediary between data donorsand researchers. They could enable donors to determine the terms of usagewith respect to time, research subject or involved parties . In turn, the bro-ker would ensure proper use and conduct as well as fair licensing (Hummelet al., 2019). Crowdsourcing data collection in a privacy preserving mannerwould enable society to take part in the process of algorithm accountabilityand support the scrutinizing of algorithmic systems that aﬀect them. Herein, However, this would undermine the notion of a donations as gifts in the sense of “conscious,deliberate, uncoerced acts of giving, informed by beliefs about a need that is beingaddressed through the donation” (Hummel et al., 2019). Nonetheless, it has to be madeclear to users that digital data donations are subject to an uncertain future use andan unknown degree of comprehensibility through emerging methods of gathering andanalysis. hapter 6: Future Work future research could develop frameworks of transparent and reliable donationplatforms were society can contribute to public scrutiny of private technicalsystems.In addition, comprehensible information pertaining to data sources and al-gorithmic decisions can be a ﬁeld of future research. Similar to the “NutritionLabel for privacy” (Kelley et al., 2009; Kelley et al., 2010), this may improveusers’ understanding of underlying mechanics and risks and improve techno-logical literacy. This might increase user acceptance and reduce perceiveddiscrimination, questionable advertisements and data privacy scandals.Finally, interdisciplinary research might yield interesting insights in howsystems can be governed. Kooiman draws a framework that helps to char-acterize interactions and mutual inﬂuences of interdependent systems. Hismodel of governance could in the future be applied to STS to understand the governability of the systems and their respective interactions (Kooiman, 2008;Kooiman & Bavinck, 2013).With respect to the EDD, there is still a lot to uncover. This thesis onlyprovided a glimpse at the workings of the web-based SCT industry. Themultivariate data that was collected, provides new perspectives on the webadvertising ecosystem that evolves around stem cell treatments. Includingquestions like; • Who is your go-to information source pertaining to stem cell treatments? • Do you search for health-related information online? • What are your concerns with respect to stem cell treatments? • What are the ﬁrst 3 terms that come to your mind when you think ofstem cell treatments? • Are you willing to try experimental therapies?on future surveys might shed a light on the motivations of patients to searchfor health related information online and who they trust. It would also beinteresting to learn whether some advertisers succeeded in “branding” a searchterm. If donors are willing to submit more data about themselves, researchersare able to deduce targeting mechanisms. They could investigate the relationbetween types of advertisement and medical condition or sensitive attributeslike religious beliefs, risk aﬃnity and Internet literacy. Because the evaluationof online oﬀers of SCT is probably highly dependent on familiarity with theInternet and the health sector in general, correlations between those factorscould be subject of future research.The majority of creatives was composed from a collection of terms thatare used in alternating order and combinations. It would be interesting toanalyze these compositions in the future to search for patterns with respectto personalization.Another interesting ﬁeld is the to be found at the second largest host ofads in our study. The subset of data concerned with drugs can be analyzedwith respect to the targeting behavior of advertisers. The peek into some100 hapter 6: Future Work of the advertisements revealed that they are equally addressing patients andpractitioners. Future research could be concerned with the degree to with thistargeting occurs.As Couturier proposed above, the classiﬁcation of advertisement hosts isongoing work and needs some more scrutiny by medical professionals andpeople who are familiar with the ﬁeld of SCT. The comparison of advertisementcreatives and landing page content could reveal whether misleading lures areused to capture users’ attention.To further explore the international targeting of providers of SCT, it wouldbe interesting to collect more detailed user information with respect to theirresidence and examine the regional scope of the various advertiser categories.101 ibliography

Media and Communi-cation , PublicLibrary of Science , (2).Ammori, M. (2014). The “New” New York Times: Free Speech Lawyering inthe Age of Google and Twitter: The First Amendment moves beyondthe courts. Harvard Law Review , , 2259–2296. https://harvardlawreview.org/2014/06/the-new-new-york-times-free-speech-lawyering-in-the-age-of-google-and-twitter/Ananny, M. (2016). Toward an Ethics of Algorithms. Science, Technology, &Human Values , (1), 93–117. https://doi.org/10.1177/0162243915606523Andreou, A., Venkatadri, G., Goga, O., Gummadi, K., Loiseau, P., & Mislove,A. (2018). Investigating Ad Transparency Mechanisms in Social Media:A Case Study of Facebook’s Explanations. NDSS 2018 - Network andDistributed System Security Symposium . https://doi.org/10.14722/ndss.2018.23204.hal-01955309Andreou, A., Silva, M., Benevenuto, F., Goga, O., Loiseau, P., & Mislove,A. Measuring the Facebook Advertising Ecosystem (A. Oprea & D.Xu, Eds.). In: In

Proceedings 2019 Network and Distributed SystemSecurity Symposium (A. Oprea & D. Xu, Eds.). Ed. by Oprea, A., &Xu, D. Reston, VA: Internet Society, 2019, 1–15. isbn : 1-891562-55-X.https://doi.org/10.14722/ndss.2019.23280. 103 ibliography

Denkweisen und Grundbegriﬀe der Soziolo-gie: Eine Einf¨uhrung (Vol. 543). Campus.Ashby, W. R. (1957).

An Introduction to Cybernetics . Chapman & Hall.Ashkan, A., Clarke, C. L. A., Agichtein, E., & Guo, Q. Classifying and Char-acterizing Query Intent (M. Boughanem, C. Berrut, J. Mothe, & C.Soule-Dupuy, Eds.). In: In

Advances in Information Retrieval (M.Boughanem, C. Berrut, J. Mothe, & C. Soule-Dupuy, Eds.). Ed. byBoughanem, M. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009,578–586. isbn : 978-3-642-00958-7.Backstrom, L., & Kleinberg, J. Romantic partnerships and the dispersion ofsocial ties: a network analysis of relationship status on facebook. In:

Proceedings of the 17th ACM conference on Computer supported coop-erative work & social computing . 2014, 831–841. Retrieved January 9,2020, from https://dl.acm.org/ft gateway.cfm?id=2531642Baker, P., & Potts, A. (2013). ‘Why do white people have thin lips?’ Googleand the perpetuation of stereotypes via auto-complete search forms.

Critical Discourse Studies , (2), 187–204. https://doi.org/10.1080/17405904.2012.744320Ballatore, A. (2015). Google chemtrails: A methodology to analyze topic rep-resentation in search engine results. ﬁrst monday , (7). https://doi.org/10.5210/fm.v20i7.5597Balog, K., & Kenter, T. Personal Knowledge Graphs (Y. Fang, Y. Zhang,J. Allan, K. Balog, B. Carterette, & J. Guo, Eds.). In: In Proceed-ings of the 2019 ACM SIGIR International Conference on Theory ofInformation Retrieval - ICTIR ’19 (Y. Fang, Y. Zhang, J. Allan, K.Balog, B. Carterette, & J. Guo, Eds.). Ed. by Fang, Y. New York,New York, USA: ACM Press, 2019, 217–220. isbn : 9781450368810.https://doi.org/10.1145/3341981.3344241.Barford, P., Canadi, I., Krushevskaja, D., Ma, Q., & Muthukrishnan, S. (2014).Adscape: Harvesting and Analyzing Online Display Ads. RetrievedJanuary 3, 2020, from http://arxiv.org/pdf/1407.0788v2Battelle, J. (2005).

The Search: How Google and Its Rivals Rewrote the Rulesof Business and Transformed Our Culture . Nicholas Brealey.Baum, W. M. (2013). What counts as behavior? The molar multiscale view.

The Behavior Analyst , (2), 283–293. https : / / doi . org / 10 . 1007 /bf03392315Belkin, N. J. (1978). Information concepts for information science. Journal ofdocumentation , (1), 55–85.104 ibliography Bennett, J. (2010).

Vibrant matter: A political ecology of things . Duke Univer-sity Press.Bi, B., Shokouhi, M., Kosinski, M., & Graepel, T. Inferring the demographicsof search users: Social data meets search queries. In:

Proceedings ofthe 22nd international conference on World Wide Web

Big Data & Society , (1), 205395171665215. https://doi.org/10.1177/2053951716652159BIS Research. (2019). PR Newswire: Global Stem Cell Therapy Market toReach $ Nature , (7415), 295–298. https://doi.org/10.1038/nature11421Bracha, O., & Pasquale, F. (2008). Federal Search Commission - Access, Fair-ness, and Accountability in the Law of Search. Cornell Law Review , (6), 1149–1210.Brin, S., & Page, L. (1999). The Anatomy of a Large-Scale Hypertextual WebSearch Engine. Retrieved December 29, 2019, from http : / / infolab .stanford.edu/ ∼ backrub/google.htmlBroder, A. (2002). A Taxonomy of Web Search. SIGIR Forum , (2), 3–10.https://doi.org/10.1145/792550.792552Brukman, M. Y., Horling, B. C., & Zamir, O. E. (2013). Systems and methodsfor promoting search restults based on personal information (No. 8,620,915).Retrieved January 1, 2020, from https://patentimages.storage.googleapis.com/fd/5e/c8/8e9f3bf69ac9fb/US8620915.pdfBrunton, F., & Nissenbaum, H. (2011). Vernacular resistance to data collectionand analysis: A political theory of obfuscation. ﬁrst monday , (5).Retrieved January 29, 2020, from https://firstmonday.org/ojs/index.php/fm/article/view/3493Bucher, T. (2016). Neither Black Nor Box: Ways of Knowing Algorithms. InS. Kubitschko & A. Kaun (Eds.), Innovative Methods in Media andCommunication Research (pp. 81–98). Springer International Publish-ing. https://doi.org/10.1007/978-3-319-40700-5 {/ textunderscore } ibliography Hofstra Law Review , Washington Law Review Washington Law Re-view , (89). https://digitalcommons.law.uw.edu/wlr/vol89/iss1/2Clark, H. H., & Brennan, S. E. (1991). Grounding in Communication.

Perspec-tives on Socially Shared Cognition , ∼ Travel medicine and infectious disease , Geschichten der Informatik: Visionen, Paradigmen, Leitmotive (pp. 473–497). Springer.Dai, H. K., Zhao, L., Nie, Z., Wen, J.-R., Wang, L., & Li, Y. Detecting onlinecommercial intention (OCI). In:

Proceedings of the 15th internationalconference on World Wide Web . 2006, 829–837. Retrieved January 13,2020, from https://dl.acm.org/doi/10.1145/1135777.1135902106 ibliography

Datta, A., Tschantz, M. C., & Datta, A. (2015). Automated Experiments onAd Privacy Settings.

Proceedings on Privacy Enhancing Technologies , (1), 92–112. https://doi.org/10.1515/popets-2015-0007Davies, D. (2017a). Patent 1 of 2: How Google learns to inﬂuence and controlusers. Retrieved December 30, 2019, from https://searchengineland.com/patent-1-2-google-learns-influence-control-users-272358Davies, D. (2017b). Patent 2 of 2: How Google learns to guide purchasingdecisions. Retrieved December 30, 2019, from https://searchengineland.com/patent-2-2-google-learns-guide-purchasing-decisions-273055Dhar, V. (2013). Data science and prediction. Communications of the ACM , (12), 64–73. https://doi.org/10.1145/2500499Diakopoulos, N. (2013a). Algorithmic Accountability Reporting: On the In-vestigation of Black Boxes. https://doi.org/10.7916/D8ZK5TW2Diakopoulos, N. (2013b). Sex, Violence, and Autocomplete Algorithms: Whatwords do Bing and Google censor from their suggestions. RetrievedDecember 24, 2019, from https : / / slate . com / technology / 2013 / 08 /words-banned-from-bing-and-googles-autocomplete-algorithms.htmlDiakopoulos, N. (2015). Algorithmic Accountability. Journalistic investigationof computational power structures. Digital Journalism , (3), 398–415.https://doi.org/10.1080/21670811.2014.976411Dickey, M. R. (2017). Algorithmic Accountability. Retrieved December 16,2019, from https://techcrunch.com/2017/04/30/algorithmic-accountability/Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion:people erroneously avoid algorithms after seeing them err.

Journal ofexperimental psychology. General , (1), 114–126. https://doi.org/10.1037/xge0000033Dodd, H. (2017). Exclusive Q&A with Google’s Gary Illyes at BrightonSEO2017.Donzelot, J. (1991). The mobility of society. In M. Foucault, G. Burchell, C.Gordon, & P. Miller (Eds.), The Foucault eﬀect

Advertising as communication . Routledge.Ebeling, M. F. (2019). Patient disempowerment through the commercial accessto digital health records.

Health (London, England : 1997) , (4), 385–400. https://doi.org/10.1177/1363459319848038Eckersley, P. How Unique Is Your Web Browser? (M. J. Atallah & N. J. Hop-per, Eds.). In: In Privacy Enhancing Technologies (M. J. Atallah &N. J. Hopper, Eds.). Ed. by Atallah, M. J., & Hopper, N. J. Berlin,Heidelberg: Springer Berlin Heidelberg, 2010, 1–18. isbn : 978-3-642-14527-8. Retrieved January 13, 2020, from https://panopticlick.eff .org/static/browser-uniqueness.pdf 107 ibliography

Edelman, B. (2011). Bias in Search Results?: Diagnosis and Response.

TheIndian Journal of Law and Technology , , 16–32.Edelman, B., Ostrovsky, M., & Schwarz, M. (2007). Internet advertising andthe generalized second-price auction: Selling billions of dollars worthof keywords. The New York Times

Joint Proceed-ings of the Posters and Demos Track of the 12th International Con-ference on Semantic Systems - SEMANTiCS2016 and the 1st Inter-national Workshop on Semantic Change & Evolving Semantics (SuC-CESS’16) co-located with the 12th International Conference on Seman-tic Systems (SEMANTiCS 2016), Leipzig, Germany, September 12-15,2016 (M. Martin, M. Cuquet, & E. Folmer, Eds.). Ed. by Martin,M. CEUR Workshop Proceedings. CEUR-WS.org, 2016. http://ceur-ws.org/Vol-1695/paper4.pdfEnserink, M. (2006). Biomedicine. Selling the stem cell dream.

Science (NewYork, N.Y.) , (5784), 160–163. https://doi.org/10.1126/science.313.5784.160Epstein, R., & Robertson, R. E. Democracy at risk: Manipulating search rank-ings can shift voters’ preferences substantially without their awareness.In: .2013.Epstein, R., & Robertson, R. E. (2015). The search engine manipulation eﬀect(SEME) and its possible impact on the outcomes of elections. Proceed-ings of the National Academy of Sciences of the United States of Amer-ica , (33), E4512–21. https://doi.org/10.1073/pnas.1419828112Eslami, M., Vaccaro, K., Lee, M. K., Elazari Bar On, A., Gilbert, E., &Karahalios, K. User Attitudes towards Algorithmic Opacity and Trans-parency in Online Reviewing Platforms (S. Brewster, G. Fitzpatrick,A. Cox, & V. Kostakos, Eds.). In: In Proceedings of the 2019 CHIConference on Human Factors in Computing Systems - CHI ’19 (S.Brewster, G. Fitzpatrick, A. Cox, & V. Kostakos, Eds.). Ed. by Brew-ster, S. New York, New York, USA: ACM Press, 2019, 1–14. isbn :9781450359702. https://doi.org/10.1145/3290605.3300724.108 ibliography

Journal of economic perspectives , Detecting and correcting potentialerrors in User Behavior: G06N7/00 ibliography users.

New Media & Society , (4), 597–615. https://doi.org/10.1177/1461444815614053Gasser, U. (2006). Regulating search engines: Taking stock and looking ahead. Yale Journal of Law and Technology , (1), 201. Retrieved Decem-ber 27, 2019, from https://digitalcommons.law.yale.edu/cgi/viewcontent.cgi?article=1028&context=yjoltGauzente, C. (2010). The intention to click on sponsored ads—A study of therole of prior knowledge and of consumer proﬁle. Journal of Retailingand Consumer Services , (6), 457–463. https://doi.org/10.1016/j.jretconser.2010.06.002Geiger, R. S. (2014). Bots, bespoke, code and the materiality of software plat-forms. Information, Communication & Society , Media technologies: Essayson communication, materiality, and society , .Glaser, T. (2009). Die Rolle der Informatik im gesellschaftlichen Diskurs: EineNeupositionierung der Informatik. Informatik-Spektrum , (3), 223–227. https://doi.org/10.1007/s00287-009-0324-yGoel, S., Hofman, J. M., & Sirer, M. I. (2012). Who Does What on the Web:A Large-Scale Study of Browsing Behavior. Sixth International AAAIConference on Weblogs and Social Media

Yale Journal of Law and Technology , ibliography ibliography TheInformation Society , (5), 364–374. https://doi.org/10.1080/01972243.2010.511560Granka, L., Joachims, T., & Gay, G. (2004). Eye-Tracking Analysis of User Be-haviour in WWW Search. Proceedings of the 27th annual internationalACM SIGIR conference on Research and development in informa-tion retrieval

NYL Sch. L. Rev. , , 939.Grimmelmann, J. (2010). Some Skepticism About Search Neutrality. The nextdigital decade: Essays on the future of the Internet , 435–459. RetrievedDecember 27, 2019, from https://digitalcommons.law.umaryland.edu/cgi/viewcontent.cgi?article=2421&context=fac pubsGrimmelmann, J. (2013). What to do about Google?

Communications of theACM , (9), 28–30. https://doi.org/10.1145/2500129Grimmelmann, J. (2014). Speech engines. Minnesota Law Review , , 868.https://scholarship.law.umn.edu/mlr/299Grimmelmann, J. (2017). The Structure of Search Engine Law. Iowa LawReview , , 3–63. Retrieved January 1, 2020, from https : / / digital112 ibliography commons . law . umaryland . edu / cgi / viewcontent . cgi ? article = 2416 &context=fac pubsGrimmelmann, J. (2018). The Platform is the Message. Georgetown Law Tech-nology Review , (Forthcoming) , 18–30. Retrieved January 1, 2020, fromhttps://papers.ssrn.com/sol3/papers.cfm?abstract id=3132758Grohol, J. M. (2018). Emotional Contagion on Facebook? More Like BadResearch Methods. Retrieved January 31, 2020, from https://psychcentral.com/blog/emotional- contagion- on- facebook- more- like- bad-research-methods/Grunwald, A. (2000). Technik f¨ur die Gesellschaft von morgen: M¨oglichkeitenund Grenzen gesellschaftlicher Technikgestaltung . Campus.Grunwald, A. (2002).

Technikfolgenabschatzung - Eine Einfuhrung . editionsigma.Gubin, M., Sung, S., Bharat, K., & Dauber, K. W. (2016).

Entity identiﬁcationmodel training (No. 9,251,141). Retrieved December 30, 2019, fromhttp://patft.uspto.gov/netacgi/nph- Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=2&f =G&l=50&co1=AND&d=PTXT&s1=9, 251, 141&OS=9, 251,141&RS=9,251,141Guha, S., Cheng, B., & Francis, P. Challenges in measuring online advertisingsystems. In:

Proceedings of the 10th ACM SIGCOMM conference onInternet measurement . 2010, 81–87.Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi,D. (2018). A Survey of Methods for Explaining Black Box Models.

ACM Comput. Surv. , (5). https://doi.org/10.1145/3236009Gupta, R., Sun, S., Blitzer, J., Lin, D., & Gabrilovich, E. (2014). Question an-swering to populate knowledge base (No. 10,108,700). Retrieved Decem-ber 30, 2019, from http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=10,108,700&OS=10,108,700&RS=10,108,700Habermas, J. (1968). Technik und Wissenschaft als”Ideologie”?

Man adn World , , 483–523.Halevy, A. Y., Wu, F., Whang, S. E., & Gupta, R. (2018). Identifying entityattributes (No. 9,864,795). Retrieved December 30, 2019, from http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f =G&l=50&s1=9,864,795.PN.&OS=PN/9,864,795&RS=PN/9,864,795Hannak, A., Sapiezynski, P., Molavi Kakhki, A., Krishnamurthy, B., Lazer, D.,Mislove, A., & Wilson, C. Measuring personalization of web search. In:

Proceedings of the 22nd international conference on World Wide Web .2013, 527–538.Hannak, A., Soeller, G., Lazer, D., Mislove, A., & Wilson, C. Measuring PriceDiscrimination and Steering on E-commerce Web Sites (C. Williamson,A. Akella, & N. Taft, Eds.). In: In

Proceedings of the 2014 Conference ibliography on Internet Measurement Conference - IMC ’14 (C. Williamson, A.Akella, & N. Taft, Eds.). Ed. by Williamson, C. New York, New York,USA: ACM Press, 2014, 305–318. isbn : 9781450332132. https://doi.org/10.1145/2663716.2663744.Hargittai, E., & Marwick, A. (2016). “What Can I Really Do?” Explainingthe Privacy Paradox with Online Apathy.

International Journal ofCommunication , .Hasso-Plattner-Institut. (2020). Data Donation Pass. Retrieved February 7,2020, from https : / / we . analyzegenomes . com / apps / data - donation -pass/Heaven, D. (2013). Not like us: artiﬁcial minds we can’t understand. NewScientist , (2929), 32–35. https://doi.org/10.1016/S0262-4079(13)61996-XHenry, J. W. (2013). Providing Knowledge Panels with Search Results: G06F7/30

Journal of medical Internet research , (2),E7. https://doi.org/10.2196/jmir.4.2.e7Hu, J., Zeng, H.-J., Li, H., Niu, C., & Chen, Z. Demographic prediction basedon user’s browsing behavior. In: Proceedings of the 16th internationalconference on World Wide Web

The Ethics of MedicalData Donation (pp. 23–54).Huynh, D. F., Chung, G., Zhou, C., Huang, Y., & Guanghua, L. (2014).

Rank-ing Search Results based on Entity Measures: G06F 17/30 (2006.01)

Science,Technology, & Human Values , (1), 17–49. https://doi.org/10.1177/0162243915587360114 ibliography Informa-tion Processing & Management , (3), 1251–1266. https://doi.org/10.1016/j.ipm.2007.07.015Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radliski, F., & Gay, G.(2007). Evaluating the Accuracy of Implicit Feedback from Clicks andQuery Reformulations in Web Search. ACM Transactions on Informa-tion Systems (TOIS) , (2).Jouhki, J., Lauk, E., Penttinen, M., Sormanen, N., & Uskali, T. (2016). Face-book’s Emotional Contagion Experiment as a Challenge to ResearchEthics. Media and Communication , (4), 75. https : / / doi . org / 10 .17645/mac.v4i4.579Kahnemann, D., & Tversky, A. (1984). Choices, Values, and Frames. AmericanPsychologist , (4), 341–350.Kang, R., Dabbish, L., Fruchter, N., & Kiesler, S. “My Data Just Goes Ev-erywhere:” User Mental Models of the Internet and Implications forPrivacy and Security. In: Eleventh Symposium On Usable Privacy andSecurity (5SOUPS6 2015) . 2015, 39–52.Kay, M., Matuszek, C., & Munson, S. A. Unequal representation and genderstereotypes in image search results for occupations. In:

Proceedings ofthe 33rd Annual ACM Conference on Human Factors in ComputingSystems

Proceedings of the 5th Symposium on Usable Privacy andSecurity . 2009, 4.Kelley, P. G., Cesca, L., Bresee, J., & Cranor, L. F. (2010). StandardizingPrivacy Notices: An Online Study of the Nutrition Label Approach.

Proceedings of the SIGCHI Conference on Human factors in Comput-ing Systems (pp. 1573–1582).Kienle, A. (2003).

Integration von Wissensmanagement und kollaborativemLernen durch technisch unterst¨utzte Kommunikationsprozesse

Informatik und Gesellschaft: Eine sozio-technische Perspektive . De Gruyter / Oldenburg. 115 ibliography kim tami, t., Barasz, K., & John, L. K. (2019). Why am I seeing this ad? Theeﬀect of ad transparency on ad eﬀectiveness.

Journal of ConsumerResearch , (5), 906–932. https://doi.org/10.1093/jcr/ucy039Kitchin, R. (2017). Thinking critically about and researching algorithms. In-formation, Communication & Society , (1), 14–29. https://doi.org/10.1080/1369118X.2016.1154087Klymenko, I. (2012). Autopoiesis. In O. Jahraus, A. Nassehi, M. Grizelj, I.Saake, C. Kirchmeier, & J. M¨uller (Eds.), Luhmann-Handbuch (pp. 69–71). J.B. Metzler.Kneer, G., & Nassehi, A. (1993).

Niklas Luhmanns Theorie sozialer Systeme:Eine Einf¨uhrung (Vol. 1751). W. Fink.Knuth, D. E. (1968).

The art of computer programmingme 1-AW (1968): Vol-ume 1 / Fundamental Algorithms . Addison-Wesley.Kooiman, J. (2008). Exploring the Concept of Governability.

Journal of Com-parative Policy Analysis: Research and Practice , (2), 171–190. https://doi.org/10.1080/13876980802028107Kooiman, J., & Bavinck, M. (2013). Theorizing Governability – The Inter-active Governance Perspective. In M. Bavinck, R. Chuenpagdee, S.Jentoft, & J. Kooiman (Eds.), Governability of Fisheries and Aqua-culture: Theory and Applications (pp. 9–30). Springer Netherlands.https://doi.org/10.1007/978-94-007-6107-0 {/ textunderscore } . 2010, 447–462.Kowalski, R. (1979). Algorithm = logic + control. Communications of theACM , (7), 424–436. https://doi.org/10.1145/359131.359136Kraﬀt, T. D., Gamer, M., & Zweig, K. A. (2017). What did you see? Per-sonalization, regionalization and the question of the ﬁlter bubble inGoogle’s search engine. Proceedings of ACM Conference, Washington,DC, USA, July 2017 . http://arxiv.org/pdf/1812.10943v1Kraﬀt, T. D., Hauer, M. P., & Zweig, K. A. (2020). Why do we need bots?What prevents society from detecting biases in recommendation sys-tems.Kramer, A. D. I., Guillory, J. E., & Hancock, J. T. (2014). Experimentalevidence of massive-scale emotional contagion through social networks.

Proceedings of the National Academy of Sciences of the United Statesof America , (24), 8788–8790. https://doi.org/10.1073/pnas.1320040111Krishnamurthy, B., & Wills, C. E. (2006). Generating a Privacy Footprint onthe Internet. Proceedings of the 6th ACM SIGCOMM conference onInternet measurement ∼ cew/papers/imc06.pdfKrishnamurthy, B., & Wills, C. E. (2009a). On the Leakage of PersonallyIdentiﬁable Information Via Online Social Networks. Proceedings ofthe 2nd ACM workshop on Online social networks (pp. 7–12). ACM.116 ibliography

Proceedings of the 18th international confer-ence on World Wide Web

Proceedingsof the 3rd Symposium on Usable Privacy and Security . SOUPS ’07.New York, NY, USA: ACM, 2007, 52–63. isbn

Facilitating Computer Supported Cooperative Work withSocio-Technical Self-Descriptions .Lambiotte, R., & Kosinski, M. (2014). Tracking the Digital Footprints of Per-sonality.

Proceedings of the IEEE , (12), 1934–1939. https://doi.org/10.1109/JPROC.2014.2359054Lane, N. D., Xu, Y., Lu, H., Hu, S., Choudhury, T., Campbell, A. T., &Zhao, F. Enabling large-scale human activity inference on smartphonesusing community similarity networks (csn). In: Proceedings of the 13thinternational conference on Ubiquitous computing . 2011, 355–364.Laperdrix, P., Bielova, N., Baudry, B., & Avoine, G. (2019). Browser Finger-printing: A survey. http://arxiv.org/pdf/1905.01051v2Larson, J., & Shaw, A. (2012). Message Machine: Reverse Engineering the2012 Campagin. Retrieved January 29, 2020, from https://projects.propublica.org/emails/Larson, J., Mattu, S., Kirchner, L., & Angwin, J. (2016). How we analyzedthe COMPAS recidivism algorithm.

ProPublica

Reassembling the social: An introduction to actor-network-theory . Oxford University Press.Law, J., & Lien, M. E. (2013). Slippery: Field notes in empirical ontology.

Social Studies of Science , (3), 363–378. https://doi.org/10.1177/0306312712456947Lawrence, S. R. (2010). Personalization of web search results using term, cat-egory and link-based user proﬁles (No. 2010/0228715). 117 ibliography

Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable ofGoogle Flu: traps in big data analysis.

Science (New York, N.Y.) , (6176), 1203–1205.L´ecuyer, M., Ducoﬀe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chain-treau, A., & Geambasu, R. Xray: Enhancing the web’s transparencywith diﬀerential correlation. In: . 2014, 49–64.Lessig, L. (2006). Code

Health informaticsjournal , (4), 804–814. https://doi.org/10.1177/1460458215595851Liu, B., Sheth, A., Weinsberg, U., Chandrashekar, J., & Govindan, R. (2013).AdReveal: Improving Transparency Into Online Targeted Advertising. Liu, Bin, et al. ”AdReveal: improving transparency into online targetedadvertising.” Proceedings of the Twelfth ACM Workshop on Hot Topicsin Networks .Liu, V., Musen, M. A., & Chou, T. (2015). Data breaches of protected healthinformation in the United States.

Jama , (14), 1471–1473.Lorigo, L., Pan,Bing,Hembrooke,Helene, Joachims, & Thorsten, Granka, Laura,Gay, Geri. (2006). The inﬂuence of Task and Gender on Search andEvaluation Behavior using Google. Information Processing and Man-agement , Proceedings of the 2012 ACM Conference on Ubiquitous Computing .2012, 351–360.Lu, W. L., Savenkov, D., Subramanya, A., Dalton, J., Gabrilovich, E., &Agichtein, E. (2019).

Information extraction from question and answerwebsites: G06F 17/2705 (No. 10,452,694). Retrieved December 30,2019, from http://patft.uspto.gov /netacgi/nph- Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=10,452,694.PN.&OS=PN/10,452,694&RS=PN/10,452,694Luhmann, N. (1984).

Soziale Systeme: Grundriß einer allgemeinen Theorie (Vol. 666). Suhrkamp.Luhmann, N. (1998).

Die Gesellschaft der Gesellschaft 1 (Vol. 1360). Suhrkamp.Luhmann, N. (2000).

Organisation und Entscheidung . Westdeutscher Verlag.118 ibliography

Lupton, D. (2012). M-health and health promotion: The digital cyborg andsurveillance society.

Social Theory & Health , (3), 229–244. https ://doi.org/10.1057/sth.2012.6Lysaght, T., Lipworth, W., Hendl, T., Kerridge, I., Lee, T.-L., Munsie, M.,Waldby, C., & Stewart, C. (2017). The deadly business of an unregu-lated global stem cell industry. Journal of medical ethics , (11), 744–746. https://doi.org/10.1136/medethics-2016-104046Lysaght, T., Munsie, M., Hendl, T., Tan, L., Kerridge, I., & Stewart, C. (2018).Selling stem cells with tokens of legitimacy: An analysis of websites inJapan and Australia. Cytotherapy , (5), S77–S78. https://doi.org/10.1016/j.jcyt.2018.02.218Mackey, T. K., Cuomo, R. E., & Liang, B. A. (2015). The rise of digital direct-to-consumer advertising?: Comparison of direct-to-consumer advertis-ing expenditure trends from publicly available data sources and globalpolicy implications. BMC Health Services Research , (1), 236. https://doi.org/10.1186/s12913-015-0885-1Madden, A. D. (2000). A deﬁnition of information. Aslib Proceedings , (9),343–350.Mager, A. (2012). Algorithmic Ideology. Information, Communication & So-ciety , Cellstem cell , (3), 267–270. https://doi.org/10.1016/j.stem.2014.08.009Matz, S. C., Kosinski, M., Nave, G., & Stillwell, D. J. (2017). Psychological tar-geting as an eﬀective approach to digital mass persuasion. Proceedingsof the National Academy of Sciences of the United States of America , . IEEE, 2012,413–427. isbn : 978-1-4673-1244-8. https://doi.org/10.1109/SP.2012.47. 119 ibliography Mayr, K. (2012). Geschlossenheit / Oﬀenheit. In O. Jahraus, A. Nassehi,M. Grizelj, I. Saake, C. Kirchmeier, & J. M¨uller (Eds.),

Luhmann-Handbuch (pp. 84–86). J.B. Metzler.McCoy, T. H., & Perlis, R. H. (2018). Temporal trends and characteristicsof reportable health data breaches, 2010-2017.

Jama , (12), 1282–1284.McCreadi, M., & Rice, R. E. (1999). Trends in analyzing access to informa-tion. Part I: cross-disciplinary conceptualizations of access. Informa-tion Processing and Management , (1), 45–76. https://doi.org/10.1016/S0306-4573(98)00037-5MDN contributors. (2019). Anatomy of an extension. Retrieved February 4,2020, from https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Anatomy of a WebExtensionMeadow, C. T., & Yuan, W. (1997). Measuring the impact of information:Deﬁning the concepts. Information Processing & Management , (6),697–714. https://doi.org/10.1016/S0306-4573(97)00042-3Mensch, G. (1980). Ist die technische Entwicklung ganz oder teilweise vorpro-grammiert?

Proceedings of the 11th ACMWorkshop on Hot Topics in Networks . 2012, 79–84.Miller, L. M. S., & Bell, R. A. (2012). Online health information seeking: theinﬂuence of age, information trustworthiness, and search challenges.

Journal of aging and health , (3), 525–541. https://doi.org/10.1177/0898264311428167Mistree, B. F. (2009). Gaydar: Facebook friendships expose sexual orientation. ﬁrst monday , (10). Retrieved January 9, 2020, from http://firstmonday.org/ojs/index.php/fm/rt/printerFriendly/%202611/2302Moz Resources. (2019). Google Algorithm Update History. Retrieved Decem-ber 30, 2019, from https://moz.com/google-algorithm-changeMukherjee, A., eds. What Yelp Fake Review Filter Might Be Doing?

Chicago.IL,2013.Munsie, M., Lysaght, T., Hendl, T., Tan, H.-Y. L., Kerridge, I., & Stewart,C. (2017). Open for business: a comparative study of websites sellingautologous stem cells in Australia and Japan.

Regenerative medicine .https://doi.org/10.2217/rme-2017-0070Muthukrishnan, S. Ad Exchanges: Research Issues. In:

Proceedings of the 5thInternational Workshop on Internet and Network Economics . WINE120 ibliography ’09. Berlin, Heidelberg: Springer-Verlag, 2009, 1–12. isbn : 9783642108402.https://doi.org/10.1007/978-3-642-10841-9 {/ textunderscore } Computing and Combinatorics (H. Q. Ngo, Ed.). Ed.by Ngo, H. Q. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009,1–6. isbn : 978-3-642-02882-3.Muthukrishnan, S. (20010).

Advertisement Slot Conﬁguration: 70.5/14.71 (No. 2010/0198694).Retrieved January 25, 2020, from https://patentimages.storage.googleapis.com/f1/20/ee/05f34af637acd3/US20100198694A1.pdfNagy, A., & Quaggin, S. E. (2010). Stem cell therapy for the kidney: a cau-tionary tale.

Journal of the American Society of Nephrology : JASN , (7), 1070–1072. https://doi.org/10.1681/ASN.2010050559Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of largedatasets (how to break anonymity of the Netﬂix prize dataset). Uni-versity of Texas at Austin .Nguyen, G. (2019). The 2019 search engine patents you need to know about.Retrieved December 30, 2019, from https : / / searchengineland . com /the-2019-search-engine-patents-you-need-to-know-about-326964Novas, C. (2006). The Political Economy of Hope: Patients’ Organizations,Science and Biovalue.

BioSocieties , (3), 289–305. https://doi.org/10.1017/S1745855206003024Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissectingracial bias in an algorithm used to manage the health of populations. Science , (6464), 447–453.O’Donnell, K., & Cramer, H. People’s Perceptions of Personalized Ads (A.Gangemi, S. Leonardi, & A. Panconesi, Eds.). In: In Proceedings ofthe 24th International Conference on World Wide Web - WWW ’15Companion (A. Gangemi, S. Leonardi, & A. Panconesi, Eds.). Ed. byGangemi, A. New York, New York, USA: ACM Press, 2015, 1293–1298. isbn : 9781450334730. https://doi.org/10.1145/2740908.2742003.O’Donnell, L., Turner, L., & Levine, A. D. (2016). Part 6: The role of com-munication in better understanding unproven cellular therapies.

Cy-totherapy , Weapons of math destruction: How big data increases in-equality and threatens democracy (First paperback edition). B/D/W/YBroadway Books.Otterbacher, J., Bates, J., & Clough, P. Competent Men and Warm Women(G. Mark, S. Fussell, C. Lampe, m. schraefel m.c, J. P. Hourcade, C.Appert, & D. Wigdor, Eds.). In: In

Proceedings of the 2017 CHI Con-ference on Human Factors in Computing Systems - CHI ’17 (G. Mark,121 ibliography

S. Fussell, C. Lampe, m. schraefel m.c, J. P. Hourcade, C. Appert, &D. Wigdor, Eds.). Ed. by Mark, G. New York, New York, USA: ACMPress, 2017, 6620–6631. isbn : 9781450346559. https : / / doi . org / 10 .1145/3025453.3025727.Pan, B., Hembrooke, H., Joachims, T., Lorigo, L., Gay, G., & Granka, L.(2007). In Google We Trust: Users’ Decisions on Rank, Position, andRelevance.

Journal of Computer-Mediated Communication , ACM Trans-actions on the Web (TWEB) , (1), 1–47. https://doi.org/10.1145/2996466Pasca, A. M., & van Durme, B. (2014). Inferring attributes from search queries (No. 8,812,509). Retrieved December 30, 2019, from https://patentimages.storage.googleapis.com/49/d2/99/98e0d54a1e7b45/US8812509.pdfPasca, M., & van Durme, B. (2013).

Extracting semantic classes and instancesfrom text: G06F 7/00;G06F 17/30 (No. 8,510,308). Retrieved Decem-ber 30, 2019, from http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=5&f =G&l=50&co1=AND&d=PTXT&s1=8,510,308&OS=8,510,308&RS=8,510,308Pasquale, F. (2008). Internet Nondiscrimination Principles: Commercial Ethicsfor Carriers and Search Engines.

University of Chicago Legal Forum , (1). Retrieved December 29, 2019, from http://chicagounbound.uchicago.edu/uclf /vol2008/iss1/6?utm source=chicagounbound.uchicago.edu%2Fuclf %2Fvol2008%2Fiss1%2F6&utm medium=PDF&utm campaign=PDFCoverPagesPasquale, F. (2010). Beyond Innovation and Competition: The Need for Qual-iﬁed Transparency in Internet Intermediaries. Northwestern universityLaw Review , (1). https://digitalcommons.law.umaryland.edu/cgi/viewcontent.cgi?article=2348&context=fac pubsPeddinti, R. V. M. K., & Dabbiru, L. K. (2017). Guided Purchasing via Smart-phone

Health ibliography (London, England : 1997) , (4), 478–494. https://doi.org/10.1177/1363459318815944Petersen, A., Tanner, C., & Munsie, M. (2019). Citizens’ use of digital mediato connect with health care: Socio-ethical and regulatory implications. Health (London, England : 1997) , (4), 367–384. https://doi.org/10.1177/1363459319847505Petersen, A., Seear, K., & Munsie, M. (2013). Therapeutic journeys: the hope-ful travails of stem cell tourists. Sociology of health & illness , (5),670–685. https://doi.org/10.1111/1467-9566.12092Petersen, A., Munsie, M., Tanner, C., MacGregor, C., & Brophy, J. (2017). Stem Cell Tourism and the Political Economy of Hope

Artiﬁcial intelligence: foundationsof computational agents . Cambridge University Press.Poole, D., Mackworth, A., & Goebel, R. (1998).

Computational Intelligence:A Logical Approach . Oxford University Press.Prainsack, B. (2019). Data Donation: How to Resist the iLeviathan. In J.Krutzinna & L. Floridi (Eds.),

The Ethics of Medical Data Donation ∼ Berkeley Technology Law Journal , ibliography Ropohl, G. (1983). A critique of technological determinism. In P. T. Durbin(Ed.),

Philosophy and Technology (pp. 83–96). Springer Netherlands.Ropohl, G. (2013). SCHELSKY Helmut. Der Mensch in der wissenschaftlichenZivilisation, 1961. In C. Hubig, A. Huning, & G. Ropohl (Eds.),

Nach-denken ¨uber Technik: Die Klassiker der Technikphilosophie und neuereEntwicklungen / 3., neu bearbeitete und erweiterte Auﬂage — Darmst¨adterAusgabe (pp. 342–345). Nomos Verlagsgesellschaft mbH & Co. KG.https://doi.org/10.5771/9783845269238-342Rose, D. E., & Levinson, D. Understanding user goals in web search. In:

Proceedings of the 13th international conference on World Wide Web

Journal of Cultural Economy , (1), 1–13. https://doi.org/10.1080/17530350.2019.1574866Ryan, K. A., Sanders, A. N., Wang, D. D., & Levine, A. D. (2010). Trackingthe rise of stem cell tourism. Regenerative medicine , (1), 27–33. https://doi.org/10.2217/rme.09.70Sandvig, C., Hamilton, K., Karahalios, K., & Langbort, C. Auditing Algo-rithms: Research Methods for Detecting Discrimination on InternetPlatforms. In: Data and Discrimination: Converting Critical Concernsinto Productive . 2014.Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. The risk of racial biasin hate speech detection. In:

Proceedings of the 57th Annual Meetingof the Association for Computational Linguistics . 2019, 1668–1678.Saurwein, Florian, & Natascha Just und Michael Latzer. (2017). Algorith-mische Selektion im Internet: Risiken und Governance automatisierterAuswahlprozesse. HIMS 2017 . CSREA Press.Schelsky, H. (1961). Der Mensch in der wissenschaftlichen Zivilisation. InArbeitsgemeinschaft f¨ur Forschung des Landes Nordrhein-Westfalen124 ibliography (Ed.),

Der Mensch in der wissenschaftlichen Zivilisation

Journal of Marketing , Media in Transition . Retrieved Jan-uary 22, 2020, from https://digitalsts.net/wp-content/uploads/2019/03/26 Knowing-Algorithms.pdfSeaver, N. (2017). Algorithms as culture: Some tactics for the ethnographyof algorithmic systems.

Big Data & Society , (2), 205395171773810.https://doi.org/10.1177/2053951717738104Semturs, C., Vandevenne, L., Sinopalnikov, D., Lyashuk, A., Steiger, S., Grimm,H., Scharli, N. M., & Lecomte, D. (2015). Computerized systems andmethods for extracting and storing information regarding entities: G06F17/00 (No. 10,198,491). Retrieved December 30, 2019, from http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f =G&l=50&s1=10,198,491.PN.&OS=PN/10,198,491&RS=PN/10,198,491Shah, S. (2019). Gain more insight into your bid strategy with top signals.Retrieved December 30, 2019, from https://support.google.com/google-ads/answer/9644171Shannon, C. E. (1948). A mathematical theory of communication.

Bell systemtechnical journal , (3), 379–423.Shaw, D. (2016). Facebook’s ﬂawed emotion experiment: Antisocial researchon social network users. Research Ethics , (1), 29–34.Shneiderman, B., Byrd, D., & Croft, W. B. (1997). A User-Interface Frame-work for Text Seaches. D-Lib Magazine , ibliography Science translational medicine , , 1–5. https://doi.org/10.1126/scitranslmed.aag0426Skinner, B. F. (1938). The Behavior Of Organisms: An Experimental Analysis

Re-generative medicine , (4), 375–384. https://doi.org/10.2217/rme-2018-0007Snyder, J., & Turner, L. (2019). Crowdfunding for stem cell-based interven-tions to treat neurologic diseases and injuries. Neurology , (6), 252–258. https://doi.org/10.1212/WNL.0000000000007838Sommerville, I. (2016). Software Engineering (10th ed.). Pearson.Speicher, T., Ali, M., Venkatadri, G., Ribeiro, F., Arvanitakis, G., Benevenuto,F., Gummadi, K. P., Loiseau, P., & Mislove, A. Potential for Discrim-ination in Online Targeted Advertising. In:

FAT 2018 - Conferenceon Fairness, Accountability, and Transparency . ibliography statcounter. (2019a). Browser Market Share Worldwide. Retrieved January 21,2020, from https://gs.statcounter.com/browser-market-sharestatcounter. (2019b). Search Engine Market Share Worldwide. Retrieved Novem-ber 29, 2019, from https://gs.statcounter.com/search-engine-market-share/allSteel, E., & Angwin, J. (2010). On the web’s cutting edge, anonymity in nameonly. The Wall Street Journal , Computersin Human Behavior , , 992–1000. https://doi.org /10.1016/j.chb.2015.09.038Stoker, G. (1995). Governance as theory: ﬁve propositions. International SocialScience Journal , (155), 17–28.Sullivan, D. (2008). Google.com Finally Gets Google Suggest Feature. https://searchengineland.com/googlecom-finally-gets-google-suggest-feature-14626Sullivan, D. (2013). FAQ: All About The New Google “Hummingbird” Algo-rithm. Retrieved December 30, 2019, from https://searchengineland.com/google-hummingbird-172816Sullivan, D. (2016a). FAQ: All about the Google RankBrain algorithm. Re-trieved December 30, 2019, from https://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440Sullivan, D. (2016b). Google now handles at least 2 trillion searches per year.Retrieved December 30, 2019, from https : / / searchengineland . com /google-now-handles-2-999-trillion-searches-per-year-250247Sweeney, L. (2013). Discrimination in Online Ad Delivery. Retrieved Jan-uary 2, 2020, from http : / / dataprivacylab . org / projects / onlineads /1071-1.pdfTanner, A. (2018). Our Bodies, Our Data: How Companies Make BillionsSelling Our Medical Records . Beacon Press.Tanner, C., Munsie, M., Sipp, D., Turner, L., & Wheatland, C. (2019). Thepolitics of evidence in online illness narratives: An analysis of crowd-funding for purported stem cell treatments.

Health (London, England: 1997) , (4), 436–457. https://doi.org/10.1177/1363459319829194Taylor-Weiner, H., & Graﬀ Zivin, J. (2015). Medicine’s Wild West–UnlicensedStem-Cell Clinics in the United States. The New England journal ofmedicine , ibliography User Modeling and User-Adapted Interaction , (1-2), 203–220. https://doi.org/10.1007/s11257-011-9110-zTrist, E. L., & Bamforth K.W. (1954). Some Social and Psychological Conse-quences of the Longwall Method of Coal-Getting: An examination ofthe psychological situation and defences of a work group in relationto the social structure and technological content of the work system. Human Relations , (1), 3–38. Retrieved January 2, 2020, from https://journals.sagepub.com/doi/pdf/10.1177/001872675100400101Tufekci, Z. (2014a). Algorithmic Harms beyond Facebook and Google: Emer-gent Challenges of COmputational Agency. Colorado Technology LawJournal , (13). Retrieved December 24, 2019, from https://heinonline.org/HOL/LandingPage?handle=hein.journals/jtelhtel13&div=18&id=&page=Tufekci, Z. (2014b). Engineering the public: Big data, surveillance and com-putational politics. (7). https://firstmonday .org /ojs/index.php/fm/article/view/4901Turing, A. M. (2009). Computing Machinery and Intelligence. In R. Epstein,G. Roberts, & G. Beber (Eds.), Parsing the Turing Test: Philosophicaland Methodological Issues in the Quest for the Thinking Computer (pp. 23–65). Springer Netherlands. https://doi.org/10.1007/978- 1-4020-6710-5 {/ textunderscore } BioSocieties , (3), 303–325.https://doi.org/10.1017/S1745855207005765Turner, L. (2015). US stem cell clinics, patient safety, and the FDA. Trendsin molecular medicine , (5), 271–273. https://doi.org /10.1016/j.molmed.2015.02.008Turner, L. (2017). ClinicalTrials.gov, stem cells and ’pay-to-participate’ clin-ical studies. Regenerative medicine , (6), 705–719. https://doi.org/10.2217/rme-2017-0015Turner, L. (2018). The US Direct-to-Consumer Marketplace for AutologousStem Cell Interventions. Perspectives in biology and medicine , (1),7–24. https://doi.org/10.1353/pbm.2018.0024Turner, L., & Knoepﬂer, P. (2016). Selling Stem Cells in the USA: Assessingthe Direct-to-Consumer Industry. Cell stem cell , Proceedings ofthe eigth SOUPS 2012 . ACM Press, 2012, 4.US Code 18 § ibliography US Code 47 § Journal of Computer-MediatedCommunication , (3), 866–887. https : / / doi . org / 10 . 1111 / j . 1083 -6101.2007.00354.xVaught, J., & Lockhart, N. C. (2012). The evolution of biobanking best prac-tices. Clinica chimica acta; international journal of clinical chemistry , (19-20), 1569–1575. https://doi.org/10.1016/j.cca.2012.04.030Veltri, G. A., & Ivchenko, A. (2017). The impact of diﬀerent forms of cognitivescarcity on online privacy disclosure. Computers in Human Behavior , Journal of Law, Economics & Policy , ,883–889.von Hilgers, P. (2011). The History of the Black Box: The Clash of a Thingand its Concept. Cultural Politics: an International Journal , (1), 41–58. https://doi.org/10.2752/175174311X12861940861707Watzlawick, P., Beavin, J. H., & Jackson, D. D. (2007). Menschliche Kommu-nikation (11th ed.). Huber.Weber, I., & Jaimes, A. Who Uses Web Search for What: And How. In:

Pro-ceedings of the Fourth ACM International Conference on Web Searchand Data Mining . WSDM ’11. New York, NY, USA: Association forComputing Machinery, 2011, 15–24. isbn

Simulation and Similarity . Oxford University Press.Weiss, D. J., Turner, L., Levine, A. D., & Ikonomou, L. (2018). Medical so-cieties, patient education initiatives, public debate and marketing ofunproven stem cell interventions.

Cytotherapy , (2), 165–168. https://doi.org/10.1016/j.jcyt.2017.10.002 129 ibliography Whittaker, A., Manderson, L., & Cartwright, E. (2010). Patients without bor-ders: understanding medical travel.

Medical anthropology , (4), 336–343. https://doi.org/10.1080/01459740.2010.501318Willis, C. E., & Tatar, C. (2012). Understanding What They Do with WhatThey Know. https://digitalcommons.wpi.edu/computerscience-pubs/6World WIde Web Foundation. (2017). Algorithmic Accountability: Applyingthe concept to diﬀerent country context. Retrieved December 11, 2019,from https://webfoundation.org/docs/2017/07/Algorithms ReportWF.pdfWu, X., Yan, J., Liu, N., Yan, S., Chen, Y., & Chen, Z. Probabilistic La-tent Semantic User Segmentation for Behavioral Targeted Advertising.In: Proceedings of the Third International Workshop on Data Miningand Audience Intelligence for Advertising . ADKDD ’09. New York,NY, USA: Association for Computing Machinery, 2009, 10–17. isbn :9781605586717. https://doi.org/10.1145/1592748.1592751.Wu, Y., Thakur, K. M., Hylton, J., & Weissman, D. (2013).

Searching Contentof Prominent Users in Social Networks: G06F 7/30 (US 2016/0246789A1). Retrieved December 30, 2019, from https://patentimages.storage.googleapis.com/d6/a5/30/a1a539a974bb93/US20160246789A1.pdfYan, J., Liu, N., Wang, G., Zhang, W., Jiang, Y., & Chen, Z. How muchcan behavioral targeting help online advertising? In:

Proceedings ofthe 18th international conference on World wide web . 2009, 261–270.Retrieved January 13, 2020, from https://dl.acm.org/doi/10.1145/1526709.1526745Yuan, S., Abidin, A. Z., Sloan, M., & Wang, J. (2012). Internet advertising: Aninterplay among advertisers, online publishers, ad exchanges and webusers. arXiv preprint arXiv:1206.1754 . Retrieved January 11, 2020,from https://arxiv.org/pdf/1206.1754.pdfZamir, O. E., Korn, J. L., Fikes, A. B., & Lawrence, S. R. (2010).

Person-lization of placed Content ordering in Search Engines (US 7.693,827B2). Retrieved December 30, 2019, from https://patentimages.storage.googleapis.com/9c/ce/41/25912234856199/US7693827.pdfZarzeczny, A., Tanner, C., Barfoot, J., Blackburn, C., Couturier, A., & Mun-sie, M. (2019). Contact us for more information: an analysis of publicenquiries about stem cells.

Regenerative medicine , (12), 1137–1150.https://doi.org/10.2217/rme-2019-0092Zittrain, J. (2014). Engineering an Election: Digital gerrymandering poses athreat to democracy. Harvard Law Review , ibliography Zweig, K. A., Fischer, S., & Lischka, K. (2018). Wo Maschinen irren k¨onnen:Verantwortlichkeiten und Fehlerquellen in Prozessen algorithmischerEntscheidungsﬁndung. https://doi.org/10.11586/2018006Zweig, K. A., Wenzelburger, G., & Kraﬀt, T. D. (2018). On Chances and Risksof Security Related Algorithmic Decision Making Systems.

EuropeanJournal for Security Research , (2), 181–203. https : / / doi . org / 10 .1007/s41125-018-0031-2 131 . EuroStemCell Data Donation:Development A.1. My Code

The full code of the plugin is obtainable from https://github.com/AALAB-TUKL/EuroStemCell-data-donation.

A.2. User Story

A patient of Parkinson’s disease, Multiple Sclerosis or Diabetes perceives aninformation need. She wants to inform herself about the condition and therespective medical perspectives, especially in the ﬁeld of stem cell-related med-ical applications. She decides to consult the Internet. She uses a search engineto ﬁnd the most relevant website that answer her questions. Then she reviewsadvertisements, search results and top stories on the website to gather infor-mation and educate herself as a basis of future decisions with respect to clinicaltreatments and therapies. 133 ppendix A: EuroStemCell Data Donation: Development

A.3. Product Backlog

Below, the requirements to the plugin are listed, ordered by priority and thus,order of implementation;1. Client-Server architecture with browser plugin dedicated to data collec-tion and a web-server concerned with storing the data2. Based on popular browsers (Firefox / Chrome), allow cross-browser im-plementation3. Capable of crawling websites4. Enable straightforward installation5. Register users on the server6. Receive a unique identiﬁer from the server and attach this to submissions7. Submit data to server8. Enable uncomplicated on-boarding process9. Display privacy statement and obtain obligatory consent10. Include an options page to capture demographics and participant’s de-tails11. Request demographics (age, gender, residence, impact of Parkinson’sdisease, Multiple Sclerosis and Diabetis on participant, researcher status,frequency of computer or search engine usage, experience with paid stemcell therapy, next largest city)12. Receive a study group identiﬁer and attach this to submission13. Automate queries14. Make automated queries unobtrusive to browsing15. Enable updates of crawl speciﬁcations16. Display recent submission and informational content134 .3. Product Backlog ppendix A: EuroStemCell Data Donation: Development

A.4. Participant survey (a)

Privacy statement (b)

Segment of user survey

Figure A.1.:

Screenshots of the on-boarding process, by author .4. Participant survey

The survey presented in the registration process comprised the following ques-tions and informational footnotes:1. Are you or someone close to you impacted by Parkinson’s Disease? • I’m a patient. • I’m a carer. • No2. Are you or someone close to you impacted by Multiple Sclerosis? • I’m a patient. • I’m a carer. • No3. Are you or someone close to you impacted by a form of Diabetes (TypeI or Type II)? • I’m a patient. • I’m a carer. • No4. Are you a stem cell researcher or medical professional? • Yes • No5. What is your country of residence? • Australia • Canada • United Kingdom • United States Of America • Other6. Your age range • • • • Note: At this point we are only studying the impact of Google advertising in the fourEnglish speaking countries above. We will consider data from other countries to guidefuture research. ppendix A: EuroStemCell Data Donation: Development • • • Female • Male • Other • Prefer not to say8. How often do you use your computer, laptop, tablet and/or smartphone? • Daily (More than 2 times a day) • Daily (Less than 2 times a day) • Weekly • Monthly9. How often do you use Google Search? • Daily (More than 2 times a day) • Daily (Less than 2 times a day) • Weekly • Monthly10. Have you ever paid for or inquired about stem cell treatments? • Yes • No11. What is the next largest city near you? • City: textfield • Prefer not to say

A.5. Query composition and crawled HTML elements

A.5.1. Query composition

The following search terms were composed at the project’s kick-oﬀ meeting.They were meant to formulate popular queries with respect to the ﬁeld we ex-amined. Thus, we included keywords like stem cell , the names of the respectivediseases ( parkinsons disease , multiple sclerosis , diabetes , denoted by disesase here). Also we included natural language questions as we assumed searchersto query search engines with direct questions if they are not Internet literatein a sense that they understand search engines capabilities and mechanics. If Yes: We’d like to hear about your experience. Please contact us. Please enter only letters. If you feel uncomfortable answering this, please choose ”Prefernot to say”. .5. Query composition and crawled HTML elements • stem cells • stem cells cost • stem cells treatment • stem cells cure • stem cells therapy • can stem cells help me? • can stem cells cure [disease] ? • [disease] cure • [disease] therapy • [disease] treatment • [disease] cells cost • [disease] stem cells treatment • [disease] stem cells cure • [disease] stem cells therapy A.5.2. Crawled HTML elements • Ads – Name – Title – URL – Content • Search results – Title – Content – URL – Position • Top Stories – Title – Author – URL – Position 139 ppendix A: EuroStemCell Data Donation: Development

Figure A.2.:

Detailed example of crawled elements (here: ad and organic results),Screenshot by author . EuroStemCell Data Donation:

Data Analysis and Visualizations

B.1. Downloads

Figure B.1.:

Daily users and downloads of the Firefox addon as documented onthe Mozilla Developer Hub statistics, screenshot by author ppendix B: EuroStemCell Data Donation: Data Analysis andVisualizations

Figure B.2.:

Daily Users from 2019-10-01 to 2020-02-07

Figure B.3.:

Cumulative registrations via the Chrome plugin from 1.10.2019 to7.2.2020 .2. Participants

B.2. Participants

Figure B.4.:

Donations by group

B.3. Advertisements and Advertisers

Figure B.5.:

Absolute Number of advertisements per group ppendix B: EuroStemCell Data Donation: Data Analysis andVisualizations

Figure B.6.:

Fraction of Prescription Treatment Advertisements per group . Functionality of a Search Engine

At ﬁrst, the original mechanism of Google will be portrayed by reference to(Brin & Page, 1999), the initial paper of the Google founders and the companyblog at (Google, 2019r). Then, these insights will be enriched with observa-tions of search engine researchers, tech observers and industrial professionals.Later on, patents provide a possible outlook.Web search engine operate in a three-stepped process of crawling the WWW,indexing web pages and serving results . Additionally, they generally dis-play advertisements along with organic search results to fund their operations.Search engines developed from merely using on-page data, link-analysis andother web-speciﬁc data (anchor text, e.g.) (Brin & Page, 1999) to leveragingmanifold sources to determine a searcher’s intentions. The factors that con-tribute to a ranking are unknown to the public. Google only gives implicitadvice on how to design websites and what they think is “high quality” thatleads to an appropriate ranking with respect to a user query (Google, 2019h,2020b). Due to this publishers might anxiously avoid anything that couldpossibly be a black hat SEO technique, scholars criticize (Pasquale, 2008).While most search engine operate on well-known principles, the speciﬁcdetails of their algorithms remain undisclosed trade secrets, mainly to sustainsearch quality and remain competitive (L. A. Granka, 2010). The fundamentaltasks of a search engine are as follows: crawling Google uses a web crawler (or robot / spider) that operates frommany computers and collects publicly available web pages on the store-server. The algorithm behind it receives a list of URLs of prior crawlsfrom the storeserver and sitemap data. Then, crawlers collect those web-site, send them to the storeserver and follow links recursively. Eventu-ally, newly created pages, changes and deletions are added to the index.It does not crawl blocked website, restricted areas and sites that arealready known (Google, 2019j). indexing The indexer parses the pages it receives from the storeserver’s repos-itory and creates an index. All signiﬁcant words and their position ona website, key content tags and attributes are stored. According to an For more details, see Appendix C Deﬁnition Crawler: “Automated software that crawls (fetches) pages from the web andindexes them.”(Google, 2019r) Uniform Resource Locator, or Internet address, see https://en.wikipedia.org/wiki/URL Websites will not be crawled if a ﬁle named robots.txt is located on the host. Throughinbound hyperlinks it might still be indexed, though.(Google, 2019j) “Pages that have already been crawled and are considered duplicates of another page, arecrawled less frequently.”(Google, 2019j) ppendix C: Functionality of a Search Engine in-memory has table (the lexicon ) the words are transformed into wordIDs. Their occurrences on a website are recorded on a hit list that isstored in barrels sorted by document ID. Then, the content of the bar-rels is used to create an inverted index. Additionally, the indexer derivesa database of linked documents from anchor ﬁles to assess meaning oflinked content (web pages and media)(Brin & Page, 1999). If a page isinaccessible due to a robots.txt ﬁle, authorization measures or anotherdevice, it is not indexed(Google, 2019j). According to Google, the index“contains hundreds of billions of webpages and is well over 100,000,000gigabytes in size”(Google, 2019m). Observers estimate that Google onlyindexes a marginal part of the WWW Grimmelmann distinguishes gen-eral from vertical search engines. While general index the whole web,vertical ones specialize in a particular category (like news, travel, shop-ping, e.g.). Over the years of its development, Google has added verticalsearch capabilities to its existing general search.(Grimmelmann, 2014) serving

Upon a user query, words from the parsed query are converted to wordIDs and searched for in the barrels. For the documents that include thosewords, a weighed rank is computed based on a multitude of parameters.Today, Google uses an unknown number of signals or variables to deter-mine relevance . Linguistic cues (website content), user cues (feedbackloop) and web structure (Page Rank) all contribute to a ﬁnal score (L. A.Granka, 2010). The relevance is assessed using an unknown amount of signals and factors ranging from context variables (location, time, cur-rent situation) to semantic information of the search query all the wayto very personalized factors (search history, proﬁling)(Google, 2019l). In2010, they used to amount to about 200 (Google, 2010). Google itselfprovides assistance to Search Engine Optimization (SEO) and qualita-tively describes how publishers should design their websites in order toreceive an accurate ranking without penalties. From this advice, onecould infer the nature of signals contributing to the measurement like in(Google, 2019r). The k highest ranked results are presented to the userin descending order of relevance.(Brin & Page, 1999) Today, hundredsof signals count towards the rank calculation(Sullivan, 2016a), links,content and RankBrain being the most signiﬁcant ones(Schwartz, 2016).Originally, Google used an algorithm called

Page Rank to assess the im-portance of a website by the web’s link structure. The creators intended tocompute the measure in accordance with people’s subjective idea of impor-tance. They argued that a source which received many citations is probablycredible, important or relevant. The more important those referrers are, thehigher the PageRank of the respective site. Hence, they used the normalized Grey literature: The estimates range from 0.004% to 4%, depending on source. Whatthey call

Deep Web consists of website without inbound links, password protected areas,databases that only respond to certain input, subscription services.(Rosen, 2014) Brin and Page state in their initial paper how “[F]iguring out the right values for theseparameters is something of a black art”(Brin & Page, 1999). ppendix C: Functionality of a Search Engine links of other pages that direct users to a particular website to caclulate themeasurement iteratively. A damping factor was included to simulate a surferthat randomly jumps to a diﬀerent website to avoid dead ends. Then, searchresults were prioritized based on their respective weight. Along with link struc-ture, the link text was considered in assessing a page’s relevance. The authorsargue that it usually describes the webpage it points to more accurately thanthe page it is located on. On top of that, websites without text content canthus be crawled (media, databases or other non-textual objects).(Brin & Page,1999)

P R ( A ) = ( − d ) + d ( P R ( T ) C ( T ) + ⋯ + P R ( T n ) C ( T n ) ) (C.1)The PageRank algorithm above calculates the PageRank P R ( A ) of website A by summing up the PageRanks of websites pointed at it, normalized by therespective sites total number of outgoing links. The damping factor d is usedto allow for personalization. This iterative algorithm computes a probabilitydistribution over all websites, so σ i = n P R ( A i ) (Brin & Page, 1999). Disclaimer:

Below, some prominent features and most recent develop-ments are discussed by observers and Search Engine Land . Note that apatented feature is not necessarily part of the actual search algorithm. Formost of the patents reviewed, there is no evidence of their clear implemen-tation. However, it can “oﬀer an interesting perspective on where [Google]is steering search and how it’s thinking about evolution of search.”(Nguyen,2019) Also, the observations and argumentations below are documented by in-dustry professionals outside of the Google universe. They stem from originalinterviews, research, experience and conferences and are published on theirwebsite. Thus, they do not ensure that Google uses these technologies.The search engine enhances queries using query expansion(Smarty, 2008) .This allows to broaden the search horizon and rely less on a user’s distinctinput.In 2008, Google introduced auto suggestion, a feature that showed numeralpossible text completions when users started to type their query(Sullivan,2008).The introduction of the Knowledge Graph (Henry, 2013) indicates a paradigmshift from things to strings, as the Oﬃcial Google Blog puts it(Singhal, 2012).Now, search on their platform is no longer about connecting keywords, but The discussions are mostly based on the personal opinions by authors of

SEO by the sea https://searchengineland.com/, especially Danny Sullivan Gray literature: Techniques include word stemming, acronyms, synonyms, translations,spelling corrections and removal of stop words. Critics argue that the concept of a knowledge graph (KG) is not properly deﬁned yet.Research work dealing with KGs cite Google’s blog even though it does not explainwhat constitutes a KG. In (Ehrlinger & W¨oß, 2016), Ehrlinger and W¨oß criticize thewide variety of interpretations of the concept. They propose to deﬁne KG as follows:“A knowledge graph acquires and integrates information into an ontology and applies areasoner to derive new knowledge.” (Ehrlinger & W¨oß, 2016) ppendix C: Functionality of a Search Engine ﬁnding semantically correct results. The development was kickstarted throughacquisition of Metaweb, a company maintaining “an open database of thingsin the world”(Menzel, 2010). Pages will not only be indexed with respect tokeywords but they will also be crawled for entities their attributes, classes andrelationships between them to create ontologies (Semturs et al., 2015). Googleﬁles numerous patents to bridge the “semantic gap”(Rosenberg, 2013) and fur-ther develop its KG(M. Pasca & van Durme, 2013)(Gubin et al., 2016)(Guptaet al., 2014) up the point where it grows and matures self-suﬃciently fromquery input (Halevy et al., 2018)(A. M. Pasca & van Durme, 2014) and un-derstands conversational queries(Sullivan, 2013)(Slawski, 2018).The KG displays information from diﬀerent sources in an infobox next to thesearch results to enrich the search experience through contextual and diverseinformation(Singhal, 2012). Google possibly adapts this method to the indi-vidual users’ background(Balog & Kenter, 2019) and enriches it with signalsfrom their social network(Y. Wu et al., 2013). Then, it may order result basedon a user proﬁle (Zamir et al., 2010). Google expands their concept of ﬁnding“things” in (Huynh et al., 2014) where they discusses how entity metrics canbe used to rank results in a way that considers semantics and context .In (W. L. Lu et al., 2019), for example question-answer-relationships onQ&A-websites are identiﬁed as well as how question concerning these rela-tionships can be parsed. They ﬁled numerous knowledge-oriented patentsthat tried to grasp a user’s individual context and understand semantic rela-tionships.In (Starr, 2015), Starr refers to (Huynh et al., 2014) and points out howdiﬀerent regions on a search results page may be computed by diﬀerent kinds ofalgorithms. She implies that “diﬀerent algorithms apply at diﬀerent times”(Starr,2015) and results might be of mixed origin to allow optimal presentation andinformation to users.Another leap in Google’s search engine design was the introduction of a aNatural Language Processor RankBrain in 2015(Schachinger, 2017). Accord-ing to (J. Clark, 2015), it is an AI-driven addition to the algorithm aﬀectinga large fraction of searches that are ambiguous in their meaning or have neverbeen asked before . If a query cannot be conﬁdently answered, RankBraintries to guess the searcher’s intentions. Its goal is to come up with a suﬃcientlygood answer by inferring associations from the input and then ﬁnd similari-ties to queries in the past(Sullivan, 2016a)(J. Clark, 2015). This guessworkis made possible by the semantic network of entities and their attributes andrelationships mentioned above(Schachinger, 2017).Recent advances include predictive computing that tries to guess user in-tent and guide them through search and decision processes. Davies reviewstwo patents ((Peddinti & Dabbiru, 2017)(Foerster & Brewin, 2017))that sup-port that development in (Davies, 2017a) and (Davies, 2017b). The patents Examples from (Starr, 2015) include relatedness (co-occurence of entities), notable entitytype (multi-categorization of entities), contribution (content generated by an entity, likesocial media posts or published works) and prize (awards and prizes) The latter amounting to 15% of all searches.(Farber, 2013) ppendix C: Functionality of a Search Engine describe how a search engine can include various information to infer futurebehavior or intent. Davies points out, that “[B]asically, the patent is builton the idea that all data from virtually any source can be used to determineexpected actions a user is likely to take.”(Davies, 2017a) With this knowl-edge, the patented system tries to estimate future behavior and indicate to theuser if an action is jeopardizing the expected outcome(Peddinti & Dabbiru,2017). The second patent allows to inject suggestive steps into the purchas-ing process and enables highly targeted bidding on advertisements(Foerster &Brewin, 2017). Davies raises awareness to how these two inventions have mas-sive impact on search behavior, advertisement bidding, purchasing processes.Consequently, this allows nudging the user in a third party’s interest which iscritical in terms of the practices introduced in Section 3.1.Google assesses search quality with feedback from third-party services andusers (Google, 2019k)(Levy, 2010). Additionally they are constantly testingand reviewing algorithm prototypes through A/B testing (Levy, 2010). According to Davies, this includes social media, motion, purchase history, weather, net-work account data, data from third-party applications and services all sorts of commu-nication processed on the device.(Davies, 2017a) “Every time engineers want to test a tweak, they run the new algorithm on a tiny per-centage of random users, letting the rest of the site’s searchers serve as a massive controlgroup.”(Levy, 2010)“Every time engineers want to test a tweak, they run the new algorithm on a tiny per-centage of random users, letting the rest of the site’s searchers serve as a massive controlgroup.”(Levy, 2010)