[PDF] Examining the Impact of Algorithm Awareness on Wikidata's Recommender System Recoin

Abstract

The global infrastructure of the Web, designed as an open and transparent system, has a significant impact on our society. However, algorithmic systems of corporate entities that neglect those principles increasingly populated the Web. Typical representatives of these algorithmic systems are recommender systems that influence our society both on a scale of global politics and during mundane shopping decisions. Recently, such recommender systems have come under critique for how they may strengthen existing or even generate new kinds of biases. To this end, designers and engineers are increasingly urged to make the functioning and purpose of recommender systems more transparent. Our research relates to the discourse of algorithm awareness, that reconsiders the role of algorithm visibility in interface design. We conducted online experiments with 105 participants using MTurk for the recommender system Recoin, a gadget for Wikidata. In these experiments, we presented users with one of a set of three different designs of Recoin's user interface, each of them exhibiting a varying degree of explainability and interactivity. Our findings include a positive correlation between comprehension of and trust in an algorithmic system in our interactive redesign. However, our results are not conclusive yet, and suggest that the measures of comprehension, fairness, accuracy and trust are not yet exhaustive for the empirical study of algorithm awareness. Our qualitative insights provide a first indication for further measures. Our study participants, for example, were less concerned with the details of understanding an algorithmic calculation than with who or what is judging the result of the algorithm.

Full PDF

EExamining the Impact of Algorithm Awareness on Wikidata’sRecommender System Recoin

Jesse Josua Benjamin

Human-Centered ComputingFreie Universität [email protected]

Claudia Müller-Birn

Human-Centered ComputingFreie Universität [email protected]

Simon Razniewski

Max Planck Institute for [email protected]

ABSTRACT

The global infrastructure of the Web, designed as an open and trans-parent system, has a significant impact on our society. However,algorithmic systems of corporate entities that neglect those prin-ciples increasingly populated the Web. Typical representatives ofthese algorithmic systems are recommender systems that influenceour society both on a scale of global politics and during mundaneshopping decisions. Recently, such recommender systems havecome under critique for how they may strengthen existing or evengenerate new kinds of biases. To this end, designers and engineersare increasingly urged to make the functioning and purpose ofrecommender systems more transparent. Our research relates tothe discourse of algorithm awareness, that reconsiders the role ofalgorithm visibility in interface design. We conducted online exper-iments with 105 participants using MTurk for the recommendersystem Recoin, a gadget for Wikidata. In these experiments, wepresented users with one of a set of three different designs of Re-coin’s user interface, each of them exhibiting a varying degreeof explainability and interactivity. Our findings include a positivecorrelation between comprehension of and trust in an algorithmicsystem in our interactive redesign. However, our results are notconclusive yet, and suggest that the measures of comprehension,fairness, accuracy and trust are not yet exhaustive for the empiricalstudy of algorithm awareness. Our qualitative insights provide afirst indication for further measures. Our study participants, forexample, were less concerned with the details of understanding analgorithmic calculation than with who or what is judging the resultof the algorithm.

KEYWORDS

Algorithm awareness, recommender system, transparency, peerproduction, Wikidata.

After three decades of continuous growth [42], the Web has becomean integral part of our society. It is designed as an open and trans-parent system, but more recently algorithmic systems that neglectthese principles populate the Web. Exemplary of these systems arerecommender systems, which have different impacts ranging fromsocietal discourses (e.g., the British EU referendum or the U.S. pres-idential election of 2016) to profane details of everyday life, such aswhen choosing a product or service, a place to spend the holidays,or consuming personalized entertainment products. On the wholerange of their usage, recommender systems are often interpretedas algorithmic decision support systems that frequently includediscussions on bias (e.g., [28]). One of these frequently raised issues is that of algorithmic bias , where specific groups of people basedon gender, ethnicity, class or ideology are systematically discrim-inated by algorithmic decisions [2]. These discussions indicate agrowing discomfort with the algorithmic advances both used inand facilitated by the Web.To this end, ongoing design discourses urge engineers to con-sider explaining the presence and function of algorithms to endusers [24]. Lawmakers, too, are increasingly called upon to respondto issues such as algorithmic bias. Exemplary of the latter is theGeneral Data Protection Regulation (GDPR) by the European Union,which seeks to curtail violations of data privacy and unregulatedprofitisation from user data. Selbst and Powles go so far as to saythat the GDPR effectively guarantees a “right to explanation” ofalgorithmic processes to end users [35].This right to explanation is understood as an explicit challenge tothe discipline of Human-Computer Interaction (HCI); to be met withconcrete means of representing and explaining algorithmic pro-cesses. In HCI, this challenge aligns with the discourse of algorithmawareness . Hamilton and colleagues define algorithm awareness asthe extent to which a user is aware of the existence and functioningof algorithms in a specific context of use [18].However, the scope of the term algorithm awareness is not yetclearly defined, partially a result of the lack of experimental resultsassociated with the discourse. As a consequence, it is unresolvedwhether algorithm awareness is the result of unearthing new meth-ods of interaction, novel forms of representation, finding means ofexplaining algorithmic processes, or all of these aspects taken to-gether. Similarly, its methodological perspective is vaguely defined.Are algorithm-aware designs, for example, a result of a criticaltechnical practice [36], or are they a new form of human-centereddesign? If algorithm awareness as a principle is to contribute to anunderstanding of web-based algorithmic systems today (and tomor-row), these methodological shortcomings need to be addressed.In our research, we focus on one specific aspect from the dis-course of algorithm awareness, the aspect of algorithm representa-tion. We first discuss related work from the areas of HCI, Computer-Supported Collaborative Work (CSCW), and Science and Technol-ogy Studies (STS). We argue that algorithm awareness should beunderstood within the context of human-technology relations sincealgorithmic systems increasingly impact how we see the world. Wethen introduce a use case which allows for studying different rep-resentations for algorithm awareness because of its open design.The use case is situated in the peer production system Wikidata,in which the completeness recommender Recoin is being used byeditors to receive recommendations for their next edits. As opposedto commercial web-based systems, the design principles of Wiki-data give as access to all necessary information available regarding a r X i v : . [ c s . H C ] S e p ecoin’s engineering and usage. We can, thus, reflect on the variousdecisions made during Recoin’s development and can suggest dif-ferent modes of representing the algorithmic system by consideringthe dimensions of explainability and interactivity.Our research makes the following contributions: We provideexperimental results to be used in the continuing development ofthe discourse on algorithm awareness. This concerns insights on de-sign measures, namely textual explanation of algorithmic logic andinteractive affordances, respectively. Our results suggest that theprovision of more interactive means for exploring the mechanismof the algorithm for users has significant potential, but that moreresearch is needed in this area. As for conducting experimentsin this context, we provide first methodological insights, whichsuggest that measures of comprehension, fairness, accuracy andtrustworthiness employed in the field are not yet exhaustive for theconcerns of algorithm awareness. Our qualitative insights providea first indication for further measures. The study participants, forexample, were less concerned with the details of understanding analgorithmic calculation than with who or what is judging the resultof the algorithm.In the following section, we discuss existing research from threeperspective. First, we discuss the role of automation in peer produc-tion communities which leads to an increased usage of algorithmicsystems in this context. Second, we review existing approaches thatattempt to make these algorithmic systems transparent. Third, wecombine these insights to argue for the urgency of researching algo-rithm awareness. This theoretical section is followed by a detailedintroduction to Recoin, our use case, including its technical designas well as how it connects specifically to the topic at hand. Subse-quently, we showcase our experiment, in which we detail the setup,design, results and analyses involved in our experimental study.Then, we proceed to discuss our insights. Finally, we conclude byoutlining future work, in which we seek to undertake qualitativestudies into how the evaluated modes of representation affect therelations between humans and algorithmic systems. Automation in Peer Production Communities . In contrastto the predominant commercial platforms on the Web, peer produc-tion communities, such as Wikipedia, OpenStreetMap, or Linux,provide a valuable alternative for people to share their ideas, expe-riences, and their collaboratively created knowledge openly [4, 5].In these communities automation is an integral component in orderto handle especially "mindless, boring and often reoccurring tasks"[27]. In Wikipedia, for example, various forms of algorithmic sup-port exist; recommender systems, such as the SuggestBot help peo-ple to find suitable tasks [8], neural networks, such as employed byClueBots help to revert obvious vandalism [7], or semi-automateduser interfaces, such as Snuggle help editors to socialize more effi-ciently with newcomers [17]. Wikidata as Wikipedia’s sister projectprofited from existing experiences in these automation efforts, thus,tools for vandalism detection were highly sophisticated from the be-ginning [34]. However, depending on how this automation is beingused, the outcome goes in both directions. The unreflected use of au-tomation can suppress participation of good-faith newcomers [16],and on the other hand, recommender systems on Wikipedia can significantly improve editor engagement and content creation [41].Existing research shows, how the openness of peer production sys-tems, such as the various Wikimedia projects (Wikipedia, Wikidata,etc.) enable researchers to investigate the manifold facets of au-tomation in a real-world setting, and simultaneously support theseprojects in their goals of providing free high quality content.

Approaches to Algorithm Awareness . With regards to relateddiscourses such as on Fairness, Accountability and Transparency(FAT) [23] or Explainable Artificial Intelligence (XAI) [15]; algo-rithm awareness is more aligned to the study of lay’s persons expe-riences of algorithmic systems. As with FAT and XAI, the concernsof the discourse are illustrative of pressing socio-cultural, economicand political needs. However, and similarly to FAT and XAI, algo-rithm awareness so far suffers from the lack of a methodologicaldefinition. Both in terms of design and engineering, implementa-tion of algorithm-aware designs is challenged by two fundamentalissues which can be derived from Hamilton and colleagues defini-tion [18]: (1) the perceivability of an algorithm (e.g., results, logic,data) and (2) an actionable mode of representation that leads toinformed usage.So far, the context of conducted algorithm awareness studiesdiffers greatly. Studies have included, for example, both attemptsat reverse-engineering web-based systems such as the Facebook’snewsfeed [11] as well as manipulating online peer grading sys-tems [21]. In the former, Eslami specifies that an algorithm-awaredesign should provide an actionable degree of transparency to algo-rithmic processes in order to promote a more informed and adaptiveuse of a specific system by its users [10]. In her study, Eslami oper-ationalizes the approach of seamful design to display results of theFacebook newsfeed algorithm that usually do not get displayed.In the latter, Kizilcec proposes another dimension to algorithmawareness [21]: the question of how much transparency of an algo-rithmic system is actually desirable to ensure understandability andusage. For his study, Kizilcec exposes participants in a peer-gradedmassive open online course to three kinds of transparency whenconfronted with their course grades. For each kind of transparency,he asked participants to rate their comprehension of the user inter-face and to what extent they evaluate the provided information asfair, accurate and trustworthy. These measures provide a first set ofmeasures to empirically study how humans understand algorithmicsystems. His results suggest a medium degree of transparency (inthis case, textually disclosing the result and logic) as most effective.A high transparency (of the result, the underlying logic and rawpeer grading data), he finds, is in fact detrimental to trust in thealgorithmic system – whether or not the received grade was lowerthan expected.A particular focus in algorithm awareness, as well as in XAIand FAT, are the concrete means by which humans may becomemore informed about algorithmic systems. A frequently deployedsolution is the use of textual explanation of algorithmic processesor outputs across all discourses, featuring in contexts such as socialmedia [30], aviation assistants [25], online advertising [12], clas-sification algorithms in machine learning [33] and in online peergrading as discussed above [21]. The prevalence of this solution The term seamful design is created as opposite concept to seamless design, wherethe algorithmic system fades into the background of human’s perception. ay be interpreted as a clear indication for textual explanation be-ing most suitable for establishing algorithm awareness. Within theaforementioned studies, various versions of textual explanationswere studied comparatively. For example, even though Kizilcecquestioned how much information a user may require, his variousconditions of transparency all feature textual, explanatory solutionsonly [21]. This may be considered a gap in the discourse. Returningto Hamilton and colleagues, the complexities of contemporary algo-rithmic systems do not only pose the question of how much humansmay need to understand, but also in what way [18]. This suggests,for example, that experimenters should also explore differencesbetween textual explanation of algorithmic logic and interactive,non-declarative solutions in the same context. Urgency for Algorithm Awareness . Due to the increase of au-tomation on the Web, finding means for a better understandingof algorithms both by experts and lay users is particularly urgent.With algorithms, existing biases may become amplified substan-tially. In the discourse on recommender systems, bias has beenobserved as a challenge early on, and a major line of recommendersystems research investigates how to avoid popularity bias, i.e.,providing recommendations that are already known to satisfy alarge number of users [13, 14]. More recently, several works inves-tigate the explainability of recommender systems [19, 43]. Evenopen peer production systems such as Wikidata need to be seenin this context. That is, if there is a pre-existing bias in a knowl-edge base such as Wikidata, a recommender system may causethis bias to become self-perpetuating. Additionally, encoded biasmay spread into the outputs of Wikidata APIs–thereby opaquelyinfluencing the standard in domains that rely on Wikidata services.In his overview of bias on the Web, Baeza-Yates concludes that anawareness of bias (whether algorithmic or cultural) is the primaryprecondition for designers and engineers to mitigate potentiallynegative effects on users [2]. The developer perspective as advancedby Baeza-Yates suggests that an engineering solution may be foundwith the potential to eliminate bias, whether by way of analyzingbiased tendencies in the data used by a Web platform or running ofextensive A/B-testing of subgroups [22].However, as repeatedly noted by Wiltse and Redström, the com-plexity of algorithmic systems in the modern Web troubles this sug-gestion. In their words, the Web is populated not by clear developer-client relations, but by fluid assemblages , i.e. socio-technical con-figurations that change in various context of use [32, 40]. Bias,therefore, is not necessarily a definitive phenomenon for eitherhuman or machine. Accordingly, counting on purely technical so-lutions to eliminate bias needs to be up for debate. Instead, and ascalled for by various researchers from algorithm awareness, FATand XAI, empirical studies that provide insights into how algorith-mic systems (and the biases encoded therein) may be made moretransparent.In the next section, we introduce the context of the open peerproduction system Wikidata, in which our use case, Recoin – aproperty recommender-system – is used.

Wikidata is an open peer production system [39]. Its structureddata is organized in entities, where two types of entities exist: items and properties. Items represent real-world objects (individuals) orabstract concepts (classes).Each item is described by statements; for example, the entity

Q1076962 represents the human

Chris Hadfield . Each item is de-scribed by statements that follow a subject-predicate-object structure(e.g.,

Chris Hadfield ( Q1076962 ) ”is instance of” (

P31 ) ”human” ( Q5 )).Thus, a property, i.e. predicate, describes the data value, i.e. object,of a statement. In October 2018, the community has more than200k registered contributors with 19K active on a monthly base.They have created more than 570m statements on more than 50mentities.Even though Wikidata was founded to serve as a structureddata hub across all Wikimedia projects, today, it is utilized formany other purposes; for example, researchers apply Wikidata asauthoritative for interlinking external datasets, such as for genedata [6] or digital preservation data [37], or companies use Wiki-data’s knowledge graph for improving their search results, suchas Google or Apple. A significant issue for Wikidata’s communityis consequently the quality of the data. Data quality is a classicalproblem in data management, however, in peer production settingssuch as in Wikidata, data quality assessment is complicated becauseof the continuous, incremental data insertions by its users, the dis-tributed expertise and interest of the community, and the absenceof a defined boundary in terms of its scope. Over the past years, thecommunity has introduced many tools that address this challenge,that range from visualizing constraint violations to de-duplicationtools and translation tools. One of these tools is Recoin that wepresent in more detail in the next section. Recoin is a recommender system for understanding and improvingthe completeness of entities in Wikidata [1, 3]. A main motivationfor implementing Recoin is Wikidata’s openess, since it allowsanyone to add nearly any kind of entities - items and properties.The latter led to a huge space of possible properties (4,859 propertiesas of July 2nd, 2018), with many applying only to a very specificcontext (e.g., ”possessed by spirit” (

P4292 ) or ”doctoral advisor”(

P184 )). Consequently, even experienced editors in Wikidata maylose track of which properties are relevant and important to agiven item which might hinder them to improve data quality inWikidata [31].Recoin is a gadget - an interface element - on Wikidata . A visualindicator informs a person about the relative completeness of anitem and, moreover, it provides an editor with concrete recommen-dations about potentially missing properties on this item. Figure 1)shows the gadget on an item page on Wikidata. A visual indicator(icon on the top right) shows a color-coded progress bar that hasfive levels ranging from empty to complete. On the top of an itempage, the recommendations are provided in an expandable list thatshows up to ten of the most relevant missing properties.The idea of relative completeness is motivated by the situationthat in absolute terms, measuring the completeness of an itemis impossible in an open setting. The relative completeness, thus,considers completeness in relation to other, similar items. The re-latedness function of Recoin considers two items as similar if they igure 1: Recoin for the astronaut Chris Hadfield. share the same class . The visual indicator of Recoin should not beunderstood as an absolute statement, i.e. level 5 (=complete) means,that all possible statements a given on the item page, but shouldrather be interpreted as a comparative measure, i.e. the statementson this item are more complete than on similar items.The completeness levels in Recoin are based on thresholds thatare manually determined. It considers the average frequency of the5 most frequent properties among the related ones to consider anitem as most complete (0%-5% average frequency), quite complete (5%-10% average frequency), and so on. Furthermore, each user isshown 10 recommendations in order to avoid an overwhelminguser experience. As of September 25, 2018, Recoin is enabled by 220 editors onWikidata , who created, based on Recoin’s recommendations, 7,062statements on 4,750 items. Even though Recoin is a straightforwardapproach for improving data quality on Wikidata, editors are hesi-tating to apply Recoin. Moreover, after persons used Recoin, theyhave raised a number of concerns. Based on existing discussions onRecoin’s community page and on the mailing list, we identifiedthree typical issues.Editors, for example, posed questions regarding the scope of therecommender : “Not sure if Recoin looks at qualifiers & their absence;if not, might this be something to think about?” . The informationprovided by Recoin hindered editors to understand which datawas being used to compute the recommendations. In another case,an editor was wondering about Recoin’s underlying algorithm: “Something weird going on on Jan Jonker (Q47500903), stays on leastcomplete.” . In this case, the unchanging visual indicator of Recoincaused the user to question the functionality of Recoin. Anotheruser was concerned about the provided recommendation and itssuitability for specific items: “How is Property:P1853 "Blood type"on this list, is that relevant (or even desirable) information for most An exception are items that are instance a of the class human . In this case, the class”occupation” is used. people?” . The user was not able to include its personal preferences -world view - in Recoin’s recommendation.However, the third typical issue exemplified by Wikidata’s edi-tors raises a more genuine concern over the impact of Recoin on analready biased knowledge base (e.g., the predominance of the Englishlanguage [20]). One editor stated: “This tool has its attractions but itdoes entrench cultural dominance even further as it stamps "quality"on items. The items with the most statements are the ones that aremost likely to get good grades. Items from Indonesia let alone countrieslike Cameroon or the Gambia are easily substandard.” . On a surface read, this quote further substantiates the misun-derstood nature of Recoin’s function, as it is not intended as aunilateral absolute grading of the completeness of a particular item,but rather as a comparative tool that recommendations depend onthe activities of the editors on similar items. However, and muchmore significant, the concern raised about cultural dominance is avery contemporary problem in algorithmic system design. Recoinfails to address this concerns by its current design and mediatedfunction. In other words, the cultural bias in the recommendedproperties, even if not intended, seem to affect the usage of Recoin.Based on these insights, we wanted to better understand how are-design of Recoin that considers algorithm awareness by focusingon explainability and interactivity can address the aforementionedissues. As opposed to existing research in this context, for examplecarried out by Eslami et al. [11], we do not require methods suchas reverse engineering to understand the algorithmic system weare dealing with on a technical level. This knowledge is key tounderstanding the intricacies of the Web platforms today; as theways in which an algorithm operates within a larger socio-technicalcontext arguably also shapes the extent to which humans can orshould be aware of it. Therefore, with an openly available recom-mender system in an open peer production system, we can conductexperiments that are closely tied to the actual practice of Wikidataediting activities, i.e., we can reflect on the technical and the socialsystem similarly.In the following section, we introduce our experimental setupthat help us to examine the impact of varying degrees of explain-ability and interactivity of the UI of the recommender system onhumans. Following the concept of Recoin, our experiment featureda task of data completion. By measuring the interactions of partici-pants with various designs during a data completion task and byeliciting self-reports, we sought to understand which design mea-sures increased task efficiency while at the same time were mosteffective in increasing understanding of the algorithmic system.

Informed by previous research, we designed two alternative UIswhere each represents another degree of explainability and interac-tivity. Each user interface extends or replaces the previous versionby specific design elements. In the following, we differentiate theoriginal Recoin design (R1), the textual explanation design (RX), andthe interactive redesign (RIX). The RX design is mainly inspired byprevious work, from which we adapted the explanation design [21].The RIX design follows an interactive approach, where the user The corresponding thread can be found on Wikidata’s mailing list archive https://lists.wikimedia.org/pipermail/wikidata/2017-December/011576.html. igure 2: Recoin with Explanation (RX) . can interact with the outcome of the algorithm, thus, can explorehow the algorithm outcome changes based on specific settings. Thethree UI designs are used in five experimental conditions (C1 toC4), supplemented by a baseline where participants only used theregular Wikidata interface. We explain these conditions in moredetail in one of the following paragraphs.Based on these designs, participants should solve the same taskacross all conditions: adding data to a Wikidata item. We recruited105 participants via Amazon Mechanical Turk (MTurk) , wherebyeach participant had a minimum task approval rate of 95%, and aminimum amount of HITs of 1,000. Each participant received USD3.50 (equivalent to USD 14.00 hourly wage) for full participation.We recruited only U.S. participants to reduce cultural confounds.We randomly and evenly distributed participants over our five con-ditions (i.e. 21 participants), also ensuring that no participant couldre-do the task or join another condition by associating participantswith qualifications. Each participants was given 10 minutes for taskcompletion.In each condition, participants went through the same generalprocedure during task completion. At first, we provided a briefon-boarding, then we provided a task briefing. After the studyparticipant carried out the task, she had to fill out an explicativeself-report which contained the dimensions comprehension, fair-ness, accuracy and trust. Additionally, all participants obtained atask completion score, which we correlated with server activity toensure that our final study corpus featured no invalid contributions.All data and results of our study will be available under an openlicense .In the following, we outline the design decisions that have ledto our three designs for Recoin in more detail. Then, we describethe task and the experimental design. Omitted for review.

Figure 3: Recoin Interactive Redesign (RIX) . In the following, we describe each UI approach of the recommenderRecoin in more detail. For each user interface, we provide a corre-sponding visual representation.

The original design of Recoin(cp. Figure 1) was primarily informed by existing UI design practicesin Wikipedia. The status indicator icon was chosen to mirror thearticle quality levels on Wikipedia, such as "Good article" or "Fea-tured article" . The used progress bar was motivated by existingvisualizations in Wikipedia projects. Some parameters to represent the results of the Recoin recom-mender were determined without further considerations, such asthe thresholds that represent the five levels of completeness.

Textual explanation of algorith-mic logic is a wide-spread measure in the related work, and hasbeen deployed in contexts such as social media [30], aviation assis-tants [25], on-line advertising [12] and classification algorithms inmachine learning [33]. For our design of RX, we drew inspirationfrom Kizilcec, who tested three states of transparency to under-stand algorithm awareness in on-line peer grading [21]. Since thealgorithm’s function can be compared to Recoin (i.e., a rating algo-rithm), we adapted the format of Kizilcec’s best solution and addeda textual explanation to Recoin’s user interface that describes thelogic behind Recoin’s calculation (cp. Figure 2).

Our interactive user interface(RIX, cp. Figure 3) is based on insights gained from user feedbackfrom Recoin’s current users (cp. Section 3.2) and from the philoso-phy of technology as discussed by Verbeek [38]. Concerning thelatter, we posit that Recoin actively transforms the relationship aneditor has with Wikidata and the entities therein. Through Recoin,Wikidata items that formerly were objects containing knowledge For more information we refer to https://en.wikipedia.org/wiki/Wikipedia:Good_articles and https://en.wikipedia.org/wiki/Wikipedia:Featured_articles. igure 4: Briefing page, with material added and resourcesfor manually carrying out Recoin’s functions and tutorial. are now also objects that are rated. Technically, this rating is not anindication of absolute qualities, but one of community-driven stan-dards, i.e. how the Wikidata community currently views a specificclass of items.However, as illustrated in the various responses to Recoin, thismediation is not adequately communicated by Recoin’s currentdesign. Furthermore, in reflecting on Recoin with the original de-velopers, we found that the comparative parameter of dividingthe relevancy of the top five properties was arbitrarily chosen. Inline with Mager [26], we consider transparency for this result ofdeveloper decision-making as essential.We operationalized these insights for RIX by considering howthe community-driven aspect of Recoin could not only be displayed,but made interactively explorable. To this end we (1) included areference to the class of the displayed entity (e.g., ”astronaut” inour running example) in the drop-down title. This was designedto convey that this particular item is rated based on its class. Next,we augmented the drop-down itself extensively. We (2) substitutedthe relevance percentage with a numerical explanation for eachsuggested property (e.g., a relevance for the property ’time in space’of 67.03% means that 549 out of 819 astronauts have this property).In contrast to a percentage, it was our intuition that relating tothe class would highlight the community-driven aspect of Recoin.To strengthen this aspect further, we (3) included a range sliderwhich allows filtering properties based on their prominence inthe class (i.e., compare this entity based on their occurrence in aminimum/maximum of n astronauts). Finally, we offered a wayfor directly interacting with Recoin’s calculation: we (4) allowedour participants to reconfigure the relevancy comparison by (de-)selecting individual properties. Thereby, we wished to show thatrelevancy can be a dynamic, community-driven attribute in thisalgorithmic system. For the study participants, we defined a typical editing task onWikidata. We presented each study participant with a copy of Wiki-data’s user interface, to provide a most realistic task setting. First,the participants received a brief on-boarding for Wikidata, and,depending on the condition, for Recoin as well. Participants thenproceeded to the task briefing page (cp. Figure 4). The participantswere asked to add further properties and data to a Wikidata item.Additionally, we supplied participants with a short video tutorialthat explained how properties can be added to an item on Wikidata.In each condition, the Wikidata item to be edited was

Chris Had-field , a Canadian astronaut . This item was chosen because it hasa number of missing statements that are easily retrievable, and onthe other hand, the item describes an astronaut who is probablywell-known by our study participants which are U.S. based. Addi-tionally, the occupation of astronauts was thought to be relativelyneutral, as opposed to, for example, politicians or soccer players.We provided study participants with source material for thetask composed of comparatively relevant and irrelevant piecesof information about Hadfield . We also supplied a link to a verydetailed Wikidata item with the same occupation, the US-Americanastronaut

Buzz Aldrin , and a link to a Wikidata query of theoccupation ”astronaut” , both with the intention to allow studyparticipants to compare the given item with other items, i.e. weencouraged our participants to perform the functionality of Recoinmanually.In addition, we provided a short video tutorial on how to addstatements to Wikidata items. Following the task briefing, par-ticipants could choose to commence the task, which lead to thereconstructed Wikidata page for Hadfield . Within a 10 minute limit,participants could then add statements to the item.We randomly assigned each participant to one of the conditions.Once the 10 minutes passed, participants were alerted that time wasup, and that they should proceed to the self-report. Here, partici-pants were confronted with a grade (from A-F) of their task. Thisgrade was calculated through the difference in completeness beforeand after participants added information to the Wikidata item (e.g.,when a participant additions increased the relative completeness of

Hadfield by more than 20% but less than 30%, they received a "B").In correspondence to this grade, participant’s were asked to ratetheir comprehension (5-point Likert scale), feelings of accuracy,fairness and trust (7-point Likert scale) of the recommender system.Again, due to substantial methodological and contextual similaritiesto Kizilcec’s online study [21], we adopted the aforementionedmeasures to our study. Participants were also asked to expand ontheir ratings using free text fields. Upon submitting their ratings,participants were returned to the MTurk platform.

We conducted a between-subject study with five conditions. In thefollowing, we define each condition and explain each measure wecollected during the study. Please check http://tinyurl.com/ycnh3q37. elevance Difference of the completeness value of theitem before and after task completion.

Usage

Number of times the recommender Recoin wasused during task completion.

Compre-hension

To what extent do you understand how yourtask has been graded? (1) No understanding atall to (5) Excellent understanding.

Fairness

How fair or unfair is your grade? (1) Definitelyunfair to (7) Definitely fair.

Accuracy

How inaccurate or accurate is the grade? (1)Very inaccurate to (7) Very accurate.

Trust

How much do you trust or distrust Wikidata tofairly grade your task? (1) Definitely distrust to(7) Definitely trust.

Table 1: Overview of measures employed in our online ex-periment.

The first three conditions (baseline, C1, C2)were designed to test usage and understanding of the current ver-sion of Recoin, i.e. R1. We then proceeded to test the collected base-line against textual explanation (C3 with RX) as found in relatedwork [15, 21, 29] and a redesign motivated by the shortcomingsfound therein (C4 with RIX). By comparing the results of the condi-tions, we aimed to gather insights on how design impacted humanunderstanding of Recoin’s function.All conditions are described in more detail next, followed by adescription of the collected measures. • Baseline : Participants can add data on a Wikidata item with-out

Recoin being present in the user interface. • Condition 1 : Participants can add data on a Wikidata item with

Recoin (R1) being present in the user interface. • Condition 2 : C1 but Recoin is mentioned during the on-boarding process. • Condition 3 : C2 but Recoin (RX) with explanation interface. • Condition 4 : C2 but Recoin (RIX) with interactive interface.

Relevance : As the improvement of dataquality is the primary goal of Recoin, we wanted to ensure thatwe understood how each condition affected the change in com-pleteness; independent of the quantity of contributions. Thus, wedefined the metric relevance as our dependent variable. Relevanceis defined as the difference of the completeness values of Recoinbefore and after a participant added properties to the item.

Recoin Usage:

As a recommender system, it is particularly im-portant to understand how each condition (aside from the baseline)affected the number of times Recoin was used directly to add infor-mation to an item. This is expressed by the measure usage whichserves as a dependent variable.

Time:

We fixed the time participants can add properties to anitem to ensure that our conditions are comparable. The measure time serves as our control variable.

Demographics:

All study participants were recruited via MTurk.While we assume that the majority of participants are US-Americans,we did not further specify our demographics. Thus, as is typical,demographics were our covariates.

Figure 5: Questionnaire after adding properties that lead toan increase in relevance of 25%.Condition

249 - - C1

319 61 3.10 C2

382 91 8.56 C3

301 55 3.50 C4

281 71 4.25

Table 2: Number of edits, i.e. contributions in each condition,with the number of Recoin usage and standard deviation.

Upon completion of the task, par-ticipants were directed to a self-report page (cp. Figure 5). The pageprominently featured a grading of the participant task performance,which was calculated by normalizing the average comparative rele-vance, i.e. Recoin’s assessment, of each contribution per participant.We graded the participants performance in their task (A-F) in orderto elicit a reaction to the task even if participants did not notice,use or understood Recoin. We purposefully designed this gradingto encourage study participants to reflect on the task, for examplea participant may receive the grade F despite many additions to theitem.Furthermore, we included ratings with the four factors fromprevious research on algorithm awareness [21]: comprehension (5-point Likert scale), accuracy, fairness and trustworthiness (7-pointLikert scale) of the algorithmic system. These measures should bestrongly correlated according to the procedural justice theory inthe related work. Low ratings of all measures, for example, wouldstem from violated expectations in an outcome [21]. We asked allparticipants in addition, to expand on their ratings via text-fieldsfor collecting qualitative data as well.

For our hypotheses, we were interested in testing the impact ofRecoin on data completeness. Having provided participants withequal opportunities to add relevant data to the item

Hadfield , we xamined whether Recoin improves the completeness of an itemor not.Based on our analysis of the status quo (cp. Section 3.2), we didnot expect study participants to actively use Recoin which leadto the following hypothesis: H : Using Recoin does not lead tosignificantly higher relevance in terms of data completeness.Based on the discussed literature on algorithm awareness, weassume that an user interface that conveys explainability and in-teractivity of the underlying recommender system leads to higherusage rates: H : The interface design of Recoin impacts the numberof time participants used Recoin.Furthermore, we assumed that the effectiveness of algorithmaware designs would be captured most succinctly by the compre-hension measure, which would accordingly allow us to distinguishthe impact of the RX and RIX designs. Given the results of textualexplanation employed in related work (cp. Section 2), we thereforehypothesized that: H : A textual explanation of the algorithmiclogic leads to higher comprehension than the interactive redesign.Finally, to gain insights on methodological procedure, we soughtto test the experimental self-report measures employed by Kizil-cec [21]. According to this research, the self-report measures shouldexhibit a high degree of correlation (Cronbach’s α = 0.83). We there-fore hypothesized: H : The correlation of self-report measures fortextual explanation solutions will equally hold for testing the inter-active solution.

We recruited 21 participants for each condition ( n = C2 -condition providing the most. In the C4 -condition, our interactiveredesign, participants used Recoin most frequently, with more thanhalf (61.09%) of participants adding data via the Recoin interface atleast once. This condition also included the most relevant contri-butions, with a median increase in completeness for the Hadfielditem of 21%. The median values of task performance, i.e., receivedgrade and average increase in completeness, as well as the ordinalLikert-scales from the participant self-report, i.e., comprehension,fairness, accuracy and trust, can be seen in Table 3.We expected only a small amount of qualitative data. However,we found that displaying a grade in the self-report provided ahighly effective trigger. Overall, 82 of our 105 participants chose toexpand on their self-reported ratings via the provided text fields.This allowed us to probe participant statements for insights onspecific subjective perspectives. Condition Grade Rel. Comp. Fair. Acc. TrustBaseline

C 11 2 4 4 4 C1 C 15 C2 C 19 C4 B 21 3 6 6 5Table 3: Median values for (1) task performance:

Grades de-pendent on the increase of

Relevance ; (2) self-report:

Com-prehension (1 - 5),

Fairness , Accuracy and

Trust (1-7). M ean I n c r ea s e i n C o m p l e t ene ss Condition

Baseline C1 C2 C3 C4

Figure 6: Boxplot of the mean increase of completeness percondition.

In the following, we show the results of our analysis for eachhypothesis by using the Kruskal-Wallis test for ordinal data andANOVA for numerical data. We report the results of the algorithmawareness measures with Spearman correlation tests. Finally, weprovide findings of our qualitative analysis of participant state-ments. H : Using Recoin does not lead to significantly higher relevance interms of data completeness. We reject this hypothesis. An increaseof comparative relevance for the Hadfield item is highly dependenton using Recoin at least once ( p rel , rec < . p rel,numUse = 0.02). This shows that Recoin is highly efficient, asadding a majority of the ten recommended properties should leadto the highest increase in relevance. H : The interface design of Recoin impacts Recoin usage. Eventhough the redesign (C4) slightly outperformed the other conditionsin terms of the goal of the set task, we could not find any significantdifference for the number of additions made via Recoin betweenC1, C2, C3, or C4 ( p = . H : Textual explanation of the algorithmic logic leads to highercomprehension than the interactive redesign. We could not find sta-tistically significant differences between ratings of comprehensionbetween condition C4 (RX) and C5 (RIX) ( p comp,con = 0.98). There-fore, this hypothesis is not confirmed. H : The correlation of self-report measures for textual explanationsolutions will equally hold for testing the interactive solution. Wehad to reject this hypothesis as well. Reacting to the large variance

Condition Base-line C1 C2 C3 C4Cronbach’s α Table 4: Cronbach’s α for questionnaire measures across allconditions. actor Comp. Fair. Acc. TrustComp. - 0.19 0.15 ∗ Fair. ∗ ∗ Acc. ∗ - 0.16 Trust 0.33 ∗ ∗ Table 5: Spearman correlation coefficients and p-valuesfor C2-C4 for self-report measures

Comp.=Comprehension , Fair.=Fairness , Acc.=Accuracy and

Trust with ∗ for p < . . −1−0.8−0.6−0.4−0.200.20.40.60.81 Acc. Trust Comp. Fair.Acc.TrustComp .Fair . −0.220.020.04 −0.220.57*0.38 0.020.57*0.61** 0.040.380.61** −1−0.8−0.6−0.4−0.200.20.40.60.81 A cc . T r u s t C o m p . F a i r . Acc.TrustComp.Fair. −0.220.020.04 −0.220.570.38 0.020.570.61 0.040.380.61

Figure 7: Spearman correlation coefficient matrix for C4with ∗ for p < . ; ∗∗ for p < . . in C4 (cp. Figure 6), we tested the validity of the questionnairemeasures. As opposed to previous research [21], the self-reportedmeasures are not correlated, instead they differ significantly (cp.Table 4). This especially concerns C4, our redesign (RIX), wherevariance was very high (Cronbach’s α = . r t,f = 0.65, p < 0.01 ), as well as between those wherein Recoin wasintroduced during onboarding (C2, C3, C4) ( r t,f = 0.60, p < 0.01 ) (cf.table 5).The predominant relationship of trust and fairness was reaf-firmed for C2 and C3, the original design and the additional textualexplanation respectively, as the strongest and most significant rela-tionship (C2: r t,f = 0.73, p < 0.01 ; C3: r t,f = 0.63, p < 0.01 ).A further relationship of fairness and accuracy was found for C2and C3, as a medium positive relationship (C2: r f,a = 0.52, p = 0.01 ;C3: r f,a = 0.53, p = 0.01 ).When participants used the interactive design of Recoin (RIX inC4), two different relationships emerged (cp. Figure 7). We foundthat the relationship between comprehension and fairness wasstrongest ( r c,f = 0.61, p < 0.01 ), closely followed by the relationshipcomprehension and trust ( r c,t = 0.57, p = 0.01 ). Surprisingly, thestrong relationships found across all conditions were not presentfor the interactive design (cp. Figure 7). Due to the high variance encoun-tered in C4, and the unexpected lack of correlation between ourself-report measures, we also expanded our analysis to participantstatements. Accordingly, we sampled participant statements in or-der to probe specific subjective viewpoints. In this section, weshowcase some preliminary insights.

Base Trust in Open Knowledge Platforms.

A recurring theme,when participants chose to expand on their rating of trust, wasa certain base trust in open knowledge platform. This occurredeven when no explanation element was provided, and also whenparticipants received a poor grade in the condition that did notfeature Recoin (Baseline): “Considering there was not a good definition ofhow we would be judged, it is tough to know ifthe judging was actually fair or unfair. However,I tend to trust Wikipedia so Wikidata is probablytrustworthy.”

Baseline-P18; graded D This base trust was also extended to the algorithm specifically,as long as it abides by platform standards: “I assume that an algorithm is used to grade thetask, in which case I assume that it’s free of bias,which is why I do trust Wikidata a good dealwhen it comes to fairness. Provided, the algorithmitself works as it’s supposed to.”

C2-P15; rated

Trust at High task efficiency may not indicate algorithm awareness.

Thequalitative data also suggests that task efficiency in terms of the al-gorithm does not necessarily indicate algorithm awareness. On thecontrary, the only participant that offered a fundamentally accurateaccount of Recoin received the second lowest grade possible: “My only theory is that it’s graded based on therelevance of entries made in regards to his occu-pation (astronaut) while most of my entries con-cerned his family, his awards and etc, rather thanhis activity as an astronaut.”

C2-P15; graded D The commentary of a well-performing participant (graded B )furthermore suggests that there may be a difference in understand-ing algorithmic logic and understanding the integration into thealgorithmic system: “It seems odd that I would be the one putting inthe data and it is grading me considering whycouldn’t it just put the data in itself if it is accu-rate enough to grade.” C1-P17; rated

Accuracy at Finally, and in a similar fashion, a participant formulated the keyquestion they had to the algorithmic system as follows: “I understand that the relevance is graded, I’mnot sure exactly how relevance is judged.”

C2-P2;rated

Comprehension at In summary, the unexpectedly high variance in the C4 condition,combined with the difference in correlative relationships acrossconditions, as well as our qualitative data, allow us to gather rel-evant insights for further research. In the next section, we will iscuss limitations to our experiment, before concluding with thecontributions as well as the implications for future work. First, we found no significant differences between the conditionsin terms of average increases in completeness. However, this alsosuggests that the solution of textual explanation found in relatedwork is not an inherently clear choice for algorithm awareness.This indicates that the design decisions for algorithm awarenessare still methodologically unrefined.Additionally, we sought to understand if our alternative to tex-tual explanation, one taking an interactive and non-declarativeapproach, could be measured according to the existing self-reportmeasures as suggested by previous research [21]. We found thatthe measures ”Comprehension”, ”Fairness”, ”Accuracy” and ”Trust”were not equally distributed across our experimental conditions.On the contrary, divergent correlative relationships emerged. Thestatus quo design (R1) as well as the addition of textual explanationdesign (RX) featured the same strong relationships of trust and fair-ness as well as fairness and accuracy. In contrast, our redesign (RIX)did not exhibit these relationships, but rather suggested that com-prehension was most influential. This was shown in the mediumto strong correlation between comprehension and fairness as wellas comprehension and trust. We therefore posit that expandingon these self-report measures for algorithm awareness is another,distinct area requiring further research.Moreover, the qualitative data we gathered also included insight-ful statements made by our participants. The phenomenon of basetrust that we encountered in participant statements is relevant forfuture algorithm awareness studies. If verified, it needs to be takeninto account in cases where researchers may wish to abstract fromplatforms to look at specific problems.In a broader context, experiments on transparency in algorith-mic systems, especially in recommender systems, are frequentlyundertaken in order to minimize or even eliminate bias. However,as also found by Ekstrand and Tian in experimenting with vari-ous recommendation algorithms [9], a complete solution to theproblem of bias is improbable. That is to say: bias is inevitable, andis a result of humans and technology interacting. This position isechoed in the work of the philosopher of technology Verbeek, whoargues that technology fundamentally mediates human relations toa particular ”world”, i.e., groups of other humans, values, practicesetc. [38]. Biases, especially those commonly not aware of, play aninstrumental role here. The solution, then, may not be finding thebest measure for an elimination of bias, but rather finding the mostactionable measure for making bias transparent. Our experimen-tal results align with this assertion insofar that participants hadissues with understanding the algorithmic system not on the basisof whether or not something is correctly calculated, but rather whoor what has the agency for judging the result (e.g., the platformitself, the algorithm as a contained unit, peer review etc.). This,along with a lack of significant differences between conditions,indicates that our intuition to design for an interactive mediationof the community-driven basis for Recoin was useful. Therefore,we posit that promoting algorithm awareness by interactivity is apromising research area. Our study has a number of limitations that should be considered.As opposed to other work (e.g. [15]), our research focuses on non-technical experts. Furthermore, by recruiting our study participantsover MTurk, it can be easily asserted that the demographics of theplatform predispose the experiment to cultural bias. Additionally,online experiments in general are limited in two ways. On the onehand, observation of the subtleties of human-technology relationsis not possible, such as the non-linguistic ways in which interactionexpresses itself and decision-making occurs. On the other, by usingMTurk we did not study Wikidata editors, but novices which mighthave never before come into contact with Wikidata. This meansthat, while we certainly could infer insights on algorithm awarenessand human-technology relations, studying the lived practice ofWikidata editors may reveal other or even contradictory results.

Our research was motivated by a wish to deepen our understandingof existing design parameters for algorithm awareness. We usedthe recommender system Recoin, employed in the online peer pro-duction system Wikidata, as a use case for our online experiments.In five different conditions, we provided the study participantswith a varying degree of explanations and interactivity while usingthe recommender system. We were able to gather experimentaldata on the effect of various algorithm aware design measures,and to reflect on the validity of measures used in related work.However, our experiments alone are not yet exhaustive enoughfor us to reason more substantially about what human awarenessmeans when algorithms are involved. Partly, this is due to the lackof longitudinal, qualitative data gathered from extensive and sus-tained use of Recoin. The participants of our experiments werepredominantly unaware of Wikidata, and the task itself was bothbrief and controlled in terms of the knowledge that was provided.Wikidata lives and breathes from enthusiasts and domain expertsthat contribute extensively in their areas of interest. Thus, in futurework, we seek to conduct studies that complement these resultsby probing individual and subjective use over time. This will allowus to understand more deeply, for example, how algorithm awaredesigns impact the relation between the Wikidata editors and theplatform. From such studies, we plan to expand our framework toother use cases. Thereby, we hope to contribute to the urgent needfor understanding how the increasingly ubiquitous algorithmicsystems shape everyday life for and from the Web.

REFERENCES [1] Albin Ahmeti, Simon Razniewski, and Axel Polleres. 2017. Assessing the com-pleteness of entities in knowledge bases. In

ESWC . 7–11.[2] Ricardo Baeza-Yates. 2018. Bias on the Web.

Commun. ACM

Wiki workshop @ The Web conference . 1787–1792.[4] Yochai Benkler. 2002. Coase’s Penguin, or Linux and the Nature of the Firm.

TheYale Law Journal (2002), 369–446.[5] Yochai Benkler and Helen Nissenbaum. 2006. Commons-based Peer Productionand Virtue.

The Journal of Political Philosophy

14, 4 (2006), 394–419.[6] Sebastian Burgstaller-Muehlbacher, Andra Waagmeester, Elvira Mitraka, JuliaTurner, Tim Putman, Justin Leong, Chinmay Naik, Paul Pavlidis, Lynn Schriml,Benjamin M Good, et al. 2016. Wikidata as a semantic framework for the GeneWiki initiative.

Database nternational conference on Intelligent user interfaces . 32–41.[9] Michael D. Ekstrand, Mucun Tian, Ion Madrazo Azpiazu, Jennifer D. Ekstrand,Oghenemaro Anuyah, David McNeill, and Maria Soledad Pera. 2018. All The CoolKids, How Do They Fit In?: Popularity and Demographic Biases in RecommenderEvaluation and Effectiveness. In . 172–186.[10] Motahhare Eslami. 2017. Understanding and Designing Around Users’ Interactionwith Hidden Algorithms in Sociotechnical Systems. In Conference on ComputerSupported Cooperative Work and Social Computing (CSCW) . 57–60.[11] Motahhare Eslami, Amirhossein Aleyasen, Karrie Karahalios, Kevin Hamilton,and Christian Sandvig. 2015. FeedVis: A Path for Exploring News Feed CurationAlgorithms. In . 65–68.[12] Motahhare Eslami, Sneha R. Krishna Kumaran, Christian Sandvig, and KarrieKarahalios. 2018. Communicating Algorithmic Process in Online BehavioralAdvertising. In

Conference on Human Factors in Computing Systems (CHI) .[13] Daniel Fleder and Kartik Hosanagar. 2009. Blockbuster culture’s next rise or fall:The impact of recommender systems on sales diversity.

Management science

ACM conference on Electronic commerce . 192–199.[15] David Gunning. 2016. Explainable Artificial Intelligence. Defense AdvancedResearch Projects Agency (DARPA).[16] Aaron Halfaker, R Stuart Geiger, Jonathan T Morgan, and John Riedl. 2012. TheRise and Decline of an Open Collaboration System: How Wikipedia’s Reactionto Popularity Is Causing Its Decline.

American Behavioral Scientist (2012).[17] Aaron Halfaker, R Stuart Geiger, and Loren G Terveen. 2014.

Snuggle: designingfor efficient socialization and ideological critique . 311–320 pages.[18] Kevin Hamilton, Karrie Karahalios, Christian Sandvig, and Motahhare Eslami.2014. A Path to Understanding the Effects of Algorithm Awareness. In

ExtendedAbstracts on Human Factors in Computing Systems (CHI) . 631–642.[19] Xiangnan He, Tao Chen, Min-Yen Kan, and Xiao Chen. 2015. Trirank: Review-aware explainable recommendation by modeling aspects. In

CIKM . 1661–1670.[20] Lucie-Aimée Kaffee, Alessandro Piscopo, Pavlos Vougiouklis, Elena Simperl,Leslie Carr, and Lydia Pintscher. 2017. A Glimpse into Babel: An Analysis of Mul-tilinguality in Wikidata. In . 1–5.[21] René F. Kizilcec. 2016. How Much Information?: Effects of Transparency onTrust in an Algorithmic Interface. In

Conference on Human Factors in ComputingSystems (CHI) . 2390–2395.[22] Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M. Henne. 2009.Controlled experiments on the web: survey and practical guide.

Data Mining andKnowledge Discovery

18, 1 (2009), 140–181.[23] Till Kohli, Renata Barreto, and Joshua A. Kroll. 2018. Translation Tutorial: AShared Lexicon for Research and Practice in Human-Centered Software Systems.In . 1–7.[24] Cliff Kuang. 2017. The Next Great Design Challenge: Make AI ComprehensibleTo Humans.[25] Lyons, Kolina S. Koltai, Nhut T. Ho, Walter B. Johnson, David E. Smith, and R. JayShively. 2016. Engineering Trust in Complex Automated Systems.

Ergonomics inDesign (2016), 13–17. [26] Astrid Mager. 2018. Internet governance as joint effort: (Re)ordering searchengines at the intersection of global and local cultures.

New Media & Society (2018).[27] Claudia Müeller-Birn, Leonhard Dobusch, and James D. Herbsleb. 2013. Work-to-Rule: The Emergence of Algorithmic Governance in Wikipedia. In

C& .[28] Cathy O’Neil. 2016.

Weapons of Math Destruction: How Big Data Increases In-equality and Threatens Democracy . Crown/Archetype.[29] Richard Phillips, Kyu Hyun Chang, and Sorelle A. Friedler. 2018. InterpretableActive Learning. In . 49–61.[30] Emilee Rader, Kelley Cotter, and Janghee Cho. 2018. Explanations As Mechanismsfor Supporting Algorithmic Transparency. In

Conference on Human Factors inComputing Systems (CHI) . 1–13.[31] Simon Razniewski, Vevake Balaraman, and Werner Nutt. 2017. Doctoral advisor ormedical condition: Towards entity-specific rankings of knowledge base properties.In

International Conference on Advanced Data Mining and Applications . 526–540.[32] Johan Redström and Heather Wiltse. 2015. Press Play: Acts of defining (in) fluidassemblages.

Nordes

1, 6 (2015).[33] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should ITrust You?": Explaining the Predictions of Any Classifier. (2016). http://arxiv.org/abs/1602.04938[34] Amir Sarabadani, Aaron Halfaker, and Dario Taraborelli. 2017. Building Auto-mated Vandalism Detection Tools for Wikidata. In

International Conference onWorld Wide Web (WWW) .[35] Andrew Selbst and Julia Powles. 2018. ”Meaningful Information” and the Rightto Explanation. In ,Vol. 81.[36] Phoebe Sengers, Kirsten Boehner, Shay David, and Joseph ’Jofish’ Kaye. 2005.Reflective Design. In . 49–58.[37] Katherine Thornton, Euan Cochrane, Thomas Ledoux, Bertrand Caron, and CarlWilson. 2017. Modeling the Domain of Digital Preservation in Wikidata. iPres (2017).[38] Peter-Paul Verbeek. 2006.

What things do: philosophical reflections on technology,agency, and design . Penn State Press, University Park.[39] Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A Free CollaborativeKnowledgebase.

Commun. ACM (2014).[40] Heather Wiltse, Erik Stolterman, and Johan Redström. 2015. Wicked Interactions:(On the Necessity of) Reframing the ’Computer’ in Philosophy and Design.

Techné: Research in Philosophy and Technology

19, 1 (2015), 26–49.[41] Ellery Wulczyn, Robert West, Leila Zia, and Jure Leskovec. 2016. Growingwikipedia across languages via recommendation. In

Proceedings of the 25th Inter-national Conference on World Wide Web . 975–985.[42] Cosmas Zavazava, Rati Skhirtladze, Fredrik Eriksson, Esperanza Magpantay,Lourdes Montenegro, Daniel Pokorna, Martin Schaaper, Ivan Vallejo, and DavidSouter. 2017.

Measuring the Information Society Report . Technical Report 1.International Telecommunication Union. 170 pages.[43] Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and ShaopingMa. 2014. Explicit factor models for explainable recommendation based onphrase-level sentiment analysis. In

SIGIR . 83–92.. 83–92.