[PDF] Auditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation

Abstract

There is a growing concern that e-commerce platforms are amplifying vaccine-misinformation. To investigate, we conduct two-sets of algorithmic audits for vaccine misinformation on the search and recommendation algorithms of Amazon -- world's leading e-retailer. First, we systematically audit search-results belonging to vaccine-related search-queries without logging into the platform -- unpersonalized audits. We find 10.47% of search-results promote misinformative health products. We also observe ranking-bias, with Amazon ranking misinformative search-results higher than debunking search-results. Next, we analyze the effects of personalization due to account-history, where history is built progressively by performing various real-world user-actions, such as clicking a product. We find evidence of filter-bubble effect in Amazon's recommendations; accounts performing actions on misinformative products are presented with more misinformation compared to accounts performing actions on neutral and debunking products. Interestingly, once user clicks on a misinformative product, homepage recommendations become more contaminated compared to when user shows an intention to buy that product.

Full PDF

TTo cite: Prerna Juneja and Tanushree Mitra. 2021. Auditing E-Commerce Platforms for Algorithmically CuratedVaccine Misinformation. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI’21). Association for Computing Machinery. DOI: https://doi.org/10.1145/3411764.3445250

Auditing E-Commerce Platforms for Algorithmically CuratedVaccine Misinformation

Prerna Juneja

The Information SchoolUniversity of WashingtonSeattle, WA, [email protected]

Tanushree Mitra

The Information SchoolUniversity of WashingtonSeattle, [email protected]

ABSTRACT

There is a growing concern that e-commerce platforms are amplify-ing vaccine-misinformation. To investigate, we conduct two-setsof algorithmic audits for vaccine misinformation on the searchand recommendation algorithms of Amazon—world’s leading e-retailer. First, we systematically audit search-results belonging tovaccine-related search-queries without logging into the platform—unpersonalized audits. We find 10.47% of search-results promotemisinformative health products. We also observe ranking-bias, withAmazon ranking misinformative search-results higher than debunk-ing search-results. Next, we analyze the effects of personalizationdue to account-history, where history is built progressively by per-forming various real-world user-actions, such as clicking a product.We find evidence of filter-bubble effect in Amazon’s recommenda-tions; accounts performing actions on misinformative products arepresented with more misinformation compared to accounts per-forming actions on neutral and debunking products. Interestingly,once user clicks on a misinformative product, homepage recom-mendations become more contaminated compared to when usershows an intention to buy that product.

CCS CONCEPTS • Information systems → Personalization ; Content ranking ; Web crawling ; •

Human-centered computing → Human com-puter interaction (HCI) . KEYWORDS search engines, health misinformation, vaccine misinformation,algorithmic bias, personalization, algorithmic audits, search results,recommendations, e-commerce platforms

ACM Reference Format:

Prerna Juneja and Tanushree Mitra. 2021. Auditing E-Commerce Platformsfor Algorithmically Curated Vaccine Misinformation. In

CHI Conference onHuman Factors in Computing Systems (CHI ’21), May 8–13, 2021, Yokohama,Japan.

ACM, New York, NY, USA, 27 pages. https://doi.org/10.1145/3411764.3445250

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

CHI ’21, May 8–13, 2021, Yokohama, Japan © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8096-6/21/05...$15.00https://doi.org/10.1145/3411764.3445250

The recent onset of coronavirus pandemic has unleashed a barrageof online health misinformation [4, 22] and renewed focus on theanti-vaccine movement, with anti-vax social media accounts wit-nessing a 19% increase in their follower base [49]. As scientists worktowards creating a vaccine for the disease, health experts worry thatvaccine hesitancy could make it difficult to achieve herd immunityagainst the new virus [3]. Battling health misinformation, especiallyanti-vaccine misinformation has never been more important.Statistics show that people increasingly rely on the internet [53],and specifically online search engines [8], for health informationincluding information about medical treatments, immunizations,vaccinations and vaccine-related side effects [6, 23]. Yet, the algor-ithms powering search engines are not traditionally designed totake into account the credibility and trustworthiness of such inform-ation. Search platforms being the primary gateway and reportedlythe most trusted source [19], persistent vaccine misinformationon them, can cause serious health ramifications [38]. Thus, therehas been a growing interest in empirically investigating searchengine results for health misinformation. While multiple studieshave performed audits on commercial search engines to investigateproblematic behaviour [35, 36, 56], e-commerce platforms havereceived little to no attention ([11, 59] are two exceptions), despitecritics calling e-commerce platforms, like Amazon, a “dystopian”store for hosting anti-vaccine books [17]. Amazon specifically hasfaced criticism from several technology critics for not regulatinghealth-related products on its platform [5, 55]. Consider the mostrecent instance. Several medically unverified products for corona-virus treatment, like prayer healing, herbal treatments and antiviralvitamin supplements proliferated Amazon [18, 28], so much so thatthe company had to remove 1 million fake products after severalinstances of such treatments were reported by the media [22]. Thescale of the problematic content suggests that Amazon could be agreat enabler of misinformation, especially health misinformation.It not only hosts problematic health-related content but its reco-mmendation algorithms drive engagement by pushing potentiallydubious health products to users of the system [27, 59]. Thus, in thispaper we investigate Amazon—world’s leading e-retailer—for mostcritical form of health misinformation—vaccine misinformation.What is the amount of misinformation present in Amazon’ssearch results and recommendations? How does personalizationdue to user history built progressively by performing real-worlduser actions, such as clicking or browsing certain products, impactthe amount of misinformation returned in subsequent search resultsand recommendations? In this paper, we dabble into these questions.We conduct 2 sets of systematic audit experiments:

Unpersonalized a r X i v : . [ c s . H C ] J a n HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra audit and

Personalized audit . In the

Unpersonalized audit , we adoptInformation Retrieval metrics from prior work [42] to determinethe amount of health misinformation users are exposed to whensearching for vaccine-related queries. In particular, we examinesearch-results of 48 search queries belonging to 10 popular vaccine-related topics like ‘hpv vaccine’, ‘immunization’, ‘MMR vaccineand autism’, etc. We collect search results without logging in toAmazon to eliminate the influence of personalization. To gain in-depth insights about the platform’s searching and sorting algorithm,our

Unpersonalized audits ran for 15 consecutive days, sorting thesearch results across 5 different Amazon filters each day: “featured”,“price low to high”, “price high to low”, “average customer review”and “newest arrivals”. The first audit resulted in 36,000 search re-sults and 16,815 product page recommendations which we laterannotated for their stance on health misinformation—promoting,neutral or debunking.In our second set of audit—

Personalized audit , we determine theimpact of personalization due to user history on the amount ofhealth misinformation returned in search results, recommenda-tions and auto-complete suggestions. User history is built progres-sively over 7 days by performing several real-world actions, such as“search” , “search + click” + , “search + click + add to cart” + + , “search + click + mark top-rated all positive reviewas helpful” + + , “follow contributor” and “search onthird party website” ( Google.com in our case) . We collectseveral Amazon components in our Personalized audit , like home-pages, product pages, pre-purchase pages, search results, etc. Ouraudits reveal that Amazon hosts a plethora of health misinformativeproducts belonging to several categories, including Books, KindleeBooks, Amazon Fashion (e.g. apparel, t-shirt, etc.) and Health &Personal care items (e.g. dietary supplements). We also establish thepresence of a filter-bubble effect in Amazon’s recommendations,where recommendations of misinformative health products containmore health misinformation.Below we present our formal research questions, key findings,contributions and implication of this study along with ethical consi-derations taken for conducting platform audits.

In our first set of audits, we ask,

RQ1 [

Unpersonalized audit ]: What is the amount of healthmisinformation returned in various Amazon components,given components are not affected by user personalization?

RQ1a: How much are the Amazon’s search results contaminatedwith misinformation?RQ1b: How much are recommendations contaminated with mis-information? Is there a filter-bubble effect in recommendations?We find a higher percentage of products promoting health misin-formation (10.47%) compared to products that debunk misinforma-tion (8.99%) in the unpersonalized search results. We discover thatAmazon returns high number of misinformative search resultswhen users sort their searches by filter “featured” and high numberof debunking results when they sort results by filter “newest ar-rivals”. We also find Amazon ranking misinformative results higherthan debunking results especially when results are sorted by filters“average customer reviews” and “price low to high”. Overall, search results of topics “vaccination”, “andrew wakefield” and “hpv vaccine”contain the highest misinformation bias when sorted by defaultfilter “featured”. Our analysis of product page recommendationssuggests that recommendations of products promoting health mis-information contain more health misinformation when comparedto recommendations of neutral and debunking products.

RQ2 [

Personalized audit ]: What is the effect of personaliza-tion due to user history on the amount of health misinforma-tion returned in various Amazon components, where userhistory is built progressively by performing certain actions?

RQ2a: How are search results affected by various user actions?RQ2b: How are recommendations affected by various user actions?Is there a filter-bubble effect in the recommendations?RQ2c: How are the auto-complete suggestions affected by varioususer actions?Our

Personalized audit reveals that search results sorted by fil-ters “average customer review”, “price low to high” and “newestarrivals” along with auto-complete suggestions are not personal-ized. Additionally, we find that user actions involving clicking asearch product leads to personalized homepages. We find evidenceof filter-bubble effect in various recommendations found in home-pages, product and pre-purchase pages. Surprisingly, the amountof misinformation present in homepages of accounts building theirhistory by performing actions “search + click” and “mark top-ratedall positive review as helpful” on misinformative products was morethan the amount of misinformation present in homepages of ac-counts that added the same misinformative products in cart. Thefinding suggests that Amazon nudges users more towards misin-formation once a user shows interest in a misinformative productby clicking on it but hasn’t shown any intention of purchasing it.Overall, our audits suggest that Amazon has a severe vaccine/healthmisinformation problem exacerbated by its search and recommen-dation algorithms. Yet, the platform has not taken any steps toaddress this issue.

In the absence of an online regulatory body monitoring the qualityof content created, sold and shared, vaccine misinformation is ram-pant on online platforms. Through our work, we specifically bringthe focus on e-commerce platforms since they have the power toinfluence browsing as well as buying habits of millions of people.We believe our study is the first large-scale systematic audit of ane-commerce platform that investigates the role of its algorithmsin surfacing and amplifying vaccine misinformation. Our workprovides an elaborate understanding of how Amazon’s algorithmis introducing misinformation bias in product selection stage andranking of search results across 5 Amazon filters for 10 impactfulvaccine-related topics. We find that even use of different searchfilters on Amazon can dictate what kind of content a user can beexposed to. For example, use of default filter “featured” lead usersto more health misinformation while sorting search results by filter“newest arrivals” lead users to products debunking health-relatedmisinformation. Ours is also the first study to empirically establishhow certain real-world actions on health misinformative productson Amazon could drive users into problematic echo chambers ofhealth misinformation. Both our audit experiments resulted in a uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan dataset of 4,997 unique Amazon products distributed across 48search queries, 5 search filters, 15 recommendation types, and 6user actions, conducted over 22 (15+7) days . Our findings suggestthat traditional recommendation algorithms should not be blindlyapplied to all topics equally. There is an urgent need for Amazonto treat vaccine related searches as searches of higher importanceand ensure higher quality content for them. Finally, our findingsalso have several design implications that we discuss in detail inSection 7.4. We took several steps to minimize the potential harm of our exp-eriments to retailers. For example, buying and later returning anAmazon product for the purpose of our project can be deemedunethical and thus, we avoid performing this activity. Similarly,writing a fake positive review about an Amazon product containingmisinformation could negatively influence the audience. Therefore,in our

Personalized audit we explored other alternatives that couldmimic similar if not the same influence as the aforementioned ac-tivities. For example, instead of buying a product, we performed"add to cart" action that shows users’ intent to purchase a prod-uct. Instead of writing positive reviews for products, we markedtop rated positive review as helpful. Since, accounts did not haveany purchase history, marking a review helpful did not increasethe “Helpful” count for that review. Through this activity, the ac-count shows positive reaction towards the product, at the sametime avoids manipulation and thus, eliminates impacting potentialbuyers or users. Lastly, we refrained from performing the experi-ments on real-world users. Performing actions on misinformativeproducts could contaminate users’ searches and recommendations.It could potentially have long-term consequences in terms of whattypes of products are pushed at participants. Thus, in our auditexperiments, accounts were managed by bots that emulated theactions of actual users.

The current research on online health misinformation includingvaccine misinformation spans three broad themes: 1) quantify-ing the characteristics of anti-vaccine discourse [12, 45, 47], 2)building machine learning models to identify users engaging withhealth misinformation or instances of health misinformation itself[13, 24, 25] and 3) designing and evaluating effective interventionsto ensure that users critically think when presented with health(mis)information [40, 63]. Most of these studies are post-hoc inv-estigations of health misinformation, i.e the misinformation hasalready propagated. Moreover, existing scholarship rarely takesinto account how the user encountered health misinformation orwhat role is played by the source of the misinformation. With theincreasing reliance on online sources for health information, searchengines have become the primary avenue of such information, with55% of American adults relying on the web to get medical informa-tion [53]. A Pew survey reports that for 5.9M people, web searchresults influenced their decision to visit a doctor and 14.7M claimed https://social-comp.github.io/AmazonAudit-data/ that online information affected their decision on how to treat adisease [53]. Given how medical information can directly influenceone’s health and well-being, it is essential that search engines re-turn quality results in response to health related search queries.However, currently online health information has been contami-nated by several outlets. These sources could be conspiracy groupsor websites spreading misinformation due to vested interests orcompanies having commercial interests in selling herbal cures orfictitious medical treatments [58]. Moreover, online curation algo-rithms themselves are not built to take into account the credibilityof information. Thus, it is of paramount importance that the roleof search engines are investigated for harvesting health misinfor-mation. How can we empirically and systematically probe searchengines to investigate problematic behaviour like prevalence ofhealth misinformation? In the next section, we briefly describe theemerging research field of “algorithmic auditing” that is focused oninvestigating search engines to reveal problematic biases. We dis-cuss this field as well as our contribution to this growing researchspace in the next section. Search engines are modern day gatekeepers and curators of infor-mation. Their black-box algorithm can shape user behaviour, alterbeliefs and even affect voting behaviour either by impeding or facil-itating the flow of certain kinds of information [16, 20, 41]. Despitetheir importance and the power they exert, till date, search engineresults and recommendations have mostly been unregulated. Infor-mation quality of search engine’s output is still measured in termsof relevance and it is up to the user to determine the credibility ofinformation. Thus, researchers have advocated for making algo-rithms more accountable. One primary method to achieve this is toperform systematic audits to empirically establish the conditionsunder which problematic behavior surfaces. Raji et al provide thefollowing definition of algorithmic audits.

An algorithmic audit in-volves the collection and analysis of outcomes from a fixed algorithmor defined model within a system. Through the stimulation of a mockuser population, these audits can uncover problematic patterns inmodels of interest [54].Previous audit studies have investigated the search enginesfor partisan bias [48, 56], gender bias [10, 39], content diversity[52, 61, 62], and price discrimination [33]. However, only a few havesystematically investigated search engines’ role in surfacing misin-formation ([36] is the only exception). Moreover, there is a dearthof systematic audits focusing specifically on health misinformation.The past literature, mostly consists of small-scale experiments thatprobe search engines with a handful of search queries. For example,an analysis of the first 30 pages of search results for query “vac-cines autism” revealed that Google.com has 10% less anti-vaccinesearch results compared to the other search engines, like Qwant,Swisscows and Bing [26]. Whereas, search results present in thefirst 102 pages for the query “autism vaccine” on Google’s Turkeyversion returned 20% websites with incorrect information [21]. Onerecently published work, closely related to this study, examinedAmazon’s first 10 pages of search results in response to the query“vaccine”. They only collected and annotated books appearing inthe searches for misinformation [59]. The aforementioned studies

HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra

Recommended items other customers often buy againRelated to items you've viewedInspired by your shopping trends (a)

Customers who shopped for Dissolving Illusions: Disease, Vaccines, and the Forgotten... also shopped for:Frequently bought with Dissolving Illusions: Disease, Vaccines, and the Forgotten...Customers also bought these highly rated itemsRelated to items you've viewed (b)

Frequently bought togetherWhat other items do customers buy after viewing this itemCustomers who viewed this item also viewedSponsored products related to this itemCustomers who bought this item also bought (c)

Figure 1: (a) Amazon homepage recommendations. (b) Pre-purchase recommendations displayed to users after adding a prod-uct to cart. (c) Product page recommendations. probed the search engine for one single query and did the analysison multiple search results pages. We, on the other hand, performour

Unpersonalized audit on a curated list of 48 search queriesbelonging to 10 most searched vaccine-related topics, spanningvarious combinations of search filters and recommendation types,over multiple days—an aspect missing in prior work. Additionally,we are the first ones to experimentally quantify the prevalenceof misinformation in various search queries, topics, and filters onan e-commerce platform. Furthermore, instead of just focusing onbooks, we analyze the platform for products belonging to differentcategories, resulting in an extensive all-category inclusive codingscheme for health misinformation.Another recent study on YouTube, audited the platform for var-ious misinformative topics including vaccine controversies [36].The work established the effect of personalization due to watchingvideos on the amount of misinformation present in search resultsand recommendations on YouTube. However, there are no stud-ies investigating the impact of personalization on misinformationpresent in the product search engines of e-commerce platforms. Ourwork fills this gap by conducting a second set of audit—

Personalizedaudit where we shortlist several real-world user actions and investi-gate their role in amplifying misinformation in Amazon’s searchesand recommendations.

For the audits, we collected 3 major Amazon components and nu-merous sub-components. We list them below.(1)

Search results:

These are products present on Amazon’sSearch Engine Results Page (SERP) returned in response toa search query. SERP results can be sorted using five filters:

Recommend-ation page Recommendation types

Homepage Related to items you’ve viewedInspired by your shopping trends”Recommended items other customers often buy againPre-purchasepage Customers also bought these highly rated itemsCustomers also shopped these itemsRelated to items you’ve viewedFrequently bought togetherRelated to itemsSponsored products relatedTop picks forProduct page Frequently bought togetherCustomers who bought this item also boughtCustomers who viewed this item also viewedSponsored products related to this itemWhat other items customers buy after viewing this item

Table 1: Table showing 15 recommendation types spreadacross 3 recommendation pages. “featured”, “price low to high,” “price high to low,” “averagecustomer review” and “newest arrivals.”(2)

Auto-complete suggestions:

These are the popular andtrending search queries suggested by Amazon when a queryis typed into the search box (see Figure 2c).(3)

Recommendations:

Amazon presents several recommen-dations as users navigate through the platform. For the pur-pose of this project, we collect recommendations presenton three different Amazon pages: homepage, pre-purchasepage and product pages. Each page hosts several types ofrecommendations. Table 1 shows the 15 recommendationtypes collected across 3 recommendation pages. We describeall three recommendations below.(a)

Homepage recommendations:

These recommendationsare present on the homepage of a user’s Amazon account.They could be of three types namely, “Related to itemsyou’ve viewed”, “Inspired by your shopping trends” and uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan

Figure 2: (a) Google Trends’ Related Topics list for topic vaccine. People who searched for vaccine topic also searched for thesetopics. (b) Google Trends’ Related queries list for topic vaccine. These are the top search queries searched by people related tovaccine topic. (c) Amazon’s auto-complete suggestions displaying popular and trending search queries. “Recommended items other customers often buy again”(see Figure 1a). Any of the three types together or sepa-rately could be present on the homepage depending onthe actions performed by the user. For example, “Inspiredby your shopping trends” recommendation type appearswhen a user performs one of two actions: either makes apurchase or adds a product to cart.(b)

Pre-purchase recommendations:

These recommenda-tions consist of product suggestions that are presented tousers after they add product(s) to cart. These recommenda-tions could be considered as a nudge to purchase othersimilar products. Figure 1b displays pre-purchase page.The page has several recommendations like “Frequentlybought together”, “Customers also bought these highlyrated items”, etc. We collectively call these recommenda-tions as pre-purchase recommendations.(c)

Product recommendations:

These are the recommen-dations present on the product page, also known as detailspage . The page contains details of an Amazon product,like product title, category (e.g., Amazon Fashion, Books,Health & Personal care, etc.), description, price, star rating,number of reviews, and other metadata. The details pageis home to several different types of recommendations. Weextracted five: “Frequently bought together”, “What otheritems customers buy after viewing this item”, “Customerswho viewed this item also viewed”, “Sponsored productsrelated to this item” and “Customers who bought this itemalso bought”. Figure 1c presents an example of productpage recommendations. Here we present our audit methodology in detail. This section is or-ganized as follows. We start by describing our approach to compilehigh impact vaccine related topics and associated search queries(section 4.1). Then, we present overview of each audit experiment https://sellercentral.amazon.com/gp/help/external/51 followed by the details of numerous methodological decisions wetook while designing our audits (section 4.2 and section 4.3). Next,we describe our qualitative coding scheme for annotating Amazonproducts for health misinformation (section 4.4). Finally, we discussour approach to calculate misinformation bias in search results(section 4.5). Here, we present our methodology to curate high impact vaccine-related topics and search queries.

The first step of any auditis to determine input—a viable set of topics and associated searchqueries that will be used to query the platform under investigation.We leveraged Google Trends (

Trends henceforth) to select and ex-pand vaccine-related search topics.

Trends is an optimal choice sinceit shares past search trends and popular queries searched by peopleacross the world. Since it is not practical to audit all topics presenton

Trends , we designed a method to curate a reasonable number ofhigh impact topics and associated search queries, i.e., topics thatwere searched by a large number of people for the longest periodof time. We started with 2 seed topics and employed a breadth-wisesearch to expand our topic list.

Trends allows to search for any subject matter either as a topic ora term . Intuitively, topic can be considered as a collection of termsthat share a common concept. Searching as a term returns resultsthat include terms present in the search query while searchingas a topic returns all search terms having same meaning as thetopic . We began our search with two seed words namely “vaccine”and “vaccine controversies” and decided to search them as topics .Starting our topic search by the aforementioned seed words ensuredthat the related topics will cover general vaccine-related topics andtopics related to controversies surrounding the vaccines, offeringus a holistic view of search interests. We set location to UnitedStates, date range to 2004-Present (this step was performed in Feb, https://support.google.com/trends/answer/4359550?hl=en HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra vaccine hpvvaccine zoster vaccine mmrvaccine hep bvaccinemeaslesvaccinein ﬂ uenza vaccine varicella vaccine vaccine controversiesvaccination andrew wake ﬁ eldvaccination schedule rabiesvaccine mmr vaccine and autismimmuniz-ationhep avaccine Figure 3: Figure illustrating the breadth-wise topic discovery approach used to collect vaccine-related topics from GoogleTrends starting from two seed topics: vaccine and vaccine controversies. Each node in the tree denotes a vaccine-related topic.An edge A → B indicates that topic B was discovered from the Trends’ Related Topic list of topic A. For example, topics “vac-cination” and “andrew wakefield” were obtained from the Trends’ Related Topic list of “vaccine controversies” topic. Then,topic “mmr vaccine and autism” was obtained from topic “andrew wakefield” and so on. indicates the topic was discardedduring filtering. Similar colored square brackets indicate similar topics that were merged together.

Search topic Seed query Sample searchqueries

Table 2: Sample search queries for each of the ten vaccine-related search topics.

Trends data is available from1/1/2004 onwards). We selected the category setting as “All” soas to get a holistic view of the search trends encompassing allthe categories together. Search service filter has options like ‘websearch’, ‘YouTube search’, ‘Google Shopping’, etc. Although Googleshopping is an e-commerce platform like Amazon, its selectionreturned handful to no results. Thus, we opted for ‘web search’service filter.We employed

Trends’

Related Topics feature for breadth-wiseexpansion of search topics (see Figure 2a). We viewed the RelatedTopics using “Top” filter which presents popular search topics inthe selected time range that are related to the topic searched. Wemanually went through the top 15 Related Topics and retainedrelevant topics using the following guidelines. All generic topicslike Infant, Travel, Side-Effects, Pregnancy CVS, etc. were discarded.Our focus was to only pick topics representing vaccine information.Thus, we discarded topics that were names of diseases but kept theircorresponding vaccines. For example, we discarded topic Influenzabut kept the topic Influenza vaccine. We kept track of duplicatesand discarded them from the search. To further expand the topicslist, we again went through the Related Topics list of the shortlistedtopics and used the aforementioned filtering strategy to shortlist relevant topics. This step allowed us to expand our topic list to areasonable number. After two levels of breadth-wise search, weobtained a list of 16 vaccine-related search topics (see Figure 3).Next, we combined multiple similar topics into a single topic.The idea is to collect search queries for both topics separately andthen combine them under one single topic. For example, topicszoster vaccine and varicella vaccine were combined since both thevaccines are used to prevent chickenpox. Thus, later search queriesof both topics were combined under topic varicella vaccine. Alltopics enclosed with similar colored boxes in Figure 3 were mergedtogether. 11 topics remained after merging.

After shortlisting a rea-sonable number of topics, next we determined the associated searchqueries per topic, to be later used for querying Amazon’s searchengine. To compile search queries, we relied on both

Trends andAmazon’s auto-complete suggestions;

Trends , because it gives alist of popular queries that people searched on Google—the mostpopular search service, and Amazon, because it is the platformunder investigation and it will provide popular trending queriesspecific to the platform.Searching for a topic on

Trends displays popular search queriesrelated to the topic (see Figure 2b). We obtained top 3 queries pertopic. Next, we collected Top 3 auto-complete suggestions obtainedby typing seed query of each topic into Amazon’s search box (see uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan select ﬁ lter saveSERP Parse SERP & extract searchresults' URLs repeat for allproduct URLsopen productpage & save it clear cookies& cache kill the incognito windowrepeat for 15 daysopenamazon.com& search forquery qNewly createdVM (no browsinghistory, clearedbrowser cache& cookies) open chromein incognitowindowrepeat for all 5 ﬁ lters featured, price low to high, price high to low, avg. customer review & newest arrivals Figure 4: Eight steps performed in

Unpersonalized audit . The steps are described in detail in Section 4.2.4

Figure 2c). We removed all animal or pet related search queries(e.g “rabies vaccine for dogs”), overly specific queries (e.g. “callousdisregard by andrew wakefield”) and replaced redundant and similarqueries with a single search query selected at random. For examplesearch queries “flu shots” and “flu shot” were replaced with a singlesearch query “flu shot”. After these filtering steps, only one queryremained in the query list of topic vaccination schedule, and thus,it was removed from the topic list. Finally, we had 48 search queriescorresponding to 10 vaccine-related search topics. Table 2 presentssample search queries for all 10 search topics.

The aim of the

Unpersonalized audit is to deter-mine the amount of misinformation present in Amazon’s searchresults and recommendations without the influence of personal-ization. We measure the amount of misinformation by determin-ing the misinformation bias of the returned results. We explainthe misinformation bias calculation in detail in Section 4.5. Intu-itively, more the number of higher ranked misinformative results,higher the overall bias. We ran the

Unpersonalized audit for 15 days,from 2 May, 2020 to 16 May, 2020. We took two important method-ological decisions regarding which components to audit and whatsources of noise to control for. We present these decisions as wellas implementation details of the audit experiment below.

We collected SERPs sorted by all 5 Amazon filters: “fea-tured”, “price low to high”, “price high to low”, “average customerreview” and “newest arrivals”. For analysis, we extracted the top 10search results from each SERP. Since 70% of Amazon users neverclick on search results beyond the first page [2], count 10 is a rea-sonable approximation of the number of search results users arelikely to engage with. Recent statistics have also shown that thefirst three search results receive 75% of all clicks [15]. Thus, weextracted the recommendations present on the product pages of thefirst three search results. We collected following 5 types of prod-uct page recommendations: “Frequently bought together”, “Whatother items customers buy after viewing this item”, “Customerswho viewed this item also viewed”, “Sponsored products related tothis item” and “Customers who bought this item also bought”. ReferFigure 1c for an example. We extracted the first product presentin each recommendation type for analysis. Next, we annotated allcollected components as promoting, neutral or debunking health misinformation. We describe our annotation scheme shortly inSection 4.4.

We controlled for potentialconfounding factors that may add noise to our audit measurements.To eliminate the effect of personalization, we ran the experiment onnewly created virtual machines (VM) and freshly installed browserwith empty browsing history, cookies and cache. Additionally, weran search queries from the same version of Google Chrome inincognito mode to ensure that no history is built during our auditruns. To avoid cookie tracking, we erased cookies and cache beforeand after opening the incognito window and destroyed the windowafter each search. In sum, we performed searches on newly createdincognito windows everyday. All VMs operated from same geolo-cation so that any effects due to location would affect all machinesequally. To prevent machine speeds from affecting the experiment,all VMs had the same architecture and configuration. To control fortemporal effect, we searched every single query at one particulartime everyday for consecutive 15 days. Prior studies have estab-lished the presence of carry-over effect in search engines, wherepreviously executed queries affect the results of the current querywhen both queries are issued subsequently within a small timeinterval [32]. Since, we destroyed browser windows and clearedsession cookies and cache after every single search, carry over effectdid not influence our experiment.

Figure 4 illustrates the eight stepsfor the

Unperonalized audit . We used Amazon Web Services (AWS)infrastructure to create all the VMs. We created selenium botsto automate web browser actions. As a first step, each day at aparticular time, the bot opened amazon.com in incognito window.Next, the bot searched for a single query, sorted the results by anAmazon filter and saved the SERPs. The bot then extracted the top10 URLs of the products present in the results. The sixth step is aniterative step where the bot iteratively opened the product URLsand saved the product pages. In the last two steps, the bot clearedthe browser cache and killed the browser window. We repeatedsteps 1 to 8 to collect search results sorted by all 5 Amazon filters.We added appropriate wait times after each step to prevent Amazonfrom detecting the account as a bot and blocking our experiment.We repeated these steps for 15 consecutive days for each of the48 search queries. After completion of the experiment, we parsedthe saved product pages to extract product metadata, like productcategory, contributors’ names (author, editor, etc.), star rating and

HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra

User action Type of history Tested values + Product search and click his-tory3 Search + click + add to cart + +

Intent to purchase history4 Search + click + mark “Top rated, All pos-itive review” helpful + +

Searching, clicking and mark-ing reviews helpful history5 Following contributor by clicking followbutton on contributor’s page Following history6 Search product on Google (third partyapplication) Third party search history

Table 3: List of user actions employed to build account history. Every action and product type (misinformative, neutral ordebunking) combination was performed on two accounts. One account sorted search results by filters “featured” and “averagecustomer review”. The other account built history in the same way but sorted the search results by filters “price low to high”and “newest arrivals”. Overall, we created 40 Amazon accounts (6 actions X 3 tested values X 2 replicates for filters + 2 controlaccounts + 2 twin accounts). number of ratings. We extracted product page recommendationsfor the top 3 search results only.

The goal of our Personalization Experiments istwofold. First, we assess whether user actions, such as clicking ona product, adding to cart would trigger personalization on Amazon.Second, and more importantly, we determine the impact of a user’saccount history on the amount of misinformation presented to themin the search results page, recommendations, and auto-completesuggestions; account history is built progressively by performinga particular action for seven consecutive days. We ran our

Per-sonalized audit from 12th to 18th August, 2020. We took severalmethodological decisions while designing this experimental setup.We discuss each of these decisions below.

Users’ click history and purchase history trigger personal-ization and influence the price of commodities on e-commercewebsites [33]. Account history also affects the amount of misinfor-mation present in the personalized results [36]. Informed by theresults of these studies, we selected six real-world user actions thatcould trigger personalization and thus, could potentially impact theamount of misinformation in search results and recommendations.The actions are (1) “search” (2) “search + click” + (3)“search + click + add to cart” + + (4) “search + click + marktop-rated all positive review as helpful” + + (5) “follow con-tributor” and (6) “search on third party website” (Google.comin our case) . Table 3 provides an overview. First two actionsinvolve searching for a product and/or clicking on it. Through thethird and fourth action, a user shows positive reaction towardsa product by adding it to cart and marking its top rated criticalreview as helpful respectively. Fifth action investigates the impactof following a contributor. For example, for a product in the Bookscategory, the associated list of contributors include the author andeditor of the book. The contributors have dedicated profile pagesthat a user can follow. Sixth action investigates the effect of search-ing for an Amazon product on Google.com. The user logs into Google using the email id used to register the Amazon account. Thehypothesis is that Amazon search results could be affected by thirdparty browsing history. After selecting the actions, we determinedthe products on which the actions needed to be performed. To build user history, all user actions except “followcontributor” need to be performed on products. First, we annotatedall products collected in the

Unpersonalized audit run as debunking(-1), neutral (0) or promoting (1) health misinformation. We presentthe annotation details in Section 4.4. For each annotation value (-1,0, 1), we selected top-rated products that had received maximumengagement and belonged to the most occurring category—‘Books’.We started by filtering Books belonging to each annotation valueand eliminated the ones that did not have an “Add to cart” buttonon their product page at the time of product selection. Since usersmake navigation and engagement decisions based on informationcues on the web [51], we considered cues present on Amazon suchas customer ratings as a criteria to further shortlist Books. First,we sorted Books based on the accumulated engagement—numberof customer ratings received. Next, we sorted the top 10 Booksobtained from the previous sorting based on star ratings receivedby the Books to end up with highly rated, high-impact and high-engagement products. We selected top 7 books from the secondsorting for the experiment (see Appendix, Table 9 for the shortlistedbooks).Action “follow contributor” is the only action that is performedon contributors’ Amazon profile pages . We selected contributorswho contributed to the most number of debunking (-1), neutral (0)and promoting (1) books. We retained only those who had a profilepage on Amazon. Table 4 lists the selected contributors. We performedall six actions explained in Section 4.3.2 and Table 3 on Books (orcontributors of the books in case of action “follow contributor”) The contributors could be authors, editors, people writing foreward of a book, pub-lisher, etc. uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan

Table 4: List of contributors who have contributed to the most number of books that are either debunking, neutral or promotehealth misinformation, selected for building account history for action “Follow contributors”. For example, Andrew J Wake-field, Mary Holland (both prominent vaccine deniers) have contributed to the most number of books that promote healthmisinformation. that are either all debunking, neutral or promoting health misinfor-mation. Each action and product type combination was acted uponby two treatment accounts. One account built its search historyby first performing searches on Amazon and then viewing searchresults sorted by filters “featured” and “average customer review”while the other did the same but sorted results by “price low tohigh” and “newest arrivals” . We did not use the filter “price highto low” since intuitively it is less likely to be used during searches.We also created 2 control accounts corresponding to 2 treatmentsthat emulated the same actions as the treatments except that theydid not build account histories by performing one of the 6 user ac-tions. Like 2 treatment accounts, the first control account searchedfor 48 queries curated in Section 4.1.2 and sorted them by filters“Featured” and “Average customer Review” while the other controlsorted them by the remaining two filters. Figure 5 outlines theexperimental steps performed by treatment and control accounts.We also created twins for each of the control accounts. The twinsperformed the exact same tasks as the corresponding control. Anyinconsistencies between a control account and its twin can be at-tributed to noise, and not personalization. Remember, Amazon’salgorithms are a black box. Even after controlling for all knownpossible sources of noise, there could be some sources that we arenot aware of or the algorithm itself could be injecting some noisein the results. If the difference between search results of controland treatment is greater than the baseline noise, only then it can beattributed to personalization. Prior audit work have also adoptedthe strategy of creating a control and its twin to differentiate be-tween the effect due to noise versus personalization [33]. Overall,we created 40 Amazon accounts (6 actions X 3 tested values X 2replicates for filters + 2 control accounts + 2 twin accounts). Next,we discuss the components collected from each account. Every account created for this experiment was run by a bot. It was not possible for abot to complete the following order of tasks in 24 hours because of wait times addedafter every action– building history using a particular action, searching for 48 searchqueries sorted by 4 filters and collecting auto-complete suggestions for those queriesetc. Thus, every action-product type combination was performed on two accounts.First account, sorted the search results by two filters and second account sorted resultsusing remaining two filters. We call these two accounts replicates since they built theirhistory in the same way.

We collected search results and auto-complete suggestions for treat-ment and control accounts to measure the extent of personalization.We collected recommendations only for the treatment accountssince they built history by clicking on product pages, pre-purchasepages, etc. Search results were sorted by filters ‘featured”, “aver-age customer review”, “price low to high” and “newest arrivals”.Once users start building their account history, Amazon displaysseveral recommendations to drive engagement on the platform. Wecollected various types of recommendations spread across threerecommendation pages: homepage, product page and pre-purchasepage. Pre-purchase pages were only collected for the accounts thatperformed “add to cart” action. Additionally, product pages werecollected for accounts that clicked on search results while creatingtheir respective account history. Each of the aforementioned pagesconsist of several recommendation types, such as “Customers whobought this item also bought”, etc. We collected the first productpresent in each of these recommendation types from both prod-uct pages and pre-purchase pages and two products from eachtype from the homepages for further analysis. Refer to Table 1 andFigures 1a, 1b and 1c for examples of these recommendation types.

Just like our

Unpersonalizedaudit , we first controlled for VM configuration and geolocation.Next, we controlled for demographics by setting the same genderand age for newly creating Google accounts. Recall, that theseGoogle accounts were used to sign-up for the Amazon accounts.Since, the VMs were newly created, the browser had no searchhistory that could otherwise hint towards users’ demographics.All accounts created their histories at the same time. They alsoperformed the searches at the same time each day, thus, controllingfor temporal effects. Lastly, we did not account for carry over effectssince it affected all the treatment and control accounts equally.

Figure 5 illustrates the experimentalsteps. We ran 40 selenium bots on 40 VMs. Each selenium botoperated on a single Amazon account. On day 0, we manuallylogged in to each of the accounts by entering login credentials andperforming account verification. Next day, experiment began attime t. All bots controlling treatment accounts started performingvarious actions to build history. Note, everyday bots built history

HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra searchamazon productamazon login & acc. veri ﬁ cation searchamazonproductamazon login & acc. veri ﬁ cation searchamazon productamazon login & acc. veri ﬁ cation searchamazon productamazon login & acc. veri ﬁ cationamazon login & acc. veri ﬁ cationamazon login & acc. veri ﬁ cation click amazon product & save HTML add productto cart andsave HTMLclick amazon product, saveHTML & go toreview page mark top rated,posi-tive reviewhelpfulclick amazon product & save HTML go to amazonhomepage &save HTMLgo to contributor'swebpage and followlogin to google.com,search product, saveHTML and go to amazon.com go to amazonhomepage &save HTMLgo to amazonhomepage &save HTMLgo to amazonhomepage &save HTMLgo to amazonhomepage &save HTMLgo to amazonhomepage &save HTML searchquery1 select ﬁ lter1& save SERP collect auto-completesuggestions select ﬁ lter2& save SERPrepeat for all 48 search queries searchquery1 select ﬁ lter1& save SERP collect auto-completesuggestions select ﬁ lter2& save SERPrepeat for all 48 search queries searchquery1 select ﬁ lter1& save SERP collect auto-completesuggestions select ﬁ lter2& save SERPrepeat for all 48 search queriestime t time t+90min time t+150min Day 0 Day 1-7

Treatments amazon login & acc. veri ﬁ cation Control searchquery1 select ﬁ lter1& save SERP collect auto-completesuggestions select ﬁ lter2& save SERPrepeat for all 48 search queries searchquery1 select ﬁ lter1& save SERP collect auto-completesuggestions select ﬁ lter2& save SERPrepeat for all 48 search queries searchquery1 select ﬁ lter1& save SERP collect auto-completesuggestions select ﬁ lter2& save SERPrepeat for all 48 search queries searchquery1 select ﬁ lter1& save SERP collect auto-completesuggestions select ﬁ lter2& save SERPrepeat for all 48 search queries Figure 5: Steps performed by treatment and control accounts in

Personalized audit corresponding to the 6 different features. by performing actions on a single Book/contributor. We gave botssufficient time to build history (90 min) after which they collectedand saved Amazon homepage. Later, all 40 accounts (control +treatment) searched for 48 queries with different search filters andsaved the SERPs. Next, the bots collected and saved auto-completesuggestions for all 48 queries. We included appropriate wait timesbetween every step to prevent accounts from being recognized asbots and getting banned in the process. We repeated these stepsfor a week. At the end of the week, for each treatment accountwe had collected personalized search results, recommendationsand auto-complete suggestions. Next, we annotated the collected search results and recommendations to determine their stance onmisinformation so that later we could analyze them to study theeffect of user actions on the amount of misinformation presentedto users in each component.

Unlike partisan bias where bias could be determined by using fea-tures such as news source bias [56], labelling a product for misinfor-mation is hard and time-consuming. There are no pre-determined uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan

A. ScaleValue AnnotationDescription Annotation Heuristics Sample Amazon Products -1 debunks vaccine misin-formation Product debunks, derides OR provides evidence against the myths/controversies surrounding vaccines OR helpsunderstand anti-vaccination attitude OR promotes use of vaccination OR describes history of a disease anddetails how its vaccine was developed OR describes scientific facts about vaccines that help users to understandhow they work OR debunks other health-related misinformation0 neutral health relatedinformation All medicines and antibodies OR medical equipment (thermometer, syringes, record-books, etc.) OR dietarysupplements that do not violate Amazon’s policy OR products about animal vaccination and diseases OR health-related products not promoting any conspiratorial views about health and vaccines1 promotes vaccine andother health relatedmisinformation Product promotes disuse of vaccines OR promotes anti-vaccine myths, controversies or conspiracy theoriessurrounding the vaccines OR advocates alternatives to vaccines and/or western medicine (diets, pseudosciencemethods like homeopathy, hypnosis, etc.) OR product is a misleading dietary supplement that violates Amazon’spolicy on dietary supplements- the supplement states that it can cure, mitigate, treat, or prevent a disease inhumans, but the claim is not approved by the FDA OR it promotes other health-related misinformation2 unknown Product’s description and metadata is not sufficient to annotate it as promoting, debunking or neutral informa-tion3 removed Product’s URL is not accessible at the time of annotation -4 Other language Product’s title and description is in language other than english5 Unrelated Non-health related products

Table 5: Description of annotation scale, heuristics along with sample products corresponding to each annotation value. sources of misinformation such as list of sellers or authors of mis-informative products on Amazon. Additionally, we found that theannotation process for some categories of products, like Books,Kindle ebooks, etc. required us to consider the product image, readthe book’s preview, if available, and even perform external searchabout the authors. Therefore, we opted to manually annotate ourdata collection. We developed a qualitative coding scheme to labelour Amazon data collection through an iterative process that re-quired several rounds of discussions to reach an agreement on theannotation scale.In the first round, first author randomly sampled 200 Amazonproducts across different topics and categories. After multiple itera-tions of analyzing and interpreting each product, the author cameup with an initial 7-point annotation scale. Then, six researcherswith extensive work experience on online misinformation indepen-dently annotated 32 products, randomly selected from the 200 prod-ucts. We discussed every product’s annotation value and the re-searchers’ annotation process. We refined the scale as well as thescheme based on the feedback. This process was repeated thrice after which all six annotators reached a consensus on the annota-tion scheme and process. In the fourth round, we gathered addi-tional feedback from an external researcher from the CredibilityCoalition group —an international organization of interdisciplinaryresearchers and practitioners dedicated to developing standardsfor news credibility and tackling the problem of online misinfor-mation. The final result of the multi-stage iterative process (seeAppendix, Figure 14) is a 7-point annotation scale comprising ofvalues ranging from -1 to 5 (see Table 5). The scale measures thescientific quality of products that users are exposed to when theymake vaccine-related searches on Amazon. In order to annotate an Amazon pro-duct, the annotators were required to go through several fieldspresent on the product’s detail page in the following order: title,description, top critical and top positive reviews about the product,other metadata present on the detail page, such as editorial reviews,legal disclaimers, etc. If the product was a book, the annotators werealso recommended to do the following three steps: (1) go through https://credibilitycoalition.org/ HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra the first few pages in the book preview , (2) see other books pub-lished by the authors, (3) perform a google search on the book andgo through the first few links to discover more information aboutthe book. Annotators were asked to see contextual informationabout the product from multiple sources to gain more context andperspective. This technique is grounded in lateral reading that hasproven to be a good approach for credibility assessment [60]. Below we describe eachvalue in our annotation scale. Table 5 presents examples.

Debunking (-1):

Annotation value ‘-1’ indicates that the productdebunks vaccine misinformation or derides any vaccine-relatedmyth or conspiracy theory or promotes the use of vaccination. Asan example, consider the poster titled

Immunization Poster 1979Vintage Star Wars C-3PO R2-D2 Original (B00TFTS194) that en-courages parents to vaccinate their children. Products helping usersunderstand anti-vaccination attitude or those that describe the his-tory about the development of vaccines or the science behind howvaccines work were also included in this category. Promoting (1):

This category includes all products that support orsubstantiate any vaccine related myth or controversy and encour-ages parents to raise a vaccine-free child. For example, consider thefollowing books that promote anti-vaccination agenda. In

A Summa-ry of the Proofs that Vaccination Does Not Prevent Small-pox butReally Increases It (B01G5QWIFM), the author talks about dangersof large scale vaccination and in

Vaccine Epidemic: How CorporateGreed, Biased Science, and Coercive Government Threaten Our HumanRights, Our Health, and Our Children (B00CWSONCE), the authorsquestion vaccine safety and present several narratives of vaccineinjuries. We included several Amazon Fashion (B07R6PB2KP) andAmazon Home (B01HXAB7TM) merchandise in this category toosince they contained anti-vaccine slogans like “Educate before youVaccinate”, “Jesus wasn’t vaccinated”, etc.We also included all products advocating any alternatives to vac-cines, products that promote other health-related misinformation,dietary supplements that claim to cure diseases in their descriptionbut are not approved by Food and Drug Administration (FDA) in this category. Neutral (-0):

We annotated all medical equipment and medicinesas neutral (annotation value ‘0’). Note that it is beyond the scopeof this project to determine the safety and veracity of the claimsof each medicine sold on the Amazon platform. This means thatthe number of products that we have determined to be promot-ing (1) serve as the lower bound of the amount of misinformationpresent on the platform. This category also includes dietary sup-plements that do not violate Amazon’s policy, pet-related productsand health-related products not advocating a conspiratorial view. Amazon has introduced a Look Inside feature that allows users to preview few pagesfrom the book. Note that for dietary supplements category, Amazon asks sellers not to state thatthe products cure, mitigate, treat, or prevent a disease in humans in their details page,unless that statement is approved by the FDA [9]

Other annotations:

We annotated a product as ‘2’ if the product’sdescription and metadata were not sufficient to determine its stance.We assigned values ‘3’ and ‘4’ to all products whose URL wasnot accessible at the time of the annotation and whose title anddescription was in a language other than English, respectively.We annotated all non-health related products (e.g. diary, carpet,electronic products, etc.) with value ‘5’.Both our audits resulted in a dataset of 4,997 Amazon productsthat were annotated by the first author and Amazon MechanicalTurk workers (MTurks). The first author being the expert annotatedmajority of products (3,367) to determine what would be a good taskrepresentation to obtain high quality annotations for the remaining1,630 products from novice MTurks. We obtained three Turkerratings for each remaining product and used the majority responseto assign the annotation value. Our task design worked. For 97.9%of the products, annotation values converged. Only 34 productshad diverging responses. The first author then annotated these 34products to obtain the final set of annotation values. We describethe AMT job in detail in Appendix A.1.

In this section, we describe our method to determine the amountof misinformation present in search results. How do we estimatethe misinformation bias present in Amazon’s SERPs? First, we usedour annotation scheme to assign misinformation bias scores ( 𝑠 𝑖 ) toindividual products present in SERPs. We converted our 7 point (-1to 5) scale to misinformation bias scores with values -1, 0 and 1. Wemapped annotation values 2, 3, 4, and 5 to bias score 0. Merging“unknown” annotations to neutral will result in a conservativeestimate of misinformation bias present in the search results. Now,a product can be assigned one of the three bias scores: -1 suggeststhat product debunks misinformation, 0 indicates a neutral stanceand 1 implies that the product promotes misinformation. Next, toquantify misinformation bias in Amazon’s SERPs, we adopt theframework and metrics proposed in prior work to quantify partisanbias in Twitter search results [42]. Below we discuss three kindsof bias proposed by the framework and delineate how we estimateeach bias with respect to misinformation. Table 6 illustrates howwe calculated the bias values.(i) The input bias ( ib ) of a list of Amazon products is the meanof misinformation bias scores of the constituting products[42]. Therefore, ib = (cid:205) 𝑛𝑖 = 𝑠 𝑖 , where n is the length of the list& 𝑠 𝑖 is the misinformation bias score of ith product in the Rankr Product Bias of eachproduct Bias tillrank r Bias value 𝑝 𝑠 B(1) 𝑠 𝑝 𝑠 B(2) ( 𝑠 + 𝑠 )3 𝑝 𝑠 B(3) ( 𝑠 + 𝑠 + 𝑠 )Input Bias (ib) ( 𝑠 + 𝑠 + 𝑠 )Output Bias (ob) [ 𝑠 (1 + + ) + 𝑠 ( + ) + 𝑠 ( )]Rank Bias (rb) ob-ib Table 6: Example illustrating the bias calculations. For agiven query, Amazon’s search engine presents users with thefollowing products in the search results 𝑝 , 𝑝 and 𝑝 . Themisinformation bias scores of the products are 𝑠 , 𝑠 and 𝑠 respectively. The table has been adopted from previous work[42]. A bias score larger than 0 indicates a lean towards mis-information. uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan

Amazon FashionAmazon HomeAmazon AudiobooksBooksHealth & Personal CareOtherO ﬃ ce ProductsKindle eBooks Product Categories debunkingneutralpromoting No. of products0 200 400 600 800 1000 1200

Figure 7: RQ1a: Figure showing categories of promoting, neutral and debunking Amazon products (search results). All cate-gories occurring less than 5% were combined and are presented as other category. Note that misinformation exists in variousforms on Amazon. Products promoting health misinformation include books (Books, Kindle eBooks, Audible Audiobooks),apparel (Amazon Fashion) and dietary supplements (Health & Personal Care). Additionally, proportion of books promotinghealth misinformation is much greater than proportion of books debunking misinformation. list. Input bias is an unweighted bias, i.e it is not affected bythe rank/ordering of the items.(ii) The output bias ( ob ) of a ranked list is the overall bias presentin the SERPs and is the sum of biases introduced due to inputand ranks of the input. We first calculate weighted bias scoreB(r) of every rank r, which is the average misinformation biasof products ranked from 1 to r. Thus, B(r) = (cid:205) 𝑟𝑖 = 𝑠 𝑖 𝑟 , where 𝑠 𝑖 is the misinformation bias score of ith product. Output bias(ob) is the average of weighted bias score B(r) for all ranks.Thus, by definition ob = (cid:205) 𝑟𝑖 = 𝐵 ( 𝑖 ) 𝑟 .(iii) The ranking bias ( rb ) is introduced by the ranking algorithmof search engine [42]. It is calculated by subtracting input biasfrom output bias. Thus, rb = ob - ib . In our case, high rankingbias indicates that search algorithm ranks misinformativeproducts higher than neutral or debunking products.Why do we need three bias scores? Amazon’s search algorithmis not only selecting the products to be shown in the search resultsbut it is also ranking them according to their internal algorithm.Therefore, the overall bias (ob) could be introduced either at theproduct selection stage (ib), or ranking stage (rb) or both. Studyingall three biases gives us an elaborate understanding of how biasesare introduced by the search algorithm. All three bias values ( ib , ob and rb ) lie between -1 and 1. A bias score larger than 0 indi-cates a lean towards misinformation. Conversely, a bias score lessthan 0 indicates a propensity towards debunking information. Weonly consider top 10 search results in each SERP. Thus, in the biascalculations, rank always varies from 1 to 10. The aim of the

Unpersonalized audit is to determine the amount ofmisinformation bias in search results. Below we present the input,rank, and output bias detected by our audit in search results of all10 vaccine-related topics with respect to 5 search filters.

We collected 36,000 search results from our

Unpersonalized audit run, out of which 3,180 were unique. Recall, we collected theseproducts by searching for 48 search queries belonging to vaccine-related topics and sorting results by each of the 5 Amazon filters.We later extracted and annotated top 10 search results from all thecollected SERPs resulting in 3,180 annotations. Figure 6a shows thenumber (and percentage) of products corresponding to each annota-tion value. Through our audits, we find a high percentage (10.47%)of misinformative products in the search results. Moreover, the

HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra mmrin ﬂ uenza vacc.immunizationvaricella vacc.vaccinationhpv vacc.mmr vacc.& autismhepatitisvacc. controversiesandrew wak. featured cust.rev. priceLtoH priceHtoL newarriv. featured cust.rev. priceLtoH priceHtoL newarriv. featured cust.rev. priceLtoH priceHtoL newarriv.input bias rank bias output bias Figure 8: RQ1a: Input, rank and output bias for all 10 vaccine-related topics across five search filters. The bias scores are averageof scores obtained for each of the 15 days. Input and rank bias is positive (>0) in the search results of majority of topics forfilters “featured” and “average customer review”. A bias value greater than 0 indicates a lean towards misinformation. Topics“andrew wakefield” and “mmr vaccine & autism” have a positive input bias across all five filters indicating that search resultsof these topics contain large number of products promoting health misinformation irrespective of the filter used to sort thesearch results. Topic “vaccination” has the highest overall bias (output bias) of 0.63 followed by topic “andrew wakefield” thathas output bias of 0.53 for filter “featured”. number of misinformative products outnumbered the debunkingproducts. Figure 7 illustrates the distribution of categories of Ama-zon products annotated as debunking (-1), neutral (0) and promoting(1). Note that the products promoting health misinformation pri-marily belong to categories Books (35.43%), Kindle eBooks (28.52%),Amazon Fashion (12.61%)—a category that includes t-shirts, apparel,etc. and Health & Personal Care (10.21%)—a category consistingof dietary supplements. Below we discuss the misinformation biasobserved across all the vaccine-related topics, the Amazon searchfilters and search queries.

We calculatethe input, rank and output bias for each of the 10 search topics.All the bias scores presented are average of scores obtained acrossthe 15 days of audit. The bias score for a topic is also the averageacross each of the constituting search queries. Figure 8 shows thebias scores for all the topics, search filters and bias combinations.

Input bias:

We observe a high input bias (>0) for all topics except“hepatitis” for “average customer review” filter indicating presenceof a large number of misinformative products in the SERPs whensearch results are sorted by this filter. Similarly, input biases for mosttopics is also positive for “featured” filter. Note, “featured” is thedefault Amazon filter. Thus, by default Amazon is presenting moremisinformative search results to users searching for vaccine relatedqueries. Topics “andrew wakefield”, “vaccination” and “vaccinecontroversies” have highest input biases for the both “featured”and “average customer review” filters. Another noteworthy trendis the negative input bias for 7 out of 10 topics with respect to filter“newest arrivals” indicating that there are more debunking productspresent in the SERP when users look for newly appearing productson Amazon. “Andrew wakefield” and “mmr vaccine & autism” are the only two topics that have positive input bias (>0) across allthe five filters. Interestingly, there is no topic that has negativeinput bias across all filters. Recall, a negative (<0) bias indicates adebunking lean. Topics “mmr”, “influenza vaccine” and “hepatitis”have negative bias scores in four out of five filters.

Rank bias: featuredavg.customer reviewsprice low to highprice high to lownewsest arrival input bias rank bias output bias -101

Figure 9: Input, rank and output bias for all filter types. uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan ib rb obrbob search query - Amazon ﬁ lter ib vaccination is not immunization - custReviewvaccination is not immunization - featuredvaccination is not immunization - priceHtoLautism vaccine - custReviewvaccine - custReviewanti vaccine books - custReviewvaccine friendly plan - featuredanti vaccination - custReviewanti vaccine - custReviewanti vaccine books - featured vaccine free me book - featuredandrew wake ﬁ eld - custReviewandrew wake ﬁ eld - featuredwake ﬁ eld autism - custReviewvaccine controversy - custReviewwake ﬁ eld autism - featuredvaccine friendly plan - custReviewvaccine free me - featuredanti vaccine shirt - featuredvaricella vaccine - custReview 1.00.50.0-0.5-1.0search query - Amazon ﬁ lter Figure 10: Top 20 search query-filter combinations when sorted by output bias (ob). In other words, these query-filter combi-nations are the most problematic ones containing highest amount of misinformation (highest ob).Output bias:

Output bias is positive (>0) for most topics withrespect to filters “featured” and “average customer reviews”. Recall,a bias value greater than 0 indicates a lean towards misinformation.Topic “vaccination” has the highest output bias (0.63) for filter“featured”. On the other hand, topic “influenza vaccine” has leastoutput bias (-0.24) for filter “price high to low”.

Figure 9 shows the re-sults for all 5 filters. Bias scores are averaged across all searchqueries. All filters except “newest arrivals” have positive input,rank, and output misinformation bias. Filter “average customerreview” has the highest positive output bias indicating that mis-informative products belonging to vaccine related topics receivehigher ratings. We present the implications of these results in ourdiscussion (Section 7).

Figure 10 shows thetop 20 search queries and filter combinations with highest outputbias. Predictably, filter “newest arrivals” does not appear in anyinstance. Surprisingly, 9 search query-filter combinations have veryhigh output biases (ob > 0.9). Search query “vaccination is notimmunization” has output bias of 1 for three filter types. Most ofthe search queries in Figure 10 have a negative connotation, i.ethe queries themselves have a bias (e.g search queries anti vaccinebooks, vaccination is not immunization indicates an intent to searchfor misinformation). This observation reveals that if you search foranti vaccine stuff, you will get high amount of vaccine and healthmisinformation. This indicates how Information Retrieval systemscurrently work; they curate by relevance with no notion of veracity.The most troublesome observation is the presence of high outputbias for generic and neutral search queries, “vaccine” (ob = 0.99) and“varicella vaccine” (ob = 0.79). These results indicate that, unlikecompanies like Pinterest, who have altered their search enginesin response to vaccine related queries [7], Amazon has not madeany modification to its search algorithm to push less anti vaccineproducts to users.

We extracted the product page recommendations of top 3 searchresults present in the SERPs. The product page constitutes of var-ious types of recommendations. For analysis, we considered thefirst product present in 5 types of recommendations “Customerswho bought this item also bought” (CBB), “Customers who viewedthis item also viewed” (CVV), “Frequently bought together” (FBT),“Sponsored products related to this item” and “What other items cus-tomers buy after viewing this item” (CBV). The process resulted in16,815 recommendations out of which 1,853 were unique. Figure 6bshows the number and percentage of recommendations belongingto different annotation values. The percentage of misinformativerecommendations (12.95%) is much higher than the debunkingrecommendations (1.95%). The total input bias in all 16,815 reco-mmendations is 0.417 while in all 1,853 unique recommendationsis 0.109, indicating a lean towards misinformation.Does filter-bubble effect occur in product page recommenda-tions? To answer, we compared the misinformation bias scoresof all types of recommendations considered together (refer Table7). Kruskal Wallis Anova test revealed the difference to be signif-icant (KW H(2, N=16815) = 6,927.6, p=0.0). Post-hoc Tukey HSDtest showed that the product page recommendations of misinfor-mative products contain more misinformation when compared torecommendations of neutral and debunking products. Even moreconcerning is that the recommendations of debunking productshave more misinformation than neutral products. To investigatefurther, we qualitatively studied the recommendation graphs ofeach of the five recommendation types (Figure 11). Each node inthe graph represents an Amazon product. An edge A → B indicatesthat B was recommended in the product page of A. Node size isproportional to the number of times the product was recommended.

Misinformation bias scores of CBB are significantlydifferent for debunking, neutral, and promoting products (KW H(2,N=3133) = 2136.03, p=0.0). Post hoc tests reveal that CBB recom-mendations of misinformative products have more misinformationwhen compared to CBB recommendations of neutral and debunking

HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra (a) Customers who bought this item also bought (CBB) (b) Customers who viewed this item also viewed (CVV)(c) Frequently bought together (FBT) (d) Sponsored products related to this item(e) What other items customers buy after viewing this item (CBV). Note that the recommendation graph for CBV recommendation type isindeed one figure. It consists of two disconnected components, indicating strong filter bubble effect.

Figure 11: Recommendation graphs for 5 different types of recommendations collected from the product pages of top threesearch-results obtained in response to 48 search queries, sorted by 5 filters over a duration of 15 days during

Unpersonalizedaudit run. denotes products annotated as misinformative, as neutral and as debunking. Node size is proportional tothe times the product was recommended in that recommendation type. Large sized red nodes coupled with several interconnec-tions between red nodes indicate a strong filter-bubble effect where recommendations of misinformative products returnedmore misinformation. products. Additionally CBB recommendations of neutral productshave more misinformation than CBB recommendations of debunk-ing products. The findings are evident from Figure 11a too. Forexample, there are several instances of red nodes connected to eachother. In other words, if you click on a misinformative search result,you will get misinformative products in CBB recommendations.Few of the green nodes are attached to red ones indicating that CBB recommendation of a neutral product sometimes contain amisinformative product. The most recommended product presentin CBB is a misinformative Kindle book titled

Miller’s Review ofCritical Vaccine Studies: 400 Important Scientific Papers Summarizedfor Parents and Researchers (B07NQW27VD). uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan

Type of product page recommendations Kruskal Wallis Anova Test Post hoc Tukey HSD d n m

All KW H(2, N=16815) = 6,927.6, p=0.0 M>D & M>N & D>N 37 1576 240Cust. who bought this item also bought (CBB) KW H(2, N=3133) = 2136.03, p=0.0 M >D & M>N & N>D 11 225 66Cust. who viewed this item also viewed (CVV) KW H(2, N=4485) = 2673.95, p=0.0 M>D & M>N & D>N 18 331 100Frequently bought together (FBT) KW H(2, N=388) = 277.08, p=6.8e-61 M>D & M>N & D>N 1 111 16Sponsored products related to this item KW H(2, N=6575) = 628.52, p=3.2e-137 M>D & M>N & D>N 7 953 98What other items cust. buy after viewingthis item (CBV) KW H(2, N=2234) = 1611.34, p=0.0 M>D & M>N & D>N 9 230 57

Table 7: RQ1b: Analyzing echo chamber effect in product page recommendations. M, N and D are the means of misinformationbias scores of products recommended in the product pages of misinformative, neutral and debunking Amazon products re-spectively. Higher means indicate that recommendations contain more misinformative products. For example, M>D indicatesthat recommendations of misinformative products have more misinformation than recommendations of debunking products.d, n and m are number of unique products annotated as debunking, neutral and promoting for each recommendation type.

Misinformation bias scores of CVV recommenda-tions are significantly different for debunking, neutral and pro-moting products (KW H(2, N=4485) = 2673.95, p=0.0) . Post hoctest indicates that CVV recommendations of misinformative prod-ucts have more misinformation than CVV recommendations ofdebunking and neutral products. Notably, CVV recommendationsof debunking products contain more misinformation than CVVrecommendations of neutral products. This is troubling since userswho are clicking on products that present scientific information arepushed more misinformation in this recommendation type. In therecommendation graph (Figure 11b ), we see edges connecting mul-tiple red nodes supporting our finding that CVV recommendationsof misinformative products mostly contain other misinformativeproducts. The most recommended product occurring in this recom-mendation type is a misinformative Kindle book titled

DissolvingIllusions (B00E7FOA0U).

Mis-information bias scores of FBT recommendations are significantlydifferent for debunking, neutral and promoting products (KW H(2,N=388) = 277.08, p=6.8e-61). Post hoc tests reveal that amount ofmisinformation in FBB recommendations of misinformative prod-ucts is significantly more than the FBB recommendations of neutraland debunking products. The finding is also evident from the graph(Figure 11c). There are large sized red nodes attached to other rednodes and several green nodes attached together indicating thepresence of a strong filter-bubble effect. “Frequently bought to-gether” can be considered an indicator of buying patterns on theplatform. The post hoc tests indicate that people buy multiple mis-informative products together. The most recommended productpresent in this recommendation type is a misinformative Paperbackbook titled

Dissolving Illusions: Disease, Vaccines, and The ForgottenHistory (1480216895).

Most of the sponsored recommendations are either neutralor promoting (Figure 11d and Table 7). Statistical test reveals thatthe misinformation bias score of sponsored recommendations aresignificantly different among debunking, neutral and promotingproducts (KW H(2, N=6575) = 628.52, p=3.2e-137). Post hoc testsreveal same results as for CVV recommendations. There are two most recommended sponsored books. First is a misinformativepaperback book titled

Vaccine Epidemic: How Corporate Greed, Bias-ed Science, and Coercive Government Threaten Our Human Rights,Our Health, and Our Children (1620872129). Second is a neutralKindle book titled

SPANISH FLU 1918: Data and Reflections on theConsequences of the Deadliest Plague, What History Teaches, HowNot to Repeat the Same Mistakes (B08774MCVP).

Misinformation bias scores of CBV reco-mmendations are significantly different for debunking, neutral andpromoting products (KW H(2, N=2234) = 1611.34, p=0.0). Resultsof post hoc tests are same as that of CVV recommendations. Thepresence of an echo chamber is quite evident in the recommendationgraph (see Figure 11e). The graph has two disconnected components,one comprising a mesh of misinformative products indicating acluster of misinformative products that keep getting recommended.CBV is also indicative of buying patterns of Amazon users. Thealgorithm has learnt that people viewing misinformative productsend up purchasing them. Thus, it pushes more misinformative itemsto users that click on them, creating a problematic feedback loop.The most recommended product in this recommendation type is amisinformative Kindle book titled

Miller’s Review of Critical VaccineStudies: 400 Important Scientific Papers Summarized for Parents andResearchers (B07NQW27VD).

The aim of our Personalized audit was to determine the effect ofpersonalization due to account history on the amount of misinfor-mation returned in search results and various recommendations.Table 8 provides a summary. Below, we explain the effect of perso-nalization on each component.

We measure personalization in search results for each Amazon filterusing two metrics: Jaccard index and Kendall 𝜏 coefficient. Jaccardindex determines similarity between two lists. A Jaccard index of 1indicates that the two lists have same elements and zero indicatesthat the lists are completely different. On the other hand, Kendall HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra

RQ2a RQ2b RQ2cSearch results Recommendations Auto completesuggestionsFeatured Avg.customerreviews Price lowto High NewestArrivals Homepage Pre-purchase Product pageActions performedto build accounthistory D N M D N M D N M D N M D N M D N M D N M D N M

Search product IR IR IR NP NP NP NP NP NP NP NP NP - - - X X X X X X NP NP NPSearch & clickproduct IR IR IR NP NP NP NP NP NP NP NP NP KW H(2, N=42) = 32.07,p = 1.08e-07M>N>D X X X KW H(2, N=42) = 24.89,p = 3.94e-06M>D & M>N NP NP NPSearch + click &add to cart product IR IR IR NP NP NP NP NP NP NP NP NP KW H(2, N=42) = 33.48,p = 5.38e-08M>N>D KW H(2, 42) = 32.63,p = 8.19e-08M>N>D KW H(2, N=42) = 24.05,p = 5.98e-06M>D & M>N NP NP NPSearch + click &mark “Top rated,All positive review”as helpful IR IR IR NP NP NP NP NP NP NP NP NP KW H(2, N=42) = 32.33,p = 9.52e-08M>N>D X X X KW H(2, 42) = 23.36,p = 8.44e-06M>N & M>D NP NP NPFollowingcontributor IR IR IR NP NP NP NP NP NP NP NP NP - - - X X X X X X NP NP NPSearch producton Google IR IR IR NP NP NP NP NP NP NP NP NP - - - X X X X X X NP NP NP

Table 8: RQ2: Table summarizing RQ2 results. IR suggests noise and inconclusive results, i.e search results of control and itstwin seldom matched. Thus, difference between treatment and control could either be attributed to noise or personalization,making it impossible to study the impact of personalization on misinformation. NP denotes little to no personalization. -indicates that the given activity had no impact on the component. X indicates that component was not collected for the activity.M, N and D indicate average per day bias in the component collected by accounts that built their history by performing actionson misinformative, neutral or debunking products. Higher mean value indicates more misinformation. For example, considerthe cell corresponding to action “search + click & add to cart product” and “Homepage” recommendation. M>N>D indicatesthat accounts adding misinformative products to cart ends up with more misinformation in their homepage recommendationsin comparison to accounts that add neutral or debunking products to cart. featured avg. cust. review price low to high newestarrivals a v g . j a cc a r d i n d e x controlMND (a) featured avg. cust. review price low to high newestarrivals a v g . k e n d a ll ' s t a u (b) Figure 12: Investigating the presence and amount of personalization due to “following contributors” action by calculating (a)Jaccard index and (b) kendall’s tao metric between search results of treatment and control. M, N and D indicate results foraccounts that follow contributors of misinformative, neutral and debunking products respectively. 𝜏 coefficient, also known as Kendall rank correlation coefficientdetermines the ordinal correlation between two lists. It can takevalues between [-1,1] with -1 indicating that lists have inverseordering, 0 signifying no correlation and 1 suggesting that items inthe list have same ranks.First, we compare search results of control account and its twin.Recall we created twins for our 2 control accounts in the Personal-ized audit to establish the baseline noise. Ideally, both should haveJaccard and Kendall rank correlation coefficient closer to 1 since theaccounts do not build any history, are set up in a similar manner, perform searches at the same time and are in the same geolocation.Next, we compare search results of control account with treatmentaccounts that built account histories by performing different ac-tions. If personalization is occurring, the difference between searchresults of treatment and control should be more than the baselinenoise (or Jaccard index and Kendall 𝜏 should be less). Whereas, ifthe baseline noise itself is large, it indicates inconsistencies and ran-domness in the search results. Interestingly, we found significantnoise in search results of control and its twin for “featured” filterwith jaccard index <0.8 and Kendall’s rank correlation coefficient uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan add to cart (M)add to cart (N)add to cart (D) search-click (N)search-click (M)search-click (D) mark rev helpful (M)mark rev helpful (N)mark rev helpful (D)1.00.50.0-0.5-1.0 I n p u t b i a s - - - - - - - - - - - - - - (a) I n p u t b i a s - - - - - - - - - - - - - - add to cart (M)add to cart (N)add to cart (D) (b) I n p u t b i a s - - - - - - - - - - - - - - add to cart (M)add to cart (N)add to cart (D) search-click (N)search-click (M)search-click (D) mark rev helpful (M)mark rev helpful (N)mark rev helpful (D) (c) Figure 13: (a) Input bias in homepages of accounts performing actions ‘add to cart”, “search + click” and “mark top ratedall positive review” for seven days of experiment run. (b) Input bias in pre-purchase recommendations of accounts for 7days experiment run. These recommendations are only collected for accounts adding products to their carts. (c) Input biasin product pages of accounts performing actions “add to cart”, “search + click” and “mark top rated all positive review” for 7days of experiment run. M, N and D indicate that the accounts performed actions on misinformative, neutral and debunkingproducts respectively. <0.2, that is, control and its twins seldom matched. Presence ofnoise suggests that Amazon is injecting some randomness in the“featured” search results. Unfortunately, this means that we wouldnot be able to study the effect of personalization on the accountsfor the “featured” search filter setting.For the other three search filters, “average customer review”,“price low to high” and “newest arrivals”, we see high (>0.8) jaccardindex and kendall 𝜏 metric values between and control and its twin.Additionally, we do not see any personalization for these filters sincemetrics values for treatment-control comparison are similar to thatof control-twin comparison. Figure 12 shows the metrics calculationfor control account and treatments that have built their searchhistories by following contributor’s of misinformative, neutral anddebunking products. We see two minor inconsistencies for filter“average customer review” in accounts building their history ondebunking products where treatment received more similar resultsto control than its twin account. In any case, the treatment doesnot see more inconsistency than the control and its twin indicatingno personalization. Other user actions show similar results, hence,we have removed their results for brevity. We investigated the occurrence of personalization and its impact onthe amount of misinformation in three different recommendationpages. We discuss them below.

Homepage recommendations:

We find that homepages are per-sonalized only when a user performs click actions on the searchresults. Thus, actions “add to cart”, “search + click” and “mark toprated most positive review helpful” led to homepage personalization.On the other hand, homepages were not personalized for actions“follow contributor”, “search product” and “google search” actions.After identifying the actions leading to personalized homepages,we investigate the impact of personalization on the amount ofmisinformation. In other words, we investigate how misinformationbias in homepages is different for accounts building their historyby performing actions on misinformative, neutral and debunking products. For each action, we had 6 accounts, two replicates for eachaction and product type (misinformation, neutral and debunking).For example, for action “add to cart” two accounts built their historyby adding misinformative products to cart for 7 days, two addedneutral products and two accounts added debunking products totheir carts. We calculate per day input bias (ib) in homepages byaveraging the misinformation bias scores of each recommendedproduct present in the homepage. Therefore, for every accountwe have seven bias values. We consider only top two products ineach recommendation type. Recall, homepages could contain threedifferent types of recommendations ‘Inspired by your shoppingtrends”, “Recommended items other customers often buy again”and “Related to items you’ve viewed”. All the different types areconsidered together for analysis.Statistical tests reveal significant differences in the amount ofmisinformation present in homepages of accounts that built theirhistories by performing actions on misinformative, neutral anddebunking products (see Table 8). This observation holds true forall three activities “add to cart”, “search + click” and “mark toprated most positive review helpful”. Post hoc test reveals an echochamber effect. Amount of misinformation in recommendations ofproducts performing actions on misinformative products is morethan the amount of misinformation in homepages of accounts per-forming actions on neutral products which in turn is more thanthe misinformation present in homepages of accounts performingactions on debunking products.Figure 13a shows per day input bias of homepages of differentaccounts performing different actions. We take an average of thereplicates for plotting the graph. Surprisingly, performing actions“mark top rated most positive review helpful” and “search + click”on a misinformative product leads to highest amount of misinforma-tion in the homepages, even more than the homepages of accountsadding misinformative products to the cart. This means that amountof misinformation present in homepage is comparatively less oncea user shows an intention to purchase a misinformative productbut high if a user shows interest in the misinformative productbut doesn’t show an indication to buy it. Figure 13a also shows

HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra that amount of misinformation present in homepages of accountsperforming actions “mark top rated most positive review helpful”and “search + click” on misinformative products gradually increasesand becomes 1 on day 4 (2020-08-15). Bias value 1 indicates that allanalysed products in homepages were misinformative. Homepagerecommendations of products performing actions on neutral ob-jects show 0 bias constantly indicating all recommendations on alldays were neutral. On the other hand, average bias in homepages ofaccounts building history on debunking accounts rose a little above0 in the first three days but eventually fells below 0 indicating adebunking lean.

Pre-purchase recommendations:

These recommendations areonly presented to users that add product(s) to their Amazon cart.Therefore, they were collected for 6 accounts, 2 of which addedmisinformative products to cart, 2 added neutral products and theother 2 added debunking products. These recommendations couldbe of several types. See Figure 1b for an example of pre-purchasepage. For our analysis, we consider the first product present ineach recommendation type. Statistical tests reveal significant dif-ference in the amount of misinformation present in pre-purchaserecommendations of accounts that added misinformative, neutraland debunking products to cart (KW H(2, 42) = 32.63, p = 8.19e-08).Those adding misinformative products to cart contain more misinfo-rmation than the accounts adding neutral or debunking productsto their carts. Figure 13b shows the input bias in the pre-purchaserecommendations for all the accounts. There is no coherent tempo-ral trend, indicating that the input bias in this recommendation typedepends on the particular product being added to cart. However, anecho chamber effect is evident. For example, bias in pre-purchaserecommendations of accounts adding misinformative products tocart is above 0 for all 7 days.

Product recommendations:

We collect product recommenda-tions for accounts performing “add to cart”, “search + click” and“mark top rated most positive review helpful” actions. We find signif-icant difference in the amount of misinformation present in productpage recommendations when accounts performed these actions onmisinformative, neutral, and debunking products (refer Table 8).Post hoc analysis reveals that product page recommendations ofmisinformative products contain more misinformation than thoseof neutral and debunking products. Figure 13c shows the inputbias present in product pages across accounts. The bias for neutralproducts is constantly 0 across the 7 days, but for misinformativeproducts, it is constantly greater than 0 for all actions. We see anunusually high bias value on the 6th day (2020-08-17) of our exper-iment for accounts performing actions on debunking product titled

Reasons to Vaccinate: Proof That Vaccines Save Lives (B086B8MM71).We checked the product page recommendations of this particular de-bunking book and found several misinformative recommendationson its product page.

We audited auto-complete suggestions to investigate how personal-ization affects the change in search query suggestions. Our initial hypothesis was that performing actions on misinformative prod-ucts could increase the auto-complete suggestions of anti-vaccinesearch queries. However, we found little to no personalization inthe auto-complete suggestions indicating that account history builtby performing actions on vaccine-related misinformative, neutralor debunking products have little to no effect on how auto-completesuggestions of accounts change. In interest of brevity, we do notadd the results and graphs for this component.

There is a growing concern that e-commerce platforms are becom-ing hubs of dangerous medical misinformation. Unlike search engin-es where the motivation of the platform is to show relevant searchresults to sell advertisements, goal of e-commerce platforms is to sellproducts. The motivation to increase sales means that relevance inrecommendations and search suggestions is driven by what peoplepurchase after conducting a search or viewing an item, irrespectiveof whether the product serves credible information or not. As aresult, due to lack of regulatory policies, websites like Amazonare providing a platform to people who are making money by sell-ing misinformation—dangerous anti-vaccine ideas, pseudosciencetreatments, or unproven dietary alternatives—some of which couldhave dangerous effects on people’s health and well-being. Witha US market share of 49%, Amazon is the leading product searchengine in the United States [14]. Therefore, any misinformationpresent in its search and recommendations could have a far reach-ing influence where they can negatively shape users’ viewing andpurchasing patterns. Thus, in this paper we audited Amazon forthe most dangerous form of health misinformation—vaccine misin-formation. Our work resulted in several critical findings with farreaching implications. We discuss them below.

Our analysis shows that Amazon hosts a variety of health misinfor-mative products. Maximum number of such products belong to thecategory Books and Kindle eBooks (Figure 7). Despite the enor-mous amount of information available online, people still turn tobooks to gain information. A Pew Research survey revealed that73% of Americans read atleast one book in a year [50]. Books areconsidered “intellectual heft”, have more presence than scientificjournals and thus, leave “a wider long lasting wake” [34]. Thus,anti-vaccine books could have a wider reach and can easily influ-ence the audience negatively. Moreover, it does not help that alarge number of anti-vaccine books are written by authors withmedical degrees [59]. Not just anti-vaccine books, there are abun-dant pseudoscience books on the platform, all suggesting unprovenmethods to cure diseases. We found diet books suggesting recipeswith colloidal silver—an unsafe product, as an ingredient. Some ofthe books proposing cures for incurable diseases, like autism andauto immune diseases, can have a huge appeal for people sufferingwith such diseases [55]. Thus, there is an urgent need to check thequality of health books presented to the users.The next most prominent category of health misinformativeproducts is Amazon Fashion. Numerous apparels are sold on theplatform with innovative anti-vaccine slogans, giving tools to the uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan

Analysis of our

Unpersonalized audit revealed that 10.47% of searchresults promote vaccine and other health-related misinformation.Notably, the higher percentage of products promoting misinforma-tion compared to debunking suggests that anti-vaccine and problem-atic health-related content is churned out more and the attempts todebunk the existing misinformation is less. We also found that Ama-zon’s search algorithm puts more health misinformative productsin search results than debunking products leading to high input biasfor topics like “vaccination”, “vaccine controversies”, “hpv vaccine”,etc. This is specifically true for search filters “featured” and “aver-age customer reviews”. Note, that “featured” is the default searchfilter indicating that by default users will see more misinformationwhen they search for the aforementioned topics. On the other hand,if users want to make a purchase decision based on product rat-ings, again users will be presented with more misinformation sinceour analysis indicates that sorting by filter "average customer re-views" leads to highest misinformation bias in the search results.We also found a ranking bias in Amazon’s search algorithm withmisinformative products getting ranked higher. Past research hasshown that people trust higher ranked search results [31]. Thus,more number of higher ranked misinformative products can makeproblematic ideas in these products appear mainstream. The onlypositive finding of our analysis was the presence of more debunk-ing products in search results sorted by filter “newest arrivals”.This might indicate that more higher quality products are beingsold on the platform in recent times. However, since there are nostudies/surveys indicating which search filters are mostly used bypeople while making purchase decisions, it is difficult to concludehow beneficial this finding is.

Many search engines and social media platforms employ personali-zation to enhance users’ experience on their platform by recommen-ding them items that the algorithm think they will like based ontheir past browsing or purchasing history. But on the downside, ifnot checked, personalization can also lead users into a rabbit holeof problematic content. Our analysis of

Personalized audit revealedthat an echo chamber exists on Amazon where users performingreal-world actions on misinformative books are presented withmore misinformation in various recommendations. Just a singleclick on an anti-vaccine book could fill your homepage with severalother similar anti vaccine books. And if you proceed to add thatbook in your cart, Amazon again presents more anti-vaccine books, nudging you to purchase even more problematic content. The worstdiscovery is that your homepages get filled with more misinforma-tion if you just show an interest in a misinformative product (byclicking on it) compared to when you show an intention to buy it byadding product to your cart. Additionally on the product page itself,you are presented 5 different kinds of recommendations each ofwhich contains equally problematic content. In a nutshell, once youstart engaging with misinformative products on the platform, youwill be presented with more misinformative stuff at every point ofyour Amazon navigation route and at multiple places. These find-ings would not have been concerning if buying a milk chocolatewould lead to recommendations of other chocolates of differentbrands. The problem is that Amazon is blindly applying its algo-rithms on all products including problematic content. Its algorithmsdo not differentiate or give special significance to vaccine-relatedtopics. Amazon has learnt from users’ past viewing and purchasingbehaviour and has categorized all the anti-vaccine and other prob-lematic health cures together. It presents the problematic contentto users performing actions on any of these products, creating adangerous recommendation loop in the process. There is an urgentneed for the platform to treat vaccine and other health related topicsdifferently and ensure high quality searches and recommendations.In the next section, we present a few ways, based on our findings,that could assist the platform in combating health misinformation.

Tackling online health misinformation is a complex problem andthere is no easy silver-bullet solution to curb its spread. However,the first step towards addressing is accepting that there is a problem.Many tech giants have acknowledged their social responsibilityin ensuring high quality in health-related content and are activelytaking many steps to ensure the same. For example, Google’s policy“Your Money Or Your Life” classifies medical and health-relatedsearch pages as pages of particular importance, whose contentshould come from reputable websites [44]. Pinterest completelyhobbled the search results of certain queries such as ‘anti-vax’ [7]and limited the search results for other vaccine-related queries tocontent from officially recognized health institutions [37]. EvenFacebook—a platform known to have questionable content modera-tion policies—banned anti-vaccine advertisements and demoted theanti-vaccine content in its search results to make its access difficult[43]. Therefore, given the massive reach and user base of Amazon—206 million website visits every month [1]—it is disconcerting tosee that Amazon has not yet joined the bandwagon. Till date, ithas not taken any concrete steps towards addressing the problemof anti-vaccine content on its platform. Through our findings, werecommend several short-term and long-term strategies that theplatform can adopt.

The simplest shortterm solution would be to introduce design interventions. Our

Unpersonalized audit revealed high misinformation bias in searchresults. The platform can use interventions as an opportunity tocommunicate to users the quality of data presented to them bysignalling misinformation bias. The platform could introduce a biasmeter or scale that signals the amount of misinformation presentin search results every time it detects a vaccine-related query in its

HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra search bar. The bias indicators could be coupled with informationalinterventions like showing Wikipedia and encyclopedia links, thathave already been proven to be effective in reducing traffic to anti-vaccine content [40]. The second intervention strategy could be torecognise and signal source bias. During our massive annotationprocess, we realized that several health misinformative books havebeen written by known anti-vaxxers like Andrew Wakefield, JennyMccarthy, Robert S. Mendelsohn, etc. We also present a list ofauthors who have contributed to most misinformative books inTable 4. Imagine a design where users are presented with a message“The author is a known anti-vaxxer and is known to write books thatmight contain health minformation” every time they click a bookwritten by these authors. An another extreme short term solutioncould be to either enforce a platform-wide ban prohibiting sale ofany anti-vaccine product or hobble search results for anti-vaccinesearch queries.

Long term interventions would include modification ofsearch, ranking and recommendation algorithm. Our investigationsrevealed that Amazon’s algorithm has learnt problematic patternsthrough consumer’s past viewing and buying patterns. It has cate-gorized all products of similar stance together (see several edgesconnecting red nodes— products promoting misinformation in Fig-ure 11). In some cases, it has also associated some misinformativeproducts with neutral and debunking products (refer Figure 11)Amazon needs to “unlearn” this categorization. Additionally, theplatform should incorporate misinformation bias in their searchand recommendation algorithms to reduce the exposure to misinfo-rmative content. There is also an urgent need to introduce somepolicy changes. First and foremost, Amazon should stop promotinghealth misinformative books by sponsoring them. We found 98 mis-informative products in the sponsored recommendations indicatingthat today, anti-vaccine outlets can easily promote their productsby spending some money. Amazon should also introduce someminimum quality requirements that should be met before a productis allowed to be sponsored or sold on its platform. It can employsearch quality raters to rate the quality of search results for varioushealth-related search queries. Google has already set an examplewith its extensive Search Quality Rating process and guidelines[29, 30]. In recent times Amazon introduced several policy andalgorithmic changes including roll out of a new feature “verifiedpurchase” to curb fake reviews problem on its platform [57]. Simi-lar efforts are required to ensure product quality as well. Amazoncan introduce a similar “verified quality” or “verified claims” tagwith health-related products once they are evaluated by experts.Having a product base of millions of products can make any kindof review process tedious and challenging. Amazon can start bytargeting specific health and vaccine related topics that are mostlikely to be searched. Our work itself presents a list of most popularvaccine-related topics that can be used as a starting point. Can weexpect Amazon to make any changes to its current policies andalgorithms without sustained pressure? We believe audit studieslike ours are the way to reveal biases in the algorithms used bycommercial platforms so that there is more awareness about theissues which in turn would create pressure on the organizationto act. In the past, such audit studies have led platforms to make positive changes to their algorithms [54]. We hope our work actsas a call to action for Amazon and also inspires vaccine and healthaudits on other platforms.

Our study is not without limitations. First, we only considered topproducts in each recommendation-type present on a page while de-termining bias of the entire page. Annotating and determining biasof all the recommendations occurring in a page would give a muchmore accurate logic of recommendation algorithms. However, paststudies have shown that the top results receive the highest numberof clicks, thus, are more likely to receive attention from users [15].Second, search queries themselves have inherent bias. For examplequery ‘anti vaccine t-shirt’ suggests that user is looking for anti-vaxproducts. Higher bias in search results of neutral queries is muchworse than that of biased queries. We did not segregate our analysisbased on search query bias. Although, we did notice two neutralsearch queries namely ‘vaccine’ and ‘varicella vaccine’ appearingin the list of most problematic search-query and filter combinations.Third, while we audited various recommendations present on theplatform, we did not analyse the email recommendations—productrecommendations present outside the platform. A journalistic re-port pointed that email recommendations could be contaminatedtoo if a user shows an interest in a misinformative product but leavesthe platform without buying it [17]. We leave investigation of theserecommendations to future work. Fourth, in our

Personalized audit ,accounts only built history for a week. Moreover, experiments wereonly run on Amazon.com. We plan to continue to run our experi-ments and explore features such as geolocation for future audits.Fifth, our audit study only targeted results returned in response tovaccine-related queries. Since, Amazon is a vast platform that hostsvariety of products and sellers, we cannot claim that our results aregeneralizable for other misinformative topics or conspiracy theories.However, our methodology is generic enough to be applied to othermisinformative topics. Lastly, another major limitation of the studyis that in the

Personalized audit account histories were built in avery conservative setting. Accounts performed actions on only oneproduct each day. Additionally, the actions were only performed onproducts with the same stance. In real-world it will be tough to findusers who only add misinformative products in their carts for sevendays continuously. But in spite of this limitation, our study stillprovides a peek into the workings of Amazon’s algorithm and haspaved way for future audits that could use our audit methodologyand extensive qualitative coding scheme to perform experimentsconsidering complex real world settings.

In this study, we conducted two sets of audit experiments on apopular e-commerce platform, Amazon to empirically determinethe amount misinformation returned by its search and recommend-ation algorithm. We also investigated whether personalization dueto user history plays any role in amplifying misinformation. Ouraudits resulted in a dataset of 4,997 Amazon products annotated forhealth misinformation. We found that search results returned for uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan many vaccine-related queries contain large number of misinforma-tive products leading to high misinformation bias. Moreover, misin-formative products are also ranked higher than debunking products.Our study also suggests presence of a filter-bubble effect in reco-mmendations, where users performing actions on misinformativeproducts are presented with more misinformation in their home-pages, product page recommendations and pre-purchase recomm-endations. We believe, our proposed methodology to audit vaccinemisinformation can be applied to other platforms to investigatehealth misinformation bias. Overall, our study brings attention tothe need for search engines to ensure high standards and qualityof results for health related queries.

REFERENCES

Nature (London)

Human Vaccines & Immunotherapeu-tics

Proceedings of the2018 CHI Conference on Human Factors in Computing Systems (Montreal QC,Canada) (CHI ’18) . Association for Computing Machinery, New York, NY, USA,1–14. https://doi.org/10.1145/3173574.3174225[11] Le Chen, Alan Mislove, and Christo Wilson. 2016. An Empirical Analysis ofAlgorithmic Pricing on Amazon Marketplace. In

Proceedings of the 25th Interna-tional Conference on World Wide Web (Montréal, Québec, Canada) (WWW ’16) .International World Wide Web Conferences Steering Committee, Republic andCanton of Geneva, CHE, 1339–1349. https://doi.org/10.1145/2872427.2883089[12] Alessandro Cossard, Gianmarco De Francisci Morales, Kyriaki Kalimeri, YelenaMejova, Daniela Paolotti, and Michele Starnini. 2020. Falling into the EchoChamber: The Italian Vaccination Debate on Twitter.

Proceedings of the Inter-national AAAI Conference on Web and Social Media

14, 1 (May 2020), 130–140.https://ojs.aaai.org/index.php/ICWSM/article/view/7285[13] Enyan Dai, Yiwei Sun, and Suhang Wang. 2020. Ginger Cannot Cure Cancer:Battling Fake Health News with a Comprehensive Data Repository.

Proceedingsof the International AAAI Conference on Web and Social Media

Proceedingsof the National Academy of Sciences

Journal of Contemporary Medicine

Proceedings of the ACM on human-computer interaction

2, CSCW (2018), 1–20.[26] Pietro Ghezzi, Peter Bannister, Gonzalo Casino, Alessia Catalani, Michel Gold-man, Jessica Morley, Marie Neunez, Andreu Prados-Bo, Pierre Smeesters, Mari-arosaria Taddeo, Tania Vanzolini, and Luciano Floridi. 2020. Online Infor-mation of Vaccines: Information Quality, Not Only Privacy, Is an Ethical Re-sponsibility of Search Engines.

Frontiers in Medicine

Proceedings of the SIGCHI Conference on HumanFactors in Computing Systems (San Jose, California, USA) (CHI ’07) . Associationfor Computing Machinery, New York, NY, USA, 417–420. https://doi.org/10.1145/1240624.1240691[32] Aniko Hannak, Piotr Sapiezynski, Arash Molavi Kakhki, Balachander Krish-namurthy, David Lazer, Alan Mislove, and Christo Wilson. 2013. MeasuringPersonalization of Web Search. In

Proceedings of the 22nd International Conferenceon World Wide Web (Rio de Janeiro, Brazil) (WWW ’13) . Association for Comput-ing Machinery, New York, NY, USA, 527–538. https://doi.org/10.1145/2488388.2488435[33] Aniko Hannak, Gary Soeller, David Lazer, Alan Mislove, and Christo Wilson.2014. Measuring Price Discrimination and Steering on E-Commerce Web Sites. In

Proceedings of the 2014 Conference on Internet Measurement Conference (Vancouver,BC, Canada) (IMC ’14) . Association for Computing Machinery, New York, NY,USA, 305–318. https://doi.org/10.1145/2663716.2663744[34] M. Herr. 2017.

Writing and Publishing Your Book: A Guide for Experts in EveryField . Greenwood, USA. https://books.google.com/books?id=r2fuswEACAAJ[35] Desheng Hu, Shan Jiang, Ronald E. Robertson, and Christo Wilson. 2019. Auditingthe Partisanship of Google Search Snippets. In

The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19) . Association for Computing Machinery,New York, NY, USA, 693–704. https://doi.org/10.1145/3308558.3313654[36] Eslam Hussein, Prerna Juneja, and Tanushree Mitra. 2020. Measuring Misinfor-mation in Video Search Platforms: An Audit Study on YouTube.

Proceedings ofthe ACM on Human-Computer Interaction

Vaccine

28, 7 (2010), 1709–1716.[39] Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. Unequal Repre-sentation and Gender Stereotypes in Image Search Results for Occupations. In

Proceedings of the 33rd Annual ACM Conference on Human Factors in ComputingSystems (Seoul, Republic of Korea) (CHI ’15) . Association for Computing Machin-ery, New York, NY, USA, 3819–3828. https://doi.org/10.1145/2702123.2702520[40] Sangyeon Kim, Omer F. Yalcin, Samuel E. Bestvater, Kevin Munger, Burt L. Mon-roe, and Bruce A. Desmarais. 2020. The Effects of an Informational Intervention

HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra on Attention to Anti-Vaccination Content on YouTube.

Proceedings of the Inter-national AAAI Conference on Web and Social Media

14, 1 (May 2020), 949–953.https://ojs.aaai.org/index.php/ICWSM/article/view/7364[41] Silvia Knobloch-Westerwick, Benjamin K Johnson, Nathaniel A Silver, and AxelWesterwick. 2015. Science exemplars in the eye of the beholder: How exposure toonline science information affects attitudes.

Science Communication

37, 5 (2015),575–601.[42] Juhi Kulshrestha, Motahhare Eslami, Johnnatan Messias, Muhammad Bilal Zafar,Saptarshi Ghosh, Krishna P. Gummadi, and Karrie Karahalios. 2017. QuantifyingSearch Bias: Investigating Sources of Bias for Political Searches in Social Media.In

Proceedings of the 2017 ACM Conference on Computer Supported CooperativeWork and Social Computing (Portland, Oregon, USA) (CSCW ’17)

Proceedings of the Tenth Interna-tional Conference on Web and Social Media, Cologne, Germany, May 17-20, 2016

Proceedings of the 33rd Annual ACM Conference on Human Factors inComputing Systems (Seoul, Republic of Korea) (CHI ’15) . Association for Comput-ing Machinery, New York, NY, USA, 1345–1354. https://doi.org/10.1145/2702123.2702553[47] Bjarke Mønsted and Sune Lehmann. 2019. Algorithmic Detection and Analysisof Vaccine-Denialist Sentiment Clusters in Social Networks. arXiv:1905.12908http://arxiv.org/abs/1905.12908[48] Eni Mustafaraj, Emma Lurie, and Claire Devine. 2020. The Case for Voter-Centered Audits of Search Engines during Political Elections. In

Proceedings ofthe 2020 Conference on Fairness, Accountability, and Transparency (Barcelona,Spain) (FAT* ’20)

Cogni-tive Science

29, 3 (2005), 343–373. https://doi.org/10.1207/s15516709cog0000_20arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog0000_20[52] Cornelius Puschmann. 2019. Beyond the Bubble: Assessing the Diversity ofPolitical Search Results.

Digital Journalism

Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, andSociety (Honolulu, HI, USA) (AIES ’19)

Proceedings of the ACM on Human-Computer Interaction

Journal of Health Commu-nication

25, 5 (2020), 394–401. https://doi.org/10.1080/10810730.2020.1776423arXiv:https://doi.org/10.1080/10810730.2020.1776423 PMID: 32536257.[60] Carrie Spector. 2017. Stanford scholars observe ’experts’ to see how they evaluatethe credibility of information online. https://news.stanford.edu/press-releases/2017/10/24/fact-checkers-ouline-information/[61] Miriam Steiner, Melanie Magin, Birgit Stark, and Stefan Geiß. 2020. Seekand you shall find? A content analysis on the diversity of five searchengines’ results on political queries.

Information, Communication & So-ciety

0, 0 (2020), 1–25. https://doi.org/10.1080/1369118X.2020.1776367arXiv:https://doi.org/10.1080/1369118X.2020.1776367 [62] Daniel Trielli and Nicholas Diakopoulos. 2019. Search as News Curator: TheRole of Google in Shaping Attention to News Information. In

Proceedings of the2019 CHI Conference on Human Factors in Computing Systems (Glasgow, ScotlandUk) (CHI ’19) . Association for Computing Machinery, New York, NY, USA, 1–15.https://doi.org/10.1145/3290605.3300683[63] Toni GLA van der Meer and Yan Jin. 2020. Seeking formula for misinformationtreatment in public health crises: The effects of corrective information type andsource.

Health Communication

35, 5 (2020), 560–575.

A APPENDIX

The appendix contains a table (Table 9) of books annotated as pro-moting, neutral and debunking that were selected to build historyof accounts in the

Personalized audit as well as illustration of ourmulti-stage iterative coding process (Figure 14). Additionally, wegive details about our Amazon Mechanical Turk (AMT) task inAppendix, Section A.1.

A.1 Amazon Mechanical Turk Job

A.1.1 Turk job description.

In this section, we describe how weobtained annotations for our study from Amazon Mechanical Turkworkers (MTurks). Past research has shown that it is possible to getgood data from crowd-sourcing platforms like Amazon MechanicalTurk (AMT) if the workers are screened and trained for the crowd-sourced task [46]. Below we describe the screening process and ourannotation task briefly.

A.1.2 Screening.

To get high quality annotations, we screenedMTurks by adding 3 qualification requirements. First, we requiredMTurks to be Masters. Second, we required them to have atleast90% approval rating. And lastly, we required them to get a full scoreof 100 in a Qualification Test. We introduced a test to ensure thatMTurks attempting our annotation job had a good understanding ofthe annotation scheme. The test had one eligibility question askingthem to confirm whether they are affiliated to authors’ University.Other three questions involved Mturks to annotate three Amazonproducts (see Figure 18 for a sample question). First author anno-tated these products and thus, their annotation values were known.To ensure MTurks understood the task and annotation scheme, wegave detailed instructions and described each annotation value indetail with various examples of Amazon products in the qualifyingtest (Figures 15, 16 and 17). Examples were added as visuals. In eachexample, we marked the meta data used used for the annotationand explained why a particular annotation value was assigned tothe product (see Figure 17).We took two steps to ensure that instructions and test ques-tions were easy to understand and attempt. First, we posted thetest on subreddit r/mturk —a community of MTurks, to obtainfeedback. Second, we did a pilot run by posting ten tasks alongwith the aforementioned screening requirements. After obtainingpositive feedback from the community and successful pilot-run, wereleased our AMT job titled “Amazon product categorization task”.We paid the Turks according to the United States federal minimumwage ($7.25/hr). Additionally, we did not disapprove any worker’sresponses. A.1.3 Amazon product categorization task.

We posted 1630 anno-tations (tasks) in batches of 50 at a time. The job was setup to getthree responses for each annotation value. The majority response uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan

Personalized audit . Srepresents the star rating of the product and R denotes the number of ratings received by the book.

Amazon data

Sample 200 products

Interpret &analyzedata Developtopicalcategories Develop annotationscale, heuristics &guidelines Assign labelsto sampledproductsQualitative codification by Expert (1st author)

Sample 32 products

Annotationby 6researchers Discussionabout eachlabel ResolveconflictsFeedback on codification schemeCompile feedback& modify codingschemeRefining coding scheme Sample andannotate 100other productsFurther refinecodingscheme Feedback fromexternalresearcher Finalize thecoding scheme

Initial qualitative coding scheme multiple iterations Repeat the process 3 times

Figure 14: Our multi-stage iterative qualitative coding process to obtain a coding scheme for annotating Amazon products forhealth misinformation. was selected to label the Amazon product. To avoid any MTurk bias,we did not explicitly reveal that the idea behind the task was to getmisinformation annotations. We used the term "Amazon productcategorization" to describe our project and task throughout. For 34 products, all three MTurk responses differed. The first author thenannotated these products to get annotation values. Figure 19 showsthe interface of our AMT job.

HI ’21, May 8–13, 2021, Yokohama, Japan Prerna Juneja and Tanushree Mitra

Figure 15: Figure illustrating Qualification Test instructions. Test included 4 questions including one eligibility question re-quired to be added by authors’ University. A full score of 100 was required to qualify the test.Figure 16: Task description in the Qualification test. Same instructions were provided in the actual task.Figure 17: Example explaining Turks how to determine the annotation value. uditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japanuditing E-Commerce Platforms for Algorithmically Curated Vaccine Misinformation CHI ’21, May 8–13, 2021, Yokohama, Japan