[PDF] "It is just a flu": Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations

Abstract

The role played by YouTube's recommendation algorithm in unwittingly promoting misinformation and conspiracy theories is not entirely understood. Yet, this can have dire real-world consequences, especially when pseudoscientific content is promoted to users at critical times, such as the COVID-19 pandemic. In this paper, we set out to characterize and detect pseudoscientific misinformation on YouTube. We collect 6.6K videos related to COVID-19, the Flat Earth theory, as well as the anti-vaccination and anti-mask movements. Using crowdsourcing, we annotate them as pseudoscience, legitimate science, or irrelevant and train a deep learning classifier to detect pseudoscientific videos with an accuracy of 0.79. We quantify user exposure to this content on various parts of the platform and how this exposure changes based on the user's watch history. We find that YouTube suggests more pseudoscientific content regarding traditional pseudoscientific topics (e.g., flat earth, anti-vaccination) than for emerging ones (like COVID-19). At the same time, these recommendations are more common on the search results page than on a user's homepage or in the recommendation section when actively watching videos. Finally, we shed light on how a user's watch history substantially affects the type of recommended videos.

Full PDF

““It is just a ﬂu”: Assessing the Effect of Watch History on YouTube’sPseudoscientiﬁc Video Recommendations

Kostantinos Papadamou (cid:63) , Savvas Zannettou ∓ , Jeremy Blackburn † Emiliano De Cristofaro ‡ , Gianluca Stringhini (cid:5) , Michael Sirivianos (cid:63) (cid:63) Cyprus University of Technology, ∓ Max Planck Institute, † Binghamton University ‡ University College London, (cid:5)

Boston [email protected], [email protected], [email protected]@ucl.ac.uk, [email protected], [email protected]

Abstract

YouTube has revolutionized the way people discover and con-sume videos, becoming one of the primary news sources forInternet users. Since content on YouTube is generated by itsusers, the platform is particularly vulnerable to misinforma-tive and conspiratorial videos. Even worse, the role played byYouTube’s recommendation algorithm in unwittingly promot-ing questionable content is not well understood, and could po-tentially make the problem even worse. This can have dire real-world consequences, especially when pseudoscientiﬁc contentis promoted to users at critical times, e.g., during the COVID-19 pandemic.In this paper, we set out to characterize and detect pseudo-scientiﬁc misinformation on YouTube. We collect 6.6K videosrelated to COVID-19, the ﬂat earth theory, the anti-vaccination,and anti-mask movements; using crowdsourcing, we annotatethem as pseudoscience, legitimate science, or irrelevant. Wethen train a deep learning classiﬁer to detect pseudoscientiﬁcvideos with an accuracy of . %. Next, we quantify user ex-posure to this content on various parts of the platform (i.e., auser’s homepage, recommended videos while watching a spe-ciﬁc video, or search results) and how this exposure changesbased on the user’s watch history. We ﬁnd that YouTube’s rec-ommendation algorithm is more aggressive in suggesting pseu-doscientiﬁc content when users are searching for speciﬁc top-ics, while these recommendations are less common on a user’shomepage or when actively watching pseudoscientiﬁc videos.Finally, we shed light on how a user’s watch history substan-tially affects the type of recommended videos. User-generated video platforms like YouTube have exploded inpopularity over the course of the last decade [4]. For manyusers, it has also become one of the most important informa-tion sources for news, world events, and various topics [13, 45].Alas, platforms like YouTube are often fertile ground for thespread of misleading and potentially harmful information likeconspiracy theories and health-related disinformation [12, 19].YouTube (and other social media platforms) have struggled with mitigating the harm from this type of content, in part be-cause of the sheer scale and also because of the deployment ofrecommendation algorithms [63]. Pure machine learning mod-eration tools have thus far been insufﬁcient to moderate con-tent, and human moderators had to be brought back into theloop [61]. Additionally, the machine learning algorithms thatYouTube relies on to recommend content to users also recom-mend potentially harmful content [63, 46], and their opaquenature makes them difﬁcult to audit.For certain types of content, e.g., health-related topics, harm-ful videos can have devastating effects on society, especiallyduring crises like the COVID-19 pandemic [56]. For instance,since the beginning of the pandemic, we witnessed an explosionin the spread of pseudoscientiﬁc conspiracy theories and disin-formation, e.g., theories suggesting that COVID-19 is causedby 5G [41] or Bill Gates [62], or the notorious “Plandemic”conspiracy theory documentary [2]. Unlike the scientiﬁc pro-cess, where experts develop testable hypotheses and performexperiments to provide evidence for or against the hypothesis,conspiracy theories are built up from tenuous connections be-tween various events, with little to no actual evidence to supportthem. On user-generated video platforms like YouTube, thesehypotheses are often presented as facts, regardless of whetheror not they have been tested, whether any evidence exists, andwhether or not they have been widely debunked.Motivated by the pressing need to mitigate the spread ofpseudoscientiﬁc content, in this paper, we focus on detectingand characterizing pseudoscientiﬁc and conspiratorial contenton YouTube. We aim to assess how likely it is for users withdifferent watch histories to come across pseudoscientiﬁc con-tent on YouTube, as well as how YouTube’s recommendationalgorithm contributes to the discovery of pseudoscientiﬁc con-tent.

Research Questions.

More precisely, we set out to answer thefollowing research questions:

RQ1

Can we effectively detect and characterize pseudoscien-tiﬁc content on YouTube?

RQ2

What is the proportion of pseudoscientiﬁc content on thehomepage of a YouTube user and how is this affected bythe user’s watch history?1 a r X i v : . [ c s . C Y ] N ov Q3 What is the proportion of pseudoscientiﬁc content insearch results on YouTube? How are they affected bywatch history?

RQ4

What is the proportion of pseudoscientiﬁc content be-ing suggested to users when they just randomly browseYouTube?

Methodology.

To answer these questions, we look into 4 pseu-doscientiﬁc topics related to: 1) COVID-19, 2) ﬂat earth the-ory, 3) anti-vaccination, and 3) anti-mask movement. We col-lect 6.6K unique videos and use crowdsourcing to label them asone of three categories: 1) science; 2) pseudoscience; or 3) ir-relevant. We then train a deep learning classiﬁer to detect pseu-doscientiﬁc content across multiple topics on YouTube. Ourexperimental evaluation shows that the classiﬁer outperformsSVM, Random Forest, and a BERT [18]-based classiﬁer, reach-ing . accuracy (RQ1).The classiﬁer allows us to design and perform experimentsto address RQ2–RQ4. More speciﬁcally, we use three carefullycrafted user proﬁles, each one with a different watch history,while all other account information remains the same. We alsoperform experiments using a browser without a Google Ac-count to simulate non-logged-in users, and using exclusivelythe YouTube Data API. To build the watch history of the threeuser proﬁles, we devise a methodology to identify the mini-mum amount of videos that must be watched by a user beforeYouTube’s recommendation algorithm starts generating morepersonalized recommendations. We build three different pro-ﬁles: 1) a user interested in scientiﬁc content; 2) a user inter-ested in pseudoscientiﬁc content; and 3) a user interested inboth scientiﬁc and pseudoscientiﬁc content. Using these pro-ﬁles, we perform three experiments to quantify the user’s expo-sure to pseudoscientiﬁc content on various parts of the platformand how this exposure changes based on a user’s watch history. Findings.

Overall, our study leads to the following ﬁndings:1. The watch history of the user substantially affects searchresults and related video recommendations.2. Pseudoscientiﬁc videos are more likely to appear in searchresults than in the video recommendations section or thehomepage of a user.3. In traditional pseudoscience topics (e.g., ﬂat earth), thereis a higher rate of recommended pseudoscientiﬁc con-tent than in more recent topics like COVID-19, anti-vaccination, and anti-mask. For COVID-19, we also ﬁndan even smaller amount of pseudoscientiﬁc content beingsuggested, which may indicate that YouTube took partlyeffective measures to mitigate pseudoscientiﬁc misinfor-mation related to the COVID-19 pandemic.

Contributions.

We present the ﬁrst study focusing on multiplepseudoscientiﬁc topics on YouTube while accounting for theeffect of a user’s watch history. To do so, we build YouTubeuser proﬁles that are representative of users viewing pseudosci-entiﬁc or scientiﬁc content. Our methodology can be re-usedfor other studies focusing on other topics of interest.We will also publish, along with the ﬁnal version of thepaper, our ground truth dataset, the classiﬁer, and the sourcecode/crawlers used in our experiments; we are conﬁdent that

Pseudoscientiﬁc Topic

COVID-19 378 1,645Flat Earth 200 1,211Anti-vaccination 346 1,759Anti-mask 199 912

Total 1,123 5,527

Table 1: Overview of the collected data: number of seed videosand number of their recommended videos.

Topic

COVID-19 607 368 721Flat Earth 162 375 707Anti-vaccination 363 394 1,060Anti-mask 65 188 724

Total 1,197 1,325 3,212

Table 2: Overview of our ground truth dataset.this will enable the research community to shed additional lighton YouTube’s recommendation algorithm and its potential in-ﬂuence on users’ consumption patterns.

In this section, we present our data collection and crowdsourcedannotation methodology. We collect a set of YouTube videosrelated to scientiﬁc topics and then use crowdsourcing to createa ground truth dataset of videos that are pseudoscientiﬁc or not.

Since we aim to automatically detect video content that ispseudoscientiﬁc, we collect a set of YouTube videos related toseveral scientiﬁc and pseudoscientiﬁc topics. To do this, weﬁrst create a list of four topics whose popularity has increasedover the last years: 1) COVID-19 [22], 2) the anti-vaccinemovement [11], 3) the anti-mask movement [51], and 4) theﬂat earth theory [55].Next, we use the YouTube Data API [17], which providesmetadata of videos uploaded on YouTube, and we perform asearch query for each selected topic obtaining the ﬁrst 20 videosas returned by YouTube’s Data API search functionality. Werefer to those videos as the “seed” videos of our data collec-tion methodology. Additionally, for each seed video, we collectthe top 10 recommended videos associated with it, as returnedby the YouTube Data API. We perform our data collection be-tween August 1, 2020 and August 10, 2020, collecting 6.6Kunique videos in total (1.1K seed videos and 5.5K videos thatare recommended from the seed videos). Table 1 summarizesour dataset.For each video in our dataset, we collect the following: 1) thetranscript of the video; 2) video snippet, which is the concate-nation of the video title and description; 3) a set of tags deﬁnedby the uploader; 4) video statistics such as the number of views,likes, etc.; and 5) the 200 top comments, deﬁned by YouTube’srelevance metric, without their replies.2 ideoTagsVideoSnippetVideo CommentsVideoTranscript

VIDEOINPUT fastText EmbeddingLayerfastTextEmbeddingLayerfastTextEmbeddingLayerfastTextEmbeddingLayer

Softmax Layer(Dense Layer)

Dropoutd=0.5Dropoutd=0.5 Dropoutd=0.5

Fusing Network

Figure 1: Architecture of our deep learning classiﬁer for the detection of pseudoscientiﬁc videos.

To create a ground truth dataset of scientiﬁc and pseudosci-entiﬁc videos, we use the Appen [9] platform to get crowd-sourced annotations for all the collected videos. Each video ispresented to three annotators that inspect its content and meta-data to assign one of three labels:•

Science.

A video falls under the “Science” category whenit contains content that is related to any scientiﬁc ﬁeldthat systematically studies the structure and behavior ofthe natural world or humanity’s artifacts (e.g., Chemistry,Biology, Mathematics, Computer Science, etc.). Videosthat debunk science-related conspiracy theories (e.g., ex-plaining why the 5G technology is not harmful) also fallunder this category. For example, a COVID-19 video withan expert estimating the total number of cases or excessdeaths falls under this category if the estimation is basedon the scientiﬁc consensus and ofﬁcial data.•

Pseudoscience.

A video falls under the ”Pseudoscience”category when it contains content that meets at least oneof the following criteria: (a) holds a view of the world thatgoes against the scientiﬁc consensus (e.g., anti-vaccingmovement); (b) consists of statements or a belief that isself-fulﬁlling or unfalsiﬁable (e.g., Meditation [58], Reikihealing [65], etc.); (c) develops hypotheses that are notevaluated following the scientiﬁc method (e.g., Astrol-ogy); or (d) explains events as secret plots by powerfulforces rather than as overt activities or accidents (e.g., the5G-coronavirus conspiracy theory).•

Irrelevant.

We consider a video “Irrelevant” when it con-tains content that is not relevant to any scientiﬁc ﬁeld anddoes not fall under the Pseudoscience category. For ex-ample, movie trailers, music videos, and cartoon videos are considered irrelevant. Conspiracy theory debunkingvideos that are not relevant to a scientiﬁc ﬁeld are con-sidered irrelevant (e.g., a video debunking the Pizzagateconspiracy theory).

Annotation.

The annotators were given instructions on whatconstitutes scientiﬁc and pseudoscientiﬁc content using appro-priate descriptions and several examples and were offered $0 . per annotation. Each video is annotated by three annotators. Toease the annotation process, we provide a clear description ofthe annotation task, our labels, as well as all the video informa-tion that an annotator needs to inspect and correctly annotatea video. Screenshots of the instructions are available, anony-mously, at [8].The platform provides no demographic information about theannotators, other than an assurance that they are experiencedannotators with high accuracy in other tasks. To assess thequality of the annotators, before allowing them to submit anno-tations we ask them to annotate 5 test videos randomly selectedfrom a set of 54 test videos (20 science, 21 pseudoscience, and13 irrelevant) annotated by the ﬁrst author of this paper. An an-notator can submit annotations only when at least 3 out of the 5test videos are annotated correctly.We also calculate the Fleiss’ Kappa Score ( k ) [25] to assessthe agreement of annotators. In the end, we get k = 0 . ,which is considered “slight” agreement [36]. This relativelylow agreement score is not surprising due to the subjective na-ture of the problem. For each video, we assign one of the la-bels according to the majority agreement of all the annotators,except a small percentage ( . ) where all annotators dis-agreed among each other, which we exclude from our groundtruth dataset. The ﬁnal ground truth dataset includes 1,197 sci-ence, 1,325 pseudoscience, and 3,212 irrelevant videos. Table 2shows the number of videos from each class for each of the 4topics considered.3 .3 Ethics We only collect publicly available data, we make no at-tempt to de-anonymize users, and overall follow standard ethi-cal guidelines [6, 20, 53]. We also note that we obtained adviceand ethics approval by the ﬁrst author’s national ethics com-mittee to ensure that our crowdsourced annotation process doesnot pose risks to the annotators despite the occasionally harmfulnature of the misinformative material.

To train and test a classiﬁer that detects pseudoscientiﬁc videoswe use our ground truth dataset of 5,734 videos. Since ouraim is to train a classiﬁer able to discern pseudoscientiﬁcvideos from science and irrelevant videos, we decide to col-lapse our three labels into two, by combining the science withthe irrelevant videos into one “Other” category resulting ina ground truth dataset that contains 1,325 pseudoscience and4,409 “Other” videos.Below we provide a description of the input features, as wellas the architecture of our proposed classiﬁer. We perform anexperimental evaluation to assess the performance of the clas-siﬁer and an ablation study to understand which of the inputfeatures contribute the most to the classiﬁcation task.

Figure 1 depicts the architecture of our proposed deep learn-ing classiﬁer. The classiﬁer consists of four different branches,where each branch processes a distinct input feature type: snip-pet, video tags, transcript, and the top 200 comments of a video.In the end, the outputs of all four branches are concatenated toform a ﬁve-layer, fully-connected neural network that mergestheir output and drives the ﬁnal classiﬁcation.The classiﬁer uses fastText [23], a library implemented byFacebook for efﬁcient learning of word and document-levelvector representations, as well as for sentence classiﬁcation.We use fastText to generate vector representations (embed-dings) for all the available video metadata in text. Speciﬁcally,for each input feature, we use the pre-trained fastText mod-els released in [43] and we ﬁne-tune them using each of ourcorresponding input features. This allows us to extract a 300-dimensional vector representation for each of the following in-put features of our dataset:

Snippet.

The snippet is the concatenation of the title and thedescription of a video.

Tags.

Tags are words deﬁned by the uploader of a video todescribe the content of the video.

Transcript.

We consider the transcript of the video, whichcomprises the subtitles uploaded by the creator of the video orauto-generated by YouTube. The transcript is one of the mostimportant features since it describes the content of the video.For the transcript, the classiﬁer uses the ﬁne-tuned model tolearn a vector representation of the concatenated text of thetranscript.

Classiﬁer Accuracy Precision Recall F1 Score

SVM 0.681 0.722 0.681 0.696Random Forest 0.721 0.703 0.721 0.711BERT-based Classiﬁer 0.734 0.645 0.734 0.672

Proposed Classiﬁer 0.761 0.736 0.761 0.742

Table 3: Performance of the evaluated baselines and of the pro-posed deep learning classiﬁer. T r u e P o s i t i v e R a t e SVM (AUC=0.667)Random Forest (AUC=0.620)BERT-based Classifier (AUC=0.557)Proposed Classifier (AUC=0.674)

Figure 2: ROC curves (and AUC) of all the evaluated baselinesand the proposed deep learning classiﬁer.

Comments.

We consider the top 200 comments of the video asreturned by the YouTube Data API, without their replies. Weﬁrst concatenate the comments of each video and use them toﬁne-tune the fastText model and extract vector representations.The second part of the classiﬁer (the “Fusing Network” inFigure 1) is essentially a four-layer, fully-connected, dense neu-ral network. At ﬁrst, we use a Flatten utility layer to merge theoutputs of the four branches of the ﬁrst part of the classiﬁer,creating a 1200-dimensional vector. This vector is processedby the four subsequent layers comprising 256, 128, 64, and 32units, respectively, with ReLU activation. To avoid overﬁtting,we regularize using the Dropout technique [57]. More speciﬁ-cally, at each one of the four fully-connected layers, we applya Dropout level of d = 0 . , which means that during each it-eration of training half of the units of each layer do not updatetheir parameters. Finally, the output of the Fusing Network isfed to the last dense-layer neural network of two units with soft-max activation, which essentially yields the probabilities that aparticular video is pseudoscientiﬁc or not. We implement the deep learning classiﬁer using Keras [15]with Tensorﬂow as the back-end [1]. We use ten-fold stratiﬁedcross-validation [10], training and testing the classiﬁer for bi-nary classiﬁcation using all the aforementioned input features.To deal with data imbalance, we use the Synthetic MinorityOver-sampling Technique (SMOTE) [14] and oversample onlythe training set at each fold. For stochastic optimization, we usethe Adam algorithm with an initial learning rate of − , and (cid:15) = 1e − .We then compare the performance of the classiﬁer, in termsof accuracy, precision, recall, F1 score, and area under the ROCcurve (AUC), against the following three baselines: 1) a Sup-port Vector Machine (SVM) classiﬁer with parameters γ = 0 . a) Homepage (b) Search Results (c) Video Recommendations Figure 3: The three main parts of the YouTube platform that we consider in our experiments: (a) homepage; (b) search resultspage; and (c) video recommendations section.and C = 10 . , 2) a Random Forest classiﬁer with an en-tropy criterion and number of minimum samples leaf equal to2, and 3) a Deep Neural Network with the same architecture asour proposed classiﬁer but using Google’s BERT method [18]and more speciﬁcally a pre-trained BERT model [60] to learndocument-level representations from all the available input fea-tures (BERT-based). For hyper-parameter tuning of baselines(1) and (2), we use the grid search strategy, while for (3) weuse the same hyper-parameters as the proposed classiﬁer. Fora fair comparison, all evaluated models use all available inputfeatures.Table 3 reports the performance of all classiﬁers, while Fig-ure 2 plots their ROC curves. We observe that our classiﬁeroutperforms all baseline models across all performance met-rics. Speciﬁcally, compared to Random Forest, which has thebest overall performance among the baselines, we improve ac-curacy, precision, recall, F1 score, and AUC by . , . , . , . , and . , respectively. Ablation Study.

To understand which of the input features con-tribute the most to the classiﬁcation of pseudoscientiﬁc videoswe perform an ablation study. That is, we systematically re-move each of the four input feature types (as well as their as-sociated branch in the proposed classiﬁer’s architecture), andretrain the classiﬁer. Again, we use ten-fold cross-validationand the oversampling technique to deal with data imbalance.Table 4 shows the performance metrics of all the classiﬁers foreach possible combination of inputs. For the single input clas-siﬁers, we observe that the comments and the transcript of thevideo yield the best performance, indicating that they are themore informative input features. For the classiﬁers trained withcombinations of three input features, we observe similar perfor-mance. However, by using all the available input features weachieve better performance, which indicates that all four inputfeatures are important for the classiﬁcation task.

Remarks.

Although the proposed classiﬁer outperforms all thebaselines, its accuracy points to the subjective nature of scien-tiﬁc vs. pseudoscientiﬁc content on YouTube. Also, the lowagreement score of our crowdsourced annotation brings out the

Input Features Accuracy Precision Recall F1 Score

Snippet 0.727 0.715 0.737 0.717Tags 0.709 0.714 0.709 0.706Transcript 0.759 0.737 0.759 0.743Comments 0.761 0.701 0.761 0.692Snippet, Tags 0.752 0.730 0.752 0.730Snippet, Transcript 0.727 0.733 0.727 0.725Snippet, Comments 0.735 0.723 0.735 0.725Tags, Transcript 0.742 0.725 0.742 0.730Tags, Comments 0.723 0.711 0.723 0.715Transcript, Comments 0.749 0.731 0.749 0.738Snippet, Tags, Transcript 0.743 0.731 0.743 0.729Snippet, Tags, Comments 0.749 0.726 0.749 0.733Snippet, Transcript, Comments 0.730 0.727 0.730 0.726Tags, Transcript, Comments 0.735 0.724 0.735 0.727

All Features 0.761 0.736 0.761 0.742

Table 4: Performance of the proposed classiﬁer trained with allthe possible combinations of the four input feature types.difﬁculty in identifying whether a video is pseudoscientiﬁc, andit is also evidence of the hurdles in devising models that auto-matically discover pseudoscientiﬁc content. However, we ar-gue that our classiﬁer can detect pseudoscientiﬁc content withacceptable performance and can provide a meaningful signalof the behavior of YouTube’s recommendation algorithm withregards to recommending pseudoscientiﬁc content (RQ1).

In this section, we analyze the prominence of pseudoscientiﬁcvideos on various parts of the platform (i.e., homepage, searchresults, and video recommendations) using a variety of experi-ments.

We focus our analysis on three parts of the platform: 1) thehomepage; 2) the search results page; and 3) the video recom-5endations section (recommendations when watching videos).Figure 3 shows an example of each part. In our experiments, weaim to simulate the behavior of users with varying interests thatwatch videos on YouTube, and measure how the watch historyaffects the recommendation of pseudoscientiﬁc content.To do so, we create three different Google accounts, each onewith a different watch history, while all other account informa-tion is the same to avoid confounding effects caused by proﬁledifferences. Additionally, we perform experiments without aGoogle Account to simulate not logged-in users, as well as us-ing the YouTube Data API (if the API provides the requiredfunctionality) to investigate the differences between YouTubeas an application and the API.

User Proﬁle Creation.

All three Google accounts were manu-ally created and phone veriﬁcation was performed. Accordingto Hussein et al. [30], once a user forms a watch history, userproﬁle attributes (e.g., demographics, geolocation) affect futurevideo recommendations. Hence, since we are only interested inthe watch history, each account has the same proﬁle: 30 yearsold, female. To minimize the likelihood of Google automati-cally detecting our user proﬁles, we carefully crafted each oneassigning them a unique name and surname, while we read allintroductory emails and performed standard phone veriﬁcation.To the best of our knowledge, none of the created user proﬁleswere banned or ﬂagged by Google during or after our experi-ments.

Watch History.

Next, we build the watch history of eachproﬁle aiming to create the following three proﬁles: 1) auser interested in legitimate science videos (“Science Proﬁle”);2) a user interested in pseudoscientiﬁc content (“PseudoscienceProﬁle”); and 3) a user interested in both science and pseudo-science videos (“Science/Pseudoscience Proﬁle”).To ﬁnd the minimum number of videos required to bewatched by a user for YouTube to understand the user’s in-terests and generate more personalized recommendations, weuse a newly created Google account with no watch history andwe perform the following experiment. First, we randomly se-lect a video, which we refer to as the “reference” one, fromthe “COVID-19” pseudoscientiﬁc videos of our ground truthdataset and we collect its top 20 recommended videos. Next, wecreate a list of 100 randomly selected videos from the “COVID-19” pseudoscientiﬁc videos of our ground truth dataset, and werepeat the following process iteratively:1. We start by watching a video from the list of the randomlyselected pseudoscientiﬁc videos;2. We visit the reference video and we collect the top 20 rec-ommendations, store them, and compare them using theJaccard similarity index with all the recommendations ofthe reference video collected in the previous iterations;3. If all the recommended videos of the reference video at thecurrent iteration have also been recommended in the pre-vious iterations then we stop our experiment; otherwise,we delete the watch history of the user, we increase thenumber of videos we watch at Step 1 by one, and proceedto the next iteration.

ScienceProfile PseudoscienceProfile Science/PseudoscienceProfile No Profile(Browser) % un i q u e p s e u d o s c i e n t i f i c v i d e o s Figure 4: Percentage of unique pseudoscientiﬁc videos foundin the homepage of each user proﬁle.We ﬁnd that the minimum amount of videos required to bewatched by a user in order for YouTube to start generating morepersonalized recommendations is 22.Finally, we select the most popular science and pseudo-science videos from our ground truth dataset, based on the num-ber of views, likes, comments, etc., and use them to personalizethe proﬁles of each one of the three Google Accounts. Sinceit is not clear how the satisfaction score on videos is measuredby YouTube and how watch time affects this score, during pro-ﬁle training we always watch the same proportion of the video( of the total duration) and always like the videos we watch.

Controlling for noise.

Some differences in search results andrecommendations are likely due to other factors than the user’swatch history and personalization in general. To diminish thepossibility of such noise affecting our results, we take the fol-lowing steps: 1) Experiments with identical search queries forall accounts are executed in parallel to avoid updates to searchresults over time for speciﬁc search queries; 2) All requeststo YouTube are sent through the same US-based proxies toavoid location-related issues (i.e., differences in localized re-sults); 3) We perform all experiments using the same browseruser-agent and operating system; 4) To avoid the carry-over ef-fect (previous search and watch activity affecting subsequentsearches and recommendations), at each repetition of our ex-periments, we use the “Delete Watch and Search History” [28]to delete the activity of the user on YouTube from the date af-ter the user proﬁles were build; and 5) Similarly to the proﬁles’watch history creation, in our experiments we always watch thesame proportion of the video ( of the total duration).

Implementation.

The experiments are written as customscripts using Selenium [44] in Python 3.7. We use Seleniumsince it provides all the features we need and allows for full con-trol of the behavior and the conﬁguration of the browser (e.g.,cookies management). The Selenium WebDriver also offers abroad range of features including JavaScript execution, whichallows for more realistic simulations. For each Google Ac-count, we create a separate Selenium instance for which we seta custom data directory, thus being able to perform manual ac-tions on the browser before starting our experiments, e.g., per-forming Google authentication, installing AdBlock Plus [49] toprevent advertisements within YouTube videos from interfer-ing with our simulations, etc. Finally, for all our experiments,6 % r a n d o m w a l k s w i t h a t l e a s t o n e p s e u d o s c i e n t i f i c v i d e o Anti-Mask (a) Science Profile(b) Pseudoscience Profile (c) Science/Pseudoscience Profile(d) No Profile (Browser) (e) No Profile (API) (a) (b) (c) (d) (e)020406080100 % un i q u e p s e u d o s c i e n t i f i c v i d e o s COVID-19 (a) (b) (c) (d) (e)020406080100 % un i q u e p s e u d o s c i e n t i f i c v i d e o s Flat Earth (a) (b) (c) (d) (e)020406080100 % un i q u e p s e u d o s c i e n t i f i c v i d e o s Anti-vaccination (a) (b) (c) (d) (e)020406080100 % un i q u e p s e u d o s c i e n t i f i c v i d e o s Anti-mask

Figure 5: Percentage of unique pseudoscientiﬁc videos found in the search results of each user proﬁle.we use Chromedriver 83.0.4 with user-agent Chrome 83 run-ning on Ubuntu 16.04. During the execution, the Chromedriverruns in headless mode and each Selenium instance remains inmemory and stores all received cookies.

We begin by assessing the degree of the problem of pseu-doscientiﬁc content on the homepage of a user on YouTube.To do so, using each one of the three user proﬁles (Science,Pseudoscience, and Science/Pseudoscience), as well as anotheruser with no account (No Proﬁle) that simulates the behaviorof not logged-in users, we visit the homepage of the user andcollect and classify the top 20 videos as ranked by YouTubeon the homepage of each user. We repeat this experiment 20times with a wait time of 10 minutes between each experiment.Note that we cannot perform this experiment using the YouTubeData API since this functionality is not supported by the API.We repeat the same experiment multiple times (20 times in thiscase) since YouTube shows different videos on the homepageeach time a user visits YouTube. We perform this experimentbetween September 26, 2020 and September 27, 2020.Figure 4 shows the percentage of unique pseudoscien-tiﬁc videos in the homepage of each user proﬁle. Weﬁnd that, . , . , . , and . , of all theunique videos encountered by the Science, Pseudoscience, Sci-ence/Pseudoscience, and the No proﬁle (browser), respectively,are pseudoscientiﬁc. Overall, all user proﬁles receive a similaramount of pseudoscientiﬁc content on their homepage. This in-dicates that YouTube recommends pseudoscientiﬁc content tousers’ homepages irrespectively of whether they have watchedsuch videos in the past and even of whether they have watcheda lot of benign, or even contradictory videos (e.g., Sciencevideos). Next, we focus on quantifying the prevalence of pseudosci-entiﬁc content when users search for videos on YouTube. Forthis experiment, we use the 4 pseudoscientiﬁc topics used tocreate our ground truth dataset and, for each topic, we performsearch queries on YouTube. For each search query, we retrievethe top 10 videos and use our classiﬁer to classify each videoin the result set. We repeat this experiment 20 times for each pseudoscientiﬁc topic using all three user proﬁles, as well astwo users with no proﬁle (one using a browser and another oneusing YouTube’s Data API). Recall that we delete the user’swatch history between each experiment repetition as well as be-tween the experiments performed with different search queriesto ensure that future search results are not affected by previousactivity other than our initial, controlled watch history of theuser. We perform this experiment between September 27, 2020and October 1, 2020.Overall, we ﬁnd a big variation in the results across pseu-doscientiﬁc topics (see Figure 5). For more traditional pseu-doscientiﬁc topics like “Flat Earth”, YouTube search returnsmore pseudoscientiﬁc content compared to other pseudoscien-tiﬁc topics. Furthermore, for more controversial and emerg-ing topics like “Anti-vaccination” and “Anti-mask,” most of thevideos returned by YouTube are pseudoscientiﬁc .On the other hand, for topics like “COVID-19,” the majorityof the recommended videos are not pseudoscientiﬁc, suggest-ing that YouTube’s recommendation algorithm does a better jobin recommending less harmful videos (at least for COVID-19).In addition, as for this topic, the user proﬁles (i.e., the watchhistory) affect the amount of pseudoscientiﬁc videos that arerecommended to a user, since the users with pseudoscience andscience/pseudoscience watch history receive a higher propor-tion of pseudoscientiﬁc content than the user with the sciencewatch history. The fact that for “COVID-19,” YouTube rec-ommends much less pseudoscientiﬁc content may be relatedto the fact that YouTube has made substantial efforts to tackleCOVID-related misinformation [33], even publishing an of-ﬁcial policy speciﬁcally for COVID-19 medical misinforma-tion [27]. This is not the case, however, for other controversialpseudoscientiﬁc topics like “Anti-vaccination” or “Anti-mask.”Nevertheless, YouTube has recently announced that they willalso attempt to target COVID-19 vaccine misinformation [64].

Finally, we set out to assess how prominent the problemof pseudoscientiﬁc content is, on a large scale, by perform-ing controlled, live random walks on YouTube’s recommenda-tion graph, while again measuring the effect of user’s watchhistory. This lets us simulate the behavior of users with dif-ferent interests who search the platform for a video and thensubsequently watch several videos according to recommenda-7 % p s e u d o s c i e n t i f i c v i d e o s Anti-Mask

Science ProfilePseudoscience Profile Science/Pseudoscience ProfileNo Profile (Browser) No Profile (API) % p s e u d o s c i e n t i f i c v i d e o s COVID-19 % p s e u d o s c i e n t i f i c v i d e o s Flat Earth % p s e u d o s c i e n t i f i c v i d e o s Anti-vaccination % p s e u d o s c i e n t i f i c v i d e o s Anti-mask

Figure 6: Percentage of pseudoscientiﬁc videos over all unique videos that the random walker encounters at hop k per user proﬁle.tions. Note that in YouTube’s recommendation graph, videosare nodes and video recommendations are directed edges con-necting a video to its recommended videos. For example, aYouTube video page can be seen as a snapshot of YouTube’srecommendation graph showing a single node (video) and allthe directed edges to all its recommended videos in the graph.For the simulations, we use the 4 pseudoscientiﬁc topics con-sidered for the creation of our ground truth dataset. For eachpseudoscientiﬁc topic, we initially perform a search query onYouTube and randomly select one video from the top ten searchresults. Then, we watch the selected video, we obtain its topten recommended videos, and we randomly select one. Again,we watch the randomly selected video and we randomly selectone of its top ten recommendations. Following this process,we simulate the behavior of a user who watches videos basedon recommendations selecting randomly the next video fromamong the top ten recommendations of the current video untilwe reach ﬁve hops (i.e., 6 total videos viewed), which consti-tutes the end of a single live random walk. We repeat this pro-cess for 20 random walks for each search term related to ourpseudoscientiﬁc topics, while at the same time classifying eachvideo we visit, using our classiﬁer. We perform this experi-ment for all the three Google accounts, the user with no proﬁle(browser), and using the YouTube Data API between October1, 2020 and October 12, 2020.Next, for the live random walks of each user proﬁle we calcu-late the percentage of pseudoscientiﬁc videos encountered overall unique videos that the random walker reaches up to the k-th hop. Figure 6 plots this percentage per hop for each of thepseudoscientiﬁc topics explored.When observing the percentage of pseudoscientiﬁc videosencountered by each user proﬁle in all the random walks ofeach pseudoscientiﬁc topic, we unveil some interesting ﬁnd-ings. Initially, we observe that for “COVID-19,” “Flat Earth,”and “Anti-vaccination” the amount of pseudoscientiﬁc contentbeing suggested to the Pseudoscience proﬁle after ﬁve hopsis higher than the Science proﬁle (see individual plots in Fig-ure 6). More precisely, the amount of unique pseudoscientiﬁcvideos encountered by the Pseudoscience proﬁle after ﬁve hopsis . , . , and . for “COVID-19,” “Flat Earth,”and “Anti-vaccination,” respectively, while for the Science pro-ﬁle is . , . , and . However, this is not the casefor relatively new and more emerging topics like “Anti-mask,”where the Science proﬁle is being suggested a higher propor- tion of pseudoscientiﬁc content than the Pseudoscience proﬁleafter ﬁve hops (see “Anti-mask” in Figure 6).Interestingly, we also ﬁnd that, for more traditional pseu-doscientiﬁc topics like “Flat Earth,” YouTube suggests morepseudoscientiﬁc content to all type of users compared to theother three more recent pseudoscientiﬁc topics, another indica-tion that YouTube has taken measures to counter the spread ofpseudoscientiﬁc misinformation related to important topics likethe COVID-19 pandemic.Overall, we ﬁnd that in most cases the watch history of theuser does affect user recommendations and the amount of pseu-doscientiﬁc content being suggested by YouTube’s algorithm.This is also observed when looking at the results of the ran-dom walks performed on the browser by the user with no pro-ﬁle. This proﬁle does not maintain a watch history and is rec-ommended less pseudoscientiﬁc content than all the other userproﬁles after ﬁve hops in almost all random walks. Finally,the results of the random walks performed using the YouTubeData API do not consistently follow the trends of browser basedwalks with user proﬁles across all topics. For example, in therandom walks of “Flat Earth”, “Anti-vaccination”, and “Anti-mask” we observe a higher amount of pseudoscientiﬁc contentbeing suggested after ﬁve hops than any of the other walks.

We now summarize the proportion of pseudoscientiﬁc con-tent found from the experiments discussed above and the maintake ways from them. Table 5 reports the percentage of uniquepseudoscientiﬁc videos appearing in the YouTube homepage,search results, and the video recommendations section for eachuser proﬁle out of all the unique videos encountered by eachuser proﬁle in each experiment.The highest percentage of pseudoscientiﬁc videos occur inthe search results section of YouTube. When looking at theresults of each individual experiment, we make some interest-ing observations. First, for the search results experiment, weobserve that for all pseudoscientiﬁc topics the Science proﬁleencountered more pseudoscientiﬁc content when searching forthese topics than the Pseudoscience proﬁle, except for COVID-19, where the Pseudoscience proﬁle encountered more pseudo-scientiﬁc content. When it comes to video recommendations, inall the random walks of all topics except anti-mask, the Pseudo-science proﬁle encountered more pseudoscientiﬁc content thanthe Science proﬁle.8 art of the YouTube Platform Pseudoscientiﬁc Science Pseudoscience Science/Pseudoscience No Proﬁle No ProﬁleTopic Proﬁle Proﬁle Proﬁle (Browser) (API)Homepage (Top 20) - .

4% 26 .

5% 25 .

6% 26 . - Search Results (Top 10)

COVID-19 .

6% 33 .

8% 36 .

9% 33 .

0% 37 . Flat Earth .

0% 100 .

0% 100 . Anti-vaccination .

5% 86 .

1% 92 .

6% 90 .

0% 90 . Anti-mask .

6% 69 .

2% 68 .

2% 80 .

0% 54 . All Topics .

8% 68 .

2% 71 .

0% 69 . Video Recommendations

COVID-19 .

2% 27 .

9% 14 .

8% 15 .

8% 10 . Flat Earth .

4% 53 .

3% 33 .

7% 34 .

7% 60 . Anti-vaccination .

8% 29 .

9% 32 .

9% 31 .

0% 32 . Anti-mask .

7% 11 .

5% 18 .

6% 6 .

4% 30 . All Topics .

4% 29 .

4% 24 .

7% 22 .

6% 29 . Table 5: Percentage of unique pseudoscientiﬁc videos out of all videos encountered by each user proﬁle in each pseudoscientiﬁctopic in the three main parts of the YouTube platform.Overall, the main take-away points from our analysis in-clude:1. The YouTube search results and video recommendationsexperiments show that the watch history of the user sub-stantially affects what videos are suggested to the user.2. It is more likely to encounter pseudoscientiﬁc videos in thesearch results page of the platform (i.e., when searchingfor a speciﬁc topic) than in the video recommendationssections or the homepage of a user.3. For traditional pseudoscience topics (e.g., ﬂat earth), thereis a higher rate of recommended pseudoscientiﬁc contentthan for more emerging/controversial topics like COVID-19, anti-vaccination, and anti-mask. Furthermore, forCOVID-19, we observe an even smaller amount of pseu-doscientiﬁc content being suggested, which may be a re-sult of measures YouTube took to mitigate pseudoscien-tiﬁc misinformation related to the COVID-19 pandemic.

In this section, we review prior work investigating pseudo-science and misinformation on YouTube, malicious activity onYouTube, auditing of the recommendation algorithm, and onuser personalization across the Web.

Pseudoscience and Misinformation.

The scientiﬁc commu-nity has extensively studied the phenomenon of misinformationand the credibility issues of online content [35, 67]. The ma-jority of prior work focuses on analyzing misinformation andpseudoscientiﬁc content on other social networks [7, 5, 50, 32],although some on speciﬁc pseudoscientiﬁc, misinformative,and conspiratorial topics on YouTube. For instance, Li etal. [39] focus on misinformation related to the COVID-19 pan-demic on YouTube. They search YouTube on March 21, 2020using the terms ‘coronavirus’ and ‘COVID-19’ and they col-lect and analyze the top 75 viewed videos from each searchterm ﬁnding . of them to be misinformation. Donzelli et al. [21] focus on misinformation related to vaccines suppos-edly causing autism by performing a quantitative analysis ofYouTube videos. They ﬁnd an annual increase in the numberof such videos being available on YouTube, and they concludethat public health institutions should be more active on the Webproviding reliable information about vaccination to the generalpublic. In another work, Loeb et al. [40] focus on the dissem-ination of misinformation about prostate cancer on YouTube.Landrum et al. [37] investigate how users with different sciencecomprehension and attitude towards conspiracies are suscepti-ble to ﬂat Earth arguments on YouTube ﬁnding that users withlower science intelligence and higher conspiracy mentality aremore likely to be recommended ﬂat earth-related videos.Faddoul et al. [24] develop a classiﬁer to detect conspiratorialvideos on YouTube and use it to perform a longitudinal analysisof conspiracy videos. In particular, they perform a simulation ofYouTube’s autoplay feature, without user personalization, andﬁnd that as the conspiracy likelihood of the source video in-creases so does the conspiracy likelihood of the recommendedvideo. Malicious activity on YouTube.

A substantial body ofwork focuses on detecting and studying malicious content onYouTube. Jiang et al. [31] investigate how channel partisan-ship affects comment moderation on YouTube and they ﬁnd thatcomments are usually moderated if the channel that posted thevideo is ideologically extreme. Zannettou et al. [66] proposea deep learning classiﬁer for identifying videos on YouTubethat use manipulative techniques to increase their views (i.e.,clickbait). Agarwal et al. [3] present a binary classiﬁer trainedwith user and video features to detect videos promoting hateand extremism on YouTube, while Mariconti et al. [42] build aclassiﬁer to predict, at upload time, whether or not a YouTubevideo will be “raided” by hateful users. Furthermore, Hussainet al. [29] analyze disinformation and crowd manipulation tac-tics on YouTube. They analyze the metadata of videos promot-ing conspiracy theories on the platform and apply social net-work analysis techniques to identify malicious behaviors.

YouTube’s Recommendation Algorithm and Audits.

Cov-ington et al. [16] provide a description of YouTube’s recom-9endation algorithm, focusing on two models: (1) a deepcandidate generation model used to retrieve a small subset ofvideos from a large corpus; and (2) a deep ranking model usedto rank those videos based on their relevance to the user’s ac-tivity. Zhao et al. [68] introduce a large-scale ranking sys-tem for YouTube recommendations that extends the Wide &Deep model architecture with Multi-gate Mixture-of-expertsfor multi-task learning. The proposed model ranks the candi-date recommendations of a given video taking into account userengagement (e.g., user clicks) and satisfaction objectives (e.g.,video likes).Ribeiro et al. [52] perform a large-scale audit of user rad-icalization on YouTube: they analyze videos from Intellec-tual Dark Web, Alt-lite, and Alt-right channels, showing thatthey increasingly share the same user base. They also an-alyze YouTube’s recommendation algorithm ﬁnding that Alt-right channels can be reached from both Intellectual Dark Weband Alt-lite channels. Papadamou et al. [46] focus on char-acterizing and detecting disturbing videos on YouTube target-ing young children, while they also propose a classiﬁer for de-tecting such videos. Using the proposed classiﬁer they analyzeYouTube’s recommendation algorithm ﬁnding that young chil-dren are likely to encounter disturbing videos when they ran-domly browse the platform starting from benign videos. Pa-padamou et al. [47] study the Incel community on YouTubeand how inappropriate and hateful content relevant to this com-munity spreads on the platform. They also analyze how suchvideos are recommended to users by quantifying the probabil-ity that a user will encounter an Incel-related video by virtue ofYouTube’s recommendation algorithm.

User Personalization.

Most of the work on user personaliza-tion focuses on Web search engines and is motivated by the con-cerns around the Filter Bubble effect [48]. Hannak et al. [26]measure personalization on Web search and they propose amethodology for measuring personalization in Web search re-sults. They apply this methodology to Google Search ﬁndingan 11.7% difference in search results due to personalization.They also ﬁnd that account login status and the IP address ofthe user affect search results. Unlike their study, we focus onYouTube and its recommendation algorithm and we devise adifferent methodology that enables us to assess the effect of auser’s watch history on video recommendations in all the partsof the platform.Kliman-Silver et al. [34] propose a methodology for explor-ing the impact of location-based personalization on Googlesearch results. Robertson et al. [54] focus on the personaliza-tion and composition of politically-related search engine resultsand they propose a methodology for auditing Google Search us-ing a dynamic set of political queries. Le et al. [38] investigatewhether politically oriented Google news search results are per-sonalized based on the user’s browsing history. Using a “sockpuppet” audit system, they ﬁnd signiﬁcant personalization thattends to reinforce the presumed partisanship of a user.St¨ocker et al. [59] analyze the effect of extreme recommen-dations on YouTube, ﬁnding that YouTube’s auto-play featureis problematic. They conclude that preventing inappropriatepersonalized recommendations is technically infeasible due to the nature of the recommendation algorithm. Finally, Hus-sein et al. [30] focus on measuring misinformation on YouTubeand perform audit experiments considering ﬁve popular top-ics like 9/11 and chemtrail conspiracy theories to investigatewhether personalization contributes to amplifying misinforma-tion. They audit three YouTube parts, namely, search results,Up-next video, and Top 5 video recommendations. They ﬁndthat, once a user develops watch history, the demographic at-tributes affect the extend of misinformation recommended tothe users. More importantly, they also ﬁnd a ﬁlter bubble ef-fect in the video recommendations section for almost all thetopics they analyze. Instead, we build a classiﬁer and we useit to characterize and detect pseudoscientiﬁc misinformationon YouTube by mostly focusing on health-related topics (e.g.,COVID-19), which can have devastating effects on society. Wealso devise a methodology that allows us to better assess theeffect of a user’s watch history in all the main parts of theYouTube platform, including the homepage of the user. UnlikeHussein et al., our methodology also includes the simulation ofthe behavior of users with distinct watch histories who searchthe platform for a video and subsequently watch several videosaccording to recommendations.

Remarks.

Unlike previous work, we build a classiﬁer and weuse it to characterize and detect pseudoscientiﬁc misinforma-tion on YouTube, aiming to understand how a user’s watchhistory affects YouTube’s recommendations across multipleparts of the platform (i.e., homepage, search results page, andvideo recommendations). To do this, we devise a methodol-ogy that also includes the simulation of the behavior of userswith distinct watch histories. Note that we also make ourdataset and source code publicly available, hoping to enablefurther research on understanding the effect of personalizationon YouTube, as well as studies focusing on auditing the recom-mendation algorithm, irrespectively of the topic of interest.

In this work, we studied pseudoscientiﬁc content on theYouTube platform. We collected a dataset of 6.6K YouTubevideos and, using crowdsourcing, we annotated them accord-ing to whether or not they include pseudoscientiﬁc content. Wethen trained a deep learning classiﬁer to detect pseudoscien-tiﬁc videos and used it to perform experiments assessing theprevalence of pseudoscientiﬁc content on various parts of theplatform while accounting for the effects of the user’s watchhistory. To do so, we crafted a set of accounts with differentwatch histories.Overall, we found that the user’s watch history indeed sub-stantially affects future user recommendations by YouTube’s al-gorithm. This result should be taken into consideration by com-munities aiming to audit the recommendation algorithm and un-derstand how it drives users’ content consumption patterns. Wealso found that YouTube search results are more likely to returnpseudoscientiﬁc content than other parts of the platform likethe recommendation engine or a user’s homepage. However,we also observed a non-negligible number of pseudoscientiﬁc10ideos on both the video recommendations section as well asthe homepage of the users.Finally, by investigating the differences across multiple pseu-doscientiﬁc topics, we showed that the recommendation algo-rithm is more likely to recommend pseudoscientiﬁc contentfrom traditional pseudoscience topics, e.g., ﬂat earth, comparedto more controversial topics like COVID-19. This likely in-dicates that YouTube takes measures to counter the spread ofharmful information related to important and emerging topicslike the COVID-19 pandemic. However, achieving this in aproactive and timely manner across topics remains a challenge.In addition, the low agreement score of our crowdsourcedannotation, as well as the accuracy of our binary classiﬁer pointto the difﬁculty in identifying whether a video is pseudoscien-tiﬁc or not, and also indicates that it is not easy to automatethe discovery of misinformation. Hence, we believe that themost potent way for YouTube to effectively cope with misin-formation on the platform is a mitigation scheme that uses deeplearning models that in turn provide signal of potential pseudo-scientiﬁc videos to human annotators who examine the videosand make the ﬁnal decision.Our work provides insights on pseudoscientiﬁc videos onYouTube and provides a set of resources to the research com-munity, as we will make the dataset, the classiﬁer, as well as allthe source code of our experiments publicly available. In par-ticular, the ability to run these kind of experiments while takinginto account users’ watching history will arguably be particu-larly useful to researchers focusing on demystifying YouTube’srecommendation algorithm—irrespective of the topic of inter-est. In other words, our methodology and codebase are generic,and can be used to study other topics besides pseudoscience,e.g., other conspiracy theories.

Limitations.

Naturally, our work is not without limitations.First, we use crowdworkers who are unlikely to have any exper-tise on identifying pseudoscientiﬁc content. Hence, a small per-centage of the annotated videos may be misclassiﬁed. However,we mitigated this issue by not including annotators with lowaccuracy on a classiﬁcation task performed on a test dataset,and annotating each video based on the majority agreement.Second, our ground truth dataset is relatively small for sucha subjective classiﬁcation task. Nonetheless, we argue thatthe classiﬁer provides a meaningful signal of the behavior ofYouTube’s recommendation algorithm with regards to recom-mending pseudoscientiﬁc content. Finally, as for user person-alization, we only work with watch history, which is only afraction of the signals YouTube uses for user personalization.

Future Work.

A more comprehensive user personalizationmethodology to account for factors outside of watch history,like account characteristics, location, and user engagement is aclear direction for future work. We also plan to conduct studiesto understand how people share and view pseudoscientiﬁc con-tent on other social networks like Twitter and Facebook, andhow people interact and engage with such content.

This project has received funding from the European Union’sHorizon 2020 Research and Innovation program under theCONCORDIA project (Grant Agreement No. 830927), andfrom the Innovation and Networks Executive Agency (INEA)under the CYberSafety II project (Grant Agreement No.1614254). This work reﬂects only the authors’ views; the fund-ing agencies are not responsible for any use that may be madeof the information it contains.

References [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean,M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorﬂow:A System for Large-scale Machine Learning.

USENIX OSDI ,2016.[2] ABC/Reuters. Millions view viral Plandemic video fea-turing discredited medical researcher Judy Mikovits. , 2020.[3] S. Agarwal and A. Sureka. A Focused Crawler for Mining Hateand Extremism Promoting Videos on YouTube.

ACM Hypertext ,2014.[4] Alexa. Alexa - The top 500 sites on the web. , 2020.[5] H. Allcott and M. Gentzkow. Social Media and Fake News in the2016 Election.

Journal of economic perspectives , 2017.[6] M. Alllman and V. Paxson. Issues and Etiquette Concerning Useof Shared Measurement Data.

ACM SIGCOMM , 2007.[7] G. W. Allport and L. Postman. An Analysis of Rumor.

Publicopinion quarterly , 1946.[8] Anonymous. Annotation platform - Instruc-tions given to the crowdsourcing annotators. https://drive.google.com/file/d/1qaoPAEzaruj5C0vBd78Kxfel8IVebZkn/view?usp=sharing , 2020.[9] Appen. AI Solutions with conﬁdent Training Data. https://appen.com/solutions/training-data/ , 2020.[10] S. Arlot, A. Celisse, et al. A Survey of Cross-Validation Proce-dures for Model Selection.

Statistics Surveys , 2010.[11] P. Ball. Anti-vaccine movement could undermine efforts toend coronavirus pandemic, researchers warn. , 2020.[12] N. Carne. ”Conspiracies” dominate YouTube climate modiﬁca-tion videos. https://cosmosmagazine.com/social-sciences/conspiracies-dominate-youtube-climate-modification-videos , 2019.[13] P. R. Center. YouTube & News. ,2012.[14] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer.SMOTE: Synthetic Minority Over-Sampling Technique.

Journalof Artiﬁcial Intelligence Research , 2002.[15] F. Chollet et al. Keras: The Python Deep Learning Library.

ASCL , 2018.

16] P. Covington, J. Adams, and E. Sargin. Deep Neural Networksfor YouTube Recommendations.

ACM RecSys , 2016.[17] G. Developers. YouTube Data API. https://developers.google.com/youtube/v3/ , 2020.[18] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of Deep Bidirectional Transformers for Language Un-derstanding. arXiv:1810.04805 , 2018.[19] R. Diresta. The Complexity of Simply Searching for MedicalAdvice. , 2018.[20] D. Dittrich and E. Kenneally. The Menlo Report: Ethical Prin-ciples Guiding Information and Communication Technology Re-search.

U.S. Department of Homeland Security , 2012.[21] G. Donzelli, G. Palomba, I. Federigi, F. Aquino, L. Cioni, M. Ve-rani, A. Carducci, and P. Lopalco. Misinformation on Vaccina-tion: a Quantitative Analysis of YouTube Videos.

Human vac-cines & Immunotherapeutics , 2018.[22] J. D’Urso and A. Wickham. YouTube Is Letting Millions Of Peo-ple Watch Videos Promoting Misinformation About The Coro-navirus. , 2020.[23] Facebook. fastText - Library for efﬁcient text classiﬁcation andrepresentation learning., 2020.[24] M. Faddoul, G. Chaslot, and H. Farid. A Longitudi-nal Analysis of YouTube’s Promotion of Conspiracy Videos. arXiv:2003.03318 , 2020.[25] J. L. Fleiss. Measuring Nominal Scale Agreement Among ManyRaters.

Psychological bulletin , 1971.[26] A. Hannak, P. Sapiezynski, A. Molavi Kakhki, B. Krishna-murthy, D. Lazer, A. Mislove, and C. Wilson. Measuring Per-sonalization of Web Search.

TheWebConf , 2013.[27] Y. Help. COVID-19 Medical Misinformation Policy. https://support.google.com/youtube/answer/9891785?hl=en , 2020.[28] Y. M. Help. View, delete, or pause watch history. https://support.google.com/youtubemusic/answer/6364666?hl=en , 2020.[29] M. N. Hussain, S. Tokdemir, N. Agarwal, and S. Al-Khateeb.Analyzing Disinformation and Crowd Manipulation Tactics onYouTube.

ASONAM , 2018.[30] E. Hussein, P. Juneja, and T. Mitra. Measuring Misinformation inVideo Search Platforms: An Audit Study on YouTube.

SIGCHI ,2020.[31] S. Jiang, R. E. Robertson, and C. Wilson. Bias Misperceived:The Role of Partisanship and Misinformation in YouTube Com-ment Moderation.

ICWSM , 2019.[32] N. F. Johnson, N. Vel´asquez, N. J. Restrepo, R. Leahy,N. Gabriel, S. El Oud, M. Zheng, P. Manrique, S. Wuchty,and Y. Lupu. The Online Competition Between Pro-and Anti-vaccination Views.

Nature .[33] L. Kelion. Coronavirus: YouTube tightens rules afterDavid Icke 5G interview. , 2020.[34] C. Kliman-Silver, A. Hannak, D. Lazer, C. Wilson, and A. Mis-love. Location, Location, Location: The Impact of Geolocationon Web Search Personalization.

IMC , 2015. [35] S. Kumar and N. Shah. False Information on Web and SocialMedia: A Survey. arXiv:1804.08559 , 2018.[36] J. R. Landis and G. G. Koch. The Measurement of ObserverAgreement for Categorical Data.

Biometrics , 1977.[37] A. R. Landrum, A. Olshansky, and O. Richards. DifferentialSusceptibility to Misleading Flat Earth Arguments on YouTube.

Media Psychology , 2019.[38] H. Le, R. Maragh, B. Ekdale, A. High, T. Havens, and Z. Shaﬁq.Measuring Political Personalization of Google News Search.

TheWebConf , 2019.[39] H. O.-Y. Li, A. Bailey, D. Huynh, and J. Chan. YouTube as ASource of Information on COVID-19: A Pandemic of Misinfor-mation?

BMJ Global Health , 2020.[40] S. Loeb, S. Sengupta, M. Butaney, J. N. Macaluso Jr, S. W.Czarniecki, R. Robbins, R. S. Braithwaite, L. Gao, N. Byrne,D. Walter, et al. Dissemination of Misinformative and BiasedInformation About Prostate Cancer on YouTube.

European urol-ogy , 2019.[41] M. Lynas. 5G: What’s behind the latest COVID conspiracy the-ory? https://allianceforscience.cornell.edu/blog/2020/04/5g-whats-behind-the-latest-covid-conspiracy-theory/ , 2020.[42] E. Mariconti, G. Suarez-Tangil, J. Blackburn, E. De Cristo-faro, N. Kourtellis, I. Leontiadis, J. L. Serrano, and G. Stringh-ini. “You Know What to Do”: Proactive Detection of YouTubeVideos Targeted by Coordinated Hate Attacks.

CSCW , 2019.[43] T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin.Advances in Pre-Training Distributed Word Representations.

LREC , 2018.[44] B. Muthukadan. Selenium with Python - Ofﬁcial Documen-tation. https://selenium-python.readthedocs.io/ , 2018.[45] N. Newman, R. Fletcher, A. Schulz, S. Andi, and R. K.Nielsen. Reuters Institute Digital News Report. https://reutersinstitute.politics.ox.ac.uk/sites/default/files/2020-06/DNR_2020_FINAL.pdf ,2020.[46] K. Papadamou, A. Papasavva, S. Zannettou, J. Blackburn,N. Kourtellis, I. Leontiadis, G. Stringhini, and M. Sirivianos.Disturbed YouTube for Kids: Characterizing and Detecting In-appropriate Videos Targeting Young Children. 2020.[47] K. Papadamou, S. Zannettou, J. Blackburn, E. De Cristofaro,G. Stringhini, and M. Sirivianos. Understanding the Incel Com-munity on YouTube. arXiv:2001.08293 , 2020.[48] E. Pariser.

The ﬁlter bubble: How the new personalized web ischanging what we read and how we think . Penguin, 2011.[49] A. Plus. Adblock Plus - The world’s https://adblockplus.org/ , 2020.[50] M. Rajdev and K. Lee. Fake and Spam Messages: DetectingMisinformation During Natural Disasters on Social Media.

WI-IAT , 2015.[51] K. Renic. Coronavirus: Dozens show up at anti-maskrally in moncton, n.b. https://globalnews.ca/news/7391000/anti-mask-rally-moncton-new-brunswick/ , 2020.[52] M. H. Ribeiro, R. Ottoni, R. West, V. A. Almeida, andW. Meira Jr. Auditing Radicalization Pathways on YouTube. In

ACM FAT* , 2020.

53] C. M. Rivers and B. L. Lewis. Ethical research standards in aworld of big data.

F1000Research , 2014.[54] R. E. Robertson, D. Lazer, and C. Wilson. Auditing the Person-alization and Composition of Politically-Related Search EngineResults Pages.

TheWebConf , 2018.[55] E. Scott. Why people believe the Earth is ﬂat and we shouldlisten to anti-vaxxers. , 2019.[56] M. Spring. Coronavirus: False claims viewed by mil-lions on YouTube. , 2020.[57] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, andR. Salakhutdinov. Dropout: A Simple Way to Prevent NeuralNetworks from Overﬁtting.

JMLR , 2014.[58] B. Stetka. Where’s the Proof that Mindfulness MeditationWorks? , 2017.[59] C. St¨ocker and M. Preuss. Riding the Wave of Misclassiﬁcation:How We End up with Extreme YouTube Content.

SIGCHI , 2020.[60] I. Turc, M.-W. Chang, K. Lee, and K. Toutanova. Well-read stu-dents learn better: On the importance of pre-training compactmodels. arXiv:1908.08962v2 ,2020.[63] C. G. Weissman. Despite recent crackdown, YouTubestill promotes plenty of conspiracies. , 2019.[64] N. Westman. YouTube will remove videos with COVID-19vaccine misinformation. , 2020.[65] Wikipedia. Reiki. https://en.wikipedia.org/wiki/Reiki , 2020.[66] S. Zannettou, S. Chatzis, K. Papadamou, and M. Sirivianos.The Good, the Bad and The Bait: Detecting and CharacterizingClickbait on YouTube.

IEEE Security and Privacy Workshops(SPW) , 2018.[67] S. Zannettou, M. Sirivianos, J. Blackburn, and N. Kourtellis. TheWeb of False Information: Rumors, Fake News, Hoaxes, Click-bait, and Various Other Shenanigans.

JDIQ , 2019.[68] Z. Zhao, L. Hong, L. Wei, J. Chen, A. Nath, S. Andrews,A. Kumthekar, M. Sathiamoorthy, X. Yi, and E. Chi. Recom-mending What Video to Watch Next: A Multitask Ranking Sys-tem.

ACM RecSys , 2019., 2019.