"It is just a flu": Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations
Kostantinos Papadamou, Savvas Zannettou, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Michael Sirivianos
““It is just a flu”: Assessing the Effect of Watch History on YouTube’sPseudoscientific Video Recommendations
Kostantinos Papadamou (cid:63) , Savvas Zannettou ∓ , Jeremy Blackburn † Emiliano De Cristofaro ‡ , Gianluca Stringhini (cid:5) , Michael Sirivianos (cid:63) (cid:63) Cyprus University of Technology, ∓ Max Planck Institute, † Binghamton University ‡ University College London, (cid:5)
Boston [email protected], [email protected], [email protected]@ucl.ac.uk, [email protected], [email protected]
Abstract
YouTube has revolutionized the way people discover and con-sume videos, becoming one of the primary news sources forInternet users. Since content on YouTube is generated by itsusers, the platform is particularly vulnerable to misinforma-tive and conspiratorial videos. Even worse, the role played byYouTube’s recommendation algorithm in unwittingly promot-ing questionable content is not well understood, and could po-tentially make the problem even worse. This can have dire real-world consequences, especially when pseudoscientific contentis promoted to users at critical times, e.g., during the COVID-19 pandemic.In this paper, we set out to characterize and detect pseudo-scientific misinformation on YouTube. We collect 6.6K videosrelated to COVID-19, the flat earth theory, the anti-vaccination,and anti-mask movements; using crowdsourcing, we annotatethem as pseudoscience, legitimate science, or irrelevant. Wethen train a deep learning classifier to detect pseudoscientificvideos with an accuracy of . %. Next, we quantify user ex-posure to this content on various parts of the platform (i.e., auser’s homepage, recommended videos while watching a spe-cific video, or search results) and how this exposure changesbased on the user’s watch history. We find that YouTube’s rec-ommendation algorithm is more aggressive in suggesting pseu-doscientific content when users are searching for specific top-ics, while these recommendations are less common on a user’shomepage or when actively watching pseudoscientific videos.Finally, we shed light on how a user’s watch history substan-tially affects the type of recommended videos. User-generated video platforms like YouTube have exploded inpopularity over the course of the last decade [4]. For manyusers, it has also become one of the most important informa-tion sources for news, world events, and various topics [13, 45].Alas, platforms like YouTube are often fertile ground for thespread of misleading and potentially harmful information likeconspiracy theories and health-related disinformation [12, 19].YouTube (and other social media platforms) have struggled with mitigating the harm from this type of content, in part be-cause of the sheer scale and also because of the deployment ofrecommendation algorithms [63]. Pure machine learning mod-eration tools have thus far been insufficient to moderate con-tent, and human moderators had to be brought back into theloop [61]. Additionally, the machine learning algorithms thatYouTube relies on to recommend content to users also recom-mend potentially harmful content [63, 46], and their opaquenature makes them difficult to audit.For certain types of content, e.g., health-related topics, harm-ful videos can have devastating effects on society, especiallyduring crises like the COVID-19 pandemic [56]. For instance,since the beginning of the pandemic, we witnessed an explosionin the spread of pseudoscientific conspiracy theories and disin-formation, e.g., theories suggesting that COVID-19 is causedby 5G [41] or Bill Gates [62], or the notorious “Plandemic”conspiracy theory documentary [2]. Unlike the scientific pro-cess, where experts develop testable hypotheses and performexperiments to provide evidence for or against the hypothesis,conspiracy theories are built up from tenuous connections be-tween various events, with little to no actual evidence to supportthem. On user-generated video platforms like YouTube, thesehypotheses are often presented as facts, regardless of whetheror not they have been tested, whether any evidence exists, andwhether or not they have been widely debunked.Motivated by the pressing need to mitigate the spread ofpseudoscientific content, in this paper, we focus on detectingand characterizing pseudoscientific and conspiratorial contenton YouTube. We aim to assess how likely it is for users withdifferent watch histories to come across pseudoscientific con-tent on YouTube, as well as how YouTube’s recommendationalgorithm contributes to the discovery of pseudoscientific con-tent.
Research Questions.
More precisely, we set out to answer thefollowing research questions:
RQ1
Can we effectively detect and characterize pseudoscien-tific content on YouTube?
RQ2
What is the proportion of pseudoscientific content on thehomepage of a YouTube user and how is this affected bythe user’s watch history?1 a r X i v : . [ c s . C Y ] N ov Q3 What is the proportion of pseudoscientific content insearch results on YouTube? How are they affected bywatch history?
RQ4
What is the proportion of pseudoscientific content be-ing suggested to users when they just randomly browseYouTube?
Methodology.
To answer these questions, we look into 4 pseu-doscientific topics related to: 1) COVID-19, 2) flat earth the-ory, 3) anti-vaccination, and 3) anti-mask movement. We col-lect 6.6K unique videos and use crowdsourcing to label them asone of three categories: 1) science; 2) pseudoscience; or 3) ir-relevant. We then train a deep learning classifier to detect pseu-doscientific content across multiple topics on YouTube. Ourexperimental evaluation shows that the classifier outperformsSVM, Random Forest, and a BERT [18]-based classifier, reach-ing . accuracy (RQ1).The classifier allows us to design and perform experimentsto address RQ2–RQ4. More specifically, we use three carefullycrafted user profiles, each one with a different watch history,while all other account information remains the same. We alsoperform experiments using a browser without a Google Ac-count to simulate non-logged-in users, and using exclusivelythe YouTube Data API. To build the watch history of the threeuser profiles, we devise a methodology to identify the mini-mum amount of videos that must be watched by a user beforeYouTube’s recommendation algorithm starts generating morepersonalized recommendations. We build three different pro-files: 1) a user interested in scientific content; 2) a user inter-ested in pseudoscientific content; and 3) a user interested inboth scientific and pseudoscientific content. Using these pro-files, we perform three experiments to quantify the user’s expo-sure to pseudoscientific content on various parts of the platformand how this exposure changes based on a user’s watch history. Findings.
Overall, our study leads to the following findings:1. The watch history of the user substantially affects searchresults and related video recommendations.2. Pseudoscientific videos are more likely to appear in searchresults than in the video recommendations section or thehomepage of a user.3. In traditional pseudoscience topics (e.g., flat earth), thereis a higher rate of recommended pseudoscientific con-tent than in more recent topics like COVID-19, anti-vaccination, and anti-mask. For COVID-19, we also findan even smaller amount of pseudoscientific content beingsuggested, which may indicate that YouTube took partlyeffective measures to mitigate pseudoscientific misinfor-mation related to the COVID-19 pandemic.
Contributions.
We present the first study focusing on multiplepseudoscientific topics on YouTube while accounting for theeffect of a user’s watch history. To do so, we build YouTubeuser profiles that are representative of users viewing pseudosci-entific or scientific content. Our methodology can be re-usedfor other studies focusing on other topics of interest.We will also publish, along with the final version of thepaper, our ground truth dataset, the classifier, and the sourcecode/crawlers used in our experiments; we are confident that
Pseudoscientific Topic
COVID-19 378 1,645Flat Earth 200 1,211Anti-vaccination 346 1,759Anti-mask 199 912
Total 1,123 5,527
Table 1: Overview of the collected data: number of seed videosand number of their recommended videos.
Topic
COVID-19 607 368 721Flat Earth 162 375 707Anti-vaccination 363 394 1,060Anti-mask 65 188 724
Total 1,197 1,325 3,212
Table 2: Overview of our ground truth dataset.this will enable the research community to shed additional lighton YouTube’s recommendation algorithm and its potential in-fluence on users’ consumption patterns.
In this section, we present our data collection and crowdsourcedannotation methodology. We collect a set of YouTube videosrelated to scientific topics and then use crowdsourcing to createa ground truth dataset of videos that are pseudoscientific or not.
Since we aim to automatically detect video content that ispseudoscientific, we collect a set of YouTube videos related toseveral scientific and pseudoscientific topics. To do this, wefirst create a list of four topics whose popularity has increasedover the last years: 1) COVID-19 [22], 2) the anti-vaccinemovement [11], 3) the anti-mask movement [51], and 4) theflat earth theory [55].Next, we use the YouTube Data API [17], which providesmetadata of videos uploaded on YouTube, and we perform asearch query for each selected topic obtaining the first 20 videosas returned by YouTube’s Data API search functionality. Werefer to those videos as the “seed” videos of our data collec-tion methodology. Additionally, for each seed video, we collectthe top 10 recommended videos associated with it, as returnedby the YouTube Data API. We perform our data collection be-tween August 1, 2020 and August 10, 2020, collecting 6.6Kunique videos in total (1.1K seed videos and 5.5K videos thatare recommended from the seed videos). Table 1 summarizesour dataset.For each video in our dataset, we collect the following: 1) thetranscript of the video; 2) video snippet, which is the concate-nation of the video title and description; 3) a set of tags definedby the uploader; 4) video statistics such as the number of views,likes, etc.; and 5) the 200 top comments, defined by YouTube’srelevance metric, without their replies.2 ideoTagsVideoSnippetVideo CommentsVideoTranscript
VIDEOINPUT fastText EmbeddingLayerfastTextEmbeddingLayerfastTextEmbeddingLayerfastTextEmbeddingLayer
Softmax Layer(Dense Layer)
Dropoutd=0.5Dropoutd=0.5 Dropoutd=0.5
Fusing Network
Figure 1: Architecture of our deep learning classifier for the detection of pseudoscientific videos.
To create a ground truth dataset of scientific and pseudosci-entific videos, we use the Appen [9] platform to get crowd-sourced annotations for all the collected videos. Each video ispresented to three annotators that inspect its content and meta-data to assign one of three labels:•
Science.
A video falls under the “Science” category whenit contains content that is related to any scientific fieldthat systematically studies the structure and behavior ofthe natural world or humanity’s artifacts (e.g., Chemistry,Biology, Mathematics, Computer Science, etc.). Videosthat debunk science-related conspiracy theories (e.g., ex-plaining why the 5G technology is not harmful) also fallunder this category. For example, a COVID-19 video withan expert estimating the total number of cases or excessdeaths falls under this category if the estimation is basedon the scientific consensus and official data.•
Pseudoscience.
A video falls under the ”Pseudoscience”category when it contains content that meets at least oneof the following criteria: (a) holds a view of the world thatgoes against the scientific consensus (e.g., anti-vaccingmovement); (b) consists of statements or a belief that isself-fulfilling or unfalsifiable (e.g., Meditation [58], Reikihealing [65], etc.); (c) develops hypotheses that are notevaluated following the scientific method (e.g., Astrol-ogy); or (d) explains events as secret plots by powerfulforces rather than as overt activities or accidents (e.g., the5G-coronavirus conspiracy theory).•
Irrelevant.
We consider a video “Irrelevant” when it con-tains content that is not relevant to any scientific field anddoes not fall under the Pseudoscience category. For ex-ample, movie trailers, music videos, and cartoon videos are considered irrelevant. Conspiracy theory debunkingvideos that are not relevant to a scientific field are con-sidered irrelevant (e.g., a video debunking the Pizzagateconspiracy theory).
Annotation.
The annotators were given instructions on whatconstitutes scientific and pseudoscientific content using appro-priate descriptions and several examples and were offered $0 . per annotation. Each video is annotated by three annotators. Toease the annotation process, we provide a clear description ofthe annotation task, our labels, as well as all the video informa-tion that an annotator needs to inspect and correctly annotatea video. Screenshots of the instructions are available, anony-mously, at [8].The platform provides no demographic information about theannotators, other than an assurance that they are experiencedannotators with high accuracy in other tasks. To assess thequality of the annotators, before allowing them to submit anno-tations we ask them to annotate 5 test videos randomly selectedfrom a set of 54 test videos (20 science, 21 pseudoscience, and13 irrelevant) annotated by the first author of this paper. An an-notator can submit annotations only when at least 3 out of the 5test videos are annotated correctly.We also calculate the Fleiss’ Kappa Score ( k ) [25] to assessthe agreement of annotators. In the end, we get k = 0 . ,which is considered “slight” agreement [36]. This relativelylow agreement score is not surprising due to the subjective na-ture of the problem. For each video, we assign one of the la-bels according to the majority agreement of all the annotators,except a small percentage ( . ) where all annotators dis-agreed among each other, which we exclude from our groundtruth dataset. The final ground truth dataset includes 1,197 sci-ence, 1,325 pseudoscience, and 3,212 irrelevant videos. Table 2shows the number of videos from each class for each of the 4topics considered.3 .3 Ethics We only collect publicly available data, we make no at-tempt to de-anonymize users, and overall follow standard ethi-cal guidelines [6, 20, 53]. We also note that we obtained adviceand ethics approval by the first author’s national ethics com-mittee to ensure that our crowdsourced annotation process doesnot pose risks to the annotators despite the occasionally harmfulnature of the misinformative material.
To train and test a classifier that detects pseudoscientific videoswe use our ground truth dataset of 5,734 videos. Since ouraim is to train a classifier able to discern pseudoscientificvideos from science and irrelevant videos, we decide to col-lapse our three labels into two, by combining the science withthe irrelevant videos into one “Other” category resulting ina ground truth dataset that contains 1,325 pseudoscience and4,409 “Other” videos.Below we provide a description of the input features, as wellas the architecture of our proposed classifier. We perform anexperimental evaluation to assess the performance of the clas-sifier and an ablation study to understand which of the inputfeatures contribute the most to the classification task.
Figure 1 depicts the architecture of our proposed deep learn-ing classifier. The classifier consists of four different branches,where each branch processes a distinct input feature type: snip-pet, video tags, transcript, and the top 200 comments of a video.In the end, the outputs of all four branches are concatenated toform a five-layer, fully-connected neural network that mergestheir output and drives the final classification.The classifier uses fastText [23], a library implemented byFacebook for efficient learning of word and document-levelvector representations, as well as for sentence classification.We use fastText to generate vector representations (embed-dings) for all the available video metadata in text. Specifically,for each input feature, we use the pre-trained fastText mod-els released in [43] and we fine-tune them using each of ourcorresponding input features. This allows us to extract a 300-dimensional vector representation for each of the following in-put features of our dataset:
Snippet.
The snippet is the concatenation of the title and thedescription of a video.
Tags.
Tags are words defined by the uploader of a video todescribe the content of the video.
Transcript.
We consider the transcript of the video, whichcomprises the subtitles uploaded by the creator of the video orauto-generated by YouTube. The transcript is one of the mostimportant features since it describes the content of the video.For the transcript, the classifier uses the fine-tuned model tolearn a vector representation of the concatenated text of thetranscript.
Classifier Accuracy Precision Recall F1 Score
SVM 0.681 0.722 0.681 0.696Random Forest 0.721 0.703 0.721 0.711BERT-based Classifier 0.734 0.645 0.734 0.672
Proposed Classifier 0.761 0.736 0.761 0.742
Table 3: Performance of the evaluated baselines and of the pro-posed deep learning classifier. T r u e P o s i t i v e R a t e SVM (AUC=0.667)Random Forest (AUC=0.620)BERT-based Classifier (AUC=0.557)Proposed Classifier (AUC=0.674)
Figure 2: ROC curves (and AUC) of all the evaluated baselinesand the proposed deep learning classifier.
Comments.
We consider the top 200 comments of the video asreturned by the YouTube Data API, without their replies. Wefirst concatenate the comments of each video and use them tofine-tune the fastText model and extract vector representations.The second part of the classifier (the “Fusing Network” inFigure 1) is essentially a four-layer, fully-connected, dense neu-ral network. At first, we use a Flatten utility layer to merge theoutputs of the four branches of the first part of the classifier,creating a 1200-dimensional vector. This vector is processedby the four subsequent layers comprising 256, 128, 64, and 32units, respectively, with ReLU activation. To avoid overfitting,we regularize using the Dropout technique [57]. More specifi-cally, at each one of the four fully-connected layers, we applya Dropout level of d = 0 . , which means that during each it-eration of training half of the units of each layer do not updatetheir parameters. Finally, the output of the Fusing Network isfed to the last dense-layer neural network of two units with soft-max activation, which essentially yields the probabilities that aparticular video is pseudoscientific or not. We implement the deep learning classifier using Keras [15]with Tensorflow as the back-end [1]. We use ten-fold stratifiedcross-validation [10], training and testing the classifier for bi-nary classification using all the aforementioned input features.To deal with data imbalance, we use the Synthetic MinorityOver-sampling Technique (SMOTE) [14] and oversample onlythe training set at each fold. For stochastic optimization, we usethe Adam algorithm with an initial learning rate of − , and (cid:15) = 1e − .We then compare the performance of the classifier, in termsof accuracy, precision, recall, F1 score, and area under the ROCcurve (AUC), against the following three baselines: 1) a Sup-port Vector Machine (SVM) classifier with parameters γ = 0 . a) Homepage (b) Search Results (c) Video Recommendations Figure 3: The three main parts of the YouTube platform that we consider in our experiments: (a) homepage; (b) search resultspage; and (c) video recommendations section.and C = 10 . , 2) a Random Forest classifier with an en-tropy criterion and number of minimum samples leaf equal to2, and 3) a Deep Neural Network with the same architecture asour proposed classifier but using Google’s BERT method [18]and more specifically a pre-trained BERT model [60] to learndocument-level representations from all the available input fea-tures (BERT-based). For hyper-parameter tuning of baselines(1) and (2), we use the grid search strategy, while for (3) weuse the same hyper-parameters as the proposed classifier. Fora fair comparison, all evaluated models use all available inputfeatures.Table 3 reports the performance of all classifiers, while Fig-ure 2 plots their ROC curves. We observe that our classifieroutperforms all baseline models across all performance met-rics. Specifically, compared to Random Forest, which has thebest overall performance among the baselines, we improve ac-curacy, precision, recall, F1 score, and AUC by . , . , . , . , and . , respectively. Ablation Study.
To understand which of the input features con-tribute the most to the classification of pseudoscientific videoswe perform an ablation study. That is, we systematically re-move each of the four input feature types (as well as their as-sociated branch in the proposed classifier’s architecture), andretrain the classifier. Again, we use ten-fold cross-validationand the oversampling technique to deal with data imbalance.Table 4 shows the performance metrics of all the classifiers foreach possible combination of inputs. For the single input clas-sifiers, we observe that the comments and the transcript of thevideo yield the best performance, indicating that they are themore informative input features. For the classifiers trained withcombinations of three input features, we observe similar perfor-mance. However, by using all the available input features weachieve better performance, which indicates that all four inputfeatures are important for the classification task.
Remarks.
Although the proposed classifier outperforms all thebaselines, its accuracy points to the subjective nature of scien-tific vs. pseudoscientific content on YouTube. Also, the lowagreement score of our crowdsourced annotation brings out the
Input Features Accuracy Precision Recall F1 Score
Snippet 0.727 0.715 0.737 0.717Tags 0.709 0.714 0.709 0.706Transcript 0.759 0.737 0.759 0.743Comments 0.761 0.701 0.761 0.692Snippet, Tags 0.752 0.730 0.752 0.730Snippet, Transcript 0.727 0.733 0.727 0.725Snippet, Comments 0.735 0.723 0.735 0.725Tags, Transcript 0.742 0.725 0.742 0.730Tags, Comments 0.723 0.711 0.723 0.715Transcript, Comments 0.749 0.731 0.749 0.738Snippet, Tags, Transcript 0.743 0.731 0.743 0.729Snippet, Tags, Comments 0.749 0.726 0.749 0.733Snippet, Transcript, Comments 0.730 0.727 0.730 0.726Tags, Transcript, Comments 0.735 0.724 0.735 0.727
All Features 0.761 0.736 0.761 0.742
Table 4: Performance of the proposed classifier trained with allthe possible combinations of the four input feature types.difficulty in identifying whether a video is pseudoscientific, andit is also evidence of the hurdles in devising models that auto-matically discover pseudoscientific content. However, we ar-gue that our classifier can detect pseudoscientific content withacceptable performance and can provide a meaningful signalof the behavior of YouTube’s recommendation algorithm withregards to recommending pseudoscientific content (RQ1).
In this section, we analyze the prominence of pseudoscientificvideos on various parts of the platform (i.e., homepage, searchresults, and video recommendations) using a variety of experi-ments.
We focus our analysis on three parts of the platform: 1) thehomepage; 2) the search results page; and 3) the video recom-5endations section (recommendations when watching videos).Figure 3 shows an example of each part. In our experiments, weaim to simulate the behavior of users with varying interests thatwatch videos on YouTube, and measure how the watch historyaffects the recommendation of pseudoscientific content.To do so, we create three different Google accounts, each onewith a different watch history, while all other account informa-tion is the same to avoid confounding effects caused by profiledifferences. Additionally, we perform experiments without aGoogle Account to simulate not logged-in users, as well as us-ing the YouTube Data API (if the API provides the requiredfunctionality) to investigate the differences between YouTubeas an application and the API.
User Profile Creation.
All three Google accounts were manu-ally created and phone verification was performed. Accordingto Hussein et al. [30], once a user forms a watch history, userprofile attributes (e.g., demographics, geolocation) affect futurevideo recommendations. Hence, since we are only interested inthe watch history, each account has the same profile: 30 yearsold, female. To minimize the likelihood of Google automati-cally detecting our user profiles, we carefully crafted each oneassigning them a unique name and surname, while we read allintroductory emails and performed standard phone verification.To the best of our knowledge, none of the created user profileswere banned or flagged by Google during or after our experi-ments.
Watch History.
Next, we build the watch history of eachprofile aiming to create the following three profiles: 1) auser interested in legitimate science videos (“Science Profile”);2) a user interested in pseudoscientific content (“PseudoscienceProfile”); and 3) a user interested in both science and pseudo-science videos (“Science/Pseudoscience Profile”).To find the minimum number of videos required to bewatched by a user for YouTube to understand the user’s in-terests and generate more personalized recommendations, weuse a newly created Google account with no watch history andwe perform the following experiment. First, we randomly se-lect a video, which we refer to as the “reference” one, fromthe “COVID-19” pseudoscientific videos of our ground truthdataset and we collect its top 20 recommended videos. Next, wecreate a list of 100 randomly selected videos from the “COVID-19” pseudoscientific videos of our ground truth dataset, and werepeat the following process iteratively:1. We start by watching a video from the list of the randomlyselected pseudoscientific videos;2. We visit the reference video and we collect the top 20 rec-ommendations, store them, and compare them using theJaccard similarity index with all the recommendations ofthe reference video collected in the previous iterations;3. If all the recommended videos of the reference video at thecurrent iteration have also been recommended in the pre-vious iterations then we stop our experiment; otherwise,we delete the watch history of the user, we increase thenumber of videos we watch at Step 1 by one, and proceedto the next iteration.
ScienceProfile PseudoscienceProfile Science/PseudoscienceProfile No Profile(Browser) % un i q u e p s e u d o s c i e n t i f i c v i d e o s Figure 4: Percentage of unique pseudoscientific videos foundin the homepage of each user profile.We find that the minimum amount of videos required to bewatched by a user in order for YouTube to start generating morepersonalized recommendations is 22.Finally, we select the most popular science and pseudo-science videos from our ground truth dataset, based on the num-ber of views, likes, comments, etc., and use them to personalizethe profiles of each one of the three Google Accounts. Sinceit is not clear how the satisfaction score on videos is measuredby YouTube and how watch time affects this score, during pro-file training we always watch the same proportion of the video( of the total duration) and always like the videos we watch.
Controlling for noise.
Some differences in search results andrecommendations are likely due to other factors than the user’swatch history and personalization in general. To diminish thepossibility of such noise affecting our results, we take the fol-lowing steps: 1) Experiments with identical search queries forall accounts are executed in parallel to avoid updates to searchresults over time for specific search queries; 2) All requeststo YouTube are sent through the same US-based proxies toavoid location-related issues (i.e., differences in localized re-sults); 3) We perform all experiments using the same browseruser-agent and operating system; 4) To avoid the carry-over ef-fect (previous search and watch activity affecting subsequentsearches and recommendations), at each repetition of our ex-periments, we use the “Delete Watch and Search History” [28]to delete the activity of the user on YouTube from the date af-ter the user profiles were build; and 5) Similarly to the profiles’watch history creation, in our experiments we always watch thesame proportion of the video ( of the total duration).
Implementation.
The experiments are written as customscripts using Selenium [44] in Python 3.7. We use Seleniumsince it provides all the features we need and allows for full con-trol of the behavior and the configuration of the browser (e.g.,cookies management). The Selenium WebDriver also offers abroad range of features including JavaScript execution, whichallows for more realistic simulations. For each Google Ac-count, we create a separate Selenium instance for which we seta custom data directory, thus being able to perform manual ac-tions on the browser before starting our experiments, e.g., per-forming Google authentication, installing AdBlock Plus [49] toprevent advertisements within YouTube videos from interfer-ing with our simulations, etc. Finally, for all our experiments,6 % r a n d o m w a l k s w i t h a t l e a s t o n e p s e u d o s c i e n t i f i c v i d e o Anti-Mask (a) Science Profile(b) Pseudoscience Profile (c) Science/Pseudoscience Profile(d) No Profile (Browser) (e) No Profile (API) (a) (b) (c) (d) (e)020406080100 % un i q u e p s e u d o s c i e n t i f i c v i d e o s COVID-19 (a) (b) (c) (d) (e)020406080100 % un i q u e p s e u d o s c i e n t i f i c v i d e o s Flat Earth (a) (b) (c) (d) (e)020406080100 % un i q u e p s e u d o s c i e n t i f i c v i d e o s Anti-vaccination (a) (b) (c) (d) (e)020406080100 % un i q u e p s e u d o s c i e n t i f i c v i d e o s Anti-mask
Figure 5: Percentage of unique pseudoscientific videos found in the search results of each user profile.we use Chromedriver 83.0.4 with user-agent Chrome 83 run-ning on Ubuntu 16.04. During the execution, the Chromedriverruns in headless mode and each Selenium instance remains inmemory and stores all received cookies.
We begin by assessing the degree of the problem of pseu-doscientific content on the homepage of a user on YouTube.To do so, using each one of the three user profiles (Science,Pseudoscience, and Science/Pseudoscience), as well as anotheruser with no account (No Profile) that simulates the behaviorof not logged-in users, we visit the homepage of the user andcollect and classify the top 20 videos as ranked by YouTubeon the homepage of each user. We repeat this experiment 20times with a wait time of 10 minutes between each experiment.Note that we cannot perform this experiment using the YouTubeData API since this functionality is not supported by the API.We repeat the same experiment multiple times (20 times in thiscase) since YouTube shows different videos on the homepageeach time a user visits YouTube. We perform this experimentbetween September 26, 2020 and September 27, 2020.Figure 4 shows the percentage of unique pseudoscien-tific videos in the homepage of each user profile. Wefind that, . , . , . , and . , of all theunique videos encountered by the Science, Pseudoscience, Sci-ence/Pseudoscience, and the No profile (browser), respectively,are pseudoscientific. Overall, all user profiles receive a similaramount of pseudoscientific content on their homepage. This in-dicates that YouTube recommends pseudoscientific content tousers’ homepages irrespectively of whether they have watchedsuch videos in the past and even of whether they have watcheda lot of benign, or even contradictory videos (e.g., Sciencevideos). Next, we focus on quantifying the prevalence of pseudosci-entific content when users search for videos on YouTube. Forthis experiment, we use the 4 pseudoscientific topics used tocreate our ground truth dataset and, for each topic, we performsearch queries on YouTube. For each search query, we retrievethe top 10 videos and use our classifier to classify each videoin the result set. We repeat this experiment 20 times for each pseudoscientific topic using all three user profiles, as well astwo users with no profile (one using a browser and another oneusing YouTube’s Data API). Recall that we delete the user’swatch history between each experiment repetition as well as be-tween the experiments performed with different search queriesto ensure that future search results are not affected by previousactivity other than our initial, controlled watch history of theuser. We perform this experiment between September 27, 2020and October 1, 2020.Overall, we find a big variation in the results across pseu-doscientific topics (see Figure 5). For more traditional pseu-doscientific topics like “Flat Earth”, YouTube search returnsmore pseudoscientific content compared to other pseudoscien-tific topics. Furthermore, for more controversial and emerg-ing topics like “Anti-vaccination” and “Anti-mask,” most of thevideos returned by YouTube are pseudoscientific .On the other hand, for topics like “COVID-19,” the majorityof the recommended videos are not pseudoscientific, suggest-ing that YouTube’s recommendation algorithm does a better jobin recommending less harmful videos (at least for COVID-19).In addition, as for this topic, the user profiles (i.e., the watchhistory) affect the amount of pseudoscientific videos that arerecommended to a user, since the users with pseudoscience andscience/pseudoscience watch history receive a higher propor-tion of pseudoscientific content than the user with the sciencewatch history. The fact that for “COVID-19,” YouTube rec-ommends much less pseudoscientific content may be relatedto the fact that YouTube has made substantial efforts to tackleCOVID-related misinformation [33], even publishing an of-ficial policy specifically for COVID-19 medical misinforma-tion [27]. This is not the case, however, for other controversialpseudoscientific topics like “Anti-vaccination” or “Anti-mask.”Nevertheless, YouTube has recently announced that they willalso attempt to target COVID-19 vaccine misinformation [64].
Finally, we set out to assess how prominent the problemof pseudoscientific content is, on a large scale, by perform-ing controlled, live random walks on YouTube’s recommenda-tion graph, while again measuring the effect of user’s watchhistory. This lets us simulate the behavior of users with dif-ferent interests who search the platform for a video and thensubsequently watch several videos according to recommenda-7 % p s e u d o s c i e n t i f i c v i d e o s Anti-Mask
Science ProfilePseudoscience Profile Science/Pseudoscience ProfileNo Profile (Browser) No Profile (API) % p s e u d o s c i e n t i f i c v i d e o s COVID-19 % p s e u d o s c i e n t i f i c v i d e o s Flat Earth % p s e u d o s c i e n t i f i c v i d e o s Anti-vaccination % p s e u d o s c i e n t i f i c v i d e o s Anti-mask
Figure 6: Percentage of pseudoscientific videos over all unique videos that the random walker encounters at hop k per user profile.tions. Note that in YouTube’s recommendation graph, videosare nodes and video recommendations are directed edges con-necting a video to its recommended videos. For example, aYouTube video page can be seen as a snapshot of YouTube’srecommendation graph showing a single node (video) and allthe directed edges to all its recommended videos in the graph.For the simulations, we use the 4 pseudoscientific topics con-sidered for the creation of our ground truth dataset. For eachpseudoscientific topic, we initially perform a search query onYouTube and randomly select one video from the top ten searchresults. Then, we watch the selected video, we obtain its topten recommended videos, and we randomly select one. Again,we watch the randomly selected video and we randomly selectone of its top ten recommendations. Following this process,we simulate the behavior of a user who watches videos basedon recommendations selecting randomly the next video fromamong the top ten recommendations of the current video untilwe reach five hops (i.e., 6 total videos viewed), which consti-tutes the end of a single live random walk. We repeat this pro-cess for 20 random walks for each search term related to ourpseudoscientific topics, while at the same time classifying eachvideo we visit, using our classifier. We perform this experi-ment for all the three Google accounts, the user with no profile(browser), and using the YouTube Data API between October1, 2020 and October 12, 2020.Next, for the live random walks of each user profile we calcu-late the percentage of pseudoscientific videos encountered overall unique videos that the random walker reaches up to the k-th hop. Figure 6 plots this percentage per hop for each of thepseudoscientific topics explored.When observing the percentage of pseudoscientific videosencountered by each user profile in all the random walks ofeach pseudoscientific topic, we unveil some interesting find-ings. Initially, we observe that for “COVID-19,” “Flat Earth,”and “Anti-vaccination” the amount of pseudoscientific contentbeing suggested to the Pseudoscience profile after five hopsis higher than the Science profile (see individual plots in Fig-ure 6). More precisely, the amount of unique pseudoscientificvideos encountered by the Pseudoscience profile after five hopsis . , . , and . for “COVID-19,” “Flat Earth,”and “Anti-vaccination,” respectively, while for the Science pro-file is . , . , and . However, this is not the casefor relatively new and more emerging topics like “Anti-mask,”where the Science profile is being suggested a higher propor- tion of pseudoscientific content than the Pseudoscience profileafter five hops (see “Anti-mask” in Figure 6).Interestingly, we also find that, for more traditional pseu-doscientific topics like “Flat Earth,” YouTube suggests morepseudoscientific content to all type of users compared to theother three more recent pseudoscientific topics, another indica-tion that YouTube has taken measures to counter the spread ofpseudoscientific misinformation related to important topics likethe COVID-19 pandemic.Overall, we find that in most cases the watch history of theuser does affect user recommendations and the amount of pseu-doscientific content being suggested by YouTube’s algorithm.This is also observed when looking at the results of the ran-dom walks performed on the browser by the user with no pro-file. This profile does not maintain a watch history and is rec-ommended less pseudoscientific content than all the other userprofiles after five hops in almost all random walks. Finally,the results of the random walks performed using the YouTubeData API do not consistently follow the trends of browser basedwalks with user profiles across all topics. For example, in therandom walks of “Flat Earth”, “Anti-vaccination”, and “Anti-mask” we observe a higher amount of pseudoscientific contentbeing suggested after five hops than any of the other walks.
We now summarize the proportion of pseudoscientific con-tent found from the experiments discussed above and the maintake ways from them. Table 5 reports the percentage of uniquepseudoscientific videos appearing in the YouTube homepage,search results, and the video recommendations section for eachuser profile out of all the unique videos encountered by eachuser profile in each experiment.The highest percentage of pseudoscientific videos occur inthe search results section of YouTube. When looking at theresults of each individual experiment, we make some interest-ing observations. First, for the search results experiment, weobserve that for all pseudoscientific topics the Science profileencountered more pseudoscientific content when searching forthese topics than the Pseudoscience profile, except for COVID-19, where the Pseudoscience profile encountered more pseudo-scientific content. When it comes to video recommendations, inall the random walks of all topics except anti-mask, the Pseudo-science profile encountered more pseudoscientific content thanthe Science profile.8 art of the YouTube Platform Pseudoscientific Science Pseudoscience Science/Pseudoscience No Profile No ProfileTopic Profile Profile Profile (Browser) (API)Homepage (Top 20) - .
4% 26 .
5% 25 .
6% 26 . - Search Results (Top 10)
COVID-19 .
6% 33 .
8% 36 .
9% 33 .
0% 37 . Flat Earth .
0% 100 .
0% 100 .
0% 100 .
0% 100 . Anti-vaccination .
5% 86 .
1% 92 .
6% 90 .
0% 90 . Anti-mask .
6% 69 .
2% 68 .
2% 80 .
0% 54 . All Topics .
8% 68 .
2% 71 .
2% 71 .
0% 69 . Video Recommendations
COVID-19 .
2% 27 .
9% 14 .
8% 15 .
8% 10 . Flat Earth .
4% 53 .
3% 33 .
7% 34 .
7% 60 . Anti-vaccination .
8% 29 .
9% 32 .
9% 31 .
0% 32 . Anti-mask .
7% 11 .
5% 18 .
6% 6 .
4% 30 . All Topics .
4% 29 .
4% 24 .
7% 22 .
6% 29 . Table 5: Percentage of unique pseudoscientific videos out of all videos encountered by each user profile in each pseudoscientifictopic in the three main parts of the YouTube platform.Overall, the main take-away points from our analysis in-clude:1. The YouTube search results and video recommendationsexperiments show that the watch history of the user sub-stantially affects what videos are suggested to the user.2. It is more likely to encounter pseudoscientific videos in thesearch results page of the platform (i.e., when searchingfor a specific topic) than in the video recommendationssections or the homepage of a user.3. For traditional pseudoscience topics (e.g., flat earth), thereis a higher rate of recommended pseudoscientific contentthan for more emerging/controversial topics like COVID-19, anti-vaccination, and anti-mask. Furthermore, forCOVID-19, we observe an even smaller amount of pseu-doscientific content being suggested, which may be a re-sult of measures YouTube took to mitigate pseudoscien-tific misinformation related to the COVID-19 pandemic.
In this section, we review prior work investigating pseudo-science and misinformation on YouTube, malicious activity onYouTube, auditing of the recommendation algorithm, and onuser personalization across the Web.
Pseudoscience and Misinformation.
The scientific commu-nity has extensively studied the phenomenon of misinformationand the credibility issues of online content [35, 67]. The ma-jority of prior work focuses on analyzing misinformation andpseudoscientific content on other social networks [7, 5, 50, 32],although some on specific pseudoscientific, misinformative,and conspiratorial topics on YouTube. For instance, Li etal. [39] focus on misinformation related to the COVID-19 pan-demic on YouTube. They search YouTube on March 21, 2020using the terms ‘coronavirus’ and ‘COVID-19’ and they col-lect and analyze the top 75 viewed videos from each searchterm finding . of them to be misinformation. Donzelli et al. [21] focus on misinformation related to vaccines suppos-edly causing autism by performing a quantitative analysis ofYouTube videos. They find an annual increase in the numberof such videos being available on YouTube, and they concludethat public health institutions should be more active on the Webproviding reliable information about vaccination to the generalpublic. In another work, Loeb et al. [40] focus on the dissem-ination of misinformation about prostate cancer on YouTube.Landrum et al. [37] investigate how users with different sciencecomprehension and attitude towards conspiracies are suscepti-ble to flat Earth arguments on YouTube finding that users withlower science intelligence and higher conspiracy mentality aremore likely to be recommended flat earth-related videos.Faddoul et al. [24] develop a classifier to detect conspiratorialvideos on YouTube and use it to perform a longitudinal analysisof conspiracy videos. In particular, they perform a simulation ofYouTube’s autoplay feature, without user personalization, andfind that as the conspiracy likelihood of the source video in-creases so does the conspiracy likelihood of the recommendedvideo. Malicious activity on YouTube.
A substantial body ofwork focuses on detecting and studying malicious content onYouTube. Jiang et al. [31] investigate how channel partisan-ship affects comment moderation on YouTube and they find thatcomments are usually moderated if the channel that posted thevideo is ideologically extreme. Zannettou et al. [66] proposea deep learning classifier for identifying videos on YouTubethat use manipulative techniques to increase their views (i.e.,clickbait). Agarwal et al. [3] present a binary classifier trainedwith user and video features to detect videos promoting hateand extremism on YouTube, while Mariconti et al. [42] build aclassifier to predict, at upload time, whether or not a YouTubevideo will be “raided” by hateful users. Furthermore, Hussainet al. [29] analyze disinformation and crowd manipulation tac-tics on YouTube. They analyze the metadata of videos promot-ing conspiracy theories on the platform and apply social net-work analysis techniques to identify malicious behaviors.
YouTube’s Recommendation Algorithm and Audits.
Cov-ington et al. [16] provide a description of YouTube’s recom-9endation algorithm, focusing on two models: (1) a deepcandidate generation model used to retrieve a small subset ofvideos from a large corpus; and (2) a deep ranking model usedto rank those videos based on their relevance to the user’s ac-tivity. Zhao et al. [68] introduce a large-scale ranking sys-tem for YouTube recommendations that extends the Wide &Deep model architecture with Multi-gate Mixture-of-expertsfor multi-task learning. The proposed model ranks the candi-date recommendations of a given video taking into account userengagement (e.g., user clicks) and satisfaction objectives (e.g.,video likes).Ribeiro et al. [52] perform a large-scale audit of user rad-icalization on YouTube: they analyze videos from Intellec-tual Dark Web, Alt-lite, and Alt-right channels, showing thatthey increasingly share the same user base. They also an-alyze YouTube’s recommendation algorithm finding that Alt-right channels can be reached from both Intellectual Dark Weband Alt-lite channels. Papadamou et al. [46] focus on char-acterizing and detecting disturbing videos on YouTube target-ing young children, while they also propose a classifier for de-tecting such videos. Using the proposed classifier they analyzeYouTube’s recommendation algorithm finding that young chil-dren are likely to encounter disturbing videos when they ran-domly browse the platform starting from benign videos. Pa-padamou et al. [47] study the Incel community on YouTubeand how inappropriate and hateful content relevant to this com-munity spreads on the platform. They also analyze how suchvideos are recommended to users by quantifying the probabil-ity that a user will encounter an Incel-related video by virtue ofYouTube’s recommendation algorithm.
User Personalization.
Most of the work on user personaliza-tion focuses on Web search engines and is motivated by the con-cerns around the Filter Bubble effect [48]. Hannak et al. [26]measure personalization on Web search and they propose amethodology for measuring personalization in Web search re-sults. They apply this methodology to Google Search findingan 11.7% difference in search results due to personalization.They also find that account login status and the IP address ofthe user affect search results. Unlike their study, we focus onYouTube and its recommendation algorithm and we devise adifferent methodology that enables us to assess the effect of auser’s watch history on video recommendations in all the partsof the platform.Kliman-Silver et al. [34] propose a methodology for explor-ing the impact of location-based personalization on Googlesearch results. Robertson et al. [54] focus on the personaliza-tion and composition of politically-related search engine resultsand they propose a methodology for auditing Google Search us-ing a dynamic set of political queries. Le et al. [38] investigatewhether politically oriented Google news search results are per-sonalized based on the user’s browsing history. Using a “sockpuppet” audit system, they find significant personalization thattends to reinforce the presumed partisanship of a user.St¨ocker et al. [59] analyze the effect of extreme recommen-dations on YouTube, finding that YouTube’s auto-play featureis problematic. They conclude that preventing inappropriatepersonalized recommendations is technically infeasible due to the nature of the recommendation algorithm. Finally, Hus-sein et al. [30] focus on measuring misinformation on YouTubeand perform audit experiments considering five popular top-ics like 9/11 and chemtrail conspiracy theories to investigatewhether personalization contributes to amplifying misinforma-tion. They audit three YouTube parts, namely, search results,Up-next video, and Top 5 video recommendations. They findthat, once a user develops watch history, the demographic at-tributes affect the extend of misinformation recommended tothe users. More importantly, they also find a filter bubble ef-fect in the video recommendations section for almost all thetopics they analyze. Instead, we build a classifier and we useit to characterize and detect pseudoscientific misinformationon YouTube by mostly focusing on health-related topics (e.g.,COVID-19), which can have devastating effects on society. Wealso devise a methodology that allows us to better assess theeffect of a user’s watch history in all the main parts of theYouTube platform, including the homepage of the user. UnlikeHussein et al., our methodology also includes the simulation ofthe behavior of users with distinct watch histories who searchthe platform for a video and subsequently watch several videosaccording to recommendations.
Remarks.
Unlike previous work, we build a classifier and weuse it to characterize and detect pseudoscientific misinforma-tion on YouTube, aiming to understand how a user’s watchhistory affects YouTube’s recommendations across multipleparts of the platform (i.e., homepage, search results page, andvideo recommendations). To do this, we devise a methodol-ogy that also includes the simulation of the behavior of userswith distinct watch histories. Note that we also make ourdataset and source code publicly available, hoping to enablefurther research on understanding the effect of personalizationon YouTube, as well as studies focusing on auditing the recom-mendation algorithm, irrespectively of the topic of interest.
In this work, we studied pseudoscientific content on theYouTube platform. We collected a dataset of 6.6K YouTubevideos and, using crowdsourcing, we annotated them accord-ing to whether or not they include pseudoscientific content. Wethen trained a deep learning classifier to detect pseudoscien-tific videos and used it to perform experiments assessing theprevalence of pseudoscientific content on various parts of theplatform while accounting for the effects of the user’s watchhistory. To do so, we crafted a set of accounts with differentwatch histories.Overall, we found that the user’s watch history indeed sub-stantially affects future user recommendations by YouTube’s al-gorithm. This result should be taken into consideration by com-munities aiming to audit the recommendation algorithm and un-derstand how it drives users’ content consumption patterns. Wealso found that YouTube search results are more likely to returnpseudoscientific content than other parts of the platform likethe recommendation engine or a user’s homepage. However,we also observed a non-negligible number of pseudoscientific10ideos on both the video recommendations section as well asthe homepage of the users.Finally, by investigating the differences across multiple pseu-doscientific topics, we showed that the recommendation algo-rithm is more likely to recommend pseudoscientific contentfrom traditional pseudoscience topics, e.g., flat earth, comparedto more controversial topics like COVID-19. This likely in-dicates that YouTube takes measures to counter the spread ofharmful information related to important and emerging topicslike the COVID-19 pandemic. However, achieving this in aproactive and timely manner across topics remains a challenge.In addition, the low agreement score of our crowdsourcedannotation, as well as the accuracy of our binary classifier pointto the difficulty in identifying whether a video is pseudoscien-tific or not, and also indicates that it is not easy to automatethe discovery of misinformation. Hence, we believe that themost potent way for YouTube to effectively cope with misin-formation on the platform is a mitigation scheme that uses deeplearning models that in turn provide signal of potential pseudo-scientific videos to human annotators who examine the videosand make the final decision.Our work provides insights on pseudoscientific videos onYouTube and provides a set of resources to the research com-munity, as we will make the dataset, the classifier, as well as allthe source code of our experiments publicly available. In par-ticular, the ability to run these kind of experiments while takinginto account users’ watching history will arguably be particu-larly useful to researchers focusing on demystifying YouTube’srecommendation algorithm—irrespective of the topic of inter-est. In other words, our methodology and codebase are generic,and can be used to study other topics besides pseudoscience,e.g., other conspiracy theories.
Limitations.
Naturally, our work is not without limitations.First, we use crowdworkers who are unlikely to have any exper-tise on identifying pseudoscientific content. Hence, a small per-centage of the annotated videos may be misclassified. However,we mitigated this issue by not including annotators with lowaccuracy on a classification task performed on a test dataset,and annotating each video based on the majority agreement.Second, our ground truth dataset is relatively small for sucha subjective classification task. Nonetheless, we argue thatthe classifier provides a meaningful signal of the behavior ofYouTube’s recommendation algorithm with regards to recom-mending pseudoscientific content. Finally, as for user person-alization, we only work with watch history, which is only afraction of the signals YouTube uses for user personalization.
Future Work.
A more comprehensive user personalizationmethodology to account for factors outside of watch history,like account characteristics, location, and user engagement is aclear direction for future work. We also plan to conduct studiesto understand how people share and view pseudoscientific con-tent on other social networks like Twitter and Facebook, andhow people interact and engage with such content.
This project has received funding from the European Union’sHorizon 2020 Research and Innovation program under theCONCORDIA project (Grant Agreement No. 830927), andfrom the Innovation and Networks Executive Agency (INEA)under the CYberSafety II project (Grant Agreement No.1614254). This work reflects only the authors’ views; the fund-ing agencies are not responsible for any use that may be madeof the information it contains.
References [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean,M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow:A System for Large-scale Machine Learning.
USENIX OSDI ,2016.[2] ABC/Reuters. Millions view viral Plandemic video fea-turing discredited medical researcher Judy Mikovits. , 2020.[3] S. Agarwal and A. Sureka. A Focused Crawler for Mining Hateand Extremism Promoting Videos on YouTube.
ACM Hypertext ,2014.[4] Alexa. Alexa - The top 500 sites on the web. , 2020.[5] H. Allcott and M. Gentzkow. Social Media and Fake News in the2016 Election.
Journal of economic perspectives , 2017.[6] M. Alllman and V. Paxson. Issues and Etiquette Concerning Useof Shared Measurement Data.
ACM SIGCOMM , 2007.[7] G. W. Allport and L. Postman. An Analysis of Rumor.
Publicopinion quarterly , 1946.[8] Anonymous. Annotation platform - Instruc-tions given to the crowdsourcing annotators. https://drive.google.com/file/d/1qaoPAEzaruj5C0vBd78Kxfel8IVebZkn/view?usp=sharing , 2020.[9] Appen. AI Solutions with confident Training Data. https://appen.com/solutions/training-data/ , 2020.[10] S. Arlot, A. Celisse, et al. A Survey of Cross-Validation Proce-dures for Model Selection.
Statistics Surveys , 2010.[11] P. Ball. Anti-vaccine movement could undermine efforts toend coronavirus pandemic, researchers warn. , 2020.[12] N. Carne. ”Conspiracies” dominate YouTube climate modifica-tion videos. https://cosmosmagazine.com/social-sciences/conspiracies-dominate-youtube-climate-modification-videos , 2019.[13] P. R. Center. YouTube & News. ,2012.[14] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer.SMOTE: Synthetic Minority Over-Sampling Technique.
Journalof Artificial Intelligence Research , 2002.[15] F. Chollet et al. Keras: The Python Deep Learning Library.
ASCL , 2018.
16] P. Covington, J. Adams, and E. Sargin. Deep Neural Networksfor YouTube Recommendations.
ACM RecSys , 2016.[17] G. Developers. YouTube Data API. https://developers.google.com/youtube/v3/ , 2020.[18] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of Deep Bidirectional Transformers for Language Un-derstanding. arXiv:1810.04805 , 2018.[19] R. Diresta. The Complexity of Simply Searching for MedicalAdvice. , 2018.[20] D. Dittrich and E. Kenneally. The Menlo Report: Ethical Prin-ciples Guiding Information and Communication Technology Re-search.
U.S. Department of Homeland Security , 2012.[21] G. Donzelli, G. Palomba, I. Federigi, F. Aquino, L. Cioni, M. Ve-rani, A. Carducci, and P. Lopalco. Misinformation on Vaccina-tion: a Quantitative Analysis of YouTube Videos.
Human vac-cines & Immunotherapeutics , 2018.[22] J. D’Urso and A. Wickham. YouTube Is Letting Millions Of Peo-ple Watch Videos Promoting Misinformation About The Coro-navirus. , 2020.[23] Facebook. fastText - Library for efficient text classification andrepresentation learning., 2020.[24] M. Faddoul, G. Chaslot, and H. Farid. A Longitudi-nal Analysis of YouTube’s Promotion of Conspiracy Videos. arXiv:2003.03318 , 2020.[25] J. L. Fleiss. Measuring Nominal Scale Agreement Among ManyRaters.
Psychological bulletin , 1971.[26] A. Hannak, P. Sapiezynski, A. Molavi Kakhki, B. Krishna-murthy, D. Lazer, A. Mislove, and C. Wilson. Measuring Per-sonalization of Web Search.
TheWebConf , 2013.[27] Y. Help. COVID-19 Medical Misinformation Policy. https://support.google.com/youtube/answer/9891785?hl=en , 2020.[28] Y. M. Help. View, delete, or pause watch history. https://support.google.com/youtubemusic/answer/6364666?hl=en , 2020.[29] M. N. Hussain, S. Tokdemir, N. Agarwal, and S. Al-Khateeb.Analyzing Disinformation and Crowd Manipulation Tactics onYouTube.
ASONAM , 2018.[30] E. Hussein, P. Juneja, and T. Mitra. Measuring Misinformation inVideo Search Platforms: An Audit Study on YouTube.
SIGCHI ,2020.[31] S. Jiang, R. E. Robertson, and C. Wilson. Bias Misperceived:The Role of Partisanship and Misinformation in YouTube Com-ment Moderation.
ICWSM , 2019.[32] N. F. Johnson, N. Vel´asquez, N. J. Restrepo, R. Leahy,N. Gabriel, S. El Oud, M. Zheng, P. Manrique, S. Wuchty,and Y. Lupu. The Online Competition Between Pro-and Anti-vaccination Views.
Nature .[33] L. Kelion. Coronavirus: YouTube tightens rules afterDavid Icke 5G interview. , 2020.[34] C. Kliman-Silver, A. Hannak, D. Lazer, C. Wilson, and A. Mis-love. Location, Location, Location: The Impact of Geolocationon Web Search Personalization.
IMC , 2015. [35] S. Kumar and N. Shah. False Information on Web and SocialMedia: A Survey. arXiv:1804.08559 , 2018.[36] J. R. Landis and G. G. Koch. The Measurement of ObserverAgreement for Categorical Data.
Biometrics , 1977.[37] A. R. Landrum, A. Olshansky, and O. Richards. DifferentialSusceptibility to Misleading Flat Earth Arguments on YouTube.
Media Psychology , 2019.[38] H. Le, R. Maragh, B. Ekdale, A. High, T. Havens, and Z. Shafiq.Measuring Political Personalization of Google News Search.
TheWebConf , 2019.[39] H. O.-Y. Li, A. Bailey, D. Huynh, and J. Chan. YouTube as ASource of Information on COVID-19: A Pandemic of Misinfor-mation?
BMJ Global Health , 2020.[40] S. Loeb, S. Sengupta, M. Butaney, J. N. Macaluso Jr, S. W.Czarniecki, R. Robbins, R. S. Braithwaite, L. Gao, N. Byrne,D. Walter, et al. Dissemination of Misinformative and BiasedInformation About Prostate Cancer on YouTube.
European urol-ogy , 2019.[41] M. Lynas. 5G: What’s behind the latest COVID conspiracy the-ory? https://allianceforscience.cornell.edu/blog/2020/04/5g-whats-behind-the-latest-covid-conspiracy-theory/ , 2020.[42] E. Mariconti, G. Suarez-Tangil, J. Blackburn, E. De Cristo-faro, N. Kourtellis, I. Leontiadis, J. L. Serrano, and G. Stringh-ini. “You Know What to Do”: Proactive Detection of YouTubeVideos Targeted by Coordinated Hate Attacks.
CSCW , 2019.[43] T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin.Advances in Pre-Training Distributed Word Representations.
LREC , 2018.[44] B. Muthukadan. Selenium with Python - Official Documen-tation. https://selenium-python.readthedocs.io/ , 2018.[45] N. Newman, R. Fletcher, A. Schulz, S. Andi, and R. K.Nielsen. Reuters Institute Digital News Report. https://reutersinstitute.politics.ox.ac.uk/sites/default/files/2020-06/DNR_2020_FINAL.pdf ,2020.[46] K. Papadamou, A. Papasavva, S. Zannettou, J. Blackburn,N. Kourtellis, I. Leontiadis, G. Stringhini, and M. Sirivianos.Disturbed YouTube for Kids: Characterizing and Detecting In-appropriate Videos Targeting Young Children. 2020.[47] K. Papadamou, S. Zannettou, J. Blackburn, E. De Cristofaro,G. Stringhini, and M. Sirivianos. Understanding the Incel Com-munity on YouTube. arXiv:2001.08293 , 2020.[48] E. Pariser.
The filter bubble: How the new personalized web ischanging what we read and how we think . Penguin, 2011.[49] A. Plus. Adblock Plus - The world’s https://adblockplus.org/ , 2020.[50] M. Rajdev and K. Lee. Fake and Spam Messages: DetectingMisinformation During Natural Disasters on Social Media.
WI-IAT , 2015.[51] K. Renic. Coronavirus: Dozens show up at anti-maskrally in moncton, n.b. https://globalnews.ca/news/7391000/anti-mask-rally-moncton-new-brunswick/ , 2020.[52] M. H. Ribeiro, R. Ottoni, R. West, V. A. Almeida, andW. Meira Jr. Auditing Radicalization Pathways on YouTube. In
ACM FAT* , 2020.
53] C. M. Rivers and B. L. Lewis. Ethical research standards in aworld of big data.
F1000Research , 2014.[54] R. E. Robertson, D. Lazer, and C. Wilson. Auditing the Person-alization and Composition of Politically-Related Search EngineResults Pages.
TheWebConf , 2018.[55] E. Scott. Why people believe the Earth is flat and we shouldlisten to anti-vaxxers. , 2019.[56] M. Spring. Coronavirus: False claims viewed by mil-lions on YouTube. , 2020.[57] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, andR. Salakhutdinov. Dropout: A Simple Way to Prevent NeuralNetworks from Overfitting.
JMLR , 2014.[58] B. Stetka. Where’s the Proof that Mindfulness MeditationWorks? , 2017.[59] C. St¨ocker and M. Preuss. Riding the Wave of Misclassification:How We End up with Extreme YouTube Content.
SIGCHI , 2020.[60] I. Turc, M.-W. Chang, K. Lee, and K. Toutanova. Well-read stu-dents learn better: On the importance of pre-training compactmodels. arXiv:1908.08962v2 ,2020.[63] C. G. Weissman. Despite recent crackdown, YouTubestill promotes plenty of conspiracies. , 2019.[64] N. Westman. YouTube will remove videos with COVID-19vaccine misinformation. , 2020.[65] Wikipedia. Reiki. https://en.wikipedia.org/wiki/Reiki , 2020.[66] S. Zannettou, S. Chatzis, K. Papadamou, and M. Sirivianos.The Good, the Bad and The Bait: Detecting and CharacterizingClickbait on YouTube.
IEEE Security and Privacy Workshops(SPW) , 2018.[67] S. Zannettou, M. Sirivianos, J. Blackburn, and N. Kourtellis. TheWeb of False Information: Rumors, Fake News, Hoaxes, Click-bait, and Various Other Shenanigans.
JDIQ , 2019.[68] Z. Zhao, L. Hong, L. Wei, J. Chen, A. Nath, S. Andrews,A. Kumthekar, M. Sathiamoorthy, X. Yi, and E. Chi. Recom-mending What Video to Watch Next: A Multitask Ranking Sys-tem.
ACM RecSys , 2019., 2019.