[PDF] "I Won the Election!": An Empirical Analysis of Soft Moderation Interventions on Twitter

Abstract

Over the past few years, there is a heated debate and serious public concerns regarding online content moderation, censorship, and the principle of free speech on the Web. To ease these concerns, social media platforms like Twitter and Facebook refined their content moderation systems to support soft moderation interventions. Soft moderation interventions refer to warning labels attached to potentially questionable or harmful content to inform other users about the content and its nature while the content remains accessible, hence alleviating concerns related to censorship and free speech. In this work, we perform one of the first empirical studies on soft moderation interventions on Twitter. Using a mixed-methods approach, we study the users who share tweets with warning labels on Twitter and their political leaning, the engagement that these tweets receive, and how users interact with tweets that have warning labels. Among other things, we find that 72% of the tweets with warning labels are shared by Republicans, while only 11% are shared by Democrats. By analyzing content engagement, we find that tweets with warning labels had more engagement compared to tweets without warning labels. Also, we qualitatively analyze how users interact with content that has warning labels finding that the most popular interactions are related to further debunking false claims, mocking the author or content of the disputed tweet, and further reinforcing or resharing false claims. Finally, we describe concrete examples of inconsistencies, such as warning labels that are incorrectly added or warning labels that are not added on tweets despite sharing questionable and potentially harmful information.

Full PDF

““I Won the Election!”: An Empirical Analysis of Soft ModerationInterventions on Twitter

Savvas Zannettou

Max Planck Institute for [email protected]

Abstract

Over the past few years, there is a heated debate and seriouspublic concerns regarding online content moderation, censor-ship, and the basic principle of free speech on the Web. Toease some of these concerns, mainstream social media plat-forms like Twitter and Facebook reﬁned their content moder-ation systems to support soft moderation interventions. Softmoderation interventions refer to warning labels that are at-tached to potentially questionable or harmful content with thegoal of informing other users about the content and its nature,while the content remains accessible, hence alleviating con-cerns related to censorship and free speech.In this work, we perform one of the ﬁrst empirical studieson soft moderation interventions on Twitter. Using a mixed-methods approach, we study the users that are sharing tweetswith warning labels on Twitter and their political leaning, theengagement that these tweets receive, and how users interactwith tweets that have warning labels. Among other things, weﬁnd that 72% of the tweets with warning labels are shared byRepublicans, while only 11% are shared by Democrats. By an-alyzing content engagement, we ﬁnd that tweets with warninglabels tend to receive more engagement. Also, we qualitativelyanalyze how users interact with content that has warning la-bels ﬁnding that the most popular interactions are related tofurther debunking false claims, mocking the author or con-tent of the disputed tweet, and further reinforcing or resharingfalse claims. Finally, we describe concrete examples of incon-sistencies such as warning labels that are incorrectly added orwarning labels that are not added on tweets despite sharingquestionable and potentially harmful information.

Social media platforms like Twitter and Facebook are underpressure from the public to address issues related to the spreadof harmful content like hate speech [9] and online misinfor-mation [30], in particular during major events like elections.To ease public’s concerns and mitigate the effects of these im-portant issues, platforms are continuously reﬁning their guide-lines and improving their content moderation systems [10].Designing and implementing an ideal content moderationsystem is not straightforward as there are many challenges andaspects to be considered [8]. First, content moderation shouldbe performed in a timely manner to ensure that harmful con-tent is removed fast and only a small number of users are ex-

Figure 1:

An example of a soft moderation intervention on Twitter. posed to the harmful content. This is a particularly hard chal-lenge given the scale of modern social media platforms likeTwitter and Facebook. Second, content moderation should beconsistent and fair across their user base. Finally, content mod-eration should ensure that it is in accordance with basic prin-ciples of our society like the freedom of speech.To ease concerns related to freedom of speech and censor-ship, recently, Facebook and Twitter introduced a new featurein their content moderation systems; a type of soft modera-tion intervention that attaches warning labels and relevant in-formation to content that is questionable or potentially harm-ful or misleading [25, 24]. An example of a soft moderationintervention is depicted in Fig. 1, where Twitter moderatorsattached a warning label to a tweet, from President Trump, re-lated to the outcome of the 2020 US elections. These warninglabels are designed to “correct” the content of the tweet andprovide necessary related information, while ensuring that thefreedom of speech principle is not violated.Previous work investigated how users perceive these warn-ing labels [13, 6, 27, 28], assessing their effectiveness andhow their design can affect their effectiveness [1, 11, 16], andpossible unintended consequences from the use of warning la-bels [20, 19]. Despite this rich body of research work, the ma-jority of it investigates these warning labels in artiﬁcial envi-ronments either through interviews, surveys, or crowdsourc-ing studies. While these studies are useful and important, theydo not consider platform-speciﬁc affordances such as user in-teractions with posts that have warning labels (e.g., retweets,likes, etc.) As a research community, we lack empirical ev-idence to understand how these warning labels are used onsocial media platforms like Twitter and how users interact andengage with them.1 a r X i v : . [ c s . S I] J a n n this work, we aim to bridge this research gap by perform-ing an empirical analysis of soft moderation interventions onTwitter. We focus on answering the following research ques-tions:• RQ1:

What are the types of warning labels on Twitterand what kind of users have their tweets ﬂagged more fre-quently? Are there differences across political leanings?•

RQ2:

Is the engagement of content that includes warninglabels signiﬁcantly different compared to content withoutwarning labels?•

RQ3:

How do users on Twitter interact with content thatincludes warning labels?To answer these research questions, we collect a datasetof tweets, shared between March 2020 and December 2020,which include soft moderation interventions (i.e., warning la-bels). To do this, we use Twitter’s API and we collect thetimeline of popular veriﬁed users. We mainly focus on veriﬁedusers as they usually have a large audience and their contentcan receive considerable engagement. Overall, we collect a setof 18K tweets that had warning labels on them shared from8.1K users between March 2020 and December 2020. Then,we follow a mixed-methods approach to analyze the engage-ment of tweets with warning labels and the users that sharethem (quantitative analysis), as well as how users interact withtweets and warning labels using qualitative analysis.

Findings.

Our main ﬁndings are:• We ﬁnd that 72.8% of the tweets that include warning la-bels were shared by Republicans, while only 11.6% ofthe tweets were shared by Democrats. This likely indi-cates that Republicans tend to disseminate more ques-tionable or potentially harmful information that is even-tually ﬂagged by Twitter. Another possible explanationis that due to the result of the 2020 US elections andclaims about election fraud, Twitter’s moderation teamdevotes more resources to moderating politics-relatedcontent coming from Republican users (

RQ1 ).• By analyzing the engagement of tweets, we ﬁnd thattweets that have warning labels receive more engage-ment compared to tweets without warning labels. Also,by looking into the users that have increased engage-ment in tweets with warning labels, we ﬁnd that mostof the users that have high engagement in general, haveincreased engagement on tweets with warning labels aswell (

RQ2 ).• Our qualitative analysis indicates that a lot of users in-teract with content that has warning labels by further de-bunking false claims, mocking or sharing emotions aboutthe author/content of the questionable tweet, or by rein-forcing the false claims that are included in tweets withwarning labels. Also, we shed light into some of the chal-lenges and issues that exist when designing and develop-ing large-scale soft moderation intervention systems. Weﬁnd instances where the warning labels were incorrectly added (e.g., see Fig. 7) and cases where the moderationsystem is inconsistent (i.e., content should be ﬂagged butit is not). Some of these cases are likely due to the dissem-ination of similar information across different languages(e.g., see Fig. 8) and across various formats of informa-tion like text and videos (

RQ3 ). Contributions.

The contributions of this work are three-fold.First, to the best of our knowledge, we perform one of the ﬁrstcharacterizations of soft moderation interventions based onempirical data from Twitter. Also, we plan to make our datasetpublicly available (upon request), hence assisting the researchcommunity in conducting further studies on soft moderationinterventions based on empirical data. Second, our quantita-tive analysis quantiﬁes the effectiveness of soft moderation in-terventions on Twitter through the lens of the engagement theyreceive (e.g., likes, retweets, etc.). This analysis encapsulatesengagement from real users interacting with timely contenton Twitter, hence it complements and strengths the ﬁndingsfrom studies undertaken in controlled experiments (e.g., viasurveys). Finally, our qualitative analysis sheds light into howusers interact with content that includes warning labels andit helps us understand some of the real-world challenges thatexist when designing soft moderation intervention systems.

Moderation interventions on social media platforms can be ap-plied on various levels. First, there are interventions that areapplied on the post level (e.g., post removal). Second, thereare interventions that exist on the user level [17, 14] like userbans or shadow banning (i.e., limiting the visibility of their ac-tivity). Finally, community-wide moderation interventions ex-ist where platforms moderate speciﬁc sub-communities withintheir platforms (e.g., banning Facebook groups or subred-dits) [2, 3, 18, 22, 26].For each of the above mentioned levels, there are two dif-ferent types of interventions: hard and soft interventions. Hardmoderation interventions refer to moderation actions that re-move content or entities from social media platforms (posts,users, or communities). On the other hand, soft moderationinterventions do not remove any content and they aim to in-form other users about potential issues with the content (e.g.,by adding warning labels) or limiting the visibility of ques-tionable content (shadow banning). Below, we review relevantprevious work that studies post-level soft moderation interven-tions as they are the most relevant to our work.

A rich body of previous work investigates soft moderationinterventions mainly through interviews, surveys, and crowd-sourcing studies. Speciﬁcally, Mena [13] performs an experi-ment using Amazon Mechanical Turk (AMT) workers to un-derstand user perceptions on content that includes warninglabels. By recruiting Facebook users and performing crowd-sourcing studies, they ﬁnd that the warning label had a sig-niﬁcant effect in users’ sharing intentions; that is, participantswere less willing to share content with warning labels. Geeng2t al. [6] focus on warning labels that are added on Twitter,Facebook, and Instagram, related to COVID-19 misinforma-tion. Through surveys they ﬁnd that users have a positive atti-tude towards warning labels, however, they highlight that usersverify misinformation through other means as well like search-ing on the Web for relevant information. Saltz et al. [27] focuson warning labels added on visual misinformation related toCOVID-19. By conducting in-depth interviews, they ﬁnd thatparticipants had different opinions regarding warning labels,with many participants perceiving them as politically-biasedand an act of censorship from the platforms.Kaiser et al. [11] use methods from information securitywork to evaluate the effectiveness and the design of warn-ing labels. Through controlled experiments, they ﬁnd that de-spite the existence of warning labels, users seek informationvia other means, thus conﬁrming the ﬁndings from [6]. Also,by performing crowdsourcing studies and asking users about8 warning labels designs, they conclude that users’ informa-tion seeking behavior is signiﬁcantly affected by the design ofthe warning label. Seo et al. [28] investigate user perceptionswhen they are exposed to fact checking and machine learninggenerated warning labels. Through experiments on AMT, theyﬁnd that users trust more fact checking warning labels com-pared to machine learning generated ones. Moravec et al. [16]highlight that the design of warning labels (i.e., how warningsare presented to users) can signiﬁcantly change their effective-ness. Also, they emphasize that clearly explaining the warn-ing labels to users can lead to increased effectiveness. Bode etal. [1] study the related stories functionality on Facebook asa means to detect or debunk misinformation. By conductingsurveys, they ﬁnd that when related stories debunk a misinfor-mation story, it signiﬁcantly reduces the participants’ misper-ceptions (beliefs that are not supported by evidence or expertopinion [19]).Other previous work demonstrates some unintentional con-sequences from the use of warning labels. Speciﬁcally, Pen-nycook et al. [20] conduct Amazon Mechanical Turk studiesand they show an implied truth effect, where posts that includemisinformation and are not accompanied with a warning labelare considered credible. Also, Nyhan and Reiﬂer [19] conductcontrolled experiments to assess the effectiveness of warninglabels to political false information. They highlight that thereis a backﬁre effect , where participants strengthen their supportto false political stories after seeing the warning label that in-cludes a correction. Pennycook et al. [21] emphasize the ex-istence of the illusory truth effect where users tend to believefalse information after getting exposed to it multiple times orfor an extended time period, despite the fact that the false in-formation is accompanied with a warning label.

Remarks.

Previous work investigated soft moderation inter-ventions in artiﬁcial testing environments like interviews, sur-veys, and crowdsourcing studies. This previous work is partic-ularly important as it helps us understand how people intend tointeract and engage with content that includes warning labelsor corrections. However, they do not capture platform-speciﬁcpeculiarities and they do not adequately capture how peopleinteract and engage with warning labels in realistic scenarios

Quoted - Warning on quoted tweet (e.g., Fig. 9)

Quoted - Warning on comment above (e.g., Fig. 8)

219 98

Quoted - Warning on both (e.g., Fig. 7)

50 30

Total

Table 1:

Overview of our dataset. (e.g., when reading a tweet from the US President). In thiswork, we address these limitations by performing, to the bestof our knowledge, one of the ﬁrst empirical analyses of softmoderation interventions on Twitter.

We start our data collection on Twitter and in particular onveriﬁed users, which are users who have an “especially largeaudience or are notable in government, news, entertainment,or another designated category.” We mainly focus on veriﬁedusers as they usually have a large audience and can have sub-stantial impact on online discussions, hence moderating con-tent from these users is important.We collect the dataset of Twitter veriﬁed users fromPushshift. The dataset includes Twitter account metadata for351,655 veriﬁed users. Then, for each user, we use Twitter’sAPI to obtain recent tweets/retweets shared by these users (i.e.,their timeline). We also collect soft moderation speciﬁc meta-data for each tweet: these include whether a tweet is accom-panied with a warning label and relevant metadata (e.g., la-bel text, landing URL, etc.). Note, that due to rate-limiting ofthe Twitter API, we tried to collect activity only from the top170,506 users based on the number of their followers (corre-sponding to 48.4% of all the Twitter veriﬁed accounts in thePushshift dataset). We managed to collect data for 168,126users, as the rest of them were either deleted, suspended, or ac-counts set to private. Our dataset collection process was con-ducted between December 7, 2020 and December 31, 2020.Overall, we collect 79,361,081 tweets shared during 2020from 168,126 users.Next, we ﬁlter all tweets that had soft moderation interven-tions (i.e., warning labels) from our dataset; we ﬁnd 29,232tweets from 9,334 veriﬁed users. This dataset also includesretweets of tweets with warning labels as well as tweets thatquote a tweet with a warning label. Due to this, we rehydrate,using the Twitter API, all quoted and retweeted tweets that hada warning label; we get an additional 3,106 tweets from 1,888users. Note that this procedure resulted in the acquisition oftweets from unveriﬁed users. This is because veriﬁed users inour dataset retweeted or quoted tweets from unveriﬁed users.Given that this content appears on the followers of veriﬁedusers, we keep in our dataset tweets from unveriﬁed users aswell.After excluding all retweets, our ﬁnal dataset includes18,765 tweets that include warning labels (either on the tweet The deﬁnition is obtained from the Twitter website. https://ﬁles.pushshift.io/twitter/TW_veriﬁed_users.ndjson.zst arning label Tweets This claim about election fraud is disputed 1,305 (58.1%)Learn about US 2020 election security efforts 271 (12.1%)Manipulated media 196 (8.7%)Learn how voting by mail is safe and secure 132 (5.8%)Ofﬁcial sources may not have calledthe race when this was Tweeted 101 (4.5%)Multiple sources called this election differently 96 (4.2%)Election ofﬁcials have certiﬁed Joe Bidenas the winner of the U.S. Presidential election 64 (2.8%)Some votes may still need to be counted 26 (1.1%)Get the facts about COVID-19 26 (1.1%)Esta reivindicação de fraude é contestada 11 (0.5%)Saiba por que urnas eletrônicas são seguras 11 (0.5%)Sources called this election differently 3 (0.1%)Get the facts about mail-in ballots 2 (0.1%)

Table 2:

Warning labels in our dataset. itself or on referenced tweets like quoted tweets) from 8,143users (see Table 1). We split our dataset into two parts;1) tweets that have warning labels attached to them (ﬁrst rowin Table 1); and 2) tweets that quote other tweets and any (orboth) of the tweets have warning labels (see second-fourth rowin Table 1). For the remainder of this paper, we call the ﬁrstpart of our dataset tweets with warning labels and the secondpart of our dataset quoted tweets . Ethical considerations and data availability.

We emphasizethat we collect and work entirely with publicly available dataas we do not collect any data from users who have a privateaccount. Overall, we follow standard ethical research stan-dards [23] like refraining from tracking users across sites andcompromising user privacy. Also, to help advancing empiricalresearch related to soft moderation interventions on Twitter,we will make publicly available (upon request) the tweet IDsand their corresponding warning labels.

In this section, we analyze the different types of warning labelsand how they are shared over time. Also, we perform a user-based analysis on users who shared tweets with warning labelsor quoted tweets, aiming to uncover differences across usersthat have opposing political leaning.

We start by looking into the different types of warning labelsthat exist in our dataset. To do this, we focus on tweets thatinclude warning labels (see ﬁrst row in Table 1), speciﬁcally,2,244 tweets posted by 853 users between March 7, 2020 andDecember 30, 2020.Table 2 shows all warning labels in our dataset along withtheir respective frequency and percentage over all the tweets.Overall, we ﬁnd 13 different warning labels with the majorityof them being related to the 2020 US elections. For instance,the most popular warning label in our dataset is “This claimabout election fraud is disputed” with 58% of all tweets. Other2020 US election warning labels are related to the securityof the elections like “Learn about US 2020 election security efforts” (12%) and “Learn how voting by mail is safe and se-cure” (5.8%), as well as related to the outcome of the electionslike “Multiple sources called this election differently” (4.2%)and “Election ofﬁcials have certiﬁed Joe Biden as the winnerof the U.S. Presidential election” (2.8%). Interestingly, we ﬁndalso warning labels referring to the 2020 US elections writtenin other languages (i.e., Portuguese). We ﬁnd 0.49% tweetsincluding “Esta reivindicação de fraude é contestada” (trans-lates to “This fraud claim is disputed”) and “Saiba por que ur-nas eletrônicas são seguras” (translates to “Find out why elec-tronic voting machines are safe”). Apart from politics-relatedwarning labels, we ﬁnd a general-purpose warning label thataims to inform users about manipulated media (e.g., imagesor videos) with 8.7% of all tweets in our dataset. Finally, weﬁnd a COVID-19 speciﬁc warning label: “Get the facts aboutCOVID-19” (1.15%) that aims to inform users about health-related issues and in particular the COVID-19 pandemic.Next, we analyze how these warning labels are sharedover time. Note that 92.8% of the tweets are shared betweenNovember 1, 2020 and December 30, 2020. Fig. 2 shows howthe top 10 most popular warning labels in our dataset areshared over time (we focus on the period between November1, 2020 and December 30, 2020 for readability purposes). Weplot the frequency of warning labels over time and ﬁnd twodifferent temporal patterns. First, we ﬁnd warning labels thatare short-lived as the majority of their appearances on tweetshappen within a short period of time. Concretely, both “Learnabout US 2020 election security efforts” and “Ofﬁcial sourcesmay not have called the race when this was Tweeted” are ex-clusively used during the ﬁrst week of November 2020. On theother hand, we ﬁnd warning labels that are long-lived. E.g., thelabel “This claim about election fraud is disputed” is used forthe entirety of the period between November and December2020. Overall, these results indicate that warning labels aretime and context dependent, with some of them being short-lived (few days) and some of them being long-lived (severalmonths).

Here, we look into the users who share tweets with warninglabels. Recall that our data collection involves 168K users andonly 853 of them share tweets that have warning labels, henceindicating that only a small percentage (0.5%) of Twitter usershave warning labels attached to their content. As per Fig. 3,out of the 853 users, 70% of them had only one tweet with awarning label, while only 3.6% of these users had at least 10tweets with warning labels. Overall, only a small percentageof users have warning labels on multiple of their tweets.

Users’ political leaning.

As we described above, our datasethas a strong political nature and the majority of the warninglabels refer to claims about the 2020 US elections (e.g., claimsabout election fraud, see Table 2). Motivated by this, we aug-ment our dataset with information about the political leaningof each user that shared tweets with warning labels. To in-fer users’ political leaning, we use the methodology presentedby [12] and in particular the Political Bias Inference API thatis publicly available by [15]. The API generates a vector with4 - - - - - - - - - - - - - - - - t w ee t s This claim about election fraud is disputedLearn about US 2020 election security effortsManipulated mediaLearn how voting by mail is safe and secureOfficial sources may not have called the race when this was TweetedMultiple sources called this election differentlyElection officials have certified Joe Biden as the winner of the U.S. Presidential electionSome votes may still need to be countedGet the facts about COVID-19Esta reivindicação de fraude é contestada

Figure 2:

Number of tweets that include a warning label for each day between November 1, 2020 and December 30, 2020.

All Tweets with warning labels Quoted tweetsTweets Users Tweets Users Tweets UsersRepublicans

Neutral

Democrats

176 (0.9%) 119 (1.4%) 54 (2.4%) 47 (5.5%) 127 (0.7%) 75 (0.9%)

Table 3:

Inferred Political leaning of users who shared tweets with warning labels or quoted a tweet that had a warning label. C D F Figure 3:

CDF of the number of tweets with warning labels per user. the topical interests of each user and their frequency. To dothis, the API collects all the friends of the user (i.e., peoplethat the user follows), generates all the topics inferred for each friend using the methodology in [7, 29], and calculates a vectorwith all the topics and their frequencies. Finally, by comparingthe topical vectors to a ground truth dataset of Republican andDemocrat Twitter users, the API infers whether a Twitter userhas a Republican, Democrat, or Neutral political leaning.In this work, we use the Political Bias Inference API, be-tween January 3 and January 10, 2021, to infer the politicalleaning of the 8,142 Twitter users in our dataset. Table 3 re-ports the number of tweets and users per inferred politicalleaning for the entire dataset, and broken down into the tweetsthat had warning labels and quoted tweets. We observe that forthe entire dataset, 51% of the users are Democrats, 13.4% areRepublicans, almost 32% are inferred as neutral, while for therest 1.4% we were unable to infer their political leaning. Thisis because some users were either suspended or made their ac-counts private by the time we were collecting their friend list,hence the Political Bias Inference API was unable to make aninference.Interestingly, when looking at the tweets with warning la-bels in Table 3, we ﬁnd that the majority of the tweets with5 ser Politicalleaning Accountstatus Tweets realDonaldTrump Republican Suspended 321 (14.3%)TeamTrump Republican Suspended 105 (4.6%)gatewaypundit Republican Active 71 (3.1%)va_shiva Neutral Active 38 (1.6%)JudicialWatch Republican Active 36 (1.6%)MichaelCoudrey Republican Suspended 27 (1.2%)TomFitton Republican Active 20 (0.8%)RudyGiuliani Republican Active 17 (0.7%)JamesOKeefeIII (U)

Republican Active 17 (0.7%)EmeraldRobinson Republican Active 16 (0.7%)RealJamesWoods Republican Active 16 (0.7%)LLinWood (U)

Republican Suspended 16 (0.7%)realLizUSA Republican Active 15 (0.6%)LouDobbs Republican Active 15 (0.6%)KMCRadio Republican Suspended 14 (0.6%)michellemalkin Republican Active 13 (0.5%)CodeMonkeyZ (U)

Republican Suspended 12 (0.5%)charliekirk11 Neutral Active 11 (0.4%)TrumpWarRoom Republican Active 11 (0.4%)chuckwoolery Republican Active 11 (0.4%)

Table 4:

Top 20 users who had the most warning labels on theirtweets. (U) refers to unveriﬁed users who exist in our dataset becauseveriﬁed users retweeted or quoted tweets of them that had warninglabels. We also report the account status of each user as of January 9,2021. warning labels are shared by Republicans (72% of all tweetsvs 11% for Democrats). This likely indicates that due to thecontext and developments related to the 2020 US elections,Republicans tend to share more questionable content that ismore likely to receive warning labels by Twitter. Another pos-sible explanation is that Twitter devotes more resources tomoderating content coming from Republican users. For thequoted tweets, we observe that Democrats tend to commenton tweets with warning labels more often than Republicans(56% vs 16.5% for Republicans).

Top users.

But who are the users who are the most “proliﬁc”with regards to tweets that include warning labels or in thequoted tweets? Table 4 and Table 5 show the top 20 users inour dataset based on the number of tweets that had warninglabels and the quoted tweets, respectively. For each user, wereport the inferred political leaning and whether the accountwas active or suspended on January 9, 2021. We make sev-eral observations. First, in both cases, the most proliﬁc useris President Trump with 14.3% of all tweets that had warninglabels and 0.4% of all quoted tweets. The account of PresidentTrump was permanently suspended by Twitter on January 8,2021, due to the risk of further incitement of violence [31], af-ter his supporters attacked the US capitol causing the death ofﬁve people [5]. Second, we observe that the majority of the top20 users who shared tweets with warning labels are inferred asRepublicans (see Table 4). This is not the case for the quoteddataset (see Table 5), as 8 out of the top 20 users with quotedtweets are inferred as Democrats. Third, despite the fact thatour study does not focus on unveriﬁed users, we observe the

User Politicalleaning Accountstatus Tweets realDonaldTrump Republican Suspended 78 (0.4%)AndrewFeinberg Democrat Active 52 (0.3%)svdate Neutral Active 50 (0.3%)NumbersMuncher Republican Active 49 (0.3%)atrupar Democrat Active 44 (0.2%)GlennKesslerWP Democrat Active 42 (0.2%)T_S_P_O_O_K_Y Republican Active 42 (0.2%)BrianKarem Democrat Active 39 (0.2%)Patterico Republican Active 38 (0.2%)TalbertSwan Neutral Active 37 (0.2%)BarnettforAZ Republican Active 33 (0.2%)TomFitton Republican Active 32 (0.2%)Justin_Stangel Democrat Active 31 (0.2%)JLMarchese111 N/A Suspended 31 (0.2%)HalSparks Democrat Active 30 (0.2%)RhondaFurin Republican Active 29 (0.2%)captainjanks Democrat Active 28 (0.2%)rogerkimball Republican Active 27 (0.2%)amhfarraj Democrat Active 26 (0.2%)michellemalkin Republican Active 25 (0.1%)

Table 5:

Top 20 users who quoted tweets that had warning labels. Wealso report the account status of each user as of January 9, 2021. existence of three unveriﬁed accounts among the top 20 userswho shared tweets with warning labels. This indicates thatTwitter’s moderation mechanism is not only limited to veri-ﬁed users. Finally, we note that 6 out of the top 20 users withtweets that had warning labels were suspended by Twitter (asof January 9, 2021). This highlights that the continuous dis-semination of questionable content that leads to the additionof warning labels is likely to result in hard moderation inter-ventions (i.e., user suspensions).

Take-aways.

The main take-away points from our analysis onwarning labels and Twitter users are:1. Most of the warning labels on Twitter, between Novem-ber 2020 and December 2020, were related to the 2020US elections. Also, we ﬁnd different temporal patternsin the use of warning labels, with a few of them beingshort-lived (less than a week) and some of them beinglong-lived (across several months).2. We ﬁnd warning labels used to inform users about manip-ulated multimedia, while some warning labels are in lan-guages other than English (i.e., Portuguese). This high-lights the efforts put in soft moderation interventions andsome of the challenges that exist (e.g., tracking claimsacross multiple information formats or languages).3. The majority of tweets with warning labels (72%) areshared by Republicans, while Democrats are more likelyto comment on tweets with warning labels using Twitter’squoting functionality (56% of the tweets compared to16% for Republicans). These results likely indicate that This is because we collect the tweets that veriﬁed accounts retweeted or quoted byunveriﬁed accounts (see Section 3). Average likes on each user's tweets0.00.20.40.60.81.0 C D F Warning labelsControl (a) Average retweets on each user's tweets0.00.20.40.60.81.0 C D F Warning labelsControl (b) Average replies on each user's tweets0.00.20.40.60.81.0 C D F Warning labelsControl (c) Average quoted tweets on each user's tweets0.00.20.40.60.81.0 C D F Warning labelsControl (d)

Figure 4:

Mean engagement metric for each user for tweets with warning labels and for tweets without warning labels.

Republicans are sharing more questionable content that iseventually ﬂagged or that Twitter devotes more resourcesto moderating content shared by Republicans, likely dueto claims about the safety and result of the 2020 US elec-tion.4. The continuous dissemination of potentially harmful in-formation that is annotated with warning labels can leadto hard moderation interventions like permanent user sus-pensions. We ﬁnd that 6 out of the 20 top users, in termsof sharing tweets with warning labels, were permanentlysuspended by Twitter as of January 9, 2021.

The goal of warning labels is to provide adequate informa-tion on tweets that include questionable content and might beharmful for users or society. Thus, we expect that users whosee content that is annotated with warning labels is likely tocause them be less willing to engage with or reshare such con-tent [13]. In this section, we aim to quantify the differenceson the engagement between tweets that include warning labelsand tweets that do not. Our empirical analysis can quantifyhow effective are the warning labels on Twitter, through thelens of engagement.For each user in our dataset, we extract two sets of tweets: 1) tweets that have warning labels; 2) a control dataset oftweets that do not have warning labels. Note, that we limitour analysis to the 115 users that had at least three tweets withwarning labels to make sure that our user analysis is not inﬂu-enced by one or two tweets. Then for each engagement signalin our dataset, we calculate the mean number that each groupof tweets (warning label tweets and control) had for eachuser. Our analysis takes into account four engagement signals:1)

Likes (how many times the tweet was liked by other users);2)

Retweets (how many times the tweet was retweeted by otherusers); 3)

Quotes (number of other tweets that retweeted thetweet with a comment); and 4)

Replies (number of replies thatthe tweet received).Fig. 4 shows the CDF of the average number oflikes/retweets/quotes/replies of tweets with and without warn-ing labels per user. For each engagement signal, we per-form two-sample Kolmogorov-Smirnov statistical signiﬁcancetests, ﬁnding that in all cases the engagement of tweets withwarning labels is signiﬁcantly different compared to tweetswithout warning labels ( p < . ) We observe that, for allfour engagement signals, users receive increased engagementon tweets that have warning labels.For likes (see Fig. 4(a)), we ﬁnd a median value of 10,303.9average likes per user for tweets with warning labels, whereas7 Fraction of mean engagement metric of tweets with warning labels over control0.00.20.40.60.81.0 C D F LikesRetweetsQuotesReplies

Figure 5:

CDF of the fraction of the mean engagement metric fortweets with warning labels over tweets without warning labels. for the control dataset we ﬁnd a median value of 3,834.3 (2.6xless than warning labels). For retweets (see Fig. 4(b)), we ﬁnda median value of 3,533 average retweets per user for tweetswith warning labels, while for the control dataset the medianvalue is only 1,129.2 (3.1x decrease compared to the warn-ing labels). For replies (see Fig. 4(c)), we ﬁnd a median valueof 235.7 replies for the control dataset, while for warning la-bels the median value increases to 494 (2.1x increase over thecontrol dataset). For quotes (see Fig. 4(d)), we ﬁnd a medianvalue of 350.6 average quotes per user for the warning la-bels datasets, whereas for the control dataset we ﬁnd a medianvalue of 122.9 quotes (2.8x decrease compared to warning la-bels). Also, from Fig. 4, we can observe that there is a smallproportion of users who have less engagement on the warninglabels dataset. To quantify the proportion of users who havemore engagement on control tweets over the tweets that hadwarning labels, we plot the fraction of the mean number ofeach engagement metric on tweets with warning labels overthe control dataset (see Fig. 5). When this fraction is below1, it means that the user’s control dataset had more engage-ment compared to the user’s warning labels dataset. We ﬁndthat 26%, 23%, 21%, 35% of the users had more engagementon their control tweets over the ones with warning labels forlikes, retweets, quotes, and replies, respectively.From our analysis thus far, it is unclear which set of usershave increased vs decreased engagement on tweets with warn-ing labels over the control dataset. To assess whether there is acorrelation between the overall engagement that user receivesand whether a user will receive increased or decreased en-gagement on tweets with warning labels, we plot the overallengagement (i.e., mean engagement metric for all the user’stweets) and the fraction of engagement on warning labels overthe control dataset (see Fig. 6). We observe that for all en-gagement metrics, most of the users that have on average highengagement on their content (i.e., over 1K likes, over 100retweets, over 100 quotes and over 100 replies) also receivean increased engagement on tweets with warning labels overthe control (note that the fraction for these users are in most ofthe times between 1 and 10).

Take-aways.

The key take-away points from our engagementanalysis are:1. Tweets with warning labels tend to receive more engage-ment compared to tweets without warning labels.2. We ﬁnd that 65%-79% (depending on engagement met-ric) of the users receive increased engagement on theirtweets that have warning labels compared to tweets with-out warning labels.3. By looking at the users that have increased vs decreasedengagement on tweets with warning labels compared tothe control dataset, we ﬁnd that most users that in generalhave high engagement have also increased engagementon tweets with warning labels.

In this section, we study how users interact with tweets thathave warning labels. To do this, we use Twitter’s quote func-tionality, where users can retweet a tweet with a comment.Speciﬁcally, we qualitatively analyze three sets of tweets;1) the 50 tweets that quote other tweets and Twitter includeswarning labels on both tweets; 2) 122 tweets (out of the 169)that quote other tweets and Twitter includes a warning label only on the top tweet (i.e., user’s comment). The 47 othertweets had a quoted tweet that was deleted when we tried toqualitatively assess them; 3) 150 randomly selected tweets thatquote another tweet that includes a warning label. We qualita-tively analyze all three set of tweets to understand how usersinteract with people that share content that is annotated withwarning labels, how users interact with questionable content(e.g., false claims), and how users discuss or perceive the ex-istence of warning labels on Twitter.

Intuitively, when both the quoted tweet and the commenttweet above include warning labels (e.g., Fig. 7), one expectsthat both tweets include information that is questionable orpotentially harmful. Here, we qualitatively analyze the tweetsin our dataset to verify if this is true and what are other caseswhere both the quoted and the comment tweet include warninglabels.

Reinforcing false claims.

The majority of the commentsabove the quoted tweets aim to retweet and reinforce the falseclaim that is included in the quoted tweet (86%, 43 out of the50). Two of them achieve this using a single word (“this” or“true”), two of them use videos, ﬁve of them achieve it bytweeting a single hashtag ( User fraction of mean likes on tweetswith warning labels over control10 U s e r m e a n nu m b e r o f li k e s (a) User fraction of mean retweets on tweetswith warning labels over control10 U s e r m e a n nu m b e r o f r e t w ee t s (b) User fraction of mean quotes on tweetswith warning labels over control10 U s e r m e a n nu m b e r o f q u o t e s (c) User fraction of mean replies on tweetswith warning labels over control10 U s e r m e a n nu m b e r o f r e p li e s (d) Figure 6:

Number of user followers and the fraction of the mean engagement metric for tweets with warning labels over the control dataset. tent should be labeled (i.e., “Say NO to Big Tech censorship!” and “Twitter labeled this tweet as disputed.... What exactly isTwitter disputing here?” ). These results further compound theﬁndings from [27].

Testing warning labels.

We ﬁnd one tweet where the usercommented with exactly the same content as the quoted tweet,likely to verify if his comment will eventually get a warninglabel.

Incorrect warning labels.

We ﬁnd one speciﬁc case wherethe warning labels were seemingly incorrectly put (see Fig. 7).Both the comment and the quoted tweet had the warning label“Get the facts about COVID-19” and both were including theterms oxygen and frequency/frequently. This likely indicatesthat Twitter employs automated means to attach warning la-bels and in some cases warning labels are incorrectly added tosome content.

Next, we investigate cases where users quote a tweet thathas no warning label and subsequently their comment tweetreceives a warning label (e.g., Fig. 8).

Commenting on news or real-world events and make falseclaims about the 2020 US elections.

We ﬁnd 45 tweets (36%)that comment on real-world events, news, or facts about theelection, and make false claims about the election (e.g., claimsabout election fraud).

Reinforcing questionable content.

In 18 tweets (14%) thecomment above reinforces questionable content that is in-cluded in the quoted tweet and makes the claim even morequestionable or harmful, hence getting ﬂagged by Twitter.

Inconsistencies on warning labels.

We ﬁnd several caseswhere there are inconsistencies with the inclusion of warn-ing labels. Speciﬁcally, we ﬁnd 28 cases (23%) where boththe quoted tweet and the comment hint to election fraud dur-ing the 2020 US elections, yet only the quoted tweet includesa warning label. In 7 of these cases, the comment makes asimilar claim with the quoted tweet with the difference that ituses a video instead of text. This highlights the challenges inﬂagging content on social media platforms and in particularﬂagging the same information across multiple diverse format(i.e., text, images, videos). Also, we ﬁnd another case with in-consistencies related to the use of language. In this case, thequoted tweet and the comment above share the same informa-9 igure 7:

Example of an incorrect addition of warning labels on Twit-ter.

Figure 8:

A quoted tweet that is not ﬂagged likely because is inFrench. tion but on different languages (quoted tweet in French andcomment above in English), yet only the English comment in-cludes a warning label (see Fig. 8).

Updates on warning labels.

During our qualitative analy-sis, we observed that Twitter occasionally updates the warn-ing labels on some tweets. In particular, we ﬁnd many in-stances where Twitter changed the warning label from “Mul-tiple sources called this election differently” to “Election ofﬁ-cials have certiﬁed Joe Biden as the winner of the U.S. Presi-dential election”. This highlights that Twitter continuously re-ﬁnes the use of warning labels and that is likely that they up-date warning labels on content to make the warning label moreclear or stronger.

Figure 9:

Example of a tweet that is mocking the author or the con-tent of the quoted tweet.

Here, we aim to understand how users interact with contentthat includes warning labels by looking into tweets that quotecontent that has warning labels (e.g., Fig. 9). We ﬁnd vari-ous behaviors ranging from mocking the author/content of thequoted tweet, debunking false claims that exist on the quotedtweet, reinforcing the false claims, and sharing opinions onTwitter’s warning labels. We provide more details below.

Mocking or sharing emotions about the author/content ofthe questionable or false claim.

We ﬁnd 37 tweets that mockthe content or the author of the tweet that includes a warninglabel. For instance, when Trump tweeted the tweet in Fig. 1,several users quoted that tweet and made absurd claims aboutthemselves like “I WON THE NOBEL PRIZE !” (see Fig. 9)and “Let me try... I AM BEYONCE!!” . Other users quotedtweets with warning labels to express their emotions on thecontent or the author of the tweet: 4 tweets calling the quotedtweet author a liar, 4 tweets calling the author a loser, 6 tweetsexpressing their disgrace for the content of the tweet, and 1tweet expressing embarrassment.

Debunking false claims.

We ﬁnd 19 tweets that debunk falseclaims that are in quoted tweets. For instance, a user quoted atweet shared by President Trump and wrote: “President Trumpjust tweeted again about claims of "secretly dumped ballots"for Biden in Michigan. This is false. These claims are based onscreenshots of a mistaken unofﬁcial tally on one site’s electionmap that was caused by a typo that was corrected in about 30minutes.”

Reinforcing false claims.

Similarly to the tweets where boththe quoted and the comment above had warning labels, weﬁnd 6 tweets that were reinforcing false claims that exist onthe quoted tweets.

Sharing opinion on warning labels.

We ﬁnd 6 tweets thatshare users’ opinions on warning labels and how effectivethey are. Speciﬁcally, one tweet just indicates that the quotedtweet includes a warning label, two tweets question how ef-fective the warning labels are and they request for strongerand more straightforward labels. Also, we ﬁnd three tweetsthat call for hard moderation interventions (i.e., user bans), inparticular asking Jack Dorsey (Twitter’s CEO) or Twitter Sup-port to ban the account of President Trump due to the spreadof false claims (e.g., “.@jack @Twitter make this lying stop!Your warnings of him lying just are not enough. ).Interestingly, we ﬁnd one tweet where the comment reinforces10he false claim included in the quoted comment by claimingthat Twitter tries to cover up the election fraud by using warn-ing labels.

Other.

The rest of the tweets we qualitatively analyzed aretweets were users shared their personal or political opinion onthe content of the quoted tweet or cases where users retweetedthe content of the quoted tweet either by paraphrasing or trans-lating the content to other languages.

The main take-away points from our qualitative analysisare:1. We ﬁnd various user interactions with tweets that havewarning labels such as debunking false claims, mockingusers that tweeted questionable content, or reinforcingfalse claims despite the inclusion of warning labels.2. Soft moderation intervention systems are not always con-sistent, as we ﬁnd several cases where content shouldhave warning labels but it does not. E.g., we ﬁnd caseswhere videos share the same information with textualtweets that include warning labels, however the tweetwith the video does not include a warning. Another ex-ample is with content across various languages. Thesecases show the challenges that exist on large-scale softmoderation systems.3. We ﬁnd a case where warning labels were incorrectlyadded likely due to the use of automated means. Thisshows the need to devise systems that rely on humanmoderators that get signals from automated means (i.e.,the human makes the ﬁnal decision), hence decreasingthe likelihood of such cases.

In this work, we performed one of the ﬁrst characterizations,based on empirical data, of soft moderation interventions onTwitter. Using a mixed-methods approach, we analyzed thewarning labels, the users that share tweets that have warninglabels, and the engagement that this content receives. Also, weinvestigated how users interact with such content and what arethe challenges and some inconsistencies that exist on large-scale soft moderation systems.Our user analysis showed that 72% of the tweets with warn-ing labels were shared by Republicans. This likely indicatethat Republicans were sharing more questionable content dur-ing the 2020 US elections or that Twitter devoted more re-sources to moderating content from Republicans. Neverthe-less, this ﬁnding prompts the need for greater transparencyby social media platforms to ease concerns related to censor-ship and possible moderation biases towards a speciﬁc politi-cal party [4].Our engagement analysis showed that tweets with warninglabels tend to receive more engagement compared to tweetswithout warning labels. This indicates that warning labelsmight not be very effective on politics-related content, hencereinforcing the results from [19]. This highlights the need todesign stricter soft moderation interventions for content that is more harmful than other, with the goal to reduce the spread ofit. Finally, our qualitative analysis showed that users furtherdebunk false claims using Twitter’s quoting mechanism, theymock the user/content of the tweet with warning label, andthey reinforce false claims (despite the existence of warninglabels). Also, we found some inconsistencies in content thatshould be ﬂagged across multiple information formats or lan-guages. This highlights the need to further study such mod-eration systems to fully understand how they work and whatare their caveats, with the goal to increase their effectiveness,consistency, fairness, and transparency.

Limitations.

Our work has some limitations. First, we ana-lyzed mainly politics-related content, shared during a shortperiod of time (two months), on a single platform (Twitter).Thus, it is unclear whether our results hold in contexts not re-lated to politics or to soft moderation systems that exist onother platforms like Facebook (as it has different platform af-fordances and design of soft moderation interventions). Also,our engagement analysis does not account for the content oftweets, hence we do not investigate whether the increased en-gagement on tweets with warning labels is due to dissemina-tion of more controversial or sensationalistic content that islikely to attract more users. Finally, since we do not know ex-actly when a soft moderation intervention happened and howthe engagement changes over time, we do not analyze whetherthe warning labels were added because the tweets receivedlarge engagement in advance.

We thank Jeremy Blackburn, Oana Goga, Krishna Gummadi,Shagun Jhaver, and Manoel Horta Ribeiro for fruitful discus-sions and feedback during this work.

References [1] L. Bode and E. K. Vraga. In related news, that waswrong: The correction of misinformation through relatedstories functionality in social media.

Journal of Commu-nication , 65(4):619–638, 2015.[2] E. Chandrasekharan, S. Jhaver, A. Bruckman, andE. Gilbert. Quarantined! Examining the Effects of aCommunity-Wide Moderation Intervention on Reddit. arXiv preprint arXiv:2009.11483 , 2020.[3] E. Chandrasekharan, U. Pavalanathan, A. Srinivasan,A. Glynn, J. Eisenstein, and E. Gilbert. You can’t stayhere: The efﬁcacy of reddit’s 2015 ban examined throughhate speech. In

CSCW arXiv preprintarXiv:2012.11055 , 2020.[7] S. Ghosh, N. Sharma, F. Benevenuto, N. Ganguly, andK. Gummadi. Cognos: crowdsourcing search for topicexperts in microblogs. In

SIGIR , 2012.[8] T. Gillespie.

Custodians of the Internet: Platforms, con-tent moderation, and the hidden decisions that shape so-cial media

Usenix Security , 2020.[12] J. Kulshrestha, M. Eslami, J. Messias, M. B. Zafar,S. Ghosh, K. P. Gummadi, and K. Karahalios. Quantify-ing search bias: Investigating sources of bias for politicalsearches in social media. In

CSCW , 2017.[13] P. Mena. Cleaning up social media: The effect of warninglabels on likelihood of sharing false news on facebook.

Policy & internet , 12(2):165–183, 2020.[14] E. L. Merrer, B. Morgan, and G. Trédan. Setting therecord straighter on shadow banning. arXiv preprintarXiv:2012.05101 , 2020.[15] J. Messias. Political Bias Inference API. https://github.com/johnnatan-messias/bias_inference_api, 2017.[16] P. L. Moravec, A. Kim, and A. R. Dennis. Appealingto sense and sensibility: System 1 and system 2 interven-tions for fake news on social media.

Information SystemsResearch , 31(3):987–1006, 2020.[17] S. Myers West. Censored, suspended, shadowbanned:User interpretations of content moderation on social me-dia platforms.

New Media & Society , 2018.[18] E. Newell, D. Jurgens, H. M. Saleem, H. Vala, J. Sassine,C. Armstrong, and D. Ruths. User Migration in OnlineSocial Networks: A Case Study on Reddit During a Pe-riod of Community Unrest. In

ICWSM , pages 279–288,2016.[19] B. Nyhan and J. Reiﬂer. When corrections fail: The per-sistence of political misperceptions.

Political Behavior ,2010. [20] G. Pennycook, A. Bear, E. T. Collins, and D. G. Rand.The implied truth effect: Attaching warnings to a subsetof fake news headlines increases perceived accuracy ofheadlines without warnings.

Management Science , 2020.[21] G. Pennycook, T. D. Cannon, and D. G. Rand. Prior ex-posure increases perceived accuracy of fake news.

Jour-nal of experimental psychology: general , 147(12):1865,2018.[22] M. H. Ribeiro, S. Jhaver, S. Zannettou, J. Blackburn,E. De Cristofaro, G. Stringhini, and R. West. Does Plat-form Migration Compromise Content Moderation? Ev-idence from r/The_Donald and r/Incels. arXiv preprintarXiv:2010.10397 , 2020.[23] C. M. Rivers and B. L. Lewis. Ethical research standardsin a world of big data.

F1000Research , 2014.[24] G. Rosen. An Update on Our Work to KeepPeople Informed and Limit Misinformation AboutCOVID-19. https://about.fb.com/news/2020/04/covid-19-misinfo-update/, 2020.[25] Y. Roth and N. Pickles. Updating our approach tomisleading information. https://blog.twitter.com/en_us/topics/product/2020/updating-our-approach-to-misleading-information.html, 2020.[26] H. M. Saleem and D. Ruths. The Aftermath of Dis-banding an Online Hateful Community. arXiv preprintarXiv:1804.07354 , 2018.[27] E. Saltz, C. Leibowicz, and C. Wardle. Encounterswith Visual Misinformation and Labels Across Plat-forms: An Interview and Diary Study to Inform Ecosys-tem Approaches to Misinformation Interventions. arXivpreprint arXiv:2011.12758 , 2020.[28] H. Seo, A. Xiong, and D. Lee. Trust It or Not: Effects ofMachine-Learning Warnings in Helping Individuals Mit-igate Misinformation. In

WebSci , 2019.[29] N. K. Sharma, S. Ghosh, F. Benevenuto, N. Ganguly, andK. Gummadi. Inferring who-is-who in the twitter socialnetwork.