No Echo in the Chambers of Political Interactions on Reddit
Gianmarco De Francisci Morales, Corrado Monti, Michele Starnini
NNo Echo in the Chambers of Political Interactions on Reddit
GIANMARCO DE FRANCISCI MORALES,
ISI Foundation, Italy
CORRADO MONTI,
ISI Foundation, Italy
MICHELE STARNINI,
ISI Foundation, ItalyEcho chambers in online social networks, whereby users’ beliefs are reinforced by interactions with like-mindedpeers and insulation from others’ points of view, have been decried as a cause of political polarization. Here,we investigate their role in the debate around the 2016 US elections on Reddit, a fundamental platform for thesuccess of Donald Trump. We identify Trump vs Clinton supporters and reconstruct their political interactionnetwork. We observe a preference for cross-cutting political interactions between the two communities ratherthan within-group interactions, thus contradicting the echo chamber narrative. Furthermore, these interactionsare asymmetrical: Clinton supporters are particularly eager to answer comments by Trump supporters. Besideasymmetric heterophily, users show assortative behavior for activity, and disassortative, asymmetric behaviorfor popularity. Our findings are tested against a null model of random interactions, by using two differentapproaches: a network rewiring which preserves the activity of nodes, and a logit regression which takesinto account possible confounding factors. Finally, we explore possible socio-demographic implications. Usersshow a tendency for geographical homophily and a small positive correlation between cross-interactions andvoter abstention. Our findings shed light on public opinion formation on social media, calling for a betterunderstanding of the social dynamics at play in this context.
Polarization is a defining feature of contemporary politics [1]. Polarization along party lines in theUnited States is on the rise [2], and 2016 elections have deepen the divide [3]. This polarization iseasy to observe online, and especially on social media, where people share their opinions liberally.Indeed, several platforms have been the subject of polarization studies, from Twitter, to YouTube,to Facebook [4–7]. On Twitter, in particular, political polarization can be exacerbated by social botsthat amplify divisive messages [8]. Polarized issues fall not only along ideological fault-lines [4],but can also touch on any collectively resonant topic [9].Several scholars have identified social media itself as a cause of polarization, citing “echo cham-bers” as a cause [10–12]. Echo chambers are situations in which users have their beliefs rein-forced due to repeated interactions with like-minded peers and insulation from others’ points ofview [13, 14]. The dynamics leading to echo chambers on online social networks have been associ-ated to selective exposure [15], biased assimilation [16], and group polarization [17]; in particular,Garrett [10] pointed to the pursuit of opinion reinforcement as a possible cause. Echo chambershave been empirically observed and characterized around several controversial topics, such asabortion or vaccines [13, 18]. Many have expressed concern that, as citizens become more polarized
Authors’ addresses: Gianmarco De Francisci Morales, ISI Foundation, Italy, [email protected]; Corrado Monti, ISI Foundation,Italy, [email protected]; Michele Starnini, ISI Foundation, Italy, [email protected] on:
Scientific Reports volume 11, Article number: 2818 (February 2021). DOI: 10.1038/s41598-021-81531-xThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the originalauthor(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images orother third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwisein a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use isnot permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from thecopyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.© 2021 Authors2045-2322/2021/2-ARThttps://doi.org/10.1038/s41598-021-81531-x a r X i v : . [ phy s i c s . s o c - ph ] F e b Gianmarco De Francisci Morales, Corrado Monti, and Michele Starnini about political issues, they do not hear the arguments of the opposite side, but are rather surroundedby people and news sources who express only opinions they agree with (e.g., Mark Zuckerberg [19]).However, the very existence of such echo chambers has been recently questioned [20, 21], as wellas their relation with the news feed algorithm of different social media platforms [14]. In particular,the effect of echo chambers in increasing political polarization has been put under scrutiny [21, 22].In this paper, we consider a highly polarized issue, the 2016 US presidential elections, andinvestigate the role of echo chambers on social media in exacerbating the debate. We focus onReddit as a platform where to study political interactions between groups with opposite views.Reddit is a social news aggregation website, in 2016, it was the seventh most visited website inUnited States, with more than 200 million visitors. It was a fundamental platform for the successof Donald Trump’s political campaign [23]. Given this context, we set out to characterize theinteraction patterns between opposing political communities on Reddit, by considering supportersof the two main presidential candidates, Clinton and Trump. Then, we look at the way they interactin a common arena of political discussion (i.e., the most popular subreddit related to politics), byreconstructing the information flow between users, as determined by their comments and replies.Are echo chambers responsible for the increased polarization on Reddit during the 2016 electoralcycle [24]?Our empirical investigation shows that there is no evidence of echo chambers in this case. On thecontrary, cross-cutting political interactions between the two communities are more frequent thanexpected. This heterophily is not symmetric with respect to the two groups: Clinton supporters areparticularly eager to answer comments by Trump supporters, an asymmetry that is not explainedby other confounders. Finally, we ask how these findings are modulated by socio-demographics,environmental characteristics of the Reddit users involved in the discussions, determined by geo-localization of such users. Our results point at a preference for geographical homophily in onlineinteractions: users are more likely to interact with other users from their own state. We finda statistically significant (albeit small) positive correlation between cross-interaction and voterabstention, which may support the hypothesis that exposure to cross-cutting political opinions isassociated with diminished political participation [25–27]. We obtain a similar result, although inthe negative direction, for living in a swing state: cross-interactions are suppressed in this case.These results have important implications in terms of our understanding of public opinionformation. It is often assumed that echo chambers can be pierced by increasing the amount ofcross-cutting content and interactions between the polarized sides [25, 28]. Instead, the presentstudy shows that polarization around a highly controversial issue, such as 2016 U.S. Presidentialelections [29], can co-exist with a large presence of cross-cutting interactions. The nature of theseinteractions might even increase polarization via “backfire effect” [30], as recently empiricallyfound for Twitter [22]. Alternatively, cross-cutting interactions and polarization might be the resultof growing underlying socio-economical divisions [31, 32]. Overall, our findings call for a betterunderstanding of the social dynamics at play in this context before suggesting technical solutionsfor such social systems, which could have unintended consequences.
We gather data from Reddit, through the Pushshift collection [33]. Reddit is organized in commu-nities, called subreddits , that share a common topic and a specific set of rules. Users subscribe tosubreddits, which contribute to the news feed of the user (their home ) with new posts. Inside eachsubreddit, a user can post , or comment on other posts and comments. Thus, the overall discussionunder each post evolves as a tree structure, growing over time. In addition, users can also upvote posts and comments to show approval; they manifest disapproval with a downvote . Each message o Echo in the Chambers of Political Interactions on Reddit 3 therefore is associated to a score , which is the number of upvotes minus the number of downvotesit has received.Given the two-party nature of the US political system and the polarized state of its politicaldiscourse, we approach the problem by modeling the interactions between groups of users labeledby their political leaning—specifically, according to which candidate they support in the 2016presidential elections. We then model such interactions as a weighted, directed network, wherenodes represent users and links represent comments between them. On top of political leaning, wealso characterize the users in terms of their activity, i.e., their propensity to engage in interactionswith other peers, and popularity, as given by the score assigned to their comments. In the remainderof this section we explain these three steps more in detail.
Political leaning of Reddit users
We identify the political leaning of Reddit users by looking at their posting behavior. With respectto the 2016 US presidential elections, users can be characterized as supporting the Democraticcandidate, Hillary Clinton, or the Republican candidate, Donald Trump. On Reddit, we identifyspecific subreddits dedicated to supporting the main presidential candidates. For Donald Trump, weselect the subreddit r/The_Donald ; for Hillary Clinton, we choose the subreddits r/hillaryclinton and r/HillaryForAmerica .The subreddit r/The_Donald was created in June 2015, at the beginning of Donald Trump campaignfor the Republican party nomination. It has been one of the largest online communities of Trumpsupporters, with 269 904 users in November 2016. Participation in this subreddit is a valid proxy tostudy Donald Trump support, as the rules of this subreddit explicitly state that the community is for“Trump Supporters Only”, and that dissenting users will be removed. As such, it has been previouslyused in literature to analyze the behavior of Trump supporters [34, 35]. r/hillaryclinton and r/HillaryForAmerica are the main communities that supported the Hillary Clinton’s campaign in2016. The former was created in 2015, while the latter was created in 2016 specifically to supporther presidential bid. In November 2016, they were able to attract 35 142 and 3025 Reddit users,respectively. Since the stated goal of these communities is to support her presidential campaign,and they forbid the use of the subreddit to campaign for other candidates, we consider activeparticipation in these communities as a good proxy for support for the Democratic party candidate.We call these subreddits the home communities for each candidate. We identify 117 011 users whoactively posted on r/The_Donald in 2016, and 13 821 on r/hillaryclinton and r/HillaryForAmerica .Given the massive use of Reddit as a political tool by Trump’s campaign [23], the difference in sizebetween the two communities is not surprising.Although these subreddits are dedicated to supporters of the candidate, we find that 3702 userspost in both subreddits (2 . . 𝑢 as a binary label 𝐿 𝑢 , assigned as Clintonsupporter ( 𝐿 𝑢 = 𝐶 ), if they post only on Clinton’s home community, or they posts on both commu-nities and have a larger average score on Clinton’s community, and as Trump supporters otherwise Gianmarco De Francisci Morales, Corrado Monti, and Michele Starnini
Table 1. Main properties of the Politics network: number of users 𝑁 , divided in Trump/Clinton supporters 𝑁 𝑇 / 𝑁 𝐶 , number of links 𝐸 , average degree ⟨ 𝑘 ⟩ , reciprocity 𝜌 (fraction of bidirectional links over the total),and total number of interactions 𝑊 . 𝑁 𝑁 𝑇 𝑁 𝐶 𝐸 ⟨ 𝑘 ⟩ 𝜌 𝑊
31 218 27 012 4206 500 030 16 .
02 0 . 𝐿 𝑢 = 𝑇 ). Our method identifies 10 240 users as Clinton supporters and 110 806 users as Trumpsupporters. Network of interactions on Politics
To study the interactions between the two sides, we need a community that is visited regularly byboth groups, but which is still topically related to politics and popular enough. The best candidatefor such a role is r/politics , since it is the largest political subreddit. We collect all submissionsand comments in the year 2016. From the collected comments, we reconstruct the network ofpolitical interactions among the users we previously identified. Among these users, 31 218 authoreda message on r/politics in 2016 and thus appear as nodes 𝑉 in the graph ( 𝑁 𝑇 =
27 012 Trumpsupporters, 𝑁 𝐶 = ( 𝑢, 𝑣 ) corresponds to user 𝑢 posting a comment as a response touser 𝑣 . The weight 𝑤 𝑢𝑣 corresponds to the number of such interactions from 𝑢 to 𝑣 . Note that thelink direction represents the interaction, and is opposite to the information flow (user 𝑢 shouldhave read what 𝑣 wrote to answer, but it is not guaranteed that 𝑣 will read 𝑢 ’s reply).In the Politics network, the probability to find a node labelled as 𝑋 ∈ { 𝐶,𝑇 } (henceforth, 𝑋 nodefor brevity) in the network is 𝑃 ( 𝑋 ) = 𝑁 𝑋 / 𝑁 , corresponding to 𝑃 ( 𝑇 ) ≃ .
87 for Trump, 𝑃 ( 𝐶 ) ≃ . 𝑋 node to a 𝑌 node reads 𝑃 ( 𝑋 → 𝑌 ) = 𝑊 𝑋𝑌 𝑊 ≃ Target
T C (cid:169)(cid:173)(cid:171) (cid:170)(cid:174)(cid:172) A u t h o r T 0.40 0.25C 0.25 0.10 , (1)where the rows of the matrix indicate the leaning of the author of a comment, and the columns theone of the target, 𝑊 =
716 765 is the total weight of the links in the network (that is, the number ofinteractions between all considered nodes), and 𝑊 𝑋𝑌 is the weight of directed links from 𝑋 nodesto 𝑌 nodes: 𝑊 𝑋𝑌 = ∑︁ 𝑢,𝑣 ∈ 𝑉 | 𝐿 𝑢 = 𝑋 ∧ 𝐿 𝑣 = 𝑌 𝑤 𝑢𝑣 . We denote with 𝑊 → 𝑋 = (cid:205) 𝑌 𝑊 𝑌𝑋 the number of interactions received by 𝑋 nodes ( (cid:205) 𝑌 denotes thesum over all possible label assignments to Y), and 𝑊 𝑋 → the ones originated by 𝑋 nodes. It followsthat (cid:205) 𝑋𝑌 𝑊 𝑌𝑋 = (cid:205) 𝑋 𝑊 𝑋 → = (cid:205) 𝑋 𝑊 → 𝑋 = 𝑊 .Diagonal elements of the matrix in Eq. (1) correspond to the interactions within political groups,off-diagonal to those across groups. The sum by rows (columns) of the matrix in Eq. (1) correspondsto the probability that an 𝑋 node initiates (receives) an interaction, 𝑃 ( 𝑋 →) = 𝑊 𝑋 → 𝑊 ( 𝑃 (→ 𝑋 ) = 𝑊 → 𝑋 𝑊 ). From Eq. (1), interactions across communities, or cross-interactions looks symmetric between o Echo in the Chambers of Political Interactions on Reddit 5 Clinton and Trump communities. However, joint probabilities do not take into account the differencein size between the two groups. This result stems from the fact that the probability that Clintonnodes initiate an interaction, 𝑃 ( 𝐶 →) = 𝑊 𝐶 → / 𝑊 ≃ .
35 is much larger than the fraction of Clintonsupporters in the network, 𝑁 𝐶 / 𝑁 ≃ .
13, which implies that Clinton supporters have much largerweighted out-degree than Trump ones.These characteristics can be further inspected by considering the conditional probability toobserve an interaction from an 𝑋 node to a 𝑌 node, given that the first node has leaning 𝑋 , 𝑃 ( 𝑋 → 𝑌 | 𝑋 ) = 𝑃 ( 𝑋 → 𝑌 ) 𝑃 ( 𝑋 →) = 𝑊 𝑋𝑌 𝑊 𝑋 → ≃ Target
T C (cid:169)(cid:173)(cid:171) (cid:170)(cid:174)(cid:172) A u t h o r T 0.62 0.38C 0.72 0.28 . (2)By looking at the columns of Eq. (2), in absence of homophilic or heterophilic effects, one wouldexpect elements of each column to be equal: given the author of a comment, the probability tointeract with the two groups would be equal, given only by the size of the group. Instead, we canobserve that Clinton supporters tend to interact more with Trump supporters (72% of interactions)than Trump supporters themselves within the community (62%). The same effect is visible forTrump supporters, who are more likely to interact with Clinton ones (38% of interactions) than theClinton community within itself (28% of interactions). These intuitions will be solidified in Section3, by comparing these values to a null model of random social interactions.Finally, we compare the average sentiment polarity of each type of interaction. To do so, first wemeasure the sentiment polarity (ranging from − Target
T C (cid:169)(cid:173)(cid:171) (cid:170)(cid:174)(cid:172) A u t h o r T 1.26 0.72C 1.10 5.75 × − . (3)First, we observe that interactions within Trump supporters are more negative than interactionswithin Clinton supporters (average sentiment of 0 . . . . Reddit score and activity of users
Political interactions on Reddit can be further characterized in terms of the score assigned to eachcomment or submission, and the activity of users, i.e., their propensity to engage in interactionswith other peers.In network terms, the activity of a user 𝑢 , 𝑎 𝑢 , can be measured by the total weight of out-goinglinks from node 𝑢 , which corresponds to the out-strength of node 𝑢 : 𝑎 𝑢 = (cid:205) 𝑣 𝑤 𝑢𝑣 Figure 1 (a) shows
Gianmarco De Francisci Morales, Corrado Monti, and Michele Starnini a P ( a ) Trump supportersClinton supporters s P ( s ) Trump supportersClinton supporters
Fig. 1. (a) Probability distribution of the activity 𝑎 of users in the Politics network, 𝑃 ( 𝑎 ) , plotted separatelyfor Clinton and Trump supporters. (b) Probability distribution of the average score 𝑠 of users in the Politicsnetwork, 𝑃 ( 𝑠 ) , plotted separately for Clinton and Trump supporters. the activity distribution 𝑃 ( 𝑎 ) in the Politics network, plotted separately for Clinton and Trumpsupporters, both with typical heavy-tailed behavior. The activity distribution of Trump supportersdecays more rapidly than for the Clinton ones, thus indicating a propensity to engage in a largernumber of interactions from Clinton supporters.The Reddit score of a comment is a measure of its popularity and, as such, it strongly de-pends on the subreddit where this comment is posted: popular comments posted on the subreddit r/The_Donald will be likely unpopular in subreddit where opposite political views dominate, suchas Clinton-oriented subreddits. We define the popularity of a user 𝑢 on a subreddit as the averagescore of their comments on that subreddit, 𝑠 𝑢 , and it will thus depend on the subreddit underconsideration. Figure 1 (b) shows the popularity distribution 𝑃 ( 𝑠 ) of users in the Politics network,separately for Clinton and Trump supporters. While the function form of the 𝑃 ( 𝑠 ) distributionis similar for Clinton and Trump supporters, comments by Clinton supporters have much largerscores on average, while the scores of Trump supporters span a larger interval of values. Thisobservation implies that the overall attitude on the politics subreddit is more favorable to commentsfrom Clinton than from Trump supporters, although users classified as Trump supporters are amuch larger set than Clinton supporters.This liberal bias in the general opinion of r/politics , however, does not seem to discourageTrump supporters from commenting in large numbers. Therefore, since we wish to study the twocommunities and how they interact, r/politics is the best arena to observe such interactions.Our set of users of interest is not a representative of r/politics users. Nevertheless, we are notinterested in studying the typical behavior of users in this subreddit, but in analyzing how thesetwo polarized communities interact in this arena. The fact that the two communities are notrepresentative of the politics subreddit is therefore of no consequence. To understand whether the empirical patterns observed in the previous section represent a consistentbehavior, we need to compare them with a theoretical null model of interactions. The simplestnull model for our data follows the hypothesis that the interactions are unaffected by the politicalleaning of users. In mathematical terms, the null model is a directed, weighted, random network(RN). This network is obtained by reshuffling links of the original network while preserving the in-and out- strength (weighted degree) of each node.In this network, the probability to observe a link from an 𝑋 node to a 𝑌 node is the productof two independent probabilities: the probability that an 𝑋 node initiates an interaction, and the o Echo in the Chambers of Political Interactions on Reddit 7 Trump Clinton
Target T r u m p C li n t o n A u t h o r -0.021 0.0210.021 -0.021 Trump Clinton
Target T r u m p C li n t o n A u t h o r -0.033 0.0330.062 -0.062 Fig. 2. Difference between empirical and random joint probabilities (left) and between empirical and randomconditional probabilities (right) of interaction in the Politics network, with respect to Trump and Clintonsupporters. We observe heterophily and asymmetry: off-diagonals are larger than zero and show distinctvalues (right-side plot). probability that a 𝑌 node receives an interaction, 𝑝 𝑅𝑁 ( 𝑋 → 𝑌 ) = 𝑊 𝑋 → 𝑊 → 𝑌 𝑊 ≃ Target
T C (cid:169)(cid:173)(cid:171) (cid:170)(cid:174)(cid:172) A u t h o r T 0.43 0.225C 0.225 0.12 . (4)The RN model preserves both the in- and out-strength sequence of nodes, while rewiring connec-tions among them, thus following the so called configuration model [37]. In this RN model, theconditional probability to observe a link from an 𝑋 node to a 𝑌 node, given that the first node hasleaning 𝑋 , reads 𝑝 𝑅𝑁 ( 𝑋 → 𝑌 | 𝑋 ) = 𝑊 → 𝑌 𝑊 ≃ Target
T C (cid:169)(cid:173)(cid:171) (cid:170)(cid:174)(cid:172) A u t h o r T 0.34 0.66C 0.34 0.66 . (5)In the following, we investigate deviations of observed data from this RN model, so to highlightspecific patterns of behavior. Deviation of conditional and joint probabilities
The difference between the empirical and random joint probabilities, given by Equations (1) and (4),respectively, is shown in Figure 2 (a). Cross-interactions between opposite political groups inthe Politics networks happen more often than expected in a RN model, with an odds ratio of1.195. This observation implies that there is a certain degree of heterophily in interactions, i.e., thepreference to interact with users from the the opposite political group. This result is surprising,considering the ample literature about homophily in social networks and especially about echochambers in political discussion on social media [4, 13]. The difference between empirical andrandom conditional probabilities, given by Equations (2) and (5), respectively, is reported in Figure 2
Gianmarco De Francisci Morales, Corrado Monti, and Michele Starnini a a a a a a a a Target a a a a a a a a A u t h o r s s s s Target s s s s A u t h o r Fig. 3. Difference between empirical and random conditional probabilities in the Politics network with respectto activity classes (on the left, with 𝑎 being the least active users) and to score classes (on the right, with 𝑠 being the lowest scores). We observe assortative behavior with respect to activity (higher values on diagonal)and disassortative behavior with respect to score (higher values off diagonal). (b). The Politics network is characterized by an asymmetry between the two political groups:Clinton supporters interact with Trump supporters more than the other way around, with a 6 . .
3% on the other.Given that Trump and Clinton supporters are different according to several metrics, there maybe some confounding effects in the interactions. In particular, we explore the roles of activity (thestrength of the node) and popularity (the average score of the node). To give a visualization similarto the ones in Figure 2, we define activity and score classes from the distributions shown in Figure 1.We manually define 4 score classes, from low to high score, and 8 activity classes, from low to highactivity. In the following we indicate for brevity a user of score class 𝑠 as a 𝑠 − user, and a user ofactivity class 𝑎 as a 𝑎 − user.Next, we define the empirical probability to observe a link from an 𝑎 − node to an 𝑎 ′ − node, 𝑃 ( 𝑎 → 𝑎 ′ ) , by applying Eq. (1) to activity classes, i.e., 𝑃 ( 𝑎 → 𝑎 ′ ) = 𝑊 𝑎,𝑎 ′ / 𝑊 where 𝑊 𝑎,𝑎 ′ is thenumber of interactions from 𝑎 − nodes to 𝑎 ′ − nodes. The same can be done for the conditionalprobability, by applying Eq. (2) to activity classes. For random interactions represented by the nullmodel, one can obtain the joint probability 𝑃 𝑅𝑁 ( 𝑎 → 𝑎 ′ ) and conditional probability 𝑃 𝑅𝑁 ( 𝑎 → 𝑎 ′ | 𝑎 ) .Figure 3 (left) shows the difference between empirical and random conditional probabilities forinteractions with respect to activity classes. Positive values indicate that a pair of classes interactsmore than expected by random chance; negative values indicate that they interact less than expected.Thus, we observe that users show an assortative behavior with respect to activity, i.e., user withhigh activity tend to interact with similarly-active users, and the same for users with low activity.We can also define the probability to observe a link from a 𝑠 − node to a 𝑠 ′ − node, 𝑃 ( 𝑠 → 𝑠 ′ ) ,by applying Eq. (1) to score classes. The same can be done for the conditional probability and forthe null model. Figure 3 (right) shows the difference between empirical and random conditionalprobabilities for interactions with respect to score classes. Users show a disassortative, slightlyasymmetric behavior with respect to scores: users with low score tend to interact with users withhigh score, viceversa is less slightly frequent. This is due to popular comments attracting manycomments from other users, mostly of low popularity.Given these characteristics of the network, it is important to understand how the differencesin behavior w.r.t. activity and score affect the heterophily and asymmetry results found for theleaning. To do so, we need a unified model that puts all these ingredients together, and confrontsthe resulting model vs the random network null model. The next section explains our approach totackle this task. o Echo in the Chambers of Political Interactions on Reddit 9 Logit regression model
So far, we have recognized the effects of different groups and user characteristics in the commentingbehavior. We now wish to assess whether the effects we have identified so far are statisticallysignificant, and what is the relationship among them. In particular, we want to see how the variablesof interest for our study (community interactions) are confounded by the popularity variables.There is no need to control for activity variables as our null model already takes that aspect intoaccount, as explained next. To do so, we design a logit model, in order to quantify the odds ofwithin-group and cross-group interaction, and the role of each variable. Such a model also allowsus to study the effect of geographical-based variables in the next section.Our logit model defines the probability of 𝑢 interacting with 𝑣 as a function of the features of 𝑢 and 𝑣 . We consider three sets of features: community interaction features (the political labelswe defined), confounding Reddit features (popularity metrics), and environmental features whichcapture real-world phenomena.For community features, similarly to the previous section, we consider the following set of binaryfeatures: • Clinton support : 1 if 𝐿 𝑢 = 𝐶 ( 𝑢 is a Clinton supporter), 0 otherwise. • Cross-group : 1 if 𝑢 and 𝑣 support different candidates ( 𝐿 𝑢 ≠ 𝐿 𝑣 ), 0 otherwise. • Clinton support, Cross-group : interaction feature between the previous variables, 1 if 𝐿 𝑢 = 𝐶 and 𝐿 𝑣 = 𝑇 , 0 otherwise.The combinations of these variables, that we assume to be independent, represent all four possiblescenarios of political labels (depicted in Figure 2(a)).We then add several variables to control for potential effects of different levels of popularitybetween supporters of Clinton and Trump. We operationalize these confounding features as follows: • Average score : average score obtained, separately, by 𝑣 ( target ) and by 𝑢 ( author ) ( 𝑠 𝑣 and 𝑠 𝑢 ). • Difference in average score : the absolute difference between the average scores of 𝑣 and 𝑢 ,namely | 𝑠 𝑣 − 𝑠 𝑢 | . • Fraction of positives : the fraction of comments with a positive score over the total number ofcomments, separately for 𝑣 ( target ) and for 𝑢 ( author ). • Difference in fraction of positives : the absolute difference between the fractions of 𝑢 and 𝑣 .We quantile-normalize these features so that they all display the same distribution, thus allowingto easily interpret the coefficients of our logit regression model.We choose to use both the average score and the fraction of comments with positive score becausethe scores have a heavy tailed distribution, therefore the average might be skewed. Conversely,the fraction of positives represents a summary statistic on a boolean property, and thus captures adifferent aspect of the data. We also add the differences to include link-based features that capturethe dynamics of the interaction between the specific author and target.To model the effects of these variables on comment creation, we use a logistic regression model.In this data set, we have 𝑁 =
31 218 users and 𝑊 =
716 765 comments. It is thus unfeasible to usethe complete set of negative links (i.e., pairs of users ( 𝑢, 𝑣 ) where 𝑢 did not interact with 𝑣 ). Wetherefore resort to sampling the set of negatives. This sampling procedure only changes the valueof the intercept coefficient, which is not of interest, and does not affect the interpretation of theimportant parts of the model (the coefficients of the community variables under study).It is important to carefully choose the sampling strategy, as it needs to faithfully represent thenull model we are considering. In the null model we presented in Section 3, we consider the authorand the target of the comments to be fixed, that is, we rewire the network while preserving thein- and out- strength of each node, equivalent to a configuration model [37]. In the samplingstrategy of negative links, we follow exactly the same procedure. We choose node 𝑢 with probability Table 2. Odds ratios obtained by logistic regression. Each column corresponds to a model with a specific setof variables: we have community features in the first column; then, we add to the model the first, the second,and both sets of confounding variables. All the variables shown here have a statistically significant ( 𝑝 < . )impact on the likelihood of writing a comment. Variable name Comm. Conf. 1 Conf. 2 Conf. 1+2Clinton sup. 0.942*** 0.918*** 0.936*** 0.911***Cross-group 1.195*** 1.172*** 1.191*** 1.165***Clinton sup., Cross-group 1.064*** 1.091*** 1.070*** 1.102***Avg. score (author) 1.166*** 1.174***Avg. score (target) 1.151*** 1.167***Diff. avg. score 1.213*** 1.228***Diff. frac. positive 0.497*** 0.498***Frac. positive (author) 1.260*** 1.247***Frac. positive (target) 1.221*** 1.195**** 𝑝 < .
05, *** 𝑝 < . proportional to their out-strength or activity 𝑎 𝑢 , node 𝑣 with probability proportional to theirin-strength, defined in Section 2. If the link between 𝑢 and 𝑣 exists, we discard it. This way, thenegative sample reflects exactly the null model presented in Section 3: the probability of consideringa pair of nodes is just the product of two independent probabilities – the probability that a node 𝑢 initiates an interaction, and the probability that a node 𝑣 receives it. The role of logistic regressionis thus to capture how the variables we consider alter the chances of observing a link 𝑢 → 𝑣 . Logit regression results
First, we present results for the model that only considers the community variable. We report theodds ratios obtained for this model in the first column of Table 2. All the coefficients are statisticallysignificant at the 0 .
1% level. These results confirm our analysis so far:(i) comments on r/politics are heterophilic: the likelihood of 𝑢 answering to 𝑣 increases when 𝑢 and 𝑣 support different candidates (odds ratio 1 . . . Target
T C (cid:169)(cid:173)(cid:171) (cid:170)(cid:174)(cid:172) A u t h o r T 0.925 1.105C 1.107 0.871 (6)The results obtained via this methodology are in line with those presented in the previous section(i.e., Figure 2). o Echo in the Chambers of Political Interactions on Reddit 11
Then, we control for the other Reddit variables we analyzed, in order to assess whether theseeffects are robust or if they can be explained by considering these other features. We report resultsfor models that include the average score, the fraction of positively scored comments, and both, inthe other columns of Table 2. Including these variables do not affect the neither the odds ratiosnor the statistical significance of the community features. This result confirms that the effectswe observe for political interactions across communities are not confounded by these other usercharacteristics.These control variables, in addition, show that users with a higher average score are more likelyboth to initiate and receive interactions. This relationship is heterophilic: a large difference inaverage score is associated to an increased likelihood. We can interpret average score as a proxymeasure for visibility: authors able to attract a large number of upvotes are also more prolific,and they also attract more comments, which explains the larger-than-one ratios. They also tendto trigger a response even from unpopular authors, thus explaining the heterophily. Again, theresults obtained with the logit regression model are in agreement with what observed by comparingempirical interactions with a random network (i.e., Figure 3).Authors with a larger fraction of positively scored comments also are more likely to send andreceive comments. In fact, since score is a measure of the social feedback from the community, thisshows that authors more aligned with the community tend to be more active in it, which is notsurprising. With this variable, however, the relationship is homophilic: users positively scored andusers negatively scored are more likely to comment each other. This result might be an effect ofthe community trying not to “feed the trolls” [38].
In this section, we investigate the connections between online interactions present in our dataand offline socio-demographic factors. In particular, our research question is the following: whichenvironmental factors are associated to higher levels of online cross-group interactions?
While wecannot prove any causal effect of the environmental factors, such observational study can provideinsights for theory generation and followup investigation.To answer our research question, we need a proxy of the socio-demographic environment ofusers. We choose US states as a proxy, as this is the finest spatial granularity we can reliably inferfor Reddit users. We infer the state of each user according to the information gathered by Balsamo,Bajardi & Panisson [39], which is based on the usage of local Reddit communities. Out of our setof 121 046 Reddit users, we are able to geo-localize 37% of them at the state level. Henceforth, werestrict our analysis only to comments authored by users in this set. State information is slightlyunbalanced (36% for Trump’s supporters, 43% for Clinton’s). The number of users we obtain foreach state closely resembles their population (Spearman R 0 . “same state” that indicates whether 𝑢 and 𝑣 come from thesame US state. Then, we select a set of macroscopic attributes of each state, that we hypothesizemight be related to their online behavior in a political community. These variables present a basicsketch of the environment of the authors, as represented by state they live in. In particular wefocus on the following political, economic, and demographic variables: • Swing state : a dummy variable which indicates whether the author lives in a US state thatobtained a 2016 presidential election margin of less than 4% for any candidate. • Clinton/Trump share : the shares of votes obtained by these two candidates in the 2016 electionsin the state where the author lives.
Table 3. Odds ratios obtained by logistic regression. Each column corresponds to a model with a specific setof variables. We indicate with three asterisks the statistically significant correlations ( 𝑝 < . ). Odds ratiosand significance are similar with or without the inclusion of the “same state” variables. Variable nameClinton sup. 0.909*** 0.884*** 0.883*** 0.883*** 0.883*** 0.884*** 0.883*** 0.883*** 0.884***Cross-group 1.172*** 1.156*** 1.133*** 1.165*** 1.117*** 1.162*** 1.163*** 1.122*** 1.144***Clinton sup., Cross-group 1.104*** 1.162*** 1.163*** 1.163*** 1.165*** 1.164*** 1.165*** 1.162*** 1.164***Avg. score (author) 1.176*** 1.174*** 1.175*** 1.175*** 1.175*** 1.174*** 1.174*** 1.174*** 1.173***Avg. score (target) 1.169*** 1.151*** 1.151*** 1.151*** 1.151*** 1.151*** 1.151*** 1.151*** 1.151***Diff. avg. score 1.232*** 1.208*** 1.209*** 1.209*** 1.210*** 1.209*** 1.209*** 1.209*** 1.209***Frac. positive (author) 1.244*** 1.208*** 1.211*** 1.211*** 1.207*** 1.209*** 1.207*** 1.212*** 1.208***Frac. positive (target) 1.192*** 1.179*** 1.179*** 1.179*** 1.178*** 1.179*** 1.178*** 1.180*** 1.178***Diff. frac. positive 0.501*** 0.488*** 0.488*** 0.488*** 0.488*** 0.488*** 0.488*** 0.488*** 0.488***Same state 1.245*** 1.243*** 1.241*** 1.241*** 1.242*** 1.239*** 1.240*** 1.241*** 1.242***Same state, Cross-group 0.811*** 0.811*** 0.810*** 0.809*** 0.812*** 0.815*** 0.817*** 0.809*** 0.814***Swing state 1.018*Swing state, Cross-group 0.965*Clinton share 1.005Clinton share, Cross-group 1.026Trump share 0.997Trump share, Cross-group 0.971*Non-vote share 0.977*Non-vote share, Cross-group 1.057***Unemployment 1.020*Unemployment, Cross-group 0.978Gini Coefficient 1.006Gini Coefficient, Cross-group 0.977Median Income 1.006Median Income, Cross-group 1.046*High school 1.002High school, Cross-group 1.008Odds ratios. * 𝑝 < .
05, *** 𝑝 < . • Non-vote share : fraction of the population that did not vote for either of the two majorcandidates in that state. • Unemployment : unemployment rate in 2016 in the state (source: US Bureau of Labor Statistics). • Gini coefficient : income inequality in the state, as measured by the Gini coefficient in 2010(data from the American Community Survey, conducted by the US Census Bureau). • Median income : median household income in 2016 (source: American Community Survey). • High school : fraction of the population with a high school degree or higher (source: 2013-2017American Community Survey).We normalize all numerical variables according to quantile normalization, so that they displaythe same distribution. The data related to voting behavior refers to the election of November 2016,while our comments are in general gathered from the whole electoral year. This process is coherentwith our hypothesis: is there any difference in behavior in the general population of a US state thatcould manifest itself also on social media, and that affected the electoral process?We build a logistic regression model for each one of these variables separately. In each model, be-side the studied variable, we also include the interaction feature between the selected environmentalvariable and the cross-group feature. This way, we capture whether the selected environmentalvariable has an effect on the likelihood of a user interacting with another user who supports a o Echo in the Chambers of Political Interactions on Reddit 13 different presidential candidate. We also include in each model all the Reddit-related variablesanalyzed so far, since they all emerged as significant. We repeat this analysis for each variable, withand without including the same state feature in each of the other models, which may act as a largeconfounder for the other environmental variables. We report only results including this variable,but the two cases are quantitatively similar.Table 3 shows the odds ratio and the statistical significance obtained by these models. We considersignificant for our hypothesis only models where both the analyzed variable and its interactionwith the cross-group variable is significant. Note that the inclusion of environmental factors doesnot alter significantly the odds ratio and the significance obtained by the community variables,thus further testifying for the robustness of our main results on heterophily and asymmetry.We summarize the findings obtained via the models in Table 3 as follows.(i) There is in fact a significant ( 𝑝 < . non vote . In particular, states where individuals aremost likely to abstain from voting Trump or Clinton in the presidential elections are alsothose where cross-party interactions on Reddit are most likely ( 𝑝 < . 𝑝 < . 𝑝 < . 𝑝 < .
05) between Trump-leaning and Clinton-leaning states:cross-group interactions are less likely in states with higher shares of Trump votes.(vi) The other variables we test do not show a statistically significant correlation: the likelihoodof interactions does not seem to be correlated with income inequality or education level.
In this work, we analyzed cross-group interactions between supporters of Trump and those ofClinton on Reddit during the 2016 US presidential elections. To this aim, we reconstruct theinteraction network among these users on the main political discussion community, r/politics .We find that, despite the political polarization, these groups tend to interact more across thanamong themselves, that is, the network exhibits heterophily rather than homophily. This findingemerges by comparison with a null model of random social interactions, implemented both as anetwork rewiring that preserves the activity of users, and as a logistic regression model for linkprediction which takes into account possible confounding factors.
Overall, our findings show that Reddit has been a tool for political discussion between opposingpoints of view during the 2016 elections. This behavior is in stark contrast with the echo chambersobserved in other polarized debates regarding different topics, on several social media platforms.While it has been argued that polarization on social media can result in the presence of echochambers, in which users do not hear opposing views, here we observe the reversed phenomenon:polarization is associated to increased interactions between groups holding opposite opinions.However, this relation between polarization and heterophily might not go beyond the digitalrealm. Reportedly, people perceive to encounter more disagreement in online than in offlineinteractions [42]. Further research should be dedicated to understanding whether the heterophilyfound in this social network is specific about the 2016 presidential elections, or it applies to politicsin general, and thus it might be a general feature of the Reddit platform [14].Several works in the literature have tried quantifying the presence of echo chambers in differentsocial media, although most of them have not studied Reddit. Conover et al. [4] analyze 250,000tweets from the 2010 U.S. congressional midterm elections. They measure the ratio between theobserved and expected numbers of links in a random model: they find that users are more likely tointeract people with whom they agree, with an odds ratio of 1 . − . . − . o Echo in the Chambers of Political Interactions on Reddit 15 the general population, besides the ones specifically addressed by the geolocation of users whichactually have the opposite causal direction. In this respect, a second limitation is evident: thestate-level aggregation is coarse-grained, and does not take into consideration differences betweenareas within the same state (e.g., urban vs rural). Nevertheless, the state-level is the finest spatialgranularity we can reliably infer for a large-enough sample of Reddit users.Despite these limitations, we find several interesting patterns regarding sociodemographic andenvironmental factors associated to an increase in likelihood of interactions between like-mindedindividuals [47]. To test this hypothesis, we analyzed the effect of different environmental factorsby inferring the state of each user according to the information gathered by Balsamo, Bajardi& Panisson [39]. We observed an effect of geographical homophily on the r/politics network:interactions between users located in the same state are significantly more likely than randomchance. At the same time, users from the same state are less likely to interact when they supportdifferent candidates. Therefore, we speculate that while different political views foster interactionsin the general case, geographical location might act more as a barrier.Among other environmental factors, we also observed a correlation between the likelihoodof cross-group connections and the fraction of the population that abstained from voting. Thisfinding suggests prudence when defining diversity of exposure as a normative goal. Similar resultswere measured through surveys, with Mutz [26] arguing that conflicts within one’s own socialenvironment can produce ambivalence, which can in turn reduce the intensity of support forone’s side. Interestingly, Mutz [26] observes such decrease even in the absence of new information.Further empirical evidence is needed to understand this phenomenon and which additional factorsmay drive it. We leave this question as important future work. While weighting our understandingof social media as a dissonating chamber or as an echoic one, we cannot escape the question ifdissonance damages the pursuit of common goals for political groups, or if it produces more realisticand less enthusiastic views of the available candidates. ACKNOWLEDGEMENTS
GDFM and MS acknowledge the support from Intesa Sanpaolo Innovation Center. The funder hadno role in study design, data collection and analysis, decision to publish, or preparation of themanuscript.
AUTHOR CONTRIBUTIONS
GDFM, CM, MS designed the study. GDFM, CM, MS analyzed and discussed the results. GDFM,CM, MS wrote the manuscript. All authors approved the final version of the manuscript.
COMPETING INTERESTS
The authors declare no competing interests.
REFERENCES
American Journal of Sociology , 408–446 (2008).[3] Jacobson, G. C. Polarization, gridlock, and presidential campaign politics in 2016.
The ANNALS of the AmericanAcademy of Political and Social Science , 226–246 (2016).[4] Conover, M. D., Ratkiewicz, J., Francisco, M., Gonçalves, B., Menczer, F. & Flammini, A. Political polarization on twitter.In
Fifth international AAAI conference on weblogs and social media (2011).[5] Garimella, K., De Francisci Morales, G., Gionis, A. & Mathioudakis, M. Quantifying Controversy in Social Media. In
WSDM ’16: 9th ACM International Conference on Web Search and Data Mining , 33–42 (2016). [6] An, J., Quercia, D. & Crowcroft, J. Partisan Sharing: Facebook Evidence and Societal Consequences. In
COSN’14: ACMConference on Online Social Networks , 13–24 (2014).[7] Bessi, A., Zollo, F., Del Vicario, M., Puliga, M., Scala, A., Caldarelli, G., Uzzi, B. & Quattrociocchi, W. Users polarizationon Facebook and Youtube.
PLOS ONE , e0159641 (2016).[8] Caldarelli, G., De Nicola, R., Del Vigna, F., Petrocchi, M. & Saracco, F. The role of bot squads in the political propagandaon Twitter. Communications Physics , 1–15 (2020).[9] Garimella, K., De Francisci Morales, G., Gionis, A. & Mathioudakis, M. Quantifying Controversy on Social Media. TSC:ACM Transactions on Social Computing , 3 (2018).[10] Garrett, R. K. Echo chambers online?: Politically motivated selective exposure among Internet news users. Journal ofComputer-Mediated Communication , 265–285 (2009).[11] Gilbert, E., Bergstrom, T. & Karahalios, K. Blogs are echo chambers: Blogs are echo chambers. In , 1–10 (2009).[12] Quattrociocchi, W., Scala, A. & Sunstein, C. R. Echo chambers on Facebook. Available at SSRN 2795110 (2016).[13] Garimella, K., De Francisci Morales, G., Gionis, A. & Mathioudakis, M. Political Discourse on Social Media: EchoChambers, Gatekeepers, and the Price of Bipartisanship. In
Proceedings of the 2018 World Wide Web Conference , 913–922(2018).[14] Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W. & Starnini, M. Echo Chambers on Social Media:A comparative analysis (2020). Preprint at https://arxiv.org/abs/2004.09603.[15] Klapper, J. T.
The Effects of Mass Communication . Free Press (1960).[16] Lord, C. G., Ross, L. & Lepper, M. R. Biased assimilation and attitude polarization: The effects of prior theories onsubsequently considered evidence.
Journal of personality and social psychology , 2098 (1979).[17] Baumann, F., Lorenz-Spreen, P., Sokolov, I. M. & Starnini, M. Modeling Echo Chambers and Polarization Dynamics inSocial Networks. Phys. Rev. Lett. , 048301 (2020).[18] Cossard, A., De Francisci Morales, G., Kalimeri, K., Mejova, Y., Paolotti, D. & Starnini, M. Falling into the Echo Chamber:The Italian Vaccination Debate on Twitter. In
ICWSM ’20: Fourteenth International AAAI Conference on Web and SocialMedia
Information, Communication & Society , 729–745 (2018).[21] Guess, A., Nyhan, B., Lyons, B. & Reifler, J. Avoiding the echo chamber about echo chambers. Knight Foundation (2018).[22] Bail, C. A., Argyle, L. P., Brown, T. W., Bumpus, J. P., Chen, H., Hunzaker, M. F., Lee, J., Mann, M., Merhout, F. &Volfovsky, A. Exposure to opposing views on social media can increase political polarization.
Proceedings of theNational Academy of Sciences , 9216–9221 (2018).[23] Karpf, D. Digital politics after Trump.
Annals of the International Communication Association , 198–207 (2017).[24] Nithyanand, R., Schaffner, B. & Gill, P. Online political discourse in the Trump era. arXiv:1711.05303 (2017).[25] Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science ,1130–1132 (2015).[26] Mutz, D. C. The consequences of cross-cutting networks for political participation.
American Journal of PoliticalScience
Political Psychology , 65–95 (2004).[28] Garimella, K., De Francisci Morales, G., Gionis, A. & Mathioudakis, M. Reducing Controversy by Connecting OpposingViews. In WSDM ’17: 10th ACM International Conference on Web Search and Data Mining , 81–90 (2017).[29] Johnston, R., Jones, K. & Manley, D. An increasingly polarized America.
Atlas of the 2016 elections
Political Behavior , 303–330(2010).[31] Duca, J. V. & Saving, J. L. Income inequality and political polarization: time series evidence over nine decades. Reviewof Income and Wealth , 445–466 (2016).[32] Storper, M. Separate Worlds? Explaining the Current Wave of Regional Economic Polarization. Journal of EconomicGeography , 247–270 (2018).[33] Baumgartner, J., Zannettou, S., Keegan, B., Squire, M. & Blackburn, J. The pushshift reddit dataset. arXiv preprintarXiv:2001.08435 (2020).[34] Flores-Saviaga, C. I., Keegan, B. C. & Savage, S. Mobilizing the trump train: Understanding collective action in apolitical trolling community. In Twelfth International AAAI Conference on Web and Social Media (2018). o Echo in the Chambers of Political Interactions on Reddit 17 [35] Massachs, J., Monti, C., De Francisci Morales, G. & Bonchi, F. Roots of Trumpism: Homophily and Social Feedback inDonald Trump Support on Reddit. In
Proceedings of the 12th ACM conference on Web Science (2020).[36] Hutto, C. J. & Gilbert, E. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In
Eighthinternational AAAI conference on weblogs and social media (2014).[37] Molloy, M. & Reed, B. A critical point for random graphs with a given degree sequence.
Random structures & algorithms , 161–180 (1995).[38] Bergstrom, K. “Don’t feed the troll”: Shutting down debate about community expectations on Reddit. com. FirstMonday (2011).[39] Balsamo, D., Bajardi, P. & Panisson, A. Firsthand opiates abuse on social media: monitoring geospatial patterns ofinterest through a digital cohort. In The World Wide Web Conference , 2572–2579 (2019).[40] Bastos, M., Mercea, D. & Baronchelli, A. The geographic embedding of online echo chambers: Evidence from theBrexit campaign.
PloS one (2018).[41] Chen, M. K. & Rohla, R. The effect of partisanship and political advertising on close family ties. Science , 1020–1024(2018).[42] Vaccari, C. How Prevalent are Filter Bubbles and Echo Chambers on Social Media? Not as Much as Conventional WisdomHas It (2018). URL https://cristianvaccari.com/2018/02/13/how-prevalent-are-filter-bubbles-and-echo-chambers-on-social-media-not-as-much-as-president-obama-thinks.[43] Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A. & Bonneau, R. Tweeting from left to right: Is online political communicationmore than an echo chamber?
Psychological science , 1531–1542 (2015).[44] Duggan, M. & Smith, A. 6% of online adults are reddit users. Pew Internet & American Life Project , 1–10 (2013).[45] Finlay, C. Age and Gender in Reddit Commenting and Success. Journal of Information Science Theory and Practice ,18–28 (2014).[46] Singer, P., Flöck, F., Meinhart, C., Zeitfogel, E. & Strohmaier, M. Evolution of Reddit: From the Front Page of theInternet to a Self-Referential Community? In Proceedings of the 23rd International Conference on World Wide Web ,517–522. New York, NY, USA (2014).[47] Gentzkow, M. & Shapiro, J. M. Ideological segregation online and offline.
The Quarterly Journal of Economics126