[PDF] Analysing Twitter Semantic Networks: the case of 2018 Italian Elections

Abstract

Social media play a key role in shaping citizens' political opinion. According to the Eurobarometer, the percentage of EU citizens employing online social networks on a daily basis has increased from 18% in 2010 to 48% in 2019. The entwinement between social media and the unfolding of political dynamics has motivated the interest of researchers for the analysis of users online behavior - with particular emphasis on group polarization during debates and echo-chambers formation. In this context, attention has been predominantly directed towards the study of online relations between users while semantic aspects have remained under-explored. In the present paper, we aim at filling this gap by adopting a two-steps approach. First, we identify the discursive communities animating the political debate in the run up of the 2018 Italian Elections as groups of users with a significantly-similar retweeting behavior. Second, we study the semantic mechanisms that shape their internal discussions by monitoring, on a daily basis, the structural evolution of the semantic networks they induce. Above and beyond specifying the semantic peculiarities of the Italian electoral competition, our approach innovates studies of online political discussions in two main ways. On the one hand, it grounds semantic analysis within users' behaviors by implementing a method, rooted in statistical theory, that guarantees that our inference of socio-semantic structures is not biased by any unsupported assumption about missing information; on the other, it is completely automated as it does not rest upon any manual labelling (either based on the users' features or on their sharing patterns). These elements make our method applicable to any Twitter discussion regardless of the language or the topic addressed.

Full PDF

AA NALYSING T WITTER S EMANTIC N ETWORKS : THE CASE OF

TALIAN E LECTIONS

Tommaso Radicioni

Scuola Normale SuperioreP.zza dei Cavalieri 7, 56126, Pisa (Italy)

Fabio Saracco

IMT School for Advanced StudiesP.zza S. Ponziano 6, 55100, Lucca (Italy)

Elena Pavan

University of TrentoVia Verdi 26, 38122, Trento (Italy)

Tiziano Squartini

IMT School for Advanced StudiesP.zza S. Ponziano 6, 55100, Lucca (Italy)September 8, 2020 A BSTRACT

Social media play a pivotal role in shaping citizens political opinion. According to the Euro-barometer,the percentage of EU citizens employing online social networks to access information, on a dailybasis, has increased from 18% in 2010 to 42% in 2017. The tight entwinement between social mediaand the unfolding of political dynamics has motivated the interest of researchers for the analysis of users online behavior - with particular emphasis on topics like group polarization during debatesand echo-chambers formation - to unveil the modes and the implications of online interactions forpolitical processes. In this context, where attention has gone predominantly towards the study ofonline relations between users, semantic aspects have remained under-explored. In the present paper,we aim at ﬁlling this gap by, ﬁrst, identifying the discursive communities that animate the politicaldebate in the run up of the 2018 Italian Elections and, then, studying the semantic mechanisms thatshape their internal Twitter discussions. We do so by monitoring, on a daily basis, the structuralevolution of the corresponding semantic networks. As our analysis points out, the supporters of thepolitical alliances present at the elections are characterized by a markedly different online behavior,in turn inducing semantic networks with different topological structures. The supporters of the right-wing parties alliance display a particularly active behavior condensed in a single, densely connectedcluster wherein discussions take place in conjunction with mediated events such as political talkshows. Daily semantic networks triggered by the users retweeting members of the 5 Star Movement(M5S) tend, instead, to be less centralized suggesting a ‘more distributed’ way of discussing avariety of themes, e.g. those raised as central by this new incumbent in the Italian political scenario.Lastly, semantic networks triggered by users retweeting members of the center-left alliance show acombination of clustered and distributed arrangements. K eywords Complex networks · Filtered projections · Semantic networks · Twitter · In the last decade, the emergence of social media platforms has brought fundamental changes to the way informationis produced, communicated, distributed and consumed. According to Eurobarometer, the percentage of Europeansemploying online social networks to access information on a daily basis has increased from 18% in 2010 to 42% in2017 [1]. A similar report concerning the US has showed that, as of August 2018, 68% of American adults retrieveat least some of their news on social media [2]. As social media facilitate rapid information sharing and large-scaleinformation cascades, what emerges is a shift from a mediated , top-down communication model heavily ruled by legacymass media to a disintermediated , horizontal one in which citizens actively select, share and contribute to the productionof politically relevant news and information, in turn affecting the political life of their countries [3]. In a context in a r X i v : . [ c s . S I] S e p hich political dynamics unfold with no solution of continuity within a hybrid social and political space, a multiplicityof studies that cut across traditional disciplinary boundaries have multiplied to uncover the many implications of usersonline behavior for political participation and democratic processes.The systematic investigation of online networks spurring from social media use during relevant political events hasbeen particularly helpful in this respect. Endorsing a view of online political activism as complementary to - and not asa substitution for - traditionally studied political participation dynamics [4], detailed and data-intensive explorations ofonline systems of interactions contributed to a more genuine and multilevel understanding of how social media relate topolitical participation processes.At the macro level, research endeavors have focused on mapping the structural and processual features of onlineinteraction systems to elaborate on the social media potential for fostering democratic and inclusive political debates. Inthis respect, speciﬁc attention has been paid to assessing grades of polarization and closure [5, 6] of online discussionswithin echo-chambers [7] with a view of connecting such features with the progressive polarization of political dynamics[8, 9].At the micro level, research has focused on disambiguating the different roles that social media users may play withinonline networks - particularly, to identify inﬂuential spreaders [10, 11, 12] responsible for triggering the pervasivediffusion of certain types of information, but also to elaborate on the redeﬁnition of political leadership in comparisonto more traditional ofﬂine dynamics [13]. More speciﬁcally, accounting for users behavior has helped to characterizethe different contributions that are delivered by actors who exploit to different extents social media communication andnetworking potentials [14, 15, 16]. In this way, concepts like ‘political relevance’ and ‘leadership’ get redeﬁned at thecrossroads between actors attributes and their actual engagement within online political discussions.Additionally, increasing attention to online dynamics has entailed dealing with non-human actors, such as platformalgorithms [17] and bots [18, 19, 20, 21] and their active contribution to online political dynamics. Consideration fornon-human actors follows from extant social sciences approaches such as actor-network theory and its invitation todisanchor agency from social actors preferring a recognition for actants , that is, for any agent capable of interveningwithin social dynamics [22]. Nonetheless, the pervasive diffusion of social media in every domain of human actionhas revamped attention for both platform materiality (i.e. the modes in which speciﬁc technological artifacts areconstructed and function) and for actants such as algorithms and bots, starting from the premise that online dynamicsare inherently sociotechnical and, thus, technology features stand in a mutual and co-creative relationship with theirsocial understanding and uses [23]. Shrouded in invisibility, platform algorithms and social bots actively ﬁlter and/orpush speciﬁc types of contents thus managing to manipulate users behaviors and opinions - in some cases acting as trueagents of disinformation [24, 25].In all its heterogeneity, this multiplicity of studies shares a common feature, insofar as it grounds in the study of networks of users and, thus, approaches the study of online political dynamics by privileging the investigation of directrelations amongst actors of different nature - individuals, organizations, institutions and even bots. Conversely, lessattention has gone towards the study of the contents that circulate during online political discussions and how thesecontents contribute to nurture collective political identities which, in turn, drive political action and participation.Studies that focus on social media content do exist and embrace a multiplicity of political instances, from electoralcampaigns to social movements and protests. For example, looking at Twitter, research has compared the content oftweets published by parties with the content of tweets sent by candidates [26], analyzed the contents of the 2017 Frenchpresidential election campaign [27], the online media coverage in the run up of the 2018 Italian Elections [28] andlooked at the keywords and hashtags related to the discursive communities as groups of users with a similar retweeting behavior . In doing so, we acknowledge the speciﬁc meaning that the retweetfeature holds within the Twitter platform - that is, an explicit recognition of the worthiness (for better or for worse) ofthe contents produced by other users [4].From a purely methodological point of view, our analysis grounds on a double ﬁltering procedure. As a ﬁrst step, weidentify the aforementioned discursive communities by circumscribing similar retweeting behaviors. More speciﬁcally,any two speciﬁc veriﬁed accounts are linked if retweeted by a signiﬁcantly large portion of non-veriﬁed users.2ctual discursive communities are, then, identiﬁed by running a traditional community detection algorithm on sucha conﬁguration. As a second step, we focus on each identiﬁed discursive community and derive the correspondingsemantic networks induced by the co-occurrences of hashtags within tweets sent by its members. Subsequently, weapply a core-periphery detection algorithm to isolate the main contents governing the collective discussion. Followingthis procedure, both the discursive communities and the semantic networks we trace are induced by the activity of users,hence overcoming the limitations of present studies, and allowing to approach the analysis of the behavioral as well asthe semantic aspect of online political debates simultaneously. Finally, we implement several ﬁltering algorithms [32]to detect the non-trivial content of our semantic networks, identifying the most debated subjects. Filtering ultimatelyallows us to identify the communication strategies adopted by the different discursive communities and the backbone ofthe narratives developed by the different groups.The paper is organized as follows. Section 2 describes data-acquisition and data-cleaning processes. In Section 3, wediscuss the methods we employ to project our bipartite user-hashtag networks on the hashtag layer and to derive ourcollection of semantic networks. Section 4, the results of our analysis are reported and discussed. Finally, in Section 5we draw a set of concluding remarks reﬂecting on the potentialities as well as on the critical aspects of the proposedapproach. Case study.

The current study focuses on the Twitter-induced discursive communities emerged during the weeks ofthe electoral campaign preceding the 2018 Italian Elections that took place on March the 4 th . The 2018 Italian Electionsrepresented a crucial political event that subverted the traditionally bipolar political competition characterizing theso-called Italian Second Republic. A radically novel scenario, with three poles of (political) attraction, emerged. Theﬁrst pole was represented by the centre-right coalition and eventually won the elections with 37% of the vote share.Interestingly, the victory of the right wing alliance was not led by Silvio Berlusconi’s party, Forza Italia, which obtainedonly a 14% of preferences and thus gave way to the nationalist Lega led by Matteo Salvini (17,4%). The second polewas represented by the center-left coalition led by Partito Democratico (Democratic Party, PD) with 18.7% of the voteshare - its worst result ever - under the leadership of the secretary and Prime Minister candidate Matteo Renzi. Thethird pole was represented by the populist party

Movimento Cinque Stelle (Five Star Movement, M5S) which was themost voted party with 32.7% of the vote share, under the leadership of Luigi Di Maio.Ultimately, the 2018 elections constituted a true electoral earthqauke triggered by two elements: on the one hand, theextreme predominance of themes such as immigration and criminality which eventually favored populist and right-wingparties over more traditional actors such as Forza Italia and the Democratic Party; on the other hand, a signiﬁcantcontribution to the shufﬂing of political balances was given also by the hybrid electoral campaign [33] put in place byall leaders and candidates who combined traditional and social media thus managing to engage voters with pervasiveand low-cost communication strategies.

Social media platform and relations selection.

Twitter is hardly the only social media platform that hosted politicallyrelevant discussions during the observation period, as all social media platforms has played an increasingly relevantpolitical role [34, 35]. Nonetheless, extant studies show that Twitter is particularly prominent during electoral dynamicsas it is the platform used by the vast majority of public ﬁgures (e.g. political leaders, journalists, ofﬁcial media accounts,etc.) to provide visibility to their statements . More speciﬁcally, in the Italian context, Twitter is recognized to playan ‘agenda setting’ effect onto the country mass media [37]. Hence, regardless of the fact that Twitter users are notrepresentative of the Italian population, looking at the discursive communities present on this platform entails lookingat a pivotal - albeit non representative - portion of political discussions that accompanied the electoral campaign.Amongst all types of interaction modes featured by the platform, the current study grounds on retweets, which weunderstand here as a baseline online relational mechanism that is particularly insightful when studying collectivepolitical identities. Indeed, as pointed out in [4], while mentions and replies in Twitter do sustain direct interactionand dialogue between users, retweets suggest a will to re-transmit contents produced by others. This, in turn, providesa more clear-cut indication of commonality and shared points of reference. Moreover, extant research suggests thatretweets proxy the actual political alliances better than mentions and replies - as shown in [38], where authors concludethat the use of retweets was more relevant than that of mentions to grasp the bipartisan nature of online debates in therun up of the 2010 US midterm elections. Data collection.

The extraction of Twitter data has been performed by selecting a set of keywords linked to theTwitter discussion about 2018 Italian Elections. In particular, each collected tweet contains at least one of the following For a review of how Twitter is used during electoral campaigns, see [36]. elezioni , elezioni2018 , , (literally, elections , elections2018 , , ). Datacollection has been realized by using the Twitter Search API across a period of 51 days, from th January 2018 to th March 2018, i.e. a time interval covering the entire period of the electoral campaign and the two weeks after theElection Day ( th March 2018).

Data cleaning.

The procedure described above led to a data-set containing 1.2 millions of tweets, posted by 123.210users (uniquely identiﬁed via their user ID). As in the Twitter environment hashtags play a central role, acting as thematic tags designated by the ‘hash’ symbol (cid:39) of the original data-set . Hashtags were then subjected to a mergingprocedure, i.e. any two hashtags have been considered as the same if found ‘similar enough’ and only the most presenthashtag has been retained. The similarity between hashtags has been quantiﬁed through the Levenshtein or edit distance (see Appendix A for more details), i.e. one of the most common sequence-based similarity measures [41] . As shownby a check a posteriori , our cleaning procedure misidentiﬁes less than of the ﬁnal list of hashtags. Data representation.

The lists of user IDs and merged hashtags were, then, used to deﬁne a bipartite network foreach day of our observation period, that is 51 bipartite networks in total. A bipartite network is deﬁned by two distinctgroups, or layers, of nodes, (cid:62) and ⊥ , and only nodes belonging to different layers are allowed to be connected. Thebipartite network corresponding to day t can be, thus, represented as a matrix M ( t ) whose dimensions are N (cid:62) × N ⊥ ,with N (cid:62) being the total number of users on day t and N ⊥ being the total number of hashtags (tweeted) during thatspeciﬁc day: m ( t ) iα = 1 if the user i has tweeted (at least once) the hashtag α on day t and 0 otherwise. The simplest way to obtain a monopartite projection out of a bipartite network is that of linking any two nodes belongingto the layer of interest (say, α and β , for the sake of illustration) if their number of common neighbors is positive . Sucha procedure yields an N ⊥ × N ⊥ adjacency matrix A whose generic entry reads a αβ = Θ[ V ∗ αβ ] (1)where V ∗ αβ = N (cid:62) (cid:88) j =1 m αj m βj (2)counts the number of nodes both α and β are linked to and Θ represents the Heaviside step function. The condition a αβ = Θ[ V ∗ αβ ] = 1 can be also rephrased by saying that α and β share at least one common neighbor.A more reﬁned method to obtain a monopartite projection is that of linking any two nodes if their number of commonneighbors is found to be statistically signiﬁcant [32]. More quantitatively, this second algorithm prescribes to comparethe empirical value V ∗ αβ = (cid:80) N (cid:62) j =1 m αj m βj with the outcome of a properly-deﬁned benchmark model - here, genericallyindicated with f - via the calculation of the p-valuep-value ( V ∗ αβ ) = (cid:88) V αβ ≥ V ∗ αβ f ( V αβ ) (3)and link α and β only in case it ‘survives’ a multiple hypotheses test (see Appendix B for more details). Such aprocedure outputs an N ⊥ × N ⊥ adjacency matrix A whose generic entry reads a αβ = 1 if nodes α and β are found tobe linked to the same neighbors a statistically signiﬁcant number of times and a αβ = 0 otherwise.The null models used as ﬁlters for the present analysis are the Bipartite Random Graph Model (BiRGM), the

BipartitePartial Conﬁguration Model (BiPCM) and the

Bipartite Conﬁguration Model (BiCM) [32] (see Appendix B for moredetails). In words, the BiRGM discounts the information provided by the total number of (re)tweets, the BiPCM Notably, this result indicates that only a reduced percentage of users employs at least one hashtag while tweeting, as alreadyreported elsewhere [40]. This procedure is needed to get rid of duplication of hashtags due to typos or different conjugations, artiﬁcially altering thestatistics of hashtagas

Our ﬁrst step to analyze the Twitter public discourse of the 2018 Italian electoral campaign is identifying communitiesof online users with a similar Twitter behaviour. To this aim, we have divided users into two groups, by distinguishingthe accounts veriﬁed by the platform from the non-veriﬁed ones. A bipartite network is, then, built as follows: averiﬁed and a non-veriﬁed user are linked if one of the two retweets the other one at least once during the observationperiod - notably, the retweeting action is mainly performed by non-veriﬁed users who share contents published by theveriﬁed ones. Then, the procedure described in Section 3 has been employed to project the bipartite network of retweetson the layer of veriﬁed users; to this aim, the BiCM ﬁlter has been employed. Lastly, a traditional community detectionalgorithm has been run to identify communities of veriﬁed users (see Appendix C for more details). These groupsconstitute discursive communities wherein the tweeting activity of the veriﬁed users triggers a discussion between thenon-veriﬁed users sharing similar contents Interestingly, the identiﬁed discursive communities provide a faithful representation of the alliances running at the 2018Italian Elections and of their supporters: • M5S : a community composed by accounts of politicians belonging to Movimento Cinque Stelle (e.g.

Danilo-Toninelli , luigidimaio ), the relative representatives institutions (e.g. M5S_Camera , M5S_Senato ) and usersengaging with all of them. The number of users belonging to this community is 11.151; • Center-right (CDX): a community of users composed by accounts of political parties composing the alliancebetween right-wing parties (e.g. forza_italia , LegaSalvini ), the relative politicians (e.g. renatobrunetta , matteosalvinimi ) and their institution representative groups (e.g. GruppoFICamera ) and users interacting withall of them. The number of users belonging to this community is 5.842; • Center-left (CSX): a rather heterogeneous community of users composed by accounts of political parties com-posing the center-left alliance (e.g. pdnetwork , PD_ROMA ), their politicians (e.g. giorgio_gori , matteorenzi )and journalists (e.g. vittoriozucconi , jacopo_iacoboni ) and users engaging with them. The number of usersbelonging to this community is 12.065. Activity level of discursive communities.

A ﬁrst step in the analysis of these discursive communities consists ofanalysing their volume of activity. As ﬁg. 1 shows, the evolution of the Twitter activity of the three discursivecommunities above is similar. Generally speaking, a ﬂat trend is followed by a steep rise, few days before the ElectionDay; then, a peak in the tweeting activity is registered in correspondence of the day after the Election Day, i.e. th March 2018. Afterwards, a rapid decrease of the number of tweets is observed: with respect to the value observedbefore the Election Day, the volume of CDX tweets decreases by (cid:39) , the volume of M5S tweets decreases by (cid:39) , the volume of CSX tweets decreases by (cid:39) . Notice that the volume of tweets characterizing the M5Scommunity is systematically larger than the volume of tweets characterizing both the CDX and the CSX communityacross the entire period considered here, an element that conﬁrms the reknown attitude of M5S supporters towardsleaning on digital media more extensively than other political groups.

Let us now move to the analysis of the monopartite projections on the layer of hashtags, i.e. what are hereby called hashtag by hashtag or semantic networks . In the present section we will discuss the results concerning the non-ﬁlteredprojections; in the next one, we will compare them with the ones concerning the ﬁltered projections. The account veriﬁcation procedure can be requested by any user to guarantee to other Twitter users that the account is authentic:for this reason, the veriﬁed accounts are usually composed by ‘entities’ such as politicians, journalists, political parties or media.This information can be easily retrieved by employing the Twitter APIs. As a side comment, we notice that also non-veriﬁed users can be ‘assigned’ to the communities of the veriﬁed ones, via thecomputation of the so-called polarization (see Appendix D and also [42]). after the Election Day, i.e. th March 2018. For what concerns the retweeting behavior of users, it is apparent that the volume of tweets characterizingthe M5S community is systematically larger than the volume of tweets characterizing the other two communities, acrossthe entire period considered here - an observation conﬁrming the attitude of M5S supporters towards digital media.

Analyzing the topics prominence.

A closer inspection of semantic networks allows us to engage more systematicallywith the contents discussed within discursive communities. A ﬁrst step in this direction can be made by exploring thenumber of nodes , which proxies the number of topics discussed by users, and their mean degree (i.e. the mean numberof neighbors per node), which proxies the (average) prominence of the topics that characterize the discussion. Resultsobtained in this step are shown in ﬁg. 2.The evolution of the number of nodes shows a rising trend up to the day after the Election Day, followed by a decreasingone. This indicates that the number of topics debated by users increases as th March 2018 approaches. Again, the M5Sseems to be the most active community with the largest number of debated topics throughout our observation period.The trend characterizing the M5S community is closely followed by the trend characterizing the right-wing alliance upto the end of February, when an inversion takes place and a rise in the number of topics debated by the supporters of thecenter-left alliance becomes clearly visible.The trend of the mean degree is, overall, much less regular: it is, in fact, characterized by several ‘bumps of activity’throughout the entire period. Notice how the use of hashtags, on a daily basis, is highly inﬂuenced by the so-called mediated events , i.e. events of social relevance broadcast by mass media (in particular on television): this is suggestedby hashtags like , , etc. (all referring to Italian political talk shows) pointing out thatTwitter users are active online during political debates hosted in TV shows. Such a behavior is particularly evident forthe CDX community, whose mean degree is characterized by a larger number of peaks. More speciﬁcally, the peaks areobserved in correspondence of the following TV shows • : interview of Silvio Berlusconi (one of the leaders of the right-wing alliance) at TG La7 (hashtags: , ); 6igure 2: Temporal evolution of the number of nodes (top panels) and of the mean degree (bottom panels) for eachcommunity-speciﬁc semantic network. Notice how the use of hashtags, on a daily basis, is highly inﬂuenced bythe so-called mediated events , i.e. events of social relevance broadcast by communication media. This behavior isparticularly evident for the CDX community whose activity increases in correspondence of TV shows where right-wingalliance politicians are hosted - a result seemingly conﬁrming the so-called group polarization phenomenon. • : Silvio Berlusconi and Matteo Salvini (both leaders of the right-wing alliance) are interviewed at‘Mezz’ora in più’ (hashtags: , ); • : Nicola Porro, an italian journalist, announces via a Facebook video, the topics that will be discussedon his TV show ‘Matrix’, broadcast by ‘Canale 5’, a TV channel owned by the Berlusconi family (hashtags: , ); • : interview of Silvio Berlusconi in the TV show ‘Che tempo che fa’ (hashtags: , ); • : interview of Silvio Berlusconi in the TV show ‘Dalla vostra parte’ (hashtags: , ); • : Matteo Salvini and Anna Maria Bernini (a right-wing alliance politician), are hosted in the TV show‘Quinta colonna’ broadcast by ‘Rete 4’, another TV channel owned by the Berlusconi family (hashtags: , ); • : Guido Crosetto and Maurizio Gasparri (both right-wing alliance politicians) are hosted in the TV show‘L’aria che tira’ (hashtag: ); • : interview of Michaela Biancoﬁore (a right-wing alliance politician) in the TV show ‘Tagadà’ (hashtags: , ).Beside conﬁrming that Twitter discussions can be inﬂuenced by external events, our results point out that Twitterdiscussions can be also triggered by external events. This is especially true for the CDX community whose Twitterdiscussions do not emerge ‘spontaneously’ but are driven by the aforementioned mediated events [43], seeminglyindicating that CDX users still conceive the TV as the reference medium when it comes to political processes. Identifying persistent topics.

A second step towards a closer understanding of the contents discussed within dis-cursive communities consists of quantifying the interest towards a topic throughout the entire period covered by ourdata-set. To this aim, we analyzed the hashtag persistence , H t , i.e. the percentage of days an hashtag is present in ourdata-set, on the non-ﬁltered projections. Results are reported in table 1. As it can be seen, the most persistent hashtags(in fact, the ones that are always present) are those concerning the name of political parties (i.e. , , )and political leaders (i.e. , , , ). Moreover, more persistent hashtags in all discursivecommunities refer almost in all cases to political actors and ﬁgures, more often than not of an opposing alliance. When itcomes to substantive electoral themes, instead, the three communities seem to hold a common interest for work-relatedmatters but also to concentrate on peculiar interests: migration ﬂows for the M5S, taxation for the CDX and the role ofEurope for the CSX. This ﬁnding has been observed for all discursive communities and it highlights the fact that the7 t M5S CDX CSX

Identifying central topics.

In order to identify topics that, regardless of their prominence and persistence, are morepivotal to the unfolding of the discussion, we computed hashtag betweenness centrality , a measure quantifying thepercentage of shortest paths passing through each hashtag, i.e. b γ = (cid:88) β ( (cid:54) = α ) (cid:88) α σ αβγ σ αβ (4)(where σ αβγ is the number of shortest paths between hashtags α and β passing through hashtag γ and σ αβ is the totalnumber of shortest paths between hashtags α and β ). In a sense, hashtag betweenness centrality provides an entry pointto identify strategic topics that ‘coordinate’ the discussion, as they bridge other topics that users do not directly connectwithin their tweets. Interestingly, the basket of the most strategic hashtags (i.e. , , , , , , , , ) is basically the same for all communities: hence, our analysis suggests thatthe main players of the 2018 Italian Elections embody crucial concepts for the deﬁnition of the narratives shaping thepolitical debates of all communities. Nonetheless, the speciﬁcities of each community are maintained when it comes toeconomic and societal issues. Analysis of triadic closures.

As discussions develop around ‘communities’ of topics, increasingly complex structuresare to be considered. To this aim, we have analyzed the presence and the persistence of the triadic closures , i.e. trianglesof connected hashtags . As it has been noticed, this kind of structures provides a deeper insight into the users tweetingbehavior, by revealing which concepts appear simultaneously in a discussion and measuring how often they do [44].This analysis is particularly insightful to distinguish the behavior of the three communities: as shown in table 2, whileboth the CDX and the CSX communities are characterized by triads of concepts exclusively about political leaders,parties and electoral slogans, the triads observed within the M5S community conﬁrm the greater concern of theirsupporters about themes of public interest (e.g. the issues of precarious labour, migrants landing, public research).Interestingly, we also notice that speciﬁc days exist in which an abundance of triadic closures is registered. For instance,on the ﬁrst day of the electoral silence, i.e. nd March 2018, users are particularly active in building narratives aroundelectoral slogans, while themes of public interest constitute the topic of tweets at the end of the electoral campaign (i.e.the last days of February). Finally, we notice that the abundance of hashtag triads tends to rise in correspondence with8 t M5S CDX CSX

31% (ricercapubblica, 8800precari,campagnaelettorale)27% (salvini, pd, m5s)24% (pd, italia, m5s) (pd, lega, m5s); (pd, dimaio,m5s)21% (cnr, campagnaelettorale, ricer-capubblica); (precari, campag-naelettorale, ricercapubblica);(politica, pd, m5s); (dimaio, pd,m5s); (lega, pd, m5s) (m5s, dimaio, salvini);(liberieuguali, pd,m5s); (m5s,berlusconi, pd); (usa, europa,russia); (savona, accettolasﬁda,poterealpopolo)20% (berlusconi, pd, m5s); (ot-toemezzo, pd, m5s); (salvini,pd, m5s); (berlusconi, politica,m5s); (centrodestra, pd, m5s);(italia, stopinvasione, italiani);(italia, stopislam, italiani); (cam-pagnaelettorale, piemonte, forza-italia); (m5s, pd, m5salgoverno) (salvini, pd, m5s) (pd, m5s, renzi); (pd, italia,m5s); (salvini, lega, m5s);(forzaitalia, pd, m5s); (fatti-nonparole, partitodemocratico,avanti); (berlusconi, salvini, pd);(salvini, m5s, berlusconi)

Dates T t value, i.e. the largest percentage of days a speciﬁc triadic closure ispresent in our data-set, is sensibly less than the number of days covered by our data-set (i.e. 51). Dates refer to the dayswith the largest number of triadic closures.mediated events, as observed for the mean degree: this is the case for the days th February 2018 (M5S community -i.e. when Luigi Di Maio was interviewed at the political talk show ‘diMartedì’), th February 2018 (CDX community -i.e. when Silvio Berlusconi was interviewed in a talk show called

Corrieredella Sera ) and th February 2018 (CSX community - i.e. when Laura Boldrini was interviewed at the radio show"Circo Massimo").

Analysis of degree-degree correlations.

A closer inspection of correlations between the degrees of the hashtagsallows us to elaborate more in depth on the ways prominent topics are connected to other ones, shaping broaderpolitically relevant narratives in the semantic network. To this aim, we consider the average nearest-neighbors degree (ANND), deﬁned, for the generic hashtag α , as the arithmetic mean of the degrees of the neighbors of a node, i.e. κ nnα = (cid:80) β ( (cid:54) = α ) a αβ κ β κ α , ∀ α (5)with κ α indicating the degree of the hashtag α in the considered monopartite projection. The degree-degree correlationstructure of a network can be easily inspected by plotting the κ nnα values versus the κ α values: a decreasing trend wouldlead one to conclude that correlations between degrees are negative - nodes with small degree would be ‘preferentially’connected to nodes with high degree and viceversa. Conversely, an increasing trend would signal that correlationsbetween nodes are positive - nodes with a small (large) degree would be ‘preferentially’ connected to nodes with asmall (large) degree. Thus, decreasing and increasing trends offer us an entry point to explore whether discussions inthe three communities tend to anchor onto some key themes that work as conversational drivers.The decreasing behaviour of the ANND throughout our data-set conﬁrms the presence of negative degree-degreecorrelations, i.e. the considered networks are disassortative (less prominent hashtags are connected with more9

5S community (a) 2018-02-19

CDX community (b) 2018-02-19

Figure 3: Analysis of the degree-degree correlations for two speciﬁc days, i.e. 2018-02-19 and 2018-03-05, on thenon-ﬁltered projections: as the trend of κ nnα reveals, the daily semantic networks are disassortative for all communities,i.e. nodes with small degree are (preferentially) connected to nodes with high degree and viceversa. As our analysisalso reveals, upon inspecting the behavior of the CDX and the CSX communities, groups of nodes with a (much) largervalue of the ANND appear: these clusters of hashtags constitute the core of the Twitter discussion in the correspondingcommunity, appearing in correspondence of speciﬁc events and disappearing the day after.prominent hashtags and viceversa); examples of the aforementioned trends are reported in ﬁg. 3. The days consideredhere, i.e. th February 2018 and th March 2018, have been chosen to highlight an interesting feature of our semanticnetworks: as it is clearly visible upon inspecting the behavior of the CDX and the CSX communities, groups of nodeswith a (much) larger value of the ANND appear. As it will become evident in what follows, these hashtags constitutethe core of the Twitter discussion in the corresponding community and are characterized by a dynamics on a dailytime-scale, i.e. they appear in correspondence of a speciﬁc event (in the case of the CDX community, the interview ofSilvio Berlusconi in a TV show; in the case of the CSX community, Laura Boldrini’s Twitter campaign) and disappearthe day after.As an additional analysis, we have also considered the clustering coefﬁcient , deﬁned as c α = (cid:80) γ ( (cid:54) = α,β ) (cid:80) β ( (cid:54) = α ) a αβ a βγ a γα κ α ( κ α − , ∀ α (6)and quantifying the percentage of neighbours of a given node α that are also neighbours of each other (i.e. thepercentage of triangles, having α as a vertex, that are actually realized). As shown in ﬁg. 4, decreasing trends areobserved: poorly-connected hashtags are strongly inter-connected and viceversa, thus suggesting the presence of several(inter-connected) ‘small’ discussions that are connected to a bunch of central topics (a network with these features isalso said to be hierarchical ). Besides, it is also apparent that the hashtags with a larger value of the ANND are also theones characterized by a larger value of the clustering coefﬁcient - conﬁrming the ‘coreness’ of such a bunch of topics.Taken altogether, these results suggest that all discursive communities revolve around a handful of few thematic drivers:overshadowed by the predominance of these issues, a set of niche discussions tend nonetheless to emerge, pointing outa variety of interests even within every discursive community. Semantic networks at the mesoscale: k-core decomposition.

Shifting perspective onto the mesoscale structure ofsemantic network helps us clarifying better in what consists the ephemeral power of thematic drivers we just identiﬁed.In the following we focus our attention on the th February 2018, but similar considerations hold true for other dailysemantic networks, such as the one of the th February 2018 and the one of the th February 2018. We implement theso-called k-core decomposition , a technique has been widely used to ﬁnd the structural properties of networks across a10

5S community (a) 2018-02-19

CDX community (b) 2018-02-19

Figure 4: Analysis of the network hierarchical structure for two speciﬁc days, i.e. 2018-02-19 and 2018-03-05,on the non-ﬁltered projections: as plotting the clustering coefﬁcient c α values versus the degree κ α values for thethree communities reveals, our daily semantic networks are hierarchical, i.e. poorly-connected hashtags are stronglyinter-connected and viceversa. Besides, it also conﬁrms that the nodes with a larger value of the ANND are also theones characterized by a larger value of the clustering coefﬁcient.broad range of disciplines including ecology, economics and social sciences [45]. The k-core decomposition can bedescribed as a sort of pruning process, where the nodes that have degree less than k are removed, in order to identify thelargest sub-graph of a network whose nodes have at least k neighbors. This method allows a ‘coreness’ score to beassigned to each node of the network which remains naturally subdivided into shells .Figures 5, 6, 7 show the the k-shell decomposition for the semantic networks of our discursive communities, for theday th February 2018: ﬁve k-shells, corresponding to ﬁve quantiles of the degree distribution, have been colored,conﬁrming the presence of a core of highly debated hashtags (the red one collecting the most prominent and intertwinedones). To inspect the presence of a sub-structure, nested into the discussion bulk, we have run the Louvain algorithmon the innermost k-shell of the semantic networks of our discursive communities. Their shell structure is indeed rich,as particularly evident upon considering the CSX and the M5S ones: indeed, several communities appear, seeminglyindicating that the discussions in which supporters of the CSX and the M5S parties are (more) engaged self-organizearound sub-topics.For what concerns the CSX community, they emerge as a consequence of factors as the Twitter campaign born insupport of the center-left candidate Laura Boldrini (revealed by the presence of hashtags such as and ), the visit of Matteo Renzi in Bologna (revealed by the presence of hashtags such as , , , ), the presence of Massimo D’Alema (another leader of the center-left alliance) in the radio show‘Circo Massimo’ (revealed by the presence of hashtags such as ).On the other hand, the presence of multiple debates within the bulk of the M5S semantic network is related to eventslike the electoral tour of Alessandro Di Battista who presented the M5S electoral program in the southern Italy regionnamed Basilicata (hashtags: , , , ), the presence of ajournalist of ‘Il Fatto Quotidiano’ (a newspaper supporting the M5S) invited in the TV show ‘Otto e mezzo’ (hashtags: , ), the presence of politicians supporting other coalitions in several TV shows such as‘Porta a Porta’, ‘Mezz’ora in più’ and ‘Dalla vostra parte’. The coreness of a node equals k if it is present in the k -core of the network but not in the ( k + 1) -core. Vasco Errani and Pier Ferdinando Casini were candidates for the Senate in Emilia-Romagna for Liberi e Uguali and for PartitoDemocratico, respectively. th February 2018: on the left plot, ﬁve k-shells for each semantic network are represented with different colorswhile, on the right plot, an expanded view of the innermost k-shell - basically overlapping with the properly deﬁnedcore individuated by the bimodular surprise - is represented. The compact bulk is triggered by the interview of SilvioBerlusconi in the TV show ‘Dalla vostra parte’.The observations above no longer hold true when the CDX-induced semantic network is considered: its innermost shellis, in fact, a compact group of topics that cannot be further partitioned.As a second observation, we notice that - when present - the communities partitioning the core are ‘hold together’by the nodes with largest betweenness centrality: as they coincide with the hashtags related to the name of politicalparties/leaders, the latter ones can be imagined to act as ‘bridges’ connecting different discussions. Generally speaking,this indicates that the concept of ‘most inﬂuential nodes’ can be found within the core of the networks of hashtags aswell, a result that complement the one about the inﬂuencial spreaders individuated within the networks of users [46].

Semantic networks at the mesoscale: the core-periphery structure.

In order to complement the analysis above,we have also implemented the method proposed in [47], prescribing to search for the network core-periphery partitionminimizing the quantity called bimodular surprise , i.e. S (cid:107) = (cid:88) i ≥ l ∗• (cid:88) j ≥ l ∗◦ (cid:0) V • i (cid:1)(cid:0) V ◦ j (cid:1)(cid:0) V − ( V • + V ◦ ) L − ( i + j ) (cid:1)(cid:0) VL (cid:1) ; (7)the quantity above is the multinomial version of the surprise , originally proposed to carry out a community detection exercise [47]. In the present case, L is the total number of links observed in our projections, while V is the total numberof possible links, i.e. V = N ( N − . The quantities marked with • ( ◦ ) refer to the corresponding core (periphery)quantities: for example, l ∗• is the number of observed links within the core, while l ∗◦ is the number of observed linkswithin the periphery. The presence of three different binomial coefﬁcients allows three different ‘species’ of links to beaccounted for: the binomial coefﬁcient (cid:0) V • i (cid:1) enumerates the number of ways i links can redistributed within the core,the binomial coefﬁcient (cid:0) V ◦ j (cid:1) enumerates the number of ways j links can redistributed within the periphery and thebinomial coefﬁcient (cid:0) V − ( V • + V ◦ ) L − ( i + j ) (cid:1) enumerates the number of ways the remaining L − ( i + j ) links can be redistributed between the two, i.e. over the remaining V − ( V • + V ◦ ) node pairs (see Appendix C for more details).The mesoscale structure characterizing all discursive communities consists of a bunch of (very) well-connected verticeslinked to a group of low-degree, loosely inter-linked nodes, see ﬁgs. 5, 6, 7. Such a structure is known as core-periphery and is present in many social, economic and ﬁnancial systems [48]. Remarkably, the nodes belonging to the innermost12igure 6: k-core decomposition of the semantic network for the non-ﬁltered projection of the CSX discursive communityon day th February 2018: on the left plot, ﬁve k-shells for each semantic network are represented with different colorswhile, on the right plot, an expanded view of the innermost k-shell - basically overlapping with the properly deﬁnedcore individuated by the bimodular surprise - is represented. Notice the presence of communities, found by running theLouvain algorithm and emerging as a consequence of factors as diverse as the Twitter campaign born in support of thecenter-left candidate Laura Boldrini, the visit of Matteo Renzi in Bologna, the presence of Massimo D’Alema (anotherleader of the center-left alliance) in the radio show ‘Circo Massimo’.shell overlap with the core ones computed with the multinomial version of the surprise , as proved by computing the

Jaccard index over the two sets of nodes.As a last comment, let us explicitly show the evolution of the number of nodes belonging to the core and to the peripheryfor each discursive community. As ﬁg. 8 shows, the core size is nearly constant throughout all the considered periodwhile the periphery size rises in correspondence of the Election Day, showing a peak in correspondence of the dayafter the Election Day (i.e. th March 2018). This behavior, common to all communities, seems to indicate that, as theElection Day approaches, the number of topics to discuss about increases - hence, the number of hashtags ‘populating’our semantic networks.

Let us now focus on the structural features of the ﬁltered projections. Before presenting the results of the analysis, let usbrieﬂy recall how the ﬁltering procedure works.Filtering lets the statistically signiﬁcant overlaps of hashtags measured on the real system emerge. More in detail, forany couple of hashtags, we count how many users are employing both: then, we consider as a benchmark an ensembleof networks that, on average, preserves some information of the real users-hashtags bipartite network. This informationcould be the total number of links (as in a bipartite Erdös-Rényi, or bipartite Random Graph,

BiRG ), the degree sequenceof the hashtag layer (Bipartite Partial Conﬁguration Model,

BiPCM ) or the degree sequence of both layers (BipartiteConﬁguration Model,

BiCM ). The more the constraints (i.e. the properties preserved by the ensemble), the more thedescription of the ensemble will be detailed, as compared to the real network and less links are going to be validated. Inthis sense, BiCM predictions on the amount of the overlaps will be closer to the ones observed in the real network thanthe ones of BiPCM or of BiRG. We then expect the number of validated overlaps, i.e. those non compatible with theexpectations of the various null model, to be higher for BiRG and BiPCM than for the BiCM. Otherwise stated, the The Jaccard index is a measure of similarity between two sets of elements and is deﬁned as the size of the intersection dividedby the size of the union of the two sets: J ( A, B ) = | A ∩ B || A ∪ B | th February 2018: on the left plot, ﬁve k-shells for each semantic network are represented with different colorswhile, on the right plot, an expanded view of the innermost k-shell - basically overlapping with the properly deﬁnedcore individuated by the bimodular surprise - is represented. Notice the presence of communities, found by running theLouvain algorithm. These emerge as a consequence of events as the electoral tour of Alessandro Di Battista (one ofthe M5S leaders), the presence of politicians in TV shows such as ‘Porta a Porta’, ‘Mezz’ora in più’ and ‘Dalla vostraparte’.sense of all this projection is to detect the overlaps that are too high to be simply explained by the constraints deﬁningthe chosen null-model.Such a procedure have been implemented in previous studies to detect the backbone of the network structure, ﬁlteringthe real system from random noise, and to highlight non trivial behaviours in the original system [32, 49]. In thepresent case, the aforementioned ﬁltering procedure will allows us to distinguish extremely viral hashtags from the onesbuilding proper narratives : the former ones, in fact, are just nodes with large degree, a feature that is compatible with(at least one of) the null models considered here, hence ﬁltered out by our procedure; the latter ones, on the other hand,will likely be constituted by groups of hashtags whose non-trivial co-occurrence will survive the ﬁltering procedure.Let us conclude this brief introduction with an important caveat: in our representation, the hashtags contained inthe retweets coincide with the ones of the (original) retweeted message. It may happen that a single viral messagecontains several hashtags used, however, only once (e.g. as observed during the electoral campaign of some of thecandidates, tweeting the names of the towns visited during a speciﬁc day). When this happens, the bipartite degree ofthe hashtags appearing once , but together , is given by the number of times the original message has been retweetedplus the contribution of the original message; hence, the number of times these hashtags co-occur basically coincideswith their degree, in turn inducing a high probability for their overlap to be validated - since the null-model ‘expects’the overlap to be distributed among all hashtags. In other words, the following analysis will not discard the hashtagsappearing in viral messages, still contributing to the development of a narrative.As mentioned above and as ﬁg. 9, 10, 11 show, the overall effect of adopting a ﬁltering procedure - irrespectively fromthe details of the employed one - is that of reducing the total volume of the semantic networks obtained. Differencesexist, instead, when coming to analyze the mean degree of nodes. Particularly interesting is the behavior of the semanticnetworks corresponding to the M5S discursive community whose mean degree is affected to a much lesser extent by theBiRGM-induced ﬁltering than the one characterizing both the CDX and the CSX discursive communities. This, in turn,implies that the information encoded into the total number of (re)tweets of the M5S bipartite user-hashtag networkis able to account for the co-occurrences between any two hashtags ‘less effectively’ than for the CDX and the CSXconﬁgurations. Equivalently, we may say that the structure of the M5S bipartite user-hashtag network requires lesstrivial information to be explained and the BiRGM (which is the simplest ﬁlter) recognizes it as signiﬁcant.14igure 8: Evolution of the number of nodes belonging to the core and to the periphery of each discursive community:the core size is nearly constant throughout all the data-taking period while the periphery size rises as the Election Dayapproaches (the peak appears in correspondence of the day after, i.e. the th March 2018). This behavior, common toall communities, is compatible with the following explanation: as the Election Day approaches, the number of topicsanimating the discussion increases - hence, the number of hashtags ‘populating’ our semantic networks.For what concerns the issue of the topics persistence, the ranking observed on the non-ﬁltered projection basicallycoincides with the ranking observed on the ﬁltered ones. Regarding topics centrality, instead, it has been observedthat the ﬁltering procedure with increasingly restrictive benchmarks involve the ‘emergence’ of previously screenedhashtags (e.g. , and , respectively for the semantic networks induced bythe CDX, CSX and M5S discursive communities).Let us now move to discuss the mesoscale structure of the ﬁltered projections: as usual, we will focus on one of thedays showing the richest structure, e.g. the th February 2018. Filtering the projections by adopting an increasinglyrestrictive benchmark has the effect of ‘sparsifying’ the projection while letting the less trivial structures emerge.Interestingly, the core portion of the semantic network corresponding to the M5S discursive community survives themost restrictive ﬁltering (i.e. the BiCM-induced one), signalling the presence of a non-trivial bunch of keywordsconstituting the bulk of the communication in that community (see ﬁg. 9). Moreover, basically all hashtags representingtopics of interest of the 2018 Italian electoral campaign persist.In the following we will describe with more details the main characteristics of the ﬁltered projections of the varioussemantic networks.

The pictorial representation of the semantic network of the center-right alliance relative to the 19 th of February can befound in ﬁg. 9.In the BiCM projection, i.e. the strictest one, few links survive. Actually, it is inappropriate to talk about communities,since we can ﬁnd only links connecting two otherwise isolated nodes, or small cliques and chains. Nevertheless, eventhese few hashtags carry important information regarding the keywords used in the election campaign. It is the caseof the cluster including stop the invasion ), law enforcement agencies ), asking for stronger countermeasures to the immigration ﬂuxesfrom Northern Africa, perceived as a danger for the security and for Italian cultural identity. On a similar topic, there isa clique composed by PD and idiots ) and th February. Finally, the last clusters present in the BiCM-induced projection are moreinstitutional: the ﬁrst contains let’s go back to govern ), ) and program ) and th February 2018 and of the projection of the same networkﬁltered according to the BiRG, the BiCM and the BiPCM, respectively. The BiCM lets only few hashtags survive,reading , , , .The BiPCM projection displays a structure in which the various sub-groups described above are reinforced (for instance,the chain What is the weather ) on air on the national television service, to promote their campaign.Fazio, while being notoriously a left-wing journalist, has run generally balanced interviews, but he has been accused to16e too leftist by the right wing and too accommodating towards right wing politicians by the left wing. Salvini refusedFazio’s invitation, publicly saying: ‘Fazio mi sta sulle palle’, literally

Fazio pisses me off . The hashtags life and family , related to the Italian pro-life movement) can be found in this cluster.In the BiRG validated projection, the clusters found in the previous stricter projections are merged together to form anetwork organised in two poles: the ﬁrst is more ‘institutional’ with keywords related to the election campaign of ForzaItalia (the political party of Silvio Berlusconi), including hashtags such as election campaign ), united we will win ), vote Forza Italia ); the second is linked to the other two right-wingparties with both the names of their leaders ( center-right ), the peripheries by

Overall, the communication strategy of the M5S is peculiar since the users tend to use a quite large number of hashtagsabout the discussion topics. Those hashtags are nearly the same across the various Twitter messages, since they werecopied and pasted from older messages on the same topics. In a sense, Twitter accounts in the M5S community appearto be more coordinated, giving their hashtags more visibility.Considering the tweets and retweets published on 19 th February, the M5S validated semantic networks of ﬁg. 10displays a rich structure, even in the BiCM projection, due to the tweeting behaviour described above. In particular,several clusters can be found, including the name of the opponents ( little plum , for Brunetta, representative of ‘Forza Italia’; let’s vote them away ; no, but vote them again , ironically targeting PDsupporters; this way or PD’s way , advertising the political successes of the M5S in the local administrations).Few clusters represent some events in the election campaign. For instance, a cluster following the election campaigntour of Di Battista, a representative of the Movement, appears in this projection. Even a clique advertising a livestreaming on Facebook can be observed, discussing the management of the health public system in the Lazio region(administrated by the PD), with the hashtags health public system ),

Naples investigation ; give them to the microcredit ), refund scandal ) refer to this issue. Other clusterstarget other harsh debates. It is the case of

In the case of the CSX community, the number of hashtags used is relatively small. With respect to the other discursivecommunities, the semantic network of the center-left discursive community validated by the BiCM (see ﬁg. 11) focusesmore on political subjects, as the pairs rights ) and rainbow ; both hashtags refers to the LGBTcivil rights, the rainbow ﬂag being one of the most recognizable sign of the movement. The center-left governmentof the Partito Democratico (PD, that is

Democratic Party ), in charge during the election campaign, established civilunions for same-sex or opposite-sex couples for the ﬁrst time in Italy), children ) and th February 2018 and of the projection of the samenetwork ﬁltered according to the BiRG, the BiCM and the BiPCM, respectively. The core portion of this networksurvives the most restrictive ﬁltering (i.e. the BiCM-induced one), indicating that basically all hashtags representingtopics of interest of the 2018 Italian electoral campaign persist.the subject of the Rohingya exodus in Myanmar and the condition of young children). Other clusters are related toinstructions for new adults to vote for the ﬁrst time ( ﬁrst vote ; how to vote ;

To not make mistakes ) and invoking for fact checking during the election campaign, with hashtags can be found, with the two hashtags come on! ) and choose PD ) and Monday ), good morning ) and

PD team ). The former one refers to an event lead by the secretary and Prime Minister candidate MatteoRenzi in Bologna, the latter one appeared in a message promoting a carpet-bombing election campaign, due to theprobable uncertainty of the result of the election.In the BiPCM validated projection more connections appear, developing more the various subjects, as in the case ofthe candidacy of Paolo Siani mentioned above:

Naples ), childhood ) merge with theprevious hashtags choose PD ;

PD team ; rights ,and so on) and the one related to the candidacy of Paolo Siani. A peripheral clique advertising the event in Venice of

Liberi e Uguali , a political party on the left of PD, can be found (

Venice ). Summarising, in all naïve projections we observe a rich structure, with a particularly evident core-periphery organisation.Due to the ﬁltering, such structure is progressively disintegrated, depending on the strictness of the benchmark used.While this disintegration is present in all discursive communities, the various groups display a different resilience, theM5S being the strongest one. Actually, the different behavior carry some information about the strategy followed by thevarious discursive communities during their political campaign.The validation procedure proposed in [32] projects the non-trivial co-occurrences of links in the bipartite networks,i.e. those that are not explained by the ingredients of the null-model used for ﬁltering. In this sense, the validatednodes in the projection are not necessarily those with, say, the highest (bipartite) degree, but those which groups toother hashtags in the semantic network more than expected by just looking at the original bipartite network. In termsof the interpretation of the phenomenon, the validated projections are saying that the more the validated links, themore hashtags are used to refer to a single subject, against the random superposition of ubiquitous slogans: this seemsto be the case of the M5S community (on speciﬁc topics, users in this community use a group of hashtags, typicallyalways the same ones, as they were copied and pasted from message to message, to increase the visibility of the topic)while this is true to a much lesser extent for the CDX discursive community (where the amount of nodes in the BiCMvalidated projection is extremely limited).The validation procedure allows us to focus on the least trivial connections, i.e. the links within related topics. In thisway, we are able to focus on the relevant information present in our dataset. It is then possible to observe differentthemes that shape the political communication of the various discoursive communities.In the CDX, a clear thematic distance is present between the far right (Matteo Salvini and Giorgia Meloni) leadersand the center-right politicians (Silvio Berlusconi and his party

Forza Italia ) in terms of topics and electoral sloganspromoted by those two poles. While the former insists on security issues related to migration ﬂuxes from NorthernAfrica, the latter tends to promote a united center-right alliance. There is an evident semantic diversiﬁcation withcompletely different keywords used in the tweets: the former uses more aggressive statements and bad words, while thesecond is more reassuring and institutional. The M5S projected semantic networks are especially rich in structure, dueto the strong usage of hashtags in this community. Most of them are referring to political opponents with nicknamesand ironic slogans. A great part of the ﬁltered semantic network is devoted to highlight the deceitfulness of the M5Sopponents. The CSX validated semantic networks is less rich, than the one of M5S, but more than CDX. Their majorfeature is to present mostly the events of the electoral campaign, their candidates at a national and regional level and theweaknesses of their political opponents.It is worth to notice that the peculiarities of the three discursive ﬁltered semantic networks are present in other dayswhich are not explicitly commented here (e.g. focusing on the speciﬁc pieces of news or events of one speciﬁc day). Paolo Siani is a physician, particular active in providing support, in collaboration with local NGOs, to children of the poorneighborhoods of Naples, at risk of being recruited in Camorra’s criminal activities. His brother was a journalist killed by Camorra. th February, we can still observe two different poles of the debates in the CDX, the one promoted bythe supporters of

Forza Italia and the one promoted by the supporters of far-right wing parties. As observed for the th of February, the two poles use different vocabularies and focus, respectively, on reforming taxation and labour or on themigration issues. Analogously, the M5S displays a cluster of people against the use of vaccines, few clusters againstsupposed quid pro quo between PD politicians and businessmen and some other teasing political opponents. Finally,the CSX focus on the election candidates presentation and few national problems (the increasing of inequality, povertyand the decreasing birth-rate). There are also mentions to the demonstration involving nearly thirty-thousand personsagainst neo-fascism held in Macerata on this day, with different level of attention, in all three semantic networks. Moredetails on the Twitter discussion about Macerata shooting can be found in [50]. Social media platforms have dramatically changed the way we approach news consumption: over the last years, in fact,they have become increasingly central during political events, especially electoral campaigns. In this respect, Twitterhas been shown to play a major role, thus attracting the attention of scientists from all disciplines.So far, however, researchers have mainly focused on users activity, paying little attention to semantic networks thestudy of which is particularly relevant to detect online debates, understand their evolution and, ultimately, inferring thebehavioral rules driving (online but also ofﬂine) electoral campaigns.In this paper, a comprehensive analysis of how political debates are born and grow around speciﬁc topics is carried out.Our study, concerning the Twitter activity of three different discursive communities (M5S, CDX and CSX), during theweeks before the 2018 Italian Elections, has been performed. We have exploited (cid:39) tweets which have, then, beenused to deﬁne networks of statistically signiﬁcant co-occurrences of hashtags at a daily time scale.One of the main ﬁndings of this paper concerns the way the topological structure of semantic networks "reacts" to theso-called mediated events, i.e. TV debates, the media coverage of ofﬂine events, etc. Interestingly enough, the threecommunities above react differently: while the topology of the CDX community is strongly dependent on these events(the mean degree of nodes increases in correspondence of speciﬁc TV shows), meaning that this audience is moreinvolved in the activity of retweeting during the appearance of a CDX political actor involved in the electoral campaignon TV, the activity of the M5S community appears to be much more "distributed". In fact, although M5S supporters aresensitive to TV shows as well, their retweeting activity is not focused on single events, a phenomenon whose possibleexplanation lies in the attitude of the supporters of this political party towards social media. Finally, the activity ofthe CSX community is characterized by a somehow "intermediate" behaviour: in fact, even in case mediated eventsaffect the Twitter discussion, the attention of the whole community is somehow "shared" among the various actorsconstituting the center-left alliance. Interestingly enough, one of the most frequent criticism to the Italian center-leftparties concerns the presence of internal conﬂicts, a signal that is captured by our analysis.Particularly insightful is the analysis of our semantic networks at the mesoscale: what emerges is the presence of a coreof topics, i.e. a densely-connected bulk of hashtags surrounded by a periphery of loosely inter-connected (sub-)topics.This indicates that daily semantic networks are characterized by few relevant hashtags to which other, less relevantones, attach. This structure is maintained even as the Election Day approaches: the main difference, in fact, seems to beconstituted by the larger number of peripheral themes entering into the discussion. The resilience of the core-peripherystructure is not the same for the various discursive communities. In the context of semantic networks, the fact that thesystem is more or less resilient to the ﬁltering implies that the various political groups have developed differently theirpolitical narrative, focusing their communications on few related terms per subject or mentioning a set of omnipresenthashtags in all messages. Even in the response to the ﬁltering procedure, M5S and CDX represent the two extremes,displaying respectively the most and the least resilient semantic network; the CSX stays in between.These differences are the effect of various styles used in writing posts. For what concerns the M5S, when targeting aspeciﬁc theme, several hashtags are used, that are subsequently used by other users writing on the same argument, inorder to be make the keywords and slogans more recognizable and visible; moreover, M5S still mentions the opponents(even using teasing nicknames), but focus more on the episode of misgovernment of the rivals. The CSX, instead, ismore intended in presenting its team, even if it still criticises its opponents.For what concerns the CDX, the number of hashtag per message is more limited, just focusing just on some viral ones;moreover, the CDX shows a diversiﬁed communication strategy, due to the different approaches of the various partiesin the alliance: right wing politicians are more aggressive towards opponents, while center-right ones tend to focus onunitive (for the coalition) keywords.In the near future, we plan to extend this study by considering not only the presence of hashtags in the textual informationof tweets but also of other keywords. As already noticed, in fact, the percentage of tweets in which at least one hashtags20igure 11: Mesoscale structure of (from bottom-right, clockwise) the non-ﬁltered non-ﬁltered projection of the semanticnetwork corresponding to the CSX discursive community on th February 2018 and of the projection of the samenetwork ﬁltered according to the BiRG, the BiCM and the BiPCM, respectively. The core portion of this network justpartially survives the most restrictive ﬁltering (i.e. the BiCM-induced one), while it is present in the less strict ﬁltering(the BiPCM and the BiRG induced), representing a structure in between the stronge persistence of the M5S semanticnetwork of ﬁg. 10 and the CDX one depicted in 9.is present amounts at just (cid:39) : additional information about the discussion on the 2018 Italian Elections can be, infact, retrieved and employed for analysing in greater detail our semantic networks.

E.P., F.S., T.R. and T.S. outlined the research question, interpreted the results and contributed equally to the writing andreviewing of the manuscript. F.S., T.R. and T.S provided the analysis tools and performed the analysis.21

Additional information

Competing interests : The authors declare no competing interests.T.R. is responsible for submitting on behalf of all authors of the paper.

A Deﬁning a similarity measure A sequence-based similarity quantiﬁes the cost of transforming a string x into a string y when the two strings are viewedas sequences of characters. String transformation is deﬁned by three elementary operations: 1) deleting a character, 2)inserting a character and 3) substituting one character with another [51]. The edit distance function d ( x, y ) aims atcapturing the mistakes of human editing, such as inserting extra characters or swapping any two characters. To mergeonly strings that are either misspelled or different by number (i.e. singular in place of plural and viceversa) we have setthe threshold for the maximum number of allowed differences between any two strings to 2. B Projecting and validating bipartite networks

As anticipated in the main text, the idea behind a ﬁltered projection is that of linking any two nodes belonging to thesame layer if found to be sufﬁciently similar . The steps to implement such a procedure are described below.

Quantifying nodes similarity.

First, a measure quantifying the similarity between nodes is needed. Given any twonodes (say, α and β ) we follow [32] and count the total number of common neighbors V ∗ αβ , i.e. V ∗ αβ = N (cid:62) (cid:88) j =1 m αj m βj = N (cid:62) (cid:88) j =1 V jαβ (8)the value of V jαβ being 1 if nodes α and β share the node i as a common neighbor and 0 otherwise. Notice that thenon-ﬁltered projection of a bipartite network corresponds to a monopartite network (say, A ) whose generic entry reads a αβ = Θ[ V ∗ αβ ] (i.e. it is an edge in correspondence of any non-zero value of V ∗ αβ ). Quantifying the statistical signiﬁcance of nodes similarity.

The statistical signiﬁcance of any two nodes similarityis quantiﬁed with respect to a bunch of null models which will be now derived from ﬁrst principles. To this aim, let usconsider the maximization of Shannon entropy S = − (cid:88) G ∈G P ( G ) ln P ( G ) (9)over the set of all, possible, bipartite graphs with, respectively, N (cid:62) nodes on one layer (say, users) and N ⊥ nodes on theother (say, hashtags). Since entropy-maximization will be carried out in a constrained framework, let us discuss eachset of constraints separately. Bipartite Conﬁguration Model.

The

Bipartite Conﬁguration Model (BiCM) represents the bipartite variant of theConﬁguration Model (CM). Upon introducing the Lagrangian multipliers θ and η to enforce the proper constraints (i.e.the ensemble average of the degrees of users and hashtags, respectively h ∗ i = (cid:80) α m iα , ∀ i and k ∗ α = (cid:80) i m iα , ∀ α )and ψ to enforce the normalization of the probability, the recipe prescribes to maximize the function L = S − ψ (cid:34) − (cid:88) G ∈G P ( G ) (cid:35) − N (cid:62) (cid:88) i =1 θ i [ h ∗ i − (cid:104) h i (cid:105) ] − N ⊥ (cid:88) α =1 η α [ k ∗ α − (cid:104) k α (cid:105) ] (10)(with respect to P ( G ) . This leads to P ( G | θ , η ) = e − H ( G ) Z = N (cid:62) (cid:89) i =1 N ⊥ (cid:89) α =1 (cid:18) x i y α x i y α (cid:19) m iα (cid:18)

11 + x i y α (cid:19) − m iα = N (cid:62) (cid:89) i =1 N ⊥ (cid:89) α =1 p m iα iα (1 − p iα ) − m iα (11)where x i ≡ e − θ i and y α ≡ e − η α . The quantity p iα = x i y α x i y α can be interpreted as the probability that a link connectingnodes i and α is there; the matrix of probability coefﬁcients { p iα } induces the expected values (cid:104) h i (cid:105) = (cid:80) α p iα , ∀ i and22 k α (cid:105) = (cid:80) i m iα , ∀ α and can be numerically determined by solving the set of N (cid:62) + N ⊥ equations (cid:104) h i (cid:105) = h ∗ i , ∀ i and (cid:104) k α (cid:105) = k ∗ α , ∀ α .According to the BiCM, the presence of each V jαβ can be described as the outcome of a Bernoulli trial: f Ber ( V jαβ = 1) = p αj p βj , (12) f Ber ( V jαβ = 0) = 1 − p αj p βj . (13)The independence of links implies that each V αβ is the sum of independent Bernoulli trials, each one characterized by adifferent probability. The behavior of such a random variable is described by a Probability Mass Function (PMF) calledPoisson-Binomial. Bipartite Partial Conﬁguration Model.

The BiCM constrains the degrees of both the users and the hashtags. Such amodel can be ‘relaxed’ by limiting ourselves to constrain the degrees of the nodes belonging to the layer of interest- in this case, the degrees of the hashtags. Upon ‘switching off’ the user-speciﬁc constraints, one end up with asimpliﬁed version of the BiCM, characterized by a generic probability coefﬁcient reading p iα = h ∗ α N (cid:62) , in turn leadingto the expression f Ber ( V jαβ = 1) = h ∗ α h ∗ β N (cid:62) . The evidence that the latter expression does not depend on j simpliﬁes thedescription of the random variable V αβ , now obeying a PMF called Binomial, i.e. f BiPCM ( V αβ = n ) = (cid:18) N (cid:62) n (cid:19) (cid:18) h ∗ α h ∗ β N (cid:62) (cid:19) n (cid:18) − h ∗ α h ∗ β N (cid:62) (cid:19) N (cid:62) − n . (14) Bipartite Random Graph Model.

The BiRG (Bipartite Random Graph) model is the bipartite variant of the traditionalRandom Graph Model. As for its monopartite counterpart, the probability that any two nodes are linked is equal for allthe nodes and reads p iα = N (cid:62) N ⊥ L ≡ p BiRG (where L is the empirical number of ‘bipartite’ edges). In this case, we have f Ber ( V jαβ = 1) = p BiRG and the PMF describing the behavior of V αβ is a Binomial, i.e. f BiRG ( V αβ = n ) = (cid:18) N (cid:62) n (cid:19) ( p BiRG ) n (1 − p BiRG ) N (cid:62) − n . (15) Validating the monopartite projection.

The statistical signiﬁcance of the similarity of nodes α and β , thus, amountsat computing a p-value on one of the aforementioned probability distributions, i.e. the probability of observing a numberof V-motifs greater than, or equal to, the observed one:p-value ( V ∗ αβ ) = (cid:88) V αβ ≥ V ∗ αβ f ( V αβ ) . (16)After this procedure is repeated for each pair of nodes, an N ⊥ × N ⊥ matrix of p-values is obtained. The choice ofwhich p-values to retain has to undergo a validation procedure for testing multiple hypotheses at the same time: here,the False Discovery Rate (FDR) procedure is used. The m p-values (in our case, m = N ⊥ ( N ⊥ − / ) are, ﬁrst, sortedin increasing order, p-value ≤ . . . ≤ p-value m and, then, the largest integer ˆ i satisfying the conditionp-value ˆ i ≤ ˆ itm (17)(where t represents the single-test signiﬁcance level - in our case, set to 0.05) is individuated. All p-values that are lessthan, or equal to, p-value ˆ i are kept, i.e. all node pairs corresponding to those p-values will be linked in the resultingmonopartite projection. 23 Analysing a network mesoscale structure

Community detection: the Louvain algorithm

After the daily monopartite user networks have been obtained, the Louvain algorithm [52] has been run to detect thepresence of communities. This algorithm works by searching for the partition attaining the maximum value of themodularity function Q , i.e. Q = 12 L (cid:88) i,j (cid:20) a ij − k i k j L (cid:21) δ c i ,c j (18)a score function measuring the optimality of a given partition by comparing the empirical pattern of interconnectionswith the one predicted by a properly-deﬁned benchmark model. In the expression above, a ij is the generic entry of thenetwork adjacency matrix A , the factor k i k j L is the probability that nodes i and j establish a connection according tothe Chung-Lu model, c is the N -dimensional vector encoding the information carried by a given partition (the i -thcomponent, c i , denotes the module to which node i is assigned) and the Kronecker delta δ c i ,c j ensures that only thenodes within the same modules provide a positive contribution to the sum. The normalization factor L guaranteesthat − ≤ Q ( c ) ≤ . Moreover, a reshufﬂing procedure has been applied to overcome the dependence of the originalalgorithm on the order of the nodes taken as input. Core-periphery detection

Core-periphery detection can be carried out upon adopting the method proposed in [47] and prescribing to search forthe network partition minimizing the quantity called bimodular surprise , i.e. S (cid:107) = (cid:88) i ≥ l ∗• (cid:88) j ≥ l ∗◦ (cid:0) V • i (cid:1)(cid:0) V ◦ j (cid:1)(cid:0) V − ( V • + V ◦ ) L − ( i + j ) (cid:1)(cid:0) VL (cid:1) ; (19)as anticipated in the main text, L is the total number of links, while V is the total number of possible links, i.e. V = N ( N − . The quantities marked with • ( ◦ ) refer to the corresponding core (periphery) quantities, i.e. V • is thetotal number of possible core links, V ◦ is the total number of possible periphery links, l ∗• is the number of observedlinks within the core and l ∗◦ is the number of observed links within the periphery.From a technical point of view, S (cid:107) is the p-value of a multivariate hypergeometric distribution, describing the probabilityof i + j successes in L draws (without replacement), from a ﬁnite population of size V that contains exactly V • objectswith a ﬁrst speciﬁc feature and V ◦ objects with a second speciﬁc feature, wherein each draw is either a ‘success’ ora ‘failure’: analogously to the univariate case, i + j ∈ [ l ∗• + l ∗◦ , min { L, V • + V ◦ } ] . The method outputs the moststatistically signiﬁcant core-periphery structure compatible with the network under analysis. D Computing the polarization of non-veriﬁed users

Let C c , with c = 1 , , , indicate the set of (both veriﬁed and non-veriﬁed) users belonging to community c and N α , with α = 1 , , the set of neighbours of veriﬁed users belonging to the community c = α . A non-veriﬁed user polarization is deﬁned as ρ α = max c { I αc } (20)where I αc = | C c ∩ N α || N α | . (21)As it has been shown in [42], the polarization index reveals how unbalanced is the distribution of interactions betweennon-veriﬁed users and veriﬁed users: non-veriﬁed accounts basically focus their retweeting activity on the tweets ofveriﬁed users within the same community, thus providing a clear indication of the community of which a non-veriﬁeduser is likely to be a member. 24 eferences [1] Eurobarometer. Standard eurobarometer 88 “Media Use in the European Union” report. European Commission -Public Opinion , 2017. (accessed September 30, 2018).[2] K. E. Matsa and E. Shearer. News use across social media platforms 2017.

Pew Research Center , 2018. (accessedSeptember 30, 2018).[3] A. L. Schmidt, F. Zollo, M. Del Vicario, A. Bessi, A. Scala, G. Caldarelli, H. E. Stanley, and W. Quattrociocchi.Anatomy of news consumption on Facebook.

Proceedings of the National Academy of Sciences , 114(12):3035–3039, 2017.[4] E. Pavan. The integrative power of online collective action networks beyond protest. exploring social media use inthe process of institutionalization.

Social Movement Studies , 16(4):433–446, 2017.[5] M. Del Vicario, A. Scala, G. Caldarelli, H. E. Stanley, and W. Quattrociocchi. Modeling conﬁrmation bias andpolarization.

Scientiﬁc Reports , 7:40391, January 2017.[6] A. L. Schmidt, F. Zollo, A. Scala, C. Betsch, and W. Quattrociocchi. Polarization of the vaccination debate onfacebook.

Vaccine , 36(25):3606 – 3612, 2018.[7] F. Zollo, A. Bessi, M. Del Vicario, A. Scala, G. Caldarelli, L. Shekhtman, S. Havlin, and W. Quattrociocchi.Debunking in a world of tribes.

PLOS ONE , 12(7):1–27, 07 2017.[8] A. J. Morales, J. Borondo, J. C. Losada, and R. M. Benito. Measuring political polarization: Twitter shows thetwo sides of venezuela.

Chaos: An Interdisciplinary Journal of Nonlinear Science , 25(3):033114, 2015.[9] D. Cherepnalkoski and I. Mozetiˇc. Retweet networks of the european parliament: evaluation of the communitystructure.

Applied Network Science , 1(1):2, Jun 2016.[10] S. Pei, L. Muchnik, J. S. Andrade, Jr., Z. Zheng, and H. A. Makse. Searching for superspreaders of information inreal-world social media.

Scientiﬁc Reports , 4:5547, July 2014.[11] Carolina Becatti, Guido Caldarelli, Renaud Lambiotte, and Fabio Saracco. Extracting signiﬁcant signal of newsconsumption from social networks: the case of Twitter in Italian political elections.

Palgrave Commun. , 2019.[12] Guido Caldarelli, Rocco De Nicola, Fabio Del Vigna, Marinella Petrocchi, and Fabio Saracco. The role of botsquads in the political propaganda on Twitter.

Commun. Phys. , 3(1):1–15, dec 2020.[13] S. González-Bailón, J. Borge-Holthoefer, and Y. Moreno. Broadcasters and hidden inﬂuentials in online protestdiffusion.

American Behavioral Scientist , 57(7):943–965, 2013.[14] M. Castells. Network theory | a network theory of power.

International Journal of Communication , 5(0), 2011.[15] C. Padovani and E. Pavan. Global governance and icts: exploring online governance networks around gender andmedia.

Global Networks , 16(3):350–371, 2016.[16] M. T. Bastos and D. Mercea. Serial activists: Political twitter beyond inﬂuentials and the twittertariat.

New Media& Society , 18(10):2359–2378, 2016.[17] E. Pariser.

The Filter Bubble: What the Internet Is Hiding from You . Penguin Group , The, 2011.[18] Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. Fame for sale:Efﬁcient detection of fake Twitter followers.

Decis. Support Syst. , 2015.[19] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini. The rise of social bots.

Commun. ACM , 59(7):96–104,June 2016.[20] Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi. From Reaction to Proaction:Unexplored Ways to the Detection of Evolving Spambots. In

Web Conf. 2018 - Companion World Wide Web Conf.WWW 2018 , 2018.[21] Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi. On the capability of evolvedspambots to evade detection via genetic engineering.

Online Soc. Networks Media , 2019.[22] B. Latour.

Reassembling the Social: An Introduction to the Actor-Network Theory . Oxford University Press, 2005.[23] T. Gillespie, P. J. Boczkowski, and K. A. Foot.

Materiality and Media in Communication and Technology Studies:An Unﬁnished Project , pages 21–51. 2013.[24] T. R. Keller and U. Klinger. Social bots in election campaigns: Theoretical, empirical, and methodologicalimplications.

Political Communication , 36(1):171–189, 2019.2525] T. R. Keller, V. Hase, J. Thaker, D. Mahl, and M. S. Schäfer. News media coverage of climate change in india1997–2016: Using automated content analysis to assess themes and topics.

Environmental Communication ,14(2):219–235, 2020.[26] H. K. Evans, S. Smith, A. Gonzales, and K. Strouse. Mudslinging on twitter during the 2014 election.

SocialMedia + Society , 3(2):2056305117704408, 2017.[27] N. Gaumont, M. Panahi, and D. Chavalarias. Reconstruction of the socio-semantic dynamics of political activisttwitter networks—method and application to the 2017 french presidential election.

PLOS ONE , 13(9):1–38, 092018.[28] F. Giglietto, L. Iannelli, L. Rossi, A. Valeriani, N. Righetti, F. Carabini, G. Marino, S. Usai, and E. Zurovac.Mapping italian news media political coverage in the lead-up of2018 general election.

SSRN Electronic Journal ,2018.[29] Y. Xiong, M. Cho, and B. Boatwright. Hashtag activism and message frames among social movement organizations:Semantic network analysis and thematic analysis of twitter during the

Public Relations Review ,45(1):10 – 23, 2019.[30] Fabio Celli and Luca Rossi. Long chains or stable communities? the role of emotional stability in twitterconversations.

Computational Intelligence , 31(1):184–200, 2015.[31] Fabio Giglietto and Yenn Lee. To be or not to be charlie: Twitter hashtags as a discourse and counter-discourse inthe aftermath of the 2015 charlie hebdo shooting in france. In

Proceedings of the 5th Workshop on Making Senseof Microposts co-located with the 24th International World Wide Web Conference , pages 33–37, 2015.[32] F. Saracco, M. J. Straka, R. Di Clemente, A. Gabrielli, G. Caldarelli, and T. Squartini. Inferring monopartiteprojections of bipartite networks: an entropy-based approach.

New Journal of Physics , 19(5):053022, may 2017.[33] G. Bobba, R. Bracciale, C. Cepernich, A. Chiaramonte, R. D’Alimonte, V. Emanuele, S. Panebianco, A. Paparo,A. Pedrazzani, L. Pinto, F. Roncarolo, E. Salvati, P. Segatti, M. Vercesi, and F. Zucchini. Who’s the winner? ananalysis of the 2018 italian general election.

Italian Political Science , 1(13), 2018.[34] A. Bovet and H. A. Makse. Inﬂuence of fake news in twitter during the 2016 us presidential election.

NatureCommunications , 10, 2019.[35] M. Del Vicario, F. Zollo, G. Caldarelli, A. Scala, and W. Quattrociocchi. Mapping social dynamics on facebook:The brexit debate.

Social Networks , 50:6 – 16, 2017.[36] A. Jungherr. Twitter use in election campaigns: A systematic literature review.

Journal of Information Technology& Politics , 13:72–91, 03 2016.[37] R. Marchetti and D. Ceccobelli. Twitter and television in a hybrid media system.

Journalism Practice , 10(5):626–644, 2016.[38] M. D. Conover, B. Goncalves, J. Ratkiewicz, A. Flammini, and F. Menczer. Predicting the political alignment oftwitter users. In , pages 192–199, Oct 2011.[39] T. A. Small. What the hashtag?

Information, Communication & Society , 14(6):872–895, 2011.[40] L. Hong, G. Convertino, and E. Chi. Language matters in twitter: A large scale study, 2011.[41] W. H. Gomaa and A. A. Fahmy. A survey of text similarity approaches.

International Journal of ComputerApplications , 68(13):13–18, April 2013. Full text available.[42] C. Becatti, G. Caldarelli, R. Lambiotte, and F. Saracco. Extracting signiﬁcant signal of news consumption fromsocial networks: the case of Twitter in Italian political elections.

Palgrave Commun. , 2019.[43] Y. Lin, B. Keegan, D. Margolin, and D. Lazer. Rising tides or rising stars?: Dynamics of shared attention ontwitter during media events.

PLOS ONE , 9(5):1–12, 05 2014.[44] D. Easley and J. Kleinberg.

Networks, Crowds, and Markets: Reasoning About a Highly Connected World .Cambridge University Press, New York, NY, USA, 2010.[45] Y. Kong, G. Shi, R. Wu, and Y. Zhang. k-core: Theories and applications.

Physics Reports , 832:1 – 32, 2019.k-core: Theories and Applications.[46] M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and H. A. Makse. Identiﬁcation ofinﬂuential spreaders in complex networks.

Nature Physics , 6(11):888–893, Aug 2010.[47] J. v. L. de Jeude, G. Caldarelli, and T. Squartini. Detecting core-periphery structures by surprise.

EurophysicsLetters , 125(6):68001, 2019. 2648] I. Malvestio, A. Cardillo, and N. Masuda. Interplay between k -core and community structure in complex networks,2020.[49] M.J. Straka, G. Caldarelli, and F. Saracco. Grand canonical validation of the bipartite international trade network. Phys. Rev. E , 96(2), 2017.[50] A. Rapini E. Pavan. Antifascism retweeted. semantic networks around

Annual Conference of the Association of Italian Political Communication (AssoComPol) , 2019.[51] V. I. Levenshte˘ın. Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys., Dokl. 10(1965), 707-710 (1966); translation from Dokl. Akad. Nauk SSSR 163, 845-848 (1965)., 1965.[52] V. D. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks.