Local Variation of Collective Attention in Hashtag Spike Trains
LLocal Variation of Collective Attention in Hashtag Spike Trains
Ceyda Sanli and
Renaud Lambiotte
CompleXity and Networks, naXys, Department of MathematicsUniversity of Namur, 5000 Namur, Belgium
Abstract
In this paper, we propose a methodology quantifyingtemporal patterns of nonlinear hashtag time series. Ourapproach is based on an analogy between neuron spikesand hashtag diffusion. We adopt the local variation,originally developed to analyze local time delays inneuron spike trains. We show that the local variationsuccessfully characterizes nonlinear features of hashtagspike trains such as burstiness and regularity. We applythis understanding in an extreme social event and areable to observe temporal evaluation of online collectiveattention of Twitter users to that event.
Introduction
Hashtag diffusion in Twitter social network is nonlinear intime. Pairwise or higher order temporal correlations, bursts,and regular patterns are observed in data analysis. The dis-tribution of time delays between two successive hashtag ac-tivities gives a power-law scaling with fat tails (Domenicoet al. 2013), on the contrary to an exponential distributionsuggested for an independent Poisson process. A potentialreason addressed is that earlier hashtags influence cominghashtags such that past hashtags can both cooperate andcompete with present hashtags (Myers and Leskovec 2012;Coscia 2013). Heterogeneity of individual online user be-havior in micro scale and self-organized cascades (Cheng etal. 2014) due to unequal selection (Ratkiewicz et al. 2010;Weng et al. 2012; Gleeson et al. 2014; Coscia 2013; Cetinand Bingol 2014; Gleeson et al. 2015) in the hashtag poolin macro scale, and the underlying cyclic rhythm of twit-ting habit (Myers and Leskovec 2014; Franca et al. 2014;Mollgaard and Mathiesen 2015; Sanli and Lambiotte 2015)are further factors driving time-dependent hashtag propaga-tion. Although preserving highly nonlinear nature, buildingtools to characterize hashtag time-series, except obtainingthe distribution functions, has not been considered in detail,yet.Extreme social events such as electionsand protests (Borge-Holthoefer et al. 2011;Gonzalez-Bailon et al. 2011), announcement of scien-tific innovations (Domenico et al. 2013), and panic events
Copyright c (cid:13) hourdebate dayregular dayelection day h a s h t a g s p i k e t r a i n s : l e d e b a t Figure 1: Hashtag spike trains of ledebat in different dayscovering extreme social events such as the debate of theFrench presidential election-2012 held on May 2 and theelection held on May 6, and a regular day between them, e.gMay 4. The upper row represents the dynamics in the debateday. Collective attention during the debate gives tremendousamount of activity on the hashtag and so we observe a con-tinuous series on the contrary to the distinguished spikes be-fore 4 pm. The middle row is for a regular day after the de-bate, followed by the spike train in the election day in thebelow row. A decay in the activity on the hashtag ledebatis visible from the top to the bottom and the process suggestshighly nonlinear characteristics in each day.such as crisis (Kenett et al. 2014) and earthquakes (Sasaharaet al. 2013) artificially deform Twitter network and en-courage massive amount of hashtag activity in a short timewindow, as shown in Figure 1. The resultant emergent on-line behavior is both empirically (Yang and Leskovec 2011;Mollgaard and Mathiesen 2015) and theoretically (Moll-gaard and Mathiesen 2015) studied and distinct temporalproperties of collective attention are quantified. These a r X i v : . [ c s . S I] A p r roperties are significantly important to be able to predictthese extreme, but rare social events (Kenett et al. 2014;Miotto and Altmann 2014).Our main motivation is to establish a systematic method-ology to distinguish real noisy hashtag signals to indepen-dent random signals and to extract temporal patterns fromthe real signals. We apply an approach called the local varia-tion L V , originally introduced to analyze noisy neuron spiketrains and to detrend for salient dynamics of neurons (Shi-nomoto, Shima, and Tanji 2003; Miura, Okada, and ichiAmari 2006; Omi and Shinomoto 2011). After convincingthe usage of L V in semantic analysis, which has been per-formed extensively in our recent work (Sanli and Lambiotte2015), we present a promising study on evaluation of col-lective attention by performing L V on a political election.Remarkable difference in L V in rush period suggests thatlocal nonlinear features could predict extreme social events. Data Set
Data Collection
The data is collected via publicly open Twitter API. A finetime window, between April 30, 2012 and May 10, 2012,is determined on purpose to be able to cover two socialevents such as the political debate on the French presiden-tial election-2012 held on May 2 and the election day heldon May 6. Having 10 days data helps us to visualize activityin regular days, both between and after these extreme events,and compare the difference in hashtag dynamics. During thisperiod, all twitting activity, but only the users addressed inFrance is considered not to deal with time differences be-tween countries and regions and other potential social eventsheld on in the same period. The time resolution is 1 secondand no language selection is applied.We examine 295,697 unique hashtags out of 2,942,239tweets include at least one hashtag, which is ofall tweets. 228,525 online users, almost half of the to-tal online users, are associated with hashtag diffusion.The network in the period contains hashtags directly re-lated to the debate, election, and two candidates Fran-cois Hollande and Nicolas Sarkozy for the presidency ofFrance. Ranking them by the number of appearance (fre-quency) or equivalently popularity p , from the most pop-ular to the least, we have ledebat (180946), hollande(143636), sarkozy (116906), votehollande (99908), avecsarkozy (67549), ledebat (66668) [in French], france2012 (20635), presidentielle (13799), and manyothers with lesser p . The numbers inside the parenthesispresent the corresponding p . These popular hashtags are atthe top of the others in the pool, e.g. . of all hash-tags. Real Hashtag Spike Trains
Single hashtag diffusion in time can be represented as aspike train, as shown in Figure 1. Each spike represents thatthe corresponding hashtag used at that time without spec-ifying ways and users. Having the resolution 1 second, thespike time of multiple events occurring in a second cannot be distinguished and therefore in this situation only one appear-ance is counted. We construct spike trains for all hashtagsobserved in the data ordering from the earliest appearancetime to the latest time, e.g. . . . , τ i − , τ i , τ i +1 , . . . . Eachhashtag has a unique number of (exact) appearance, popu-larity p . Randomized Hashtag Spike Trains
To be able to compare real dynamics with an artificial and in-dependent one, the randomized version of real hashtag spiketrains is established serving as a null model. First, all spikescoming from any hashtags are combined, giving a singlemerged hashtag spike train. Uniforming spike appearance,one spike at a spike time, is still valid. Children randomizedhashtag spike trains are obtained by uniformly permuting thematrix T of the spike times of the merged train by p times,the number of spikes of the desired real train we compare.We apply randperm( T , p ) in Matlab and have p times uni-formly distributed unique independent random spike times,e.g. . . . , τ ri − , τ ri , τ ri +1 , . . . . Local Variation
The local variation L V , specifically defined to quantify non-linear neural time-series and to uncover temporal patternsin neuron spike trains, is defined at spike time τ i (Omi andShinomoto 2011) L V = 3 N − N − (cid:88) i =2 (cid:18) ∆ τ i +1 − ∆ τ i ∆ τ i +1 + ∆ τ i (cid:19) , (1) ∆ τ i +1 = τ i +1 − τ i and ∆ τ i = τ i − τ i − . ∆ τ i +1 quantifiesforward delay and ∆ τ i represents backward waiting time.Importantly, the denominator normalizes the quantity suchas to account for local variations of the rate at which eventstake place.By definition, L V takes values in the interval [0:3]. Fur-thermore, it is derived that L V is on average equal to 1, (cid:104) L V (cid:105) = 1, if the underlying process described by an independentPoisson distribution, which the distribution of the inter-spikeintervals gives an exponential function (Shinomoto, Shima,and Tanji 2003). Here, the brackets describe the averagetaken over the given distribution. All other situations canbe generalized by Gamma processes (Shinomoto, Shima,and Tanji 2003; Miura, Okada, and ichi Amari 2006) and (cid:104) L V (cid:105) should be significantly different than 1. For instance, (cid:104) L V (cid:105) ≈ if the hashtag spike trains are extremely bursty(irregular), on the other hand (cid:104) L V (cid:105) ≈ while the trainspresent regular (homogeneous) temporal patterns (Sanli andLambiotte 2015).Figure 2 shows the results of our L V analysis, for bothreal and randomized hashtag spike trains. The probabilitydistribution of P ( L V ) of the calculated values of L V on thetwo data sets, with classifying hashtag groups in popular-ity p , presents distinct behavior. Whereas (cid:104) L V (cid:105) = 1 for anygroups of p for the randomized trains, suggesting Poissonprocesses, (cid:104) L V (cid:105) never indicates 1 for the real trains. Therandomization dampens nonlinearity of the real trains, tem-poral correlations, burstiness, and regularity in series andconstruct statistically stationary and independent processes,
=91127
=18553
= 1678
= 318
= 174
= 117
= 86
= 68
= 56
= 47
= 41
= 35
=91127
=18553
= 1678
= 318
= 174
= 117
= 86
= 68
= 56
= 47
= 41
= 35 L V P ( ) L V P ( ) L V Real activity(a) Random activity(b)
Figure 2: Probability density function of local variation L V , P ( L V ) , of hashtag spike trains (Sanli and Lambiotte 2015).(a) Real hashtag spike trains. We observe a clear shift, to thehigher values of L V , in the peak positions while decreas-ing hashtag popularity p , which indicates that the processbecomes bursty (irregular). In any p , the mean values nevergives 1, none of the real signal is Poisson process. (b) Ran-domized hashtag spike trains. Independent of p , all curvessuggest fluctuations around 1, as expected for temporarilyindependent signals. To satisfy a better visualization, the re-sults are grouped based on ranking p from the most popularto the least popular ones: High p , red and orange symbols,moderate p , yellow and green symbols, and low p , blue andpurple symbols.yet time-dependent events. Therefore, we characterize time-dependent Poissons in Figure 2(b), P ( L V ) fluctuates around1. However, all nonlinearities are present in the real data,and so in P ( L V ) . Describing regular patterns for popularhashtags (high p ), red and orange symbols, the trains be-come bursty (irregular) due to local temporal correlationsfor moderate, yellow and green symbols, and for low popu-larity, blue and purple symbols. The trend is captured in theshift of the peak positions of P ( L V ) from small L V to large L V decreasing p in Figure 2(a). Consequently, we find thatnot only for neurons but also for hashtags L V is a success-ful tool to characterize salient dynamics in nonlinear social time-series. Empirical Application: Collective Attention
We now utilize L V for more practical purposes and ask: Can L V predict extreme social events? Our investigation will bepresented below is far from a complete understanding. How-ever, we will be able to capture temporal evaluation of on-line emergent behavior as a result of collective attention oftwitting on the French presidential election-2012, in the firstweek of May 2012.We specifically compare hashtag diffusion in extremedays, the debate day (May 2) and the election day (May 6)with the dynamics in a regular day between these events,e.g. May 4. Instead of considering all hashtags in the pool, asdone in the previous Section, we concentrate on topic relatedhashtags such as ledebat (180946), hollande (143636), sarkozy (116906), votehollande (99908), avecsarkozy(67549), and ledebat (66668) [in French]. The numbers inthe parenthesis indicate p of the corresponding hashtag.Local variation L V is obtained for these topic-orientedhashtag spike trains. The trains are constructed separatelyfor the three days. L V for each train and for each day is cal-culated considering time window with duration 1 hour. Fig-ure 3 presents the results in the debate (left), regular (mid-dle), and election (right) days. The top row [Figure 3(a)]shows L V ( t ) in the days in hour resolution. The below row[Figure 3(b)] summarizes the twitting activity as the tweetsincluding listed hashtags in the legend versus time, again inhour resolution.Rush hours in online communications during the debateand the announcement of the election result are highlightedin the shaded yellow rectangle and with the yellow verticalline, respectively. Significant decays in L V ( t ) for both thedebate and election days, synchronizing perfectly with thepeak of the counts, indicate regular activation of the onlineusers on the discussion of the election and so describe noburstiness, L V ( t ) ≈ . This trend is not observed at all forthe regular day and mainly the cyclic rhythm of Twitter net-work (Sanli and Lambiotte 2015) characterize the values of L V ( t ) . While large amount of fluctuations present in inac-tive hours [0 am:6 am], the rest of the day (cid:104) L V ( t ) (cid:105) ≈ suggesting time-dependent Poisson processes. These resultsare preliminary, but promising since the stages of collectiveattention are clearly visible on L V ( t ) . Discussion and Future Work
The main purpose of this paper is to establish a tool for noisysocial time-series and uncover nonstationary features andtemporal patterns, specifically in an online emergent limit.Our comparative test on the real and randomized data setsshows that the local variation L V , a metric introduced toquantify the fluctuations of neuron spike trains as comparedto a local characteristic time, works successfully in hashtagspike trains, as well. This encourages us to develop furthertools, for instance to predict extreme online events by eval-uating the early noisy signal prior to an extreme event. Asan example, we consider the week of the French presidentialelection-2012. This fine time window is well suitable for our (a) debate day (b) regular day election day hour hour hour L ( t ) V t w ee t c o u n t / h o u r announcement of the result at 7 pm debateat 7-11 pm Figure 3: Characterizing temporal evaluation of collective attention. From left to right, the debate (May 2, 2012), a regularday (May 4, 2012), and the election day (May 6, 2012) are shown. (a) The local variation of L V ( t ) on the topic-relatedhashtags about the debate and the election. The shaded yellow rectangle covers the debate hours and the yellow line indicatesthe announcement of the election. Significant decays in L V ( t ) , in left and right windows, match well with the schedule of theevents. However, no remarkable trend is observed in the regular day (middle panel). (b) Counting tweets, including at leastone of the hashtags addressed in the legends, per hour. The activity increase in time coincides successfully with the decays in L V ( t ) , indicating that the collective attention homogenizes the hashtag propagation and so the hashtag spike trains in this limitpresent temporal regularity.aim and we find that L V is sensitive enough to distinguishcollective attention period, users are active homogeneouslyin time, from the preceding period where temporal hetero-geneity is present and therefore a prediction would be satis-fied by performing better statistics in the decay of L V ( t ) .We obtain L V ( t ) is almost 0 in rush periods. Such artifi-cial regularity originates from our assumption due to lackof time resolution below 1 second. Although we observeheterogeneity in hashtag spike trains in rush hours in theempirical data, uniforming spike appearance setting to 1 inany spike time creates unnatural homogeneity in emergentlimit. To resolve this, the trains should be constructed pre-serving the heterogeneity in the data and so L V must bere-introduced for nonuniform number of spikes at differentspike times in a train. Acknowledgments
C. Sanli acknowledges supports from the EU 7th FrameworkOptimizR Project and FNRS. This paper presents researchresults of the Belgian Network DYSCO, funded by the In-teruniversity Attraction Poles Programme, initiated by theBelgian State, Science Policy Office.
References
Borge-Holthoefer, J.; Rivero, A.; Garcia, I.; Cauhe, E.; Fer-rer, A.; Ferrer, D.; Francos, D.; Iniguez, D.; Perez, M. P.;Ruiz, G.; Sanz, F.; Serrano, F.; Vinas, C.; Tarancon, A.; andMoreno, Y. 2011. Structural and dynamical patterns on on-line social networks: The spanish may 15th movement as acase study.
PLoS ONE
Phys. Rev. E
Proceed-ings of the 23rd International Conference on World WideWeb , WWW ’14, 925–936. New York, NY, USA: ACM.Coscia, M. 2013. Competition and success in the memepool: A case study on quickmeme.com. In
InternationalAAAI Conference on Weblogs and Social Media (ICWSM) .Domenico, M. D.; Lima, A.; Mougel, P.; and Musolesi, M.2013. The anatomy of a scientific rumor.
Sci. Rep.
ArXiv e-prints .leeson, J. P.; Ward, J. A.; O’Sullivan, K. P.; and Lee, W. T.2014. Competition-induced criticality in a model of memepopularity.
Phys. Rev. Lett.
ArXiv e-prints .Gonzalez-Bailon, S.; Borge-Holthoefer, J.; Rivero, A.; andMoreno, Y. 2011. The dynamics of protest recruitmentthrough an online network.
Sci. Rep.
PLoS ONE
PLoS ONE
NeuralComput.
ArXiv e-prints .Myers, S., and Leskovec, J. 2012. Clash of the contagions:Cooperation and competition in information diffusion. In
Data Mining (ICDM), 2012 IEEE 12th International Con-ference on , 539–548.Myers, S. A., and Leskovec, J. 2014. The bursty dynamicsof the twitter information network. In
Proceedings of the23rd International Conference on World Wide Web , WWW’14, 913–924. New York, NY, USA: ACM.Omi, T., and Shinomoto, S. 2011. Optimizing time his-tograms for non-poissonian spike trains.
Neural Comput.
Phys. Rev. Lett.
ArXiv e-prints .Sasahara, K.; Hirata, Y.; Toyoda, M.; Kitsuregawa, M.; andAihara, K. 2013. Quantifying collective attention from tweetstream.
PLoS ONE
Neural Comput.
Sci. Rep.