[PDF] Branching process descriptions of information cascades on Twitter

Abstract

A detailed analysis of Twitter-based information cascades is performed, and it is demonstrated that branching process hypotheses are approximately satisfied. Using a branching process framework, models of agent-to-agent transmission are compared to conclude that a limited attention model better reproduces the relevant characteristics of the data than the more common independent cascade model. Existing and new analytical results for branching processes are shown to match well to the important statistical characteristics of the empirical information cascades, thus demonstrating the power of branching process descriptions for understanding social information spreading.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] J u l Branching process descriptions of information cascades on Twitter

James P Gleeson , , , Tomokatsu Onaga , Peter Fennell , , James Cotter , RaymondBurke , David J. P. O’Sullivan MACSI, Department of Mathematics and Statistics, University of Limerick, Ireland. Insight Centre for Data Analytics, University of Limerick, Ireland. Conﬁrm Centre for Smart Manufacturing, University of Limerick, Ireland. The Frontier Research Institute for Interdisciplinary Sciences & Graduate School ofInformation Sciences, Tohoku University, Japan. USC/ISI, 4676 Admiralty Way, Marina Del Rey, Los Angeles, California 90292, U.S.A.13 July 2020

Abstract

A detailed analysis of Twitter-based information cascades is performed, and it is demon-strated that branching process hypotheses are approximately satisﬁed. Using a branching pro-cess framework, models of agent-to-agent transmission are compared to conclude that a limitedattention model better reproduces the relevant characteristics of the data than the more com-mon independent cascade model. Existing and new analytical results for branching processesare shown to match well to the important statistical characteristics of the empirical informationcascades, thus demonstrating the power of branching process descriptions for understandingsocial information spreading.

The transmission of information via online social networks is increasingly ubiquitous. The volumeof freely-available data oﬀers unprecedented opportunities for data-driven mathematical modellingof human behaviour. Twitter, for example, is a directed social network wherein users “follow”other users in order to receive their broadcast transmissions, called “tweets”. All tweets are public,making analysis of Twitter data particularly popular among data scientists. Twitter users mayretweet messages they receive from the users they follow, and in this way cascades of informationmay stem from a single tweet event that we call the “seed”. In the search for mathematicalmodels to describe such structures, branching processes [1,2] are an appealing option. As stochasticprocesses, they can potentially capture the wide variability observed in tweeting patterns and humanbehaviour, while oﬀering a wealth of theoretical results that can be tested against data from onlinesocial networks.Branching processes have already been applied in several studies of Twitter and other onlinefora. The recent review by Arag´on et al. [3] surveys models of discussion threads, including Twitterreply cascades. Several of the generative models cited in [3] are based on branching processes, butmost ﬁnd it necessary to modify classical branching processes with some novel features in order tomatch to data. For example, Nishi et al. [4] studied reply cascades in Twitter (as distinct fromthe retweet cascades that we examine here) that were seeded by celebrities and found that these1ould not be ﬁtted by classical Galton-Watson processes, so they introduced a modiﬁed versionof the branching process. On the other hand, Galton-Watson processes (albeit with special seedoﬀspring distributions) were successfully applied to discussion trees from Reddit by Medvedev etal. [5], and time-dependent continuous-time branching processes were ﬁtted to a viral marketingcampaign in [6, 7]. Golub and Jackson [8] reanalysed data from [9] to show that although standardbranching processes did not appear to reproduce the features of the email cascades studied in [9],when selection bias was added—to model the fact that large, viral, chains are more likely to beobserved than small chains—then the biased Galton-Watson process ﬁtted quite well.Although branching processes have been ﬁtted to data to form the basis for simulation andprediction in several studies, the application of analytical results from branching processes theoryhas been mostly limited to a small selection of features. The most common [3] is the cascadesize, i.e., the accumulated number of tweets or replies to a single seeding post, for which the well-known Galton-Watson result for the expected total number of progeny has been used for prediction,e.g., [10]. However, determining the entire distribution of cascade sizes, not just its mean, is quitefeasible [11, 12], as are analytical (or semi-analytical) methods for the calculating the length anddepth of cascade trees, as well as other measures. One such measure that we examine in Sec. 4.3is the “structural virality” of a cascade, as introduced by Goel et al. [13]. Using this and othermeasures of cascade trees, Goel et al. performed large-scale numerical simulations of a simpletransmission model on networks to ﬁt to data from Twitter. The transmission model of [13] isa discrete-time version of the susceptible-infected-recovered disease-spread model, also known asthe “independent cascade model” (ICM) [14]. Other network-based simulation models [15, 16] usevariations of such dynamics (such as susceptible-infected-susceptible disease-spread models [17]) tounderstand the eﬀects of network structure upon spreading.In this paper we focus on three aspects of branching process models for Twitter retweet cas-cades, using a reanalysis of two previously-studied datasets [18, 19]. First, in Section 2, we extractthe empirical oﬀspring distributions from the tree structures and show that these remain approx-imately stable across a range of generations. This is a necessary condition for classical branchingprocesses to provide accurate models for detailed features of cascade trees, and the simplicity of thisresult contrasts with models where explicit time-decay of novelty [20–22] or generation-dependentbranching numbers [23] are required.Secondly, we consider in Section 3 how the structure of the underlying social network andthe modelling of user-to-user transmission mechanisms can aﬀect the oﬀspring distribution forcascade trees. By comparing with the empirical results of Sec. 2, we examine whether the oﬀspringdistribution is better modelled by the independent cascade model, or by an alternative model thataccounts for limited attention of users of social media [24].Finally, in Section 4, we use the branching process framework to derive predictions for featuresof cascades, focusing on both the distribution of metrics of interest across the entire dataset andon analytical results for expected values. For completeness, we ﬁrst derive the well-known resultsfor cascade sizes and durations and then build on this approach to derive results for the expectedvalue of tree depths and the structural virality measure of Goel et al. [13], and apparent noveltydecay factors [20, 21]. These results include integral expressions for expected structural viralityand tree depth that we have not been able to locate in existing literature, and which are amenableto asymptotic analysis. We conclude the paper with a discussion of the results, limitations, andpotential extensions in Section 5. 2

Data

As a motivation and test for branching process hypotheses, we reanalyse two independent Twitterdatasets. Both datasets have been previously analysed and described in Refs. [18] and [19, 25], buthere we use the identiﬁed cascade structures to focus on the accuracy of branching process models.The ﬁrst dataset, which we call “Marref”, is comprised of tweets related to the 2015 Irish same-sex marriage referendum, collected between May 8 and May 23, 2015. As described in [18], all tweetscontaining either of the hashtags

In this section we analyse the characteristics of the trees extracted from the data as described inSec. 2.1. For each dataset, we consider an ensemble of M trees, with each tree made up of particles(or nodes) in multiple generations, see Fig. 1. We deﬁne Z m,n to be the number of particles ingeneration n of tree m (where m = 1 , , . . . , M and n = 0 , , . . . , ). The individual trees have veryheterogeneous characteristics (size, number of generations, etc.), so we ﬁrst consider the ensembleas a whole. 3igure 1: Schematic of the ensemble of trees, indicating the Z m,n values for the ﬁrst two trees inthe ensemble.Deﬁning z n as the total number of generation- n particles observed across all trees, i.e., z n = M X m =1 Z m,n , (1)we plot in the top panels of Fig. 2 the dependence of z n on the generation n using log-linearscales. Figure 2(a) is from the Marref dataset ( M = 7 , M = 39 , z n upon n is shown by the nearlylinear shape of the function on log-linear axes; such a dependence is consistent with a subcriticalbranching process. Note that small-number ﬂuctuations occur when z n is relatively small: wechoose 10 as a threshold level (shown by the black dashed line in Figs. 2(a) and (b)) and focus on z n values which are above this threshold.In Fig. 2(c) and Fig. 2(d) (for Marref and URL, respectively) we show the eﬀective branchingnumber [23] ξ n = z n +1 z n , (2)which gives the average number of children particles for a particle of the n th generation. Observethat ξ n is approximately constant for the range of generations in which z n is suﬃciently large (i.e.,above the threshold marked in Figs. 2(a) and (b)). Figure 2(d) shows that the early generations ofthe URL dataset exhibit a lot of ﬂuctuations in the ξ n values, consistent with the possible biasingof the data towards larger trees (see Sec. 2.1). In both cases, the branching number ξ of the seedgeneration appears to be anomalously high; this is partly due to the biasing introduced by thefact that no trees of size less than two are recorded (but see also the discussion leading to Eq. (10)below). The dashed green lines in Fig. 2 highlight the range of generations over which the branchingnumber ξ n appears to be approximately constant. For each dataset we calculate an average value ¯ ξ of the branching number over the range shown by the dashed green line. The URL dataset, with avalue ¯ ξ = 0 .

90, has a high virality (recall the critical branching number of 1 separates the regime ofsubcritical cascades from that of supercritical cascades), while the Marref dataset has lower virality( ¯ ξ = 0 . generation node s i n gene r a t i on (a) Marref generation node s i n gene r a t i on (b) URL b r an c h i ng nu m be r (c) Marref b r an c h i ng nu m be r (d) URL Figure 2: Number of nodes (top panels) and eﬀective branching number (bottom panels) in thedata, as deﬁned by Eqs. (1) and (2). Here, and in most subsequent ﬁgures, the left panels ((a) and(c)) show results for the Marref dataset while the right panels ((b) and (d)) are the results fromthe URL dataset. 5ext, we make a stronger test of the branching process hypothesis, by examining the empiricaloﬀspring distribution at each generation. For each particle i in generation n we record the number Z ( i ) n +1 of its oﬀspring particles, i.e., the number of users in generation n + 1 that are identiﬁed aschildren of particle i . Gathering the ensemble of Z ( i ) n +1 values across all trees, we calculate theempirical oﬀspring distribution of generation n as¯ q ℓ,n = Prob (cid:16) Z ( i ) n +1 = ℓ | particle i in generation n (cid:17) , (3)i.e., ¯ q ℓ,n is the probability that a particle in generation n spawns ℓ children particles in generation n + 1 and we have used the fact that the maximum-likelihood estimate of the probability of having ℓ children is given by the fraction of nodes in the data with ℓ children [8].In Figs. 3(a) and (b) we plot the empirical oﬀspring distributions for several generations. Be-cause of the data collection restriction to cascades of size exceeding one (and also because of thenetwork structure, see Sec. 3 below), the seed generation oﬀspring distribution ¯ q ℓ, diﬀers substan-tially from the other generations. However, for the Marref data set (Fig. 3(a)), observe that the¯ q ℓ,n distributions for n = 1 through n = 4 (which is the range of generations giving z n values abovethreshold in Fig. 2(a)) are very similar to each other: the curves in Fig. 3(a) are almost indis-tinguishable. This collapse of the empirical oﬀspring distributions is consistent with a branchingprocess model in which the oﬀspring distributions are identical for all generations with n ≥

1, seeSec. 3.In the URL data set, the low-generation distributions ¯ q ℓ,n do not show as clean a collapse asseen in the Marref case, see the inset of Fig. 3(b). However, this may be due to the selectionbias in the data collection, which means that small trees (those with fewer generations) are likelyto have been omitted from the collected set of trees. Larger trees are more likely to be properlyrepresented in the dataset, and these trees are also likely to consist of a large number of generations.Accordingly, we plot also the ¯ q ℓ,n curves for n = 10 , , , , , ,

40 in Fig. 3(b) (note the rangeof generations chosen matches the green dashed line in Fig. 2(b) and (d)), and we observe a goodcollapse of these distributions, which is again consistent with a branching process model.From the evidence of Figs. 2, 3(a) and 3(b), we conclude that a branching process modelmay give a good approximation to the heterogeneous cascades represented by the trees extractedfrom the data sets. In the next section we will derive a mathematical model that explicitly linksthe network structure and various hypotheses on the information-spreading mechanism to predictoﬀspring distributions, which we then compare with the empirical results of Fig. 3(a) and 3(b).

We consider a directed network whose structure is minimally described (in the conﬁguration-modelsense [29]) by the joint distribution p jk of nodes’ in-degree j and out-degree k : in other words, p jk is the probability that a randomly chosen node has j friends and k followers . We model thedynamics of information spreading at the level of ( j, k ) classes also, deﬁning the vulnerability v jk as the probability that a ( j, k )-class node will retweet a message that it has received from one ofits j friends [30].Consider a message that is tweeted by a node to its followers. Under the conﬁguration-modelassumption, the probability that a follower is in the ( j, k ) class is given by j/ h j i p jk , where h j i = Following [19, 25], we call the nodes followed by node i (the in-neighbours of i ) its friends . The followers of i are the out-neighbours of i , where we consider the direction of the edges to be the direction of information ﬂow, i.e.,edges point from a node to its followers. −4 −3 −2 −1 offspring q l (a) Marref −4 −3 −2 −1 offspring q l (b) URL −5 −4 −3 −2 −1 offspring+1 q l cc d f (c) Marref −6 −4 −2 offspring+1 q l cc d f (d) URL Figure 3: Empirical oﬀspring distributions. (a) Oﬀspring distributions for generations 0 (blacksymbols) through to 4 (coloured symbols) of Marref dataset. The magenta curve is the LAMprediction; the black curve is the ICM prediction. (b) Oﬀspring distributions for generations 10through to 40 in steps of 5 (with generations 0 to 5 in inset) from URL dataset, with LAM and ICMtheory curves in magenta and black, respectively. (c) CCDF of oﬀspring distribution for Marref;blue symbols show the averaged empirical distribution (averaged over generations 1 through 4),curves are LAM (magenta) and ICM (black) predictions, with the ﬁtted distribution of Eq. (17) inred. (d) As panel (c), but for the URL datset, with the averaged empirical distribution averagedover generations 10 through 40. 7 j,k jp jk is the mean in-degree (mean number of friends) over the network. This follower willretweet the message if he is vulnerable, which occurs with probability v jk , and in doing so, he willexpose all k of his followers to the message. Thus, the probability that a randomly-chosen followerwill retweet a message he receives is given by ρ = X j,k j h j i p jk v jk . (4)If we know that a follower has retweeted the message (i.e., if we condition on retweeting) then theprobability that he is in the ( j, k ) class is 1 ρ j h j i p jk v jk . (5)In particular, the probability that a retweeter has k followers is given by summing over all possible j values: X j ρ j h j i p jk v jk . (6)Assuming each of the k followers to be independently vulnerable with probability ρ , the number ℓ of followers who themselves retweet has the binomial distribution (cid:18) kℓ (cid:19) ρ ℓ (1 − ρ ) k − ℓ . (7)Combining these probabilities, we have derived the oﬀspring distribution q ℓ which gives the prob-ability that a retweeting by a node will lead to ℓ further retweets by followers of that node as q ℓ = X k X j ρ j h j i p jk v jk | {z } Prob k followers, conditioned on retweeting (cid:18) kℓ (cid:19) ρ ℓ (1 − ρ ) k − ℓ | {z } Prob ℓ of k followers retweet . (8)The corresponding pgf for the oﬀspring distribution, f ( x ) = P ℓ q ℓ x ℓ , is f ( x ) = X k,j ρ j h j i p jk v jk (1 − ρ + ρx ) k . (9)In the derivation of Eq. (9), we began by considering a node that receives the message from oneof its friends. However, the initial source (or seed ) of the cascade has a diﬀerent dynamic, meaningthat the seed generation of the branching process has an oﬀspring distribution diﬀerent from Eq. (8).We assume that the seed node for a cascade is chosen uniformly at random from all the nodes. Thismeans that the seed node is in the ( j, k ) class with probability p jk . As above, the number ℓ of its k followers who will retweet the message is given by Eq. (7), and so the seed-generation oﬀspringdistribution is e q ℓ = X k,j p jk (cid:18) kℓ (cid:19) ρ ℓ (1 − ρ ) k − ℓ , (10)with corresponding pgf e f ( x ) = ∞ X ℓ =0 e q ℓ x ℓ = X k,j p jk (1 − ρ + ρx ) k . (11) We use tildes to diﬀerentiate the seed-generation oﬀspring distribution and pgf from those deﬁned in Eqs. (8)and (9). p jk ) and the dynamics (via the vulnerability v jk ), we next examine two possiblemodels for contagion dynamics. In the independent cascade model (ICM) [14] each “infected” node (i.e., node who tweets or retweetsthe message of interest) gets one attempt to infect each of its out-neighbours; the infection attemptis successful (meaning that the follower also retweets the message) with probability C , where C is the single parameter of the model. In our modelling framework, this implies that the ICMvulnerability of every node is equal to C , regardless of the node’s ( j, k ) class: v ICM jk = C ∀ j, k. (12)Note that in this case, the retweet probability ρ is determined from Eq. (4) to be ρ = C . Moreover,in the special case of uncorrelated in- and out-degrees (i.e., if the number of friends j and thenumber of followers k of a node are uncorrelated), the joint distribution p jk factorises into theproduct p in j p out k and the oﬀspring q ℓ and e q ℓ are identical . However, in the more realistic case wherethe in- and out-degrees of nodes (the numbers of friends and followers of users) are correlated(see, for example, Fig. 2 of [31]), the oﬀspring distribution of the seed generation diﬀers from theoﬀspring distributions of subsequent generations. A number of researchers have pointed out that the limitations of human cognition impose aneﬀective limit on how much information can be absorbed and shared by an individual. For a useron Twitter, having a larger number of friends j leads to a faster inﬂux of information into the user’sstream, with a consequent dividing of attention among the many tweets. Empirical analyses [19,24]and models of information-sharing dynamics [11,12,32] both indicate that the probability that a userretweets a particular piece of information she has received can be modelled as being approximatelyinversely proportional to the number j of her friends. In our notation, the vulnerability v jk of a( j, k )-class user in the limited attention model (LAM) is inversely proportional to j : v LAM jk = Bj , (13)where B is a parameter of the model, and we assume no nodes have j = 0.In the LAM, the probability ρ of a random follower retweeting is given in terms of B by Eq. (4): ρ = B h j i . (14)Interestingly, under the assumption that the network has no nodes with j = 0 then the LAMoﬀspring distributions for the seed generation and for later generation are identical, even if the in-and out-degrees of nodes are correlated (unlike the ICM model): f LAM ( x ) = e f LAM ( x ) = X k,j p jk (1 − ρ + ρx ) k if p in0 = 0 . (15) This is easily seen from the corresponding pgfs, where the sum over j can be performed in Eqs. (9) and (11) togive f ICM ( x ) = e f ICM ( x ) = P k p out k (1 − C + Cx ) k if p jk = p in j p out k . .3 Comparing ICM and LAM with empirical oﬀspring distributions Using the empirical network structure for the Marref and URL datasets, speciﬁcally the in-degree j i and out-degree k i of each node i in the network, we construct the oﬀspring distribution predictedby the independent cascade model and by the limited attention model, using Eqs. (12) and (13),respectively, in Eqs. (4), (8) and (10). In each case, we ﬁt the parameters C and B by matchingthe branching number to the average value ¯ ξ calculated in Sec. 2.2. The sums over j and k arereplaced by sums over the N nodes: Equation (4), for example, becomes ρ = N X i =1 j i ¯ j N v j i k i , (16)where ¯ j is the sample mean of the in-degrees: ¯ j = N P Ni =1 j i . (In eﬀect, we replace p jk by 1 /N andreplace sums over j and k by a sum over all nodes.)The black (for ICM) and magenta (for LAM) curves in Fig. 3 show how these predictionscompare with the empirical oﬀspring distribution. Evidently, the LAM predictions are closer to theempirical oﬀspring distributions than the ICM predictions, at least for the relatively low values of ℓ in Figs. 3(a) and (b). To examine the empirical oﬀspring distributions at higher values of ℓ wereduce the low-number ﬂuctuations by averaging the distributions over the generations marked withthe green line in Fig. 2(a) and (b), i.e., those generations for which the eﬀective branching numberis approximately constant. This averaged oﬀspring distribution is shown by the blue symbols inFigs. 3(c) and (d): note we plot ℓ + 1 on the horizontal axis in order to make the ℓ = 0 case visibleon the logarithmic scale.Noting the near-linear decay of the oﬀspring distribution on the log-log plot, we ﬁt the empiricalaveraged oﬀspring distribution with a truncated power law: q ℓ ∝ ( ℓ + 1) − β e − ℓθ . (17)This distribution is chosen for its good ﬁt and analytical convenience ; calculations with thisdistribution can be more easily reproduced than by using the full ICM or LAM distributions,which require knowledge of the full set of node degrees ( j i , k i ). To ﬁt the parameters β and θ in Eq. (17), we match the ﬁrst and second moments of the distribution with the correspondingmoments of the averaged empirical distribution. The ﬁtted parameters are given in Table 1, andthe red curves in Figs. 3(c) and 3(d) show that the ﬁtted oﬀspring distribution is reasonably close tothe empirical distribution. A similar procedue is used to ﬁt a seed generation oﬀspring distribution e q ℓ , using the form of Eq. (17) with parameters β and θ replaced by β and θ , and with the domainrestricted to ℓ > .To summarize this Section: we have derived a general formulation for the oﬀspring distributionthat results from cascades on a network with a given distribution p jk of in- and out-degrees. Weused the vulnerability v jk to describe diﬀerent models of information transmission, focussing oncomparing the ICM with the LAM. In Fig. 3 we see that there are observable diﬀerences betweenthe oﬀspring distributions predicted by the two models, with the LAM case generally closer tothe empirical observations. Finally, we ﬁtted a standard distribution (Eq. (17)) to the empiricaldistribution to make our results in the next section more tractable and readily reproducible. Note,however, that in principle the data on the structure of the network (e.g., the p jk distribution) and Its pgf is Li β (cid:16) e − θ x (cid:17) / (cid:16) x Li β (cid:16) e − θ (cid:17)(cid:17) , where Li β is the polylogarithm function of order β . The pgf for the seed generation is (cid:16) x − e θ Li β (cid:16) e − θ x (cid:17)(cid:17) / (cid:16) x − xe θ Li β (cid:16) e − θ (cid:17)(cid:17) . β θ β θ

178 1 . × Table 1: Parameter values for the distribution in Eq. (17), ﬁtted to the ﬁrst and second momentof the averaged empirical distributions.the assumed vulnerability v jk suﬃce to determine the oﬀspring distribution, and this opens thepossibility of examining further hypotheses on the dependence of information spreading on thenodes’ in- and out-degrees [33].It is also worth noting that the network structure, through the correlations in the p jk distri-bution, strongly aﬀects the oﬀspring distribution (and hence, as we show in the next Section, thepredictions of the cascade structure); this point has recently been recognised by Ma et al. [31].We point out that the p jk distribution of the network should therefore be included, when possible,in analysis of information spreading. This is not current practice: in Refs. [12, 13], for example,large-scale simulations are performed on synthetic networks with speciﬁed out-degree distributionsbut without considering the correlation structure between the in- and out-degrees of nodes. In this Section we focus on analytical predictions of branching process theory that can be comparedto statistical features of the two datasets. We begin with a discrete-time branching process, where—as in Sec. 3—the number of oﬀspring of the seed particle is distributed according to pgf e f ( x ) = P ∞ ℓ =0 e q ℓ x ℓ while all later generations of the tree have oﬀspring numbers generated by f ( x ) = P ∞ ℓ =0 q ℓ x ℓ . We consider the seed of the tree to be generation 0, and we are interested in variousproperties of the trees as observed a number of generations later. In Sections 4.1 and 4.2 we usea slightly unusual approach to derive known results on the distribution of cascade durations andsizes. We then extend this methodology to the calculation of other metrics in Secs. 4.3 and 4.4. n ; distribution of cascade lifetimes As a ﬁrst example, we deﬁne the random (non-negative integer) variable e Z n to represent the numberof particles in generation n of the tree (the small nodes in Fig. 4). As schematically represented inFig. 4, these particles are the descendants of the generation-0 seed node, observed n generationsafter the seed. They can also be considered as the sum of the particles contained in all the subtreesthat are seeded at generation 1 and which are observed n − k of particles in generation 1, we deﬁne Z ( i ) n − to be the number of particles in the subtreethat is seeded by the i th particle in generation 1, as observed n − n of the parent tree). Since all the subtrees are i.i.d., each of the k randomvariables Z ( i ) n − has the same distribution. 11igure 4: Schematic of a tree generated by a seed particle; note the number of children of the seedparticle is generated by e f .We deﬁne the pgf e F n for the random variable e Z n as e F n ( s ) = E (cid:16) s e Z n (cid:17) = ∞ X j =0 Prob (cid:16) e Z n = j (cid:17) s j , (18)where E denotes expectation over the ensemble of trees and s is a dummy variable. If we conditionon the number k of particles in generation 1, we can write e Z n as the sum of the k subtree variables Z ( i ) n − (the superscript i denotes the i th i.i.d. copy): e Z n = k X i =1 Z ( i ) n − , (19)and so E (cid:16) s e Z n (cid:12)(cid:12)(cid:12) k particles in generation 1 (cid:17) = E (cid:16) s P ki =1 Z n − i ) (cid:17) = E (cid:16) s Z (1) n − (cid:17) E (cid:16) s Z (2) n − (cid:17) . . . E (cid:16) s Z ( k ) n − (cid:17) = (cid:2) E (cid:0) s Z n − (cid:1)(cid:3) k , (20)where we have used the independence of the subtrees and the i.i.d. nature of the Z ( i ) n − variables.Writing F n − ( s ) for the pgf E (cid:0) s Z n − (cid:1) and summing over all possible values of k (recall that e q k is the probability that there are k children of the seed particle, i.e., k particles in generation 1)yields e F n ( s ) = E (cid:16) s e Z n (cid:17) = ∞ X k =0 e q k E (cid:16) s e Z n (cid:12)(cid:12)(cid:12) k particles in generation 1 (cid:17) = ∞ X k =0 e q k (cid:2) E (cid:0) s Z n − (cid:1)(cid:3) k = e f ( F n − ( s )) . (21)12igure 5: Schematic of a subtree generated from a particle that is not a seed; note the number ofchildren of the particle is generated by f .This equation relates the pgf for e Z n to the pgf for the subtree quantities Z n − . The next step isto derive an equation that recursively links Z n (the number of particles in a subtree n generationsafter its birth) to Z n − .Figure 5 is a schematic view of this relationship. The main subtree in Fig. 5 is born with theﬁrst particle shown (left of the Figure) and we condition on the number k of children particles ofthis ﬁrst particle; recall that k is a random variable with pgf f ( x ). The number Z m of particles inthe main subtree after m generations is equal to the sum of the k i.i.d. variables Z ( i ) m − : Z m = k X i =1 Z ( i ) m − , (22)and so E (cid:0) s Z m (cid:12)(cid:12) k particles in generation 1 (cid:1) = E (cid:16) s P ki =1 Z m − i ) (cid:17) = (cid:2) E (cid:0) s Z m − (cid:1)(cid:3) k , (23)as in Eq. (21). Summing over the possible values of k then yields F m ( s ) = E (cid:0) s Z m (cid:1) = ∞ X k =0 q k E (cid:0) s Z m (cid:12)(cid:12) k particles in generation 1 (cid:1) = ∞ X k =0 q k (cid:2) E (cid:0) s Z m − (cid:1)(cid:3) k = f ( F m − ( s )) . (24)Equation (24) gives a recursion relation for the pgf F m ( s ), starting from the initial condition F ( s ) = s , corresponding to the tree being seeded from a single particle. Using the result of therecursion Eq. (24) in Eq. (21) then gives the pgf for the number of nodes in generation n of thetree. This characterization of the branching process is called the backward approach in [34], inanalogy with the backward Chapman-Kolmogorov equation of Markov processes. An alternative forward approach—wherein the states of particles in generation n + 1 is predicted from the stateof the process after n generations—is often used to derive Eq. (24), but we will ﬁnd the backwardapproach easily generalizable to other quantities of interest.13 −4 −3 −2 −1 lifetime p r obab ili t y Marref(a) −5 −4 −3 −2 −1 lifetime p r obab ili t y URL(b)

Figure 6: Lifetime distribution of cascades in Marref (left) and URL (right) datasets. Blue symbolsare empirical values; red line shows the theoretical distribution from Eq. (25), using the oﬀspringdistribution of Eq. (17).The probability that the tree is terminated at or before generation n is equal to the probability ofthe tree having zero nodes in generation n , which is e F n (0). The probability that the tree terminatesprecisely at generation n (i.e., that there are a nonzero number of particles in generation n buteach of these has zero oﬀspring) is thereforeΩ n = e F n (0) − e F n − (0)= e f ( F n − (0)) − e f ( F n − (0)) , (25)where F n (0) is calculated by iteration from Eq. (21) and the initial condition F (0) = 0. We callΩ n the lifetime distribution of trees , as it gives the probability that the observed lifetime of a treeis n generations. See Figure 6 for a comparison of the empirical lifetime distribution with thepredictions of Eq. (25), using the oﬀspring distribution ﬁtted in Eq. (17). A similar approach can be applied to calculate the distribution of tree (cascade) sizes, i.e., thetotal number of particles that are in all generations of the tree, from the seed at generation 0 upto the last generation of the tree (this quantity is sometimes called the total progeny of the tree).We deﬁne the random variable e X n to be the size of the tree observed n generations after its seedparticle is born. As before, e X n can be decomposed into the sum of contributions from each of thesubtrees born in generation 1. Conditioning on the seed node having k children particles, we write e X n = 1 + k X i =1 X ( i ) n − , (26)where X ( i ) n − represents the i th i.i.d. subtree size as observed after n − E (cid:16) x e X n (cid:17) = E (cid:16) x P ki =1 X ( i ) n − (cid:17) = x (cid:2) E (cid:0) x X n − (cid:1)(cid:3) k (27)14nd the pgf e G n ( x ) = E (cid:16) x e X n (cid:17) = P ∞ j =0 Prob (cid:16) e X n = j (cid:17) x j is then given by e G n ( x ) = ∞ X k =0 e q k E (cid:16) x e X n (cid:12)(cid:12)(cid:12) k particles in generation 1 (cid:17) = ∞ X k =0 e q k x (cid:2) E (cid:0) x X n − (cid:1)(cid:3) k = x e f ( G n − ( x )) , (28)where G n − ( x ) = E (cid:0) x X n − (cid:1) is the pgf for the size of a subtree after n − k children inthe ﬁrst generation of the subtree: X m = 1 + k X i =1 X ( i ) m − , (29)and then proceeding as in Equations (24) and (28) to obtain the recursion relation G m ( x ) = xf ( G m − ( x )) , (30)with initial condition G ( x ) = x .By iterating Eq. (30) for m = 1 , , . . . , n − G n − ( x ) into Eq. (28), weobtain the desired pgf e G n ( x ) describing the distribution of cascade sizes after n generations. Inorder to invert the pgf to obtain the distribution of cascade sizes, we iterate Eqs. (30) and (28) fora set of x values that are uniformly spaced around the unit circle in the complex x -plane, and usea fast Fourier transform to approximate the Cauchy integralProb (cid:16) e X n = j (cid:17) = 1 j ! d j e G n dx j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x =0 = 12 πi I C e G n ( x ) x − ( j +1) dx, (31)as in section S2 of [11].Figure 7 shows the large- n limit of the cascade size distribution, and compares it with theempirical distribution. The good agreement between this theoretical prediction and the empiricalresults gives further support to the usage of branching process descriptions for such data. In this subsection, we build on the approach used in Sec. 4.1 to derive results for measures of theshape of cascade trees, which are of considerable interest in analyses of Twitter [13, 18]. We focuson the distribution (and expected value) of two quantities [13]: the average depth of a tree, andthe structural virality of a tree.

To calculate the average depth of a sample tree, we ﬁrst sum the depths (generation numbers) ofall particles in the tree to obtain the cumulative depth of the tree, and then divide this by the sizeof the tree (the total number of particles in the tree), see Fig. 8. In this subsection we generalizethe methods used in Sec. 4.2 to calculate the joint distribution of tree size and cumulative depth,and hence to ﬁnd a formula for the expected average tree depth (EATD). In the ensemble of trees15 −5 cascade size p r obab ili t y Marref(a) −5 cascade size p r obab ili t y URL(b) −5 cascade size cc d f Marref(c) −5 cascade size cc d f URL(d)

Figure 7: Cascade size distributions: pdfs (top panels) and ccdfs (bottom panels) for Marref (left)and URL (right). Blue symbols are empirical values; red line shows the theoretical distributionfrom Sec. 4.2, using the oﬀspring distribution of Eq. (17).Figure 8: This is a tree of size 5. Each of the 5 particles is labelled by its depth (its generationplus 1). The cumulative depth of the tree is 0+1+1+2+2=6, and so the average depth of this treeis 6 /

5. Note that the cumulative depth of the top subtree is 0+1+1=2, also that each node in thesubtree has a depth that is one larger than its value when considered as part of the main tree: SeeEq. (33). 16enerated by the branching process, each tree has its own average depth, and the EATD is the meanof the average depths over all trees in the ensemble. We believe the formula we derive (Eq. (44)) isnovel.We extend the approach of Sec. 4.2 to consider the joint distribution of e X n (the tree size after n generations) and of e Y n , which is the random variable giving the cumulative depth of the tree after n generations. We deﬁne the two-variable pgf e H n ( x, y ) as e H n ( x, y ) = E (cid:16) x e X n y e Y n (cid:17) = ∞ X j,ℓ =0 Prob (cid:16) e X n = j and e Y n = ℓ (cid:17) x j y ℓ . (32)As in earlier sections, we relate the variables e X n and e Y n to subtree quantities, and begin by assumingthat the seed node (in Fig. 4 for example) has k children. Each of the k children generates a subtreewith (after n − (cid:16) e X ( i ) n − , e Y ( i ) n − (cid:17) for i = 1 , , . . . , k .The relationship between e X n and X ( i ) n − is given by Eq. (26) but we must now also ﬁnd ananalogous expression for e Y n . We deﬁne Y ( i ) n − to be the cumulative depth of the i th i.i.d. subtree.Notice (see Fig. 8) that when we add the Y ( i ) n − values for all the subtrees, each node of the subtreehas a depth that is one less that its depth in the main tree. Therefore, the i th subtree contributesto e Y n a total of Y ( i ) n − + X ( i ) n − , where the second term adds one for each node in the subtree. Fromthis relationship, we obtain e Y n = k X i =1 (cid:16) Y ( i ) n − + X ( i ) n − (cid:17) , (33)and with Eq. (26) we ﬁnd the pgf relations e H n ( x, y ) = ∞ X k =0 e q k E (cid:16) x e X n y e Y n (cid:12)(cid:12)(cid:12) k particles in generation 1 (cid:17) = ∞ X k =0 e q k E (cid:18) x P ki =1 X ( i ) n − y P ki =1 (cid:16) Y ( i ) n − + X ( i ) n − (cid:17) (cid:19) = x ∞ X k =0 e q k E (cid:16) ( xy ) P ki =1 X ( i ) n − y P ki =1 Y ( i ) n − (cid:17) = x ∞ X k =0 e q k E (cid:0) ( xy ) X n − y Y n − (cid:1) = x e f ( H n − ( xy, y )) , (34)where H n − ( x, y ) = E (cid:0) x X n − y Y n − (cid:1) .Addressing the recursion relation for the subtrees in a similar fashion leads (as in Sec. 4.2) to H m ( x, y ) = xf ( H m − ( xy, y )) , (35)with initial condition H ( x, y ) = x (since a single particle is a tree of size 1, with zero depth).Iterating Eq. (35) for m = 1 , , . . . , n − e H n ( x, y ) forthe joint distribution of trees size and cumulative depth after n generations.17e can use this joint distribution to calculate the EATD for trees of n generations as d n = ∞ X j =0 ∞ X ℓ =0 Prob (cid:16) e X n = j and e Y n = ℓ (cid:17) ℓj = Z x ∂ e H n ∂y (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y =1 dx, (36)as can be veriﬁed by term-by-term diﬀerentiation and integration of the series in Eq. (32). Takingthe n → ∞ limit in order to include all trees, Eq. (35) give a self-consistent equation for H ∞ ( x, y )and it can be diﬀerentiated with respect to x to yield ∂H∂x (cid:12)(cid:12)(cid:12)(cid:12) y =1 = f ( H ( x, − xf ′ ( H ( x, , (37)where, for simplicity, we drop the subscript from H ∞ for the remainder of this section. Similarly,diﬀerentiation of Eq. (35) with respect to y gives ∂H∂y (cid:12)(cid:12)(cid:12)(cid:12) y =1 = xf ′ ( H ( x, " x ∂H∂x (cid:12)(cid:12)(cid:12)(cid:12) y =1 + ∂H∂y (cid:12)(cid:12)(cid:12)(cid:12) y =1 , (38)which can be solved for ∂H∂y (cid:12)(cid:12)(cid:12) y =1 , after substituting for ∂H∂x (cid:12)(cid:12) y =1 from Eq. (37): ∂H∂y (cid:12)(cid:12)(cid:12)(cid:12) y =1 = x f ′ ( H ( x, f ( H ( x, − xf ′ ( H ( x, . (39)Diﬀerentiating the n → ∞ limit of Eq. (34) with respect to y yields ∂ e H∂y (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y =1 = x e f ′ ( H ( x, " x ∂H∂x (cid:12)(cid:12)(cid:12)(cid:12) y =1 + ∂H∂y (cid:12)(cid:12)(cid:12)(cid:12) y =1 , (40)and substituting from Eqs. (37) and (39) gives ∂ e H∂y (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y =1 = x e f ′ ( H ( x, f ( H ( x, − xf ′ ( H ( x, . (41)Thus, the expected average tree depth over all trees is given by Eq. (36) as d = Z x e f ′ ( H ( x, f ( H ( x, − xf ′ ( H ( x, dx. (42)Noting that Eq. (35) relates H ( x,

1) to x through the implicit relation H ( x,

1) = xf ( H ( x, , (43)we make the change of integration variable x h deﬁned implicitly by x = h/f ( h ) (with dx =( f ( h ) − hf ′ ( h )) /f ( h ) dh ), to yield a simple integral formula for the EATD: d = Z h e f ′ ( h ) f ( h ) − hf ′ ( h ) dh. (44)This remarkably simple formula is easily evaluated once the oﬀspring distributions f and e f of thebranching process are given. In Table 2 we show that it agrees with Monte Carlo simulations andalso gives quite a reasonably accurate estimate of the values found from the empirical data.18 −1 −5 average tree depth cc d f Marref(a) −1 −5 average tree depth cc d f URL(b) −1 −5 structural virality cc d f Marref(c) −1 −5 structural virality cc d f URL(d)

Figure 9: Ccdfs of average tree depth (top panels) and structural virality (bottom panels) forMarref (left) and URL (right). Blue symbols are empirical distributions; red symbols are fromMonte Carlo simulations of branching processes with oﬀspring distribution q ℓ given by Eq. (17),and by the corresponding distribution e q ℓ for the seed’s oﬀspring.Marref URLintegral 0.862 1.22Monte Carlo 0.862 (0.859, 0.866) 1.22 (1.21, 1.23)data 0.899 (0.887, 0.912) 1.34 (1.32, 1.36)Table 2: Expected average tree depth (EATD) from the integral formula of Eq. (44), comparedwith Monte Carlo simulations (10 realizations) and the data values. Bootstrap intervals givenfor the latter two cases show quantile 0.025 to quantile 0.975 (i.e., 95% of cases) for the expectedaverage tree depth, using 10 bootstrap samples.19arref URLintegral 1.44 1.81Monte Carlo 1.440 (1.436, 1.443) 1.82 (1.81, 1.83)data 1.47 (1.46, 1.49) 1.77 (1.75, 1.78)Table 3: Expected structural virality from the integral formula of Eq. (51), compared with MonteCarlo simulations (10 realizations) and the data values. Bootstrap intervals given for the lattertwo cases show quantile 0.025 to quantile 0.975 (i.e., 95% of cases) for the expected structuralvirality, using 10 bootstrap samples. The structural virality of a tree with size n > n ( n − n X i =1 n X j =1 d ij , (45)where d ij is the graph distance from node i to node j . The distribution of this metric across anensemble of trees was used to ﬁt models to data in [13].As noted in [13], the structural virality of a tree is closely related to its Wiener index , deﬁnedby P ni =1 P nj =1 d ij . If we consider the expected value of the Wiener index across the ensembleof trees generated by the branching process, we can usefully adapt the approach of Entringer etal. [35], with the aim of calculating the expected structural virality for the exnsemble. Entringeret al. deﬁne a generating function W ( x ) so that the coeﬃcient of x n in the power series is thecontribution of trees of size n to the ensemble-averaged Wiener index (note that W ( x ) is not aprobability generating function). Their Eq. (3.5) is W ( x ) = D ( x ) + xf ′ ( G ) W ( x ) + xf ′′ ( G ) (cid:2) D ( x ) + xG ′ (cid:3) xG ′ , (46)where D ( x ) is our ∂H∂y (cid:12)(cid:12)(cid:12) y =1 from Eq. (39) and G ( x ) = H ( x,

1) is the m → ∞ cascade size pgf fromEq. (30). The ﬁrst term in Eq. (46) comes from considering pairs of vertices u and v where one of u or v is the root of the tree. The second term arises from the case where u and v belong to thesame subtree, and the third term stems from u and v belonging to diﬀerent subtrees, see Sec. 3of [35] for details.We extend the approach of Eq. (46) to the case where the seed node of the tree has oﬀspringdistribution with pgf e f , to get an analogous equation for f W ( x ): f W ( x ) = e D ( x ) + x e f ′ ( G ) W ( x ) + x e f ′′ ( G ) (cid:2) D ( x ) + xG ′ (cid:3) xG ′ , (47)where e D ( x ) is ∂ e H∂y (cid:12)(cid:12)(cid:12) y =1 as given by Eq. (41).Solving Eq. (46) for W ( x ) and substituting into Eq. (47) enables us to determine f W ( x ). Theexpected structural virality for the ensemble of trees is then given by s = ∞ X n =2 f W n n ( n − , (48)20here f W n is the coeﬃcient of x n in the power series of f W ( x ). The value of s can be calculatedfrom the generating function f W ( x ) by a double integration: s = Z Z y f W ( x ) x dx ! dy = Z f W ( x ) x (1 − x ) dx, (49)where the second equation follows from changing the order of integration, i.e., using the identity Z Z y ( · ) dx dy = Z Z x ( · ) dy dx. (50)Combining these results and then making the same change of variable as for Eq. (44) yields anintegral formula for the expected structural virality: s = Z f ( f − h ) (cid:16) f ˜ f ′ + hf ˜ f ′′ − hf ′ ˜ f ′ + h (cid:16) f ′′ ˜ f ′ − ˜ f ′′ f ′ (cid:17)(cid:17) ( f − hf ′ ) dh (51)Table 3 shows that this formula agrees with Monte Carlo simulations of the branching process,and also matches reasonably well to the average structural virality of the ensemble of empiricaltrees in both datasets. The integral formulas derived for the expected average tree depth (Eq. (44)) and the expectedstructural virality (Eq. (51)) enable us to analytically study the impact of the spreading processupon these measures. Such understanding can assist in the ﬁtting of information-spreading modelsto empirical data. In Figure 2 of Ref. [13], for example, large-scale numerical simulations are usedto calculate the dependence of the expected structural virality on the branching number, and thisinformation is then used to guide model parameter ﬁtting.We are therefore motivated to examine how the integrals in Eqs. (44) and (51) depend upon theform of the oﬀspring distribution (through its pgf f ( x )) and in particular on the branching number ξ = f ′ (1). For simplicity we will restrict ourselves in this section to the case where e f ( x ) = f ( x ),i.e., assuming that the seed node’s oﬀspring distribution is the same as that of the later generations.First we note that both integrals may be performed exactly in the special case of a binary ﬁssionprocess [1], where each parent has either zero or two children: f bf ( x ) = (cid:18) − ξ (cid:19) + ξ x . (52)The exact integrals for EATD and expected structural virality in this case are d bf = 2 s − ξξ ArcTanh s ξ − ξ − s bf = 1 − ξ − ξ ) log (cid:18) − ξ − ξ (cid:19) − s − ξξ ArcTanh s ξ − ξ (53)21 ξ EA T D (a) EATD ξ SV (b) Structural virality Figure 10: Results of the integral formulas in Eqs. (44) and (51) for Expected Average Tree Depth(left) and Structural Virality (right) for trees with binary ﬁssion oﬀspring distribution (dashedcurves) and with power-law (tail exponent γ = 2 .

5) oﬀspring distribution (solid curves).and each shows a logarithmic divergence as the branching number ξ approaches the critical valueof 1 from below (see dashed curves in Fig. 10).In fact, this logarithmic divergence as ξ → f ′′ (1), meaning that the second moment of theoﬀspring distribution is ﬁnite. The integrands in Eqs. (44) and (51) are singular at h = 1, and theform of the singularity can be understood using the expansion of f ( h ) about h = 1: f ( h ) ∼ f (1) + f ′ (1)( h −

1) + 12 f ′′ (1)( h − + . . . = 1 − ξ (1 − h ) + 12 f ′′ (1)(1 − h ) + . . . as h → − . (54)The integrand of Eq. (44), for example, has leading-order expansion ∼ ξ − ξ + f ′′ (1)(1 − h ) as h → − (55)and so the integral diverges logarithmically as ξ →

1; the same asymptotic behaviour is found forthe integrand in Eq. (51). Hence the behaviour of the dashed curves in Fig. 10 is quite generic foroﬀspring distributions with ﬁnite second moments.Oﬀspring distributions with inﬁnite second moments are also of interest, as they relate to heavy-tailed follower distributions in the Twitter network [12, 36]. An important example is the case of apower-law tail, i.e., q ℓ ∼ D ℓ − γ as ℓ → ∞ , (56)for constant D and for values of the exponent between 2 and 3. The asymptotic series for f ( h ) as h → − is given in this case by [11, 37] f ( h ) ∼ − ξ (1 − h ) + D Γ(1 − γ )(1 − h ) γ − as h → − , (57)where Γ( · ) is the Gamma function. Using this asymptotic series, the integrands in both Eqs. (44)and (51) have the leading order behaviour ∼ (1 − h ) − γ as h → − at the critical value of ξ = 1.22ince this singularity is integrable, the resulting values of d and s are both ﬁnite at ξ = 1, incontrast to the divergence seen in the case where f ′′ (1) is ﬁnite. The example of the solid curvesin Fig. 10 is for the oﬀspring distribution where q ℓ ∝ ℓ − γ for all ℓ ≥ q = 1 − P ℓ> q ℓ ), withpower-law exponent γ = 2 .

5. The ﬁnite limits of d and s as the branching number approaches 1are evident. Multiplicative stochastic processes have been used in a number of papers to model popularitygrowth [20, 21]. In our notation, the assumption of the multiplicative model is that the totalnumber of tweets by generation n (i.e., the tree size e X n ) can be considered as proportional to thenumber of tweets that occurred in all previous generations ( e X n − ), multiplied by a random factor W n that is modulated by a novelty decay factor r n : e X n = (1 + r n W n ) e X n − . (58)Here, the random variables W , W , . . . are assumed to be positive, independent, and identicallydistributed for each tree, while r n is a deterministic novelty decay factor that is common to alltrees.The novelty decay factor for this model can be obtained from Eq. (58) by rewriting it as r n W n = e X n − e X n − e X n − (59)and taking expectations (i.e., averaging over all trees). The deterministic novelty decay factor r n is then proportional to the expectation of the right hand side of Eq. (59). Using the fact that thenumber of particles e Z n in generation n can be related to the tree size by e Z n = e X n − e X n − , (60)we therefore consider the calculation of the quantity e r n = E e Z n e X n − e Z n ! , (61)which, up to a multiplicative constant, is the novelty decay factor in such models. (The multiplica-tive constant is often set, as in [20] for example, by normalizing the value of r ).Similar to Eq. (32), we consider here the joint distribution, at generation n , of tree size e X n andnumber of particles e Z n , deﬁning the two-variable pgf e K n ( x, z ) as e K n ( x, z ) = E (cid:16) x e X n z e Z n (cid:17) = ∞ X j,ℓ =0 Prob (cid:16) e X n = j and e Z n = ℓ (cid:17) x j z ℓ . (62)The iteration equation for e K n is, similar to Eqs. (28) and (21), e K n ( x, z ) = x e f ( K n − ( x, z )) , (63)where K n ( x, z ) satisﬁes K n ( x, z ) = xf ( K n − ( x, z )) , (64)23nd K ( x, z ) = xz .We observe that if we modify the second argument of e K as follows e K n (cid:16) x, zx (cid:17) = ∞ X j,ℓ =0 Prob (cid:16) e X n = j and e Z n = ℓ (cid:17) x j (cid:16) zx (cid:17) ℓ (65)= ∞ X j,ℓ =0 Prob (cid:16) e X n = j and e Z n = ℓ (cid:17) x j − ℓ z ℓ , (66)then we can write, analogous to Eq. (36), e r n = ∞ X j,ℓ =0 Prob (cid:16) e X n = j and e Z n = ℓ (cid:17) ℓj − ℓ (67)= Z x ∂ e J n ∂z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z =1 dx, (68)where e J n ( x, z ) = e K n (cid:0) x, zx (cid:1) . The iteration equation for e J n ( x, z ) is obtained from Eq. (63) as e J n ( x, z )) = x e f ( J n − ( x, z )) , (69)where J n ( x, z ) = K n (cid:16) x, zx (cid:17) = xf ( J n − ( x, z )) , (70)and J ( x, z ) = z .To evaluate the integral in Eq. (68), it is convenient to deﬁne the single-argument function e L n ( x ) = 1 x ∂ e J n ∂z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z =1 = 1 x ∂ e K n (cid:0) x, zx (cid:1) ∂z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) z =1 . (71)Then we obtain from Eq. (69) that e L n can be expressed as e L n ( x ) = x e f ′ ( J n − ( x, L n − ( x ) , (72)where L n ( x ), deﬁned as L n ( x ) = 1 x ∂J n ∂z (cid:12)(cid:12)(cid:12)(cid:12) z =1 , (73)obeys the iteration equation L n ( x ) = xf ′ ( J n − ( x, L n − ( x ) , (74)with L ( x ) = 1 /x . Iterating Eqs. (72), (74) and (70) for values of x that partition the interval [0 , e r n = Z e L n ( x ) dx (75)using the trapezoidal rule.Thus, we have shown how a subcritical branching process model can give rise to an apparentnovelty decay factor, even though the oﬀspring distribution does not change from generation togeneration. The “apparent” nature of the decay in the novelty factor does not reﬂect any changein the likelihood of retweeting by a user who receives the tweet; rather it is the mechanism neededin the multiplicative process model of Eq. (58) to deal with the ﬁnite lifetimes of cascades. At eachgeneration of the branching process fewer trees remain alive, and so the growth rate of the totalnumber of tweets must decline with n , and in the multiplicative process this is mediated by thedecay of the novelty factor r n . 24 −6 −4 −2 n r n Marref −6 −4 −2 n r n URL

Figure 11: Novelty function e r n in Marref (left) and URL (right) datasets. Blue symbols are empiricalvalues using Eq. (61); red lines are the predictions of the theoretical result (75), using the oﬀspringdistribution of Eq. (17). In Section 2 we demonstrated that two datasets from Twitter can be approximately described bybranching processes, at least when we examine the discrete generation-by-generation structure.An examination of the details of a continuous-time branching process that could produce thesestructures is left for further work. In Section 3 we argued that the observed oﬀspring distributionswere better ﬁtted by a model based on the assumption that Twitter users have limited attention—sothose who follow many others are less likely to notice and retweet any single message they receive—than by the more usual independent cascade model, with its assumption of equal transmissionprobability for each infection attempt.Taking the ﬁtted oﬀspring distributions as inputs, in Section 4 we derived analytical and semi-analytical results using branching process theory. We began with well-established results on thedistribution of cascade lifetimes and of cascade sizes, and then extended the arguments used toderive novel results for other measures. We derived integral formulas for the expected average treedepth (equation (44)) and for the expected structural virality (equation (51)) and showed thatthese provide a good match to the data. The integral formulas are also amenable to asymptoticanalysis to understand the behaviour of the metrics as the branching number approaches the criticalvalue. These results should assist in the ﬁtting of transmission models to large-scale datasets, aswas done (albeit using billions of numerical simulations rather than analytical methods) in Goel etal. [13]. Finally, we derived a formula that enables the calculation of the apparent novelty factor, aswould be used in a multiplicative stochastic model for the cascades under study. In the branchingprocess model, information does not decrease in its transmission likelihood over generations, butthe fact that the processes are subcritical means that the number of users who receive a cascadingtweet decreases over time (Figure 3). In a multiplicative model, the stochastic lifetimes of cascadetrees must be imposed through the assumption of novelty decay, and our results in Sec. 4.4 showhow the two modelling approaches can be directly compared. We believe that the insights of thebranching process approach will help inform applications of the multiplicative model, while theformula linking the oﬀspring distribution to the apparent novelty decay (equation (75)) will allowthe application of branching process theory to datasets that previously were studied only via themultiplicative model. 25ur study has, of course, several limitations. The nature of cascades on Twitter is that theyare rather short-lived, so our observation of a stable oﬀspring distribution might not generalize tocascades on other social media where the attention given to topics is longer-lived, and hence wherenovelty decay might be more likely. We have implicitly assumed that all cascade topics are equallyattractive to the Twitter users and so the identiﬁcation of cascade-speciﬁc “ﬁtnesses” [38] has notbeen addressed here. As noted above, a study based on continuous-time branching processes couldpotentially extend our results to include age-dependent eﬀects [7], but we expect that the resultspresented here would remain valid in the long-time limit where all cascades have reached their ﬁnalstate. In conclusion, we hope that the results and the methodology presented here will prove usefulto researchers investigating those aspects of human behaviour that are mediated by online socialnetworks.

Acknowledgements

This work is partly supported by Science Foundation Ireland (grant numbers 16/IA/4470, 16/RC/3918,12/RC/2289 P2 and 18/CRT/6049) with co-funding from the European Regional DevelopmentFund (J.G.), by the James S. McDonnell Foundation (P.F.) and by JSPS KAKENHI (grant num-ber JP19K14618) (T.O.). We acknowledge the work of the authors of [18, 19, 25] in gathering theinitial datasets and making them available for study.

References [1] Krishna B Athreya and Peter E Ney.

Branching Processes . Springer Science & Business Media,2012.[2] Theodore Edward Harris.

The theory of branching process . Rand Corporation, 1964.[3] Pablo Arag´on, Vicen¸c G´omez, David Garc´ıa, and Andreas Kaltenbrunner. Generative modelsof online discussion threads: state of the art and research challenges.

Journal of InternetServices and Applications , 8(1):15, 2017.[4] Ryosuke Nishi, Taro Takaguchi, Keigo Oka, Takanori Maehara, Masashi Toyoda, Ken-ichiKawarabayashi, and Naoki Masuda. Reply trees in Twitter: data analysis and branchingprocess models.

Social Network Analysis and Mining , 6(1):26, 2016.[5] Alexey N Medvedev, Jean-Charles Delvenne, and Renaud Lambiotte. Modelling structure andpredicting dynamics of discussion threads in online boards.

Journal of Complex Networks ,2018.[6] Jos´e Luis Iribarren and Esteban Moro. Impact of human activity patterns on the dynamics ofinformation diﬀusion.

Physical Review Letters , 103(3):038702, 2009.[7] Jos´e Luis Iribarren and Esteban Moro. Branching dynamics of viral information spreading.

Physical Review E , 84(4):046116, 2011.[8] Benjamin Golub and Matthew O Jackson. Using selection bias to explain the observed struc-ture of internet diﬀusions.

Proceedings of the National Academy of Sciences , 107(24):10833–10836, 2010. 269] David Liben-Nowell and Jon Kleinberg. Tracing information ﬂow on a global scale usinginternet chain-letter data.

Proceedings of the national academy of sciences , 105(12):4633–4638,2008.[10] Qingyuan Zhao, Murat A Erdogdu, Hera Y He, Anand Rajaraman, and Jure Leskovec. Seismic:A self-exciting point process model for predicting tweet popularity. In

Proceedings of the 21thACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages1513–1522, 2015.[11] James P Gleeson, Jonathan A Ward, Kevin P O’Sullivan, and William T Lee. Competition-induced criticality in a model of meme popularity.

Physical Review Letters , 112(4):048701,2014.[12] James P Gleeson, Kevin P O’Sullivan, Raquel A Ba˜nos, and Yamir Moreno. Eﬀects of networkstructure, competition and memory time on social spreading phenomena.

Physical Review X ,6(2):021019, 2016.[13] Sharad Goel, Ashton Anderson, Jake Hofman, and Duncan J Watts. The structural viralityof online diﬀusion.

Management Science , 62(1):180–196, 2015.[14] David Kempe, Jon Kleinberg, and ´Eva Tardos. Maximizing the spread of inﬂuence througha social network. In

Proceedings of the ninth ACM SIGKDD international conference onKnowledge discovery and data mining , pages 137–146. ACM, 2003.[15] Daniel B Larremore, Marshall Y Carpenter, Edward Ott, and Juan G Restrepo. Statisticalproperties of avalanches in networks.

Physical Review E , 85(6):066131, 2012.[16] Sameet Sreenivasan, Kevin S Chan, Ananthram Swami, Gyorgy Korniss, and Boleslaw KarolSzymanski. Information cascades in feed-based networks of users with limited attention.

IEEETransactions on Network Science and Engineering , 4(2):120–128, 2016.[17] Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance, and Matthew Hurst. Pat-terns of cascading behavior in large blog graphs. In

Proceedings of the 2007 SIAM internationalconference on data mining , pages 551–556. SIAM, 2007.[18] David JP OSullivan, Guillermo Gardu˜no-Hern´andez, James P Gleeson, and MarianoBeguerisse-D´ıaz. Integrating sentiment and social structure to determine preference align-ments: the Irish Marriage Referendum.

Royal Society Open Science , 4(7):170154, 2017.[19] Nathan O Hodas and Kristina Lerman. The simple rules of social contagion.

Scientiﬁc Reports ,4, 2014.[20] Fang Wu and Bernardo A Huberman. Novelty and collective attention.

Proceedings of theNational Academy of Sciences , 104(45):17599–17601, 2007.[21] Taha Yasseri, Scott A Hale, and Helen Z Margetts. Rapid rise and decay in petition signing.

EPJ Data Science , 6(1):20, 2017.[22] James P Gleeson, Davide Cellai, Jukka-Pekka Onnela, Mason A Porter, and Felix Reed-Tsochas. A simple generative model of collective online behavior.

Proceedings of the NationalAcademy of Sciences , 111(29):10411–10415, 2014.2723] Ian Dobson. Estimating the propagation and extent of cascading line outages from utility datawith a branching process.

IEEE Transactions on Power Systems , 27(4):2146–2155, 2012.[24] Kristina Lerman. Information is not a virus, and other consequences of human cognitive limits.

Future Internet , 8(2):21, 2016.[25] Kristina Lerman, Rumi Ghosh, and Tawan Surachawala. Social contagion: An empirical studyof information spread on Digg and Twitter follower graphs. arXiv:1202.3162 , 2012.[26] David JP O’Sullivan.

Dynamics of behaviour and information diﬀusion on complex networks:analytical and empirical perspectives . PhD thesis, University of Limerick, 2017.[27] Lerman group Twitter 2010 dataset. .[28] Tree structures data. https://github.com/DavidJPOS/Branching-process-descriptions-of-information-cascades-on-Twitter .[29] Mark Newman.

Networks: an introduction . Oxford University Press, 2010.[30] James P Gleeson and Rick Durrett. Temporal proﬁles of avalanches on networks.

NatureCommunications , 8(1):1227, 2017.[31] Sijuan Ma, Ling Feng, and Choy-Heng Lai. Mechanistic modelling of viral spreading onempirical social network and popularity prediction.

Scientiﬁc Reports , 8(1):13126, 2018.[32] Lillian Weng, Alessandro Flammini, Alessandro Vespignani, and Fillipo Menczer. Competitionamong memes in a world with limited attention.

Scientiﬁc Reports , 2:335, 2012.[33] Jiacheng Wu, Forrest W Crawford, David A Kim, Derek Staﬀord, and Nicholas A Christakis.Exposure, hazard, and survival analysis of diﬀusion on social networks.

Statistics in Medicine ,37(17):2561–2585, 2018.[34] Marek Kimmel and David E. Axelrod.

Branching Processes in Biology . Springer, New York,2002.[35] Roger C Entringer, Amram Meir, John W Moon, and L´aszl´o A Sz´ekely. On the Wiener indexof trees from certain families.

Australasian J. Combinatorics , 10:211–224, 1994.[36] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a socialnetwork or a news media? In

Proceedings of the 19th International Conference on WorldWide Web , pages 591–600. ACM, 2010.[37] Herbert S Wilf. generatingfunctionology . Elsevier, 2013.[38] Soon-Hyung Yook and Yup Kim. Origin of the log-normal popularity distribution of trendingmemes in social networks.