Scaling laws of human interaction activity
Diego Rybski, Sergey V. Buldyrev, Shlomo Havlin, Fredrik Liljeros, Hernan A. Makse
aa r X i v : . [ phy s i c s . s o c - ph ] S e p Scaling laws of human interaction activity
Diego Rybski , Sergey V. Buldyrev , Shlomo Havlin ,Fredrik Liljeros , and Hern´an A. Makse Levich Institute and Physics Department,City College of New York, New York, NY 10031, USA Department of Physics, Yeshiva University, New York, NY 10033, USA Minerva Center and Department of Physics,Bar-Ilan University, Ramat-Gan 52900, Israel Department of Sociology, Stockholm University, S-10691 Stockholm, Sweden (Dated: June 14, 2018)
Abstract
Even though people in our contemporary, technological society are depending on communica-tion, our understanding of the underlying laws of human communicational behavior continues to bepoorly understood. Here we investigate the communication patterns in two social Internet commu-nities in search of statistical laws in human interaction activity. This research reveals that humancommunication networks dynamically follow scaling laws that may also explain the observed trendsin economic growth. Specifically, we identify a generalized version of Gibrat’s law of social activityexpressed as a scaling law between the fluctuations in the number of messages sent by members andtheir level of activity. Gibrat’s law has been essential in understanding economic growth patterns,yet without an underlying general principle for its origin. We attribute this scaling law to long-termcorrelation patterns in human activity, which surprisingly span from days to the entire period of theavailable data of more than one year. Further, we provide a mathematical framework that relatesthe generalized version of Gibrat’s law to the long-term correlated dynamics, which suggests thatthe same underlying mechanism could be the source of Gibrat’s law in economics, ranging fromlarge firms, research and development expenditures, gross domestic product of countries, to citypopulation growth. These findings are also of importance for designing communication networksand for the understanding of the dynamics of social systems in which communication plays a role,such as economic markets and political systems. . INTRODUCTION The question of whether unforeseen outcomes of social activity follow emergent statisticallaws has been an acknowledged problem in the social sciences since at least the last decade ofthe 19th century [1, 2, 3, 4]. Earlier discoveries include Pareto’s law for income distributions[5], Zipf’s law initially applied to word frequency in texts and later extended to firms, citiesand others [6], and Gibrat’s law of proportionate growth in economics [7, 8, 9].Social networks are permanently evolving and Internet communities are growing each daymore. Having access to the communication patterns of Internet users opens the possibilityto unveil the origins of statistical laws that lead us to the better understanding of humanbehavior as a whole. In this paper, we analyze the dynamics of sending messages in twoInternet communities in search of statistical laws of human communication activity. Thefirst online community (OC1) is mainly used by the group of men who have sex with men(MSM) [38]. The data consists of over 80 ,
000 members and more than 12 . [38] The study of the de-identified MSM dating site network data was approved by the Regional Ethical Reviewboard in Stockholm, record 2005/5:3. II. RESULTSGrowth in the number of messages
The cumulative number, m j ( t ), expresses how many messages have been sent by a certainmember j up to a given time t [for a better readability we will not write the index j explicitly, m ( t ), see details on the notation in the Supporting Information (SI) Sec. I]. The dynamicsof m ( t ) between times t and t within the period of data acquisition T ( t < t ≤ T ) can beconsidered as a growth process, where each member exhibits a specific growth rate r j ( r forshort notation): r ≡ ln m m , (1)where m ≡ m ( t ) and m ≡ m ( t ) are the number of messages sent until t and t ,respectively, by every member. To characterize the dynamics of the activity, we considertwo measures. (i) The conditional average growth rate, h r ( m ) i , quantifies the averagegrowth of the number of messages sent by the members between t and t depending on theinitial number of messages, m . In other words, we consider the average growth rate of onlythose members that have sent m messages until t (see Methods, Sec. IV for more details). (ii) The conditional standard deviation of the growth rate for those members that havesent m messages until t , σ ( m ) ≡ p h ( r ( m ) − h r ( m ) i ) i , expresses the statistical spreador fluctuation of growth among the members depending on m . Both quantities are relevantin the context of Gibrat’s law in economics [7, 8, 9] which proposes a proportionate growthprocess entailing the assumption that the average and the standard deviation of the growthrate of a given economic indicator are constant and independent of the specific indicatorvalue. That is, both h r ( m ) i and σ ( m ) are independent of m [9]In Fig. 2a,b we show the results of h r ( m ) i and σ ( m ) versus m for both online commu-nities. We find that the conditional average growth rate is fairly independent of m . On theother hand, the standard deviation decreases as a power-law of the form: σ ( m ) ∼ m − β . (2)3e obtain by least square fitting the exponents β OC1 = 0 . ± .
01 for OC1 and β OC2 =0 . ± .
03 for OC2 (the values deviate slightly for large m due to low statistics). Althoughthe web-sites are used by different member populations, the power-law and the obtainedexponents are quite similar. The exponents are also close to those reported for growth ineconomic systems such as firms and countries (0 . − .
18, [16]), research and developmentexpenditures at universities (0 .
25, [17]), scientific output (0 . − .
4, [18]), and city popula-tion growth (0 . − .
27, [19]). The approximate agreement between the exponents obtainedfor very different systems (social or of human origin) can be considered as a generalizationof Gibrat’s law, suggesting that the mechanisms behind the growth properties in differentsystems may originate in the human activity represented by Eq. (2).Figures 2c and d depict the results when we randomize the data of OC1 and OC2, re-spectively (see Sec. IV for details of the randomization procedure), such that any temporalcorrelations are removed. The typical dynamics for such surrogate data set are shown inFig. 1b (the brown curve) displaying a clear random pattern of small fluctuations in com-parison with the original data of larger fluctuations (green curve). We find that the randomsignal displays a close to constant average growth rate h r ( m ) i and that the fluctuations be-have as in Eq. (2) but with an exponent β rnd = 1 / β rnd = 1 /
2. Incontrast to randomness, here we hypothesize that the origin of the generalized version ofGibrat’s law with β < / β ≈ . ong-term correlations The exceptional quality of the data (more than 10 million messages spanning severaleffective decades of magnitude in terms of both activity and time) allows to test the abovehypothesis by investigating the presence of temporal correlations in the individuals’ activity.We aggregate the data to records of messages per day (an example is shown in Fig. 1c) toavoid the daily cycle in the activity and analyze the number of messages sent by individualsper day, µ ( t ), where t denotes the day [ m ( t ) ≡ P tt ′ =1 µ ( t ′ ), Figs. 1d-f show the color codeddaily activity of three members in OC1]. For every member we obtain a record of a lengthof 63 days (OC1) or 492 days (OC2). We note that former studies reporting Eq. (2) such as[16, 17, 18, 19] typically were not based on data with temporal resolution as we use it here,and therefore were not able to investigate its origin in terms of temporal correlations.We quantify the temporal correlations in the members’ activity by mapping the problemto a one-dimensional random walk. The quantity Y ( t ) ≡ P tt ′ =1 ( µ ( t ′ ) − h µ ( t ) i ), where h µ ( t ) i is the average of the corresponding record µ ( t ), represents the position of the randomwalker that performs an up or down step given by µ ( t ′ ) − h µ ( t ) i at time step t ′ . Thecorrelations after ∆ t steps are reflected in the behavior of the root-mean-square displacement F (∆ t ) ≡ q h [ Y ( t + ∆ t ) − Y ( t )] i [20], where h·i is the average over t and members. If theactivity µ ( t ) is uncorrelated or short-term correlated , then one obtains F (∆ t ) ∼ (∆ t ) / ,Fick’s law of diffusion, after some cross-over time. In the case of long-term correlations , theresult is a power-law increase F (∆ t ) ∼ (∆ t ) H , (3)where H > / H (see SI Sec. III for a detailed description).The results for OC1 are shown in Figs. 3a,b, where we calculate Eq. (3) by separatingthe members in groups with different total number of messages sent by the members, M .We find that F (∆ t ) asymptotically follows a power-law with H ≈ / M < H ≈ .
75 for members with
M > (see Fig. 3b). The smaller value5f H for less active members could be due to the small amount of information that thesemembers provide in the available time of data acquisition. When we shuffle the data toremove any temporal correlations, we obtain the random exponent H rnd = 1 / H ≈ / H ≈ . M (the exponents for very active members are based on poor statistics andtherefore carry large error bars). Analogous to the results obtained for OC1, there are nocorrelations in the shuffled records ( H rnd = 1 / H > / Relation between β and H Next, we elaborate the mathematical framework that relates the growth process Eq. (2) tothe long-term correlations, Eq. (3). To relate the exponent from Eq. (2), β , to the temporalcorrelation exponent γ , from Eq. (4), and therefore to H , one can first rewrite Eq. (1) as: r = ln m m = ln m + ∆ mm with ∆ m = m − m = ln (cid:18) ∆ mm + 1 (cid:19) ≈ ∆ mm for small ∆ mm . Next, the total increment of messages ∆ m is expressed in terms of smaller increments µ ( t ),such as messages per day: ∆ m = t +∆ t X t = t +1 µ ( t ) , which is (assuming stationarity) statistically equivalent to ∆ m = P ∆ tt =1 µ ( t ) , and one canwrite r ≈ m P ∆ tt =1 µ ( t ) for the growth rate. The conditional average growth is then h r ( m ) i = h m t X t =1 µ ( t ) i ≈ m t X t =1 h µ ( t ) i . σ ( m ) = p h [ r ( m ) − h r ( m ) i ] i , can be writtenin terms of the auto-correlation function as follows: r ( m ) − h r ( m ) i = 1 m ∆ t X t =1 µ ( t ) − ∆ t X t =1 h µ ( t ) i ! [ r ( m ) − h r ( m ) i ] = 1 m ∆ t X t =1 ( µ ( t ) − h µ ( t ) i ) ! h [ r ( m ) − h r ( m ) i ] i ≈ m
20 ∆ t X i ∆ t X j σ µ C ( j − i ) , where C (∆ t ) = σ µ h [ µ ( t ) − h µ ( t ) i ] [ µ ( t + ∆ t ) − h µ ( t ) i ] i is the auto-correlation function of µ ( t ) and σ µ is the standard deviation of µ ( t ). The auto-correlation function C (∆ t ) measuresthe interdependencies between the values of the record µ ( t ). For uncorrelated values, C (∆ t )is zero for ∆ t >
0, because on average positive and negative products of the record willcancel out each other. In the case of short-term correlations, C (∆ t ) has a characteristicdecay time, ∆ t × . A prominent example is the exponential decay C (∆ t ) ∼ exp( − ∆ t/ ∆ t × ).Long-term correlations are described by a slower decay namely a power-law, C (∆ t ) ∼ (∆ t ) − γ , (4)with the correlation exponent 0 < γ < H fromEq. (3) by γ = 2 − H [20]. We note that γ = 1 (or γ >
1) corresponds to an uncorrelatedrecord with H = 1 /
2. A key-property of long-term correlations is a pronounced mountain-valley structure in the records [20]. Statistically, large values of µ ( t ) are likely to be followedby large values and small values by small ones. Ideally, this holds on all time scales, whichmeans a sequence in daily, weekly or monthly resolution is correlated in the same way asthe original sequence.Assuming long-term correlations asymptotically decaying as in Eq. (4), we approximatethe double sum with integrals and obtain: h [ r ( m ) − h r ( m ) i ] i ≈ m σ µ Z Z ∆ t ( j − i ) − γ d j d i ∼ m σ µ (∆ t ) − γ . In order to relate ∆ t and m , one can use ∆ t = x t , where x is an arbitrary (small)constant, that simply states how large ∆ t is compared to t , and m ∼ t , which states thatthe number of messages is proportional to time assuming stationary activity. Using these7wo arguments we obtain: h [ r ( m ) − h r ( m ) i ] i ≈ m σ µ ( x ) − γ ( t ) − γ ∼ σ µ m − γ ,σ ( m ) ∼ σ µ m − γ/ . Comparing with Eq. (2), we finally obtain β = γ/ , and with γ = 2 − H : β = 1 − H . (5)Equation (5) is a scaling law formalizing the relation between growth and long-termcorrelations in the activity and is confirmed by our data. For OC1 we measured β OC1 ≈ . H OC1 ≈ .
78 from Eq. (5), which is in approximate agreement with the (maximum)exponent we obtained by direct measurements for OC1 ( H = 0 . ± .
05 from Fig. 3b). ForOC2 we obtained β OC2 ≈ .
17 and therefore H OC2 ≈ .
83 through Eq. (5) which is not toofar from the (maximum) exponent found by direct measurements for OC2 ( H = 0 . ± . β G = 0) corresponds to very strong long-term correlations with H G = 1. This is the case when the activity on all time scales exhibitsequally strong correlations. In contrast, β rnd = 1 / H rnd = 1 / H and thegrowth fluctuations quantified by β could be relevant to other complex systems. While thegeneralized version of Gibrat’s law has been reported for economic indicators displaying β ≈ . β could be explained by the existence of long-termcorrelations in the activity of the corresponding system ranging from firms and marketsto social and population dynamics. In turn, Eq. (5) establishes a missing link betweenstudies of growth processes in economic or social systems [16, 17, 18] and studies of long-term correlations such as in finance and the economy [23], Ethernet traffic [24], humanbrain [25] or motor activity [26]. Our results foreshadow that systems involving other typesof human interactions such as various Internet activities, communication via cell phones,trading activity, etc. may display similar growth and correlation properties as found here,offering the possibility of explaining their dynamics in terms of the long-term persistence ofthe individuals’ behavior. 8 rowth of the degree in the underlying social network Communication among the members of a community represents a type of a social inter-action that defines a network, whereas a message is sent either based on an existing relationbetween two members or establishing a new one. There is considerable interest in the originof broad distributions of activity in social systems. Two paradigms have been invoked forvarious applications in social systems: the “rich-get-richer” idea used by Simon in 1955 [27]and the models based on optimization strategies as proposed by Mandelbrot [28]. Regard-ing network models, the preferential attachment (PA) model has been introduced [29] togenerate a type of stochastic scale-free networks with a power-law degree distribution in thenetwork topology. Considering the social network of members linked when they exchange atleast one message (that has not been sent before), we examine the dynamic of the numberof outgoing links of each member [the out-degree k ( t )] in analogy to Eqs. (2).We start from the empty set of nodes consisting of all the members in the communityand chronologically add a directed link between two members when a messages is sent. Inanalogy to the growth in the number of messages m ( t ) of each member, we study the growthof the members’ out-degree k ( t ), i.e. the number of links to others. We define the growthrate of every member as r k = ln k k , (6)where k ≡ k ( t ) is the out-degree of a member at time t and k ≡ k ( t ) is the out-degreeat time t . Again, there is a growth rate for each member j , but for a better readability,we skip the index. In Fig. 4 we study h r k ( k ) i , the average growth rate conditional to theinitial out-degree k , and σ k ( k ), the standard deviation of the growth rate conditional to theinitial out-degree k for OC1 and OC2. We obtain almost constant average growth h r k ( k ) i as a function of k as in the study of messages.The conditional standard deviation of the network-degree, σ k ( k ), is shown in Fig. 4 forboth social communities. We obtain a power-law relation analogous to Eq. (2): σ k ( k ) ∼ k − β k , (7)with fluctuation exponents very similar to those found for the number of messages, namely β k, OC1 = 0 . ± .
02 for OC1 and β k, OC2 = 0 . ± .
08 for OC2. This values are consistentwith those we obtained for the activity of sending messages.9ext, we consider the preferential attachment model which has been introduced to gener-ate scale-free networks [29] with power-law degree distribution P ( k ) of the type investigatedin the present study. Essentially, it consists of subsequently adding nodes to the network bylinking them to existing nodes which are chosen randomly with a probability proportionalto their degree. We consider the undirected network and study the degree growth propertiesusing Eqs. (6) and (7) and calculate the conditional average growth rate h r PA ( k ) i and theconditional standard deviation σ PA ( k ). The times t and t are defined by the number ofnodes attached to the network. Figure 2 in the SI Sec. IV shows the results where an averagedegree h k i = 20; 50 ,
000 nodes in t , and 100 ,
000 nodes in t were chosen. We find constantaverage growth rate that does not depend on the initial degree k . The conditional standarddeviation is a function of k and exhibits a power-law decay characterized by Eq. (7), re-spectively Eq. (2), with β PA = 1 /
2. The value β PA = 1 / H = 1 / k ( t ) ∼ (cid:0) tt ∗ (cid:1) b , where t ∗ is the time when the corresponding node was introduced to thesystem and b = 1 / r PA = b ln t t , which is constant independent of k , in accordancewith our numerical findings. Furthermore, in SI Sec. IV we obtain analytically the exponent β PA = 1 / b -exponents and therefore a distributionof growth rates. This model opens the possibility to relate the distribution of fitness valuesto the fluctuations in the growth rates, a point that requires further investigation. III. DISCUSSION
From a statistical physics point of view, the finding of long-term correlations opens thequestion of the origin of such a persistence pattern in the communication. At this point we10peculate on two possible scenarios, which require further studies. The question is whetherthe finding of an exponent
H > . H > / IV. MATERIALS AND METHODSCalculations of h r ( m ) i , σ ( m ) and optimal times t and t The average growth rate, h r ( m ) i , and the standard deviation, σ ( m ) = p h r ( m ) i − h r ( m ) i , are defined as follows. Calling P ( r | m ) the conditional probabil-ity density of finding a member with growth rate r ( m ) with the condition of initial numberof messages m , then we obtain: h r ( m ) i = Z rP ( r | m ) d r , (8)and h r ( m ) i = Z r P ( r | m ) d r . (9)In order to calculate the growth rate Eq. (1), one has to choose the times t and t in theperiod of data acquisition T . Naturally, it is best to use all data in order to have optimalstatistics. Accordingly, t is chosen best at the end of the available data ( t = T ). We arguethat if the choice of t is too small, then m ( t ) is zero for many members (those that sendtheir messages later), which are then rejected in the calculation because of the division inEq. (1). Conversely, if t is chosen too large, then there is not enough time to observe themember’s activity and r = 0 will occur frequently, indicating no change (members have senttheir messages before). Thus, there must be an optimal time in between. In SI Sec. II,Fig. 1, we plot, as a function of t , the number of members with at least one message at t [ m >
0] and further exhibit at least some activity until t = T [ m − m > t in the middle of the period of observation t = T /
Shuffling of the message data
The raw data comprises one entry for each message consisting of the time when themessage is sent, the sender identifier and the receiver identifier. For example:12 ime sender receiver1 a b2 a c4 b a6 c d7 a b...
This means, at t = 1 member a sends a message to member b , at t = 2 member a sends amessage to member c , and so on.The randomized surrogate data set is created by randomly swapping the instants ( time )at which the messages are sent between two events chosen at random. Thus, each messageentry randomly obtains the time of another one. This means the total number of messagesis preserved and the associations between them get shuffled. Temporal correlations aredestroyed, but the set of instants at which the messages are sent remains unchanged. Forinstance, swapping events at t = 1 and t = 6 results in: t = 1, c → d , and t = 6, a → b . Acknowledgments
We thank NSF-SES-0624116 for financial support and C. Briscoe, L. K. Gallos and H.D. Rozenfeld for discussions. F. L. acknowledges financial support from The Swedish BankTercentenary Foundation. 13
1] Merton RK (1936) The Unanticipated Consequences of Purposive Social Action.
Am SociolRev
Economy and Society, Vol.1 . (University of California Press, Berkley).[3] Giddens A (1993)
New Rules of Sociological Method . (Stanford University Press, Stanford).[4] Durkheim E (1997)
Suicide, reprint from 1897 . (The Free Press, New York).[5] Pareto V (1896)
Cours d’Economie Politique . (Droz, Geneva).[6] Zipf G (1932)
Selective Studies and the Principle of Relative Frequency in Language . (HarvardUniversity Press, Cambridge, MA).[7] Gibrat R (1931)
Les in´egalit´es ´economiques . (Libraire du Recueil Sierey, Paris).[8] Sutton J (1997) Gibrat’s Legacy.
J Econ Lit
Q J Econ
Soc Networks
IEEE/ACMTrans Networking
Proc. 2003 ACM SIGCOMM Conf. InternetMeasurement (IMC-03) . (ACM Press, New York).[13] Barab´asi A-L (2005) The origin of bursts and heavy tails in human dynamics.
Nature
Nature
Europhys Lett et al. (1996) Scaling behaviour in the growth of companies.
Nature
Nature
J Am Soc Inf Sci Tec
19] Rozenfeld HD, et al. (2008) Laws of Population Growth.
Proc Nat Acad Sci USA
Fractals , Physics of Solids and Liquids. (Plenum Press, New York).[21] Peng C-K, et al. (1994) Mosaic organization of DNA nucleotides.
Phys Rev E
Adv Phys
An Introduction to Econophysics: Correlations and Com-plexity in Finance . (Cambridge University Press, Cambridge).[24] Leland WE, Taqqu MS, Willinger W, Wilson DV (1994) On the Self-Similar Nature of Eth-ernet Traffic (Extended Version)
IEEE/ACM Trans Networking
J Neurosci
Proc NatAcad Sci USA
Biometrika
An informational theory of the statistical structure of language , ed.Jackson, W. (Butterworth, London), pp. 486–504.[29] Barab´asi A-L, Albert R (1999) Emergence of scaling in random networks.
Science
Rev Mod Phys
Euro-phys Lett
Phys Rev Lett
Phys Rev E
Biophys J ecords. Phys Rev Lett
Dissecting the Social: On the Principles of Analytical Sociology . (Cam-bridge University Press, Cambridge).[37] Kentsis A (2006) Mechanisms and models of human dynamics.
Nature IG. 1:
A typical example of an individuals’ message activity. a , Instants at whichmessages were sent by a member belonging to OC1. b , Cumulative number of messages m ( t )(green) and the same but with the messages placed at random (brown). c , Sequence of number ofmessages sent per day, µ ( t ), for the same individual. d,e,f , Color coded sequences µ ( t ) for memberssending M = 100; 1,000; or 10,000 messages overall, respectively. The color is proportional to thelogarithm of the number of messages per day (red: 1 message, blue: 400 messages, white for nomessage). IG. 2:
Average and standard deviation of the growth rate versus number of messages.a , Results for OC1. The average growth rate of messages conditional to m is almost constant andthe standard deviation decays with an exponent β OC1 = 0 . ± . b , Results for OC2. Thestandard deviation conditional to m decays with an exponent β OC2 = 0 . ± . c , Resultsfor OC1, when the messages are shuffled, displaying β rnd = 1 / d , Results for OC2, when themessages are shuffled. In all cases t corresponds to half of the period of data acquisition and t to the end, which we found to provide optimal statistics (see SI Fig. 1). IG. 3:
Long-term correlations in the message activity of OC1 (a and b) and OC2 (cand d). a , DFA fluctuation functions averaged conditional to M , the total number of messagessent by each member (black: 1-2, red: 3-7, green: 8-20, blue: 21-54, orange: 55-148, brown: 149-403, maroon: 404-1096, violet: 1097-2980, turquoise: 2981-8103). The dotted lines serve as guides,the one in the bottom corresponds to the uncorrelated case, while the one in the top correspondsto the exponent 0 . b , Fluctuation exponent H measured from panel a on the scales 10 days ≤ ∆ t ≤
63 days as a function of the total number of messages sent, M , for real (blue) and individuallyshuffled (green) records. c , DFA fluctuation functions averaged conditional to M [colors as in (A)].The dotted lines correspond to the uncorrelated case (bottom) and to the exponent 1 (top). d ,Fluctuation exponents obtained from panel c on the scales 32 days ≤ ∆ t ≤
200 days as a functionof the total number of messages sent, M . Due to weak statistics causing large error bars we do notconsider the last two values for M >
500 as reliable. For clarity the fluctuation functions in panelsa and c are shifted vertically. IG. 4:
Mean out-degree growth rate and standard deviation versus initial out-degree.a , Results for OC1. The average growth of out-degree conditional to the out-degree at t is almostconstant. The standard deviation decays with an exponent β k, OC1 = 0 . ± . b , Resultsfor OC2. The standard deviation conditional to the out-degree at t decays with an exponent β k, OC2 = 0 . ± .
08. The quantities are analogous to those of Fig. 2 except that here the growthrate of the out-degree r k is considered instead of the number of messages sent. UPPORTING INFORMATION (SI)Scaling laws of human interaction activity
Diego Rybski, Sergey V. Buldyrev, Shlomo Havlin,Fredrik Liljeros, and Hern´an A. Makse
I. NOTATION
1. Member j sends his/her n th message at time t j ( n ), where 1 ≤ n ≤ M j and M j is thetotal number of messages sent by j in the time of data acquisition T . The sequenceof counts defined as the number of messages in the period δt , is given by µ δtj ( t ) = X n,t j ( n ) ∈ [ t,t + δt ] a j ( n ) , (10)where a j ( n ) = 1. In addition, the periods are non-overlapping, t = iδt with integer i ,and therefore 1 ≤ t j ( n ) ≤ T . In the case of daily resolution δt = 1 day.2. The cumulative number of messages that a member sends until time t is: m δtj ( t ) = t X t ′ =1 µ δtj ( t ′ ) . (11)In particular, m j (1) = µ j (1) and m j ( T ) = M j .3. The displacement of the random walk is the cumulative sum of the normalized µ δtj ( t ): Y δtj ( t ) = t X t ′ =1 ( µ δtj ( t ′ ) − h µ δtj ( t ) i ) , (12)where h µ δtj ( t ) i is the average of µ δtj ( t ) in time t . The root-mean-square displacementafter ∆ t is defined as F δtj (∆ t ) = q h [ Y δtj ( t + ∆ t ) − Y δtj ( t )] i t , (13)where the average is performed over the time t . Additionally, we perform an averageover members j with activity level M and define( F δt (∆ t )) M = h ( F δtj ) | M i j . (14)21
20 40 60t (t =T) [days]010000200003000040000 m e m be r s w i t h m > and m − m > (t =T) [weeks] 0100020003000400050006000 a b DC1 DC2
FIG. 5: Optimal times t and t . The panels show for a , OC1, and b , OC2, the number ofmembers with both, m > m − m >
0. While t obviously is optimal at the end of theperiod, t is varied to find the value for which the number of members – with at least one messageuntil t and at least one new message between t and t – is maximal.
4. For simplicity, in the main text we skip the index j as well as δt and write µ ( t ), m ( t ), Y ( t ), as well as F (∆ t ).5. To investigate the growth in the number of messages we use the quantities r = ln m m , h r ( m ) i , σ ( m ) and the exponents β OC1 , β OC2 , β G , β rnd .6. To investigate the growth of the degree we use the quantities r k = ln k k , h r k ( k ) i , σ k ( k ) and the exponents β k, OC1 ; β k, OC2 .7. For the growth of the degree in the preferential attachment model we use the quantities r PA = ln k k , h r PA ( k ) i , σ PA ( k ) and the exponent β PA . II. OPTIMAL TIMES t AND t Figure 5 displays the optimal times t and t to calculate the growth rates for OC1(panel a) and OC2 (panel b). 22 II. DETAILS ON THE QUANTIFICATION OF LONG-TERM CORRELATIONSUSING DETRENDED FLUCTUATION ANALYSIS
Statistical dependencies between the values of a record µ ( t ) with t = 1 , . . . , T can becharacterized by the auto-correlation function C (∆ t ) = 1 σ µ ( T − ∆ t ) T − ∆ t X t =1 [ µ ( t ) − h µ ( t ) i ] [ µ ( t + ∆ t ) − h µ ( t ) i ] , (15)where T is the length of the record µ ( t ), h µ ( t ) i its average, and σ µ its standard deviation.For uncorrelated values of µ ( t ), C (∆ t ) is zero for ∆ t >
0, because on average positiveand negative products will cancel each other out. In the case of short-term correlations C (∆ t ) has a characteristic decay time ∆ t × . A prominent example is the exponential decay C (∆ t ) ∼ exp( − ∆ t/ ∆ t × ). Long-term correlations are described by a slower decay, e.g.diverging ∆ t × , namely a power-law, C (∆ t ) ∼ (∆ t ) − γ , (16)with the correlation exponent 0 < γ < µ ( t ) of length T consists of 5 steps:1. Calculate the cumulative sum, the so-called profile: Y ( t ) = t X t ′ =1 ( µ ( t ′ ) − h µ ( t ) i ) . (17)2. Separate the profile Y ( t ) into T ∆ t = int T ∆ t segments of length ∆ t . Often, the lengthof the record is not a multiple of ∆ t . In order not to disregard information, thesegmentation procedure is repeated starting from the end of the record and one obtains2 T ∆ t segments.3. Locally detrend each segment ν by determining best polynomial fits p ( n ) ν ( t ) of order n and subsequently subtract it from the profile: Y ∆ t ( t ) = Y ( t ) − p ( n ) ν ( t ) . (18)23. Calculate for each segment the variance (squared residuals) of the detrended Y ∆ t ( t ) F t ( ν ) = 1∆ t ∆ t X j =1 (cid:0) Y t [( ν − t + j ] (cid:1) (19)by averaging over all values in the corresponding ν th segment.5. The DFA fluctuation function is given by the square-root of the average over all seg-ments: F (∆ t ) = " T ∆ t T ∆ t X ν =1 F t ( ν ) / . (20)The averaging of F t ( ν ) is additionally performed over members of similar activitylevel M .If the record µ ( t ) is long-term correlated according to a power-law decaying auto-correlation function, Eq. (16), then F (∆ t ) increases for large scales ∆ t also as a power-law: F (∆ t ) ∼ (∆ t ) H , (21)where the fluctuation exponent H is analogous to the well-known Hurst exponent [20]. Theexponents are related via H = 1 − γ/ , γ = 2 − H . (22)When γ = 1 then H rnd = 1 /
2, that is the case of uncorrelated dynamics. If the correlationsdecay faster than γ > H rnd = 1 / < γ < / < H <
1. In practice, one plots F (∆ t ) versus ∆ t indouble-logarithmic representation, determines the exponent H on large scales and quantifiesthe correlation exponent γ . The order of the polynomials p ( n ) ν determines the detrendingtechnique which is named DFA n , DFA0 for constant detrend, DFA1 for linear, DFA2 forparabolic, etc.The subtraction of the average in Eq. (17) is only necessary for DFA0. By definitionthe corresponding fluctuation function is only given for ∆ t ≥ n + 2. The detrending orderdetermines the capability of detrending. Since the local trends are subtracted from theprofile, only trends of order n − µ ( t ). Throughoutthe paper we show the results using DFA2 which we found to be sufficient in terms ofdetrending. 24 degree k −2 −1 < r PA ( k ) > , σ PA ( k ) preferential attachment
Figure 6 shows the results of the average growth rates and fluctuations of the growthrates as a function of the initial degree for the preferential attachment model [29]. We finda constant average growth rate and a standard deviation decreasing as a power law withexponent β PA = 1 / k ( t ) ∼ (cid:18) tt ∗ (cid:19) b , (23)where t ∗ is the time when the corresponding node was introduced to the system and b isthe dynamics exponent in growing network models ( b = 1 / r PA = ln t t , which we alsofind in Fig. 6.To obtain σ PA ( k ) one can use analogous considerations as for σ ( m ) in the main text.Due to Eq. (6) in the main text, here we have r PA ≈ k t X t =1 κ ( t ) , (24)where κ ( t ) are small increments analogous to µ ( t ), whereas Eq. (23) implies κ ( t ) ∼ (∆ t ) − / . (25)As before, the conditional standard deviation of the growth rate is h [ r PA ( k ) − h r PA ( k ) i ] i ≈ k
20 ∆ t X i ∆ t X j σ κ C ( j − i ) . (26)In the uncorrelated case C ( j − i ) = δ ij , the double sum can be reduced to a single one: σ ( k ) = 1 k
20 ∆ t X i σ κ ( i ) . (27)As shown below, σ κ ( i ) ∼ i − / , and integration leads to σ ( k ) ∼ k Z ∆ t i − / d i (28) ∼ k (∆ t ) / . (29)Eliminating ∆ t using k ∼ t − / , Eq. (23), one obtains σ PA ( k ) ∼ k − / . (30)That is, we obtain β PA = 1 / σ κ ( t ) ∼ t − / . We assume new links are set according to a Poissonprocess, whereas every new link of a node represents an event. The intervals between these26vents (asymptotically) follow an exponential distribution p ( τ ) = λ e − λτ . Accordingly, κ ( t )is a sequence of zeros and only one when a new link is set to the corresponding node. Thestandard deviation of this sequence is σ κ ∼ λ / . (31)Due to Eq. (23) the rate parameter decreases like λ ( t ) ∼ t − / . (32)Accordingly, σ κ ( t ) ∼ t − / . (33)In order to extend the standard PA model, a fitness model has been introduced [31]taking into account different fitnesses of the nodes of acquiring links and therefore involving adistribution of b -exponents. The spread of growth rates r could be related to the distributionof fitness. On the other hand, the growth according to Eq. (23) is superimposed with randomfluctuations that we characterize with the exponent ββ