[PDF] Scaling laws of human interaction activity

Abstract

Even though people in our contemporary, technological society are depending on communication, our understanding of the underlying laws of human communicational behavior continues to be poorly understood. Here we investigate the communication patterns in two social Internet communities in search of statistical laws in human interaction activity. This research reveals that human communication networks dynamically follow scaling laws that may also explain the observed trends in economic growth. Specifically, we identify a generalized version of Gibrat's law of social activity expressed as a scaling law between the fluctuations in the number of messages sent by members and their level of activity. Gibrat's law has been essential in understanding economic growth patterns, yet without an underlying general principle for its origin. We attribute this scaling law to long-term correlation patterns in human activity, which surprisingly span from days to the entire period of the available data of more than one year. Further, we provide a mathematical framework that relates the generalized version of Gibrat's law to the long-term correlated dynamics, which suggests that the same underlying mechanism could be the source of Gibrat's law in economics, ranging from large firms, research and development expenditures, gross domestic product of countries, to city population growth. These findings are also of importance for designing communication networks and for the understanding of the dynamics of social systems in which communication plays a role, such as economic markets and political systems.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] S e p Scaling laws of human interaction activity

Diego Rybski , Sergey V. Buldyrev , Shlomo Havlin ,Fredrik Liljeros , and Hern´an A. Makse Levich Institute and Physics Department,City College of New York, New York, NY 10031, USA Department of Physics, Yeshiva University, New York, NY 10033, USA Minerva Center and Department of Physics,Bar-Ilan University, Ramat-Gan 52900, Israel Department of Sociology, Stockholm University, S-10691 Stockholm, Sweden (Dated: June 14, 2018)

Abstract

Even though people in our contemporary, technological society are depending on communica-tion, our understanding of the underlying laws of human communicational behavior continues to bepoorly understood. Here we investigate the communication patterns in two social Internet commu-nities in search of statistical laws in human interaction activity. This research reveals that humancommunication networks dynamically follow scaling laws that may also explain the observed trendsin economic growth. Speciﬁcally, we identify a generalized version of Gibrat’s law of social activityexpressed as a scaling law between the ﬂuctuations in the number of messages sent by members andtheir level of activity. Gibrat’s law has been essential in understanding economic growth patterns,yet without an underlying general principle for its origin. We attribute this scaling law to long-termcorrelation patterns in human activity, which surprisingly span from days to the entire period of theavailable data of more than one year. Further, we provide a mathematical framework that relatesthe generalized version of Gibrat’s law to the long-term correlated dynamics, which suggests thatthe same underlying mechanism could be the source of Gibrat’s law in economics, ranging fromlarge ﬁrms, research and development expenditures, gross domestic product of countries, to citypopulation growth. These ﬁndings are also of importance for designing communication networksand for the understanding of the dynamics of social systems in which communication plays a role,such as economic markets and political systems. . INTRODUCTION The question of whether unforeseen outcomes of social activity follow emergent statisticallaws has been an acknowledged problem in the social sciences since at least the last decade ofthe 19th century [1, 2, 3, 4]. Earlier discoveries include Pareto’s law for income distributions[5], Zipf’s law initially applied to word frequency in texts and later extended to ﬁrms, citiesand others [6], and Gibrat’s law of proportionate growth in economics [7, 8, 9].Social networks are permanently evolving and Internet communities are growing each daymore. Having access to the communication patterns of Internet users opens the possibilityto unveil the origins of statistical laws that lead us to the better understanding of humanbehavior as a whole. In this paper, we analyze the dynamics of sending messages in twoInternet communities in search of statistical laws of human communication activity. Theﬁrst online community (OC1) is mainly used by the group of men who have sex with men(MSM) [38]. The data consists of over 80 ,

000 members and more than 12 . [38] The study of the de-identiﬁed MSM dating site network data was approved by the Regional Ethical Reviewboard in Stockholm, record 2005/5:3. II. RESULTSGrowth in the number of messages

The cumulative number, m j ( t ), expresses how many messages have been sent by a certainmember j up to a given time t [for a better readability we will not write the index j explicitly, m ( t ), see details on the notation in the Supporting Information (SI) Sec. I]. The dynamicsof m ( t ) between times t and t within the period of data acquisition T ( t < t ≤ T ) can beconsidered as a growth process, where each member exhibits a speciﬁc growth rate r j ( r forshort notation): r ≡ ln m m , (1)where m ≡ m ( t ) and m ≡ m ( t ) are the number of messages sent until t and t ,respectively, by every member. To characterize the dynamics of the activity, we considertwo measures. (i) The conditional average growth rate, h r ( m ) i , quantiﬁes the averagegrowth of the number of messages sent by the members between t and t depending on theinitial number of messages, m . In other words, we consider the average growth rate of onlythose members that have sent m messages until t (see Methods, Sec. IV for more details). (ii) The conditional standard deviation of the growth rate for those members that havesent m messages until t , σ ( m ) ≡ p h ( r ( m ) − h r ( m ) i ) i , expresses the statistical spreador ﬂuctuation of growth among the members depending on m . Both quantities are relevantin the context of Gibrat’s law in economics [7, 8, 9] which proposes a proportionate growthprocess entailing the assumption that the average and the standard deviation of the growthrate of a given economic indicator are constant and independent of the speciﬁc indicatorvalue. That is, both h r ( m ) i and σ ( m ) are independent of m [9]In Fig. 2a,b we show the results of h r ( m ) i and σ ( m ) versus m for both online commu-nities. We ﬁnd that the conditional average growth rate is fairly independent of m . On theother hand, the standard deviation decreases as a power-law of the form: σ ( m ) ∼ m − β . (2)3e obtain by least square ﬁtting the exponents β OC1 = 0 . ± .

01 for OC1 and β OC2 =0 . ± .

03 for OC2 (the values deviate slightly for large m due to low statistics). Althoughthe web-sites are used by diﬀerent member populations, the power-law and the obtainedexponents are quite similar. The exponents are also close to those reported for growth ineconomic systems such as ﬁrms and countries (0 . − .

18, [16]), research and developmentexpenditures at universities (0 .

25, [17]), scientiﬁc output (0 . − .

4, [18]), and city popula-tion growth (0 . − .

27, [19]). The approximate agreement between the exponents obtainedfor very diﬀerent systems (social or of human origin) can be considered as a generalizationof Gibrat’s law, suggesting that the mechanisms behind the growth properties in diﬀerentsystems may originate in the human activity represented by Eq. (2).Figures 2c and d depict the results when we randomize the data of OC1 and OC2, re-spectively (see Sec. IV for details of the randomization procedure), such that any temporalcorrelations are removed. The typical dynamics for such surrogate data set are shown inFig. 1b (the brown curve) displaying a clear random pattern of small ﬂuctuations in com-parison with the original data of larger ﬂuctuations (green curve). We ﬁnd that the randomsignal displays a close to constant average growth rate h r ( m ) i and that the ﬂuctuations be-have as in Eq. (2) but with an exponent β rnd = 1 / β rnd = 1 /

2. Incontrast to randomness, here we hypothesize that the origin of the generalized version ofGibrat’s law with β < / β ≈ . ong-term correlations The exceptional quality of the data (more than 10 million messages spanning severaleﬀective decades of magnitude in terms of both activity and time) allows to test the abovehypothesis by investigating the presence of temporal correlations in the individuals’ activity.We aggregate the data to records of messages per day (an example is shown in Fig. 1c) toavoid the daily cycle in the activity and analyze the number of messages sent by individualsper day, µ ( t ), where t denotes the day [ m ( t ) ≡ P tt ′ =1 µ ( t ′ ), Figs. 1d-f show the color codeddaily activity of three members in OC1]. For every member we obtain a record of a lengthof 63 days (OC1) or 492 days (OC2). We note that former studies reporting Eq. (2) such as[16, 17, 18, 19] typically were not based on data with temporal resolution as we use it here,and therefore were not able to investigate its origin in terms of temporal correlations.We quantify the temporal correlations in the members’ activity by mapping the problemto a one-dimensional random walk. The quantity Y ( t ) ≡ P tt ′ =1 ( µ ( t ′ ) − h µ ( t ) i ), where h µ ( t ) i is the average of the corresponding record µ ( t ), represents the position of the randomwalker that performs an up or down step given by µ ( t ′ ) − h µ ( t ) i at time step t ′ . Thecorrelations after ∆ t steps are reﬂected in the behavior of the root-mean-square displacement F (∆ t ) ≡ q h [ Y ( t + ∆ t ) − Y ( t )] i [20], where h·i is the average over t and members. If theactivity µ ( t ) is uncorrelated or short-term correlated , then one obtains F (∆ t ) ∼ (∆ t ) / ,Fick’s law of diﬀusion, after some cross-over time. In the case of long-term correlations , theresult is a power-law increase F (∆ t ) ∼ (∆ t ) H , (3)where H > / H (see SI Sec. III for a detailed description).The results for OC1 are shown in Figs. 3a,b, where we calculate Eq. (3) by separatingthe members in groups with diﬀerent total number of messages sent by the members, M .We ﬁnd that F (∆ t ) asymptotically follows a power-law with H ≈ / M < H ≈ .

75 for members with

M > (see Fig. 3b). The smaller value5f H for less active members could be due to the small amount of information that thesemembers provide in the available time of data acquisition. When we shuﬄe the data toremove any temporal correlations, we obtain the random exponent H rnd = 1 / H ≈ / H ≈ . M (the exponents for very active members are based on poor statistics andtherefore carry large error bars). Analogous to the results obtained for OC1, there are nocorrelations in the shuﬄed records ( H rnd = 1 / H > / Relation between β and H Next, we elaborate the mathematical framework that relates the growth process Eq. (2) tothe long-term correlations, Eq. (3). To relate the exponent from Eq. (2), β , to the temporalcorrelation exponent γ , from Eq. (4), and therefore to H , one can ﬁrst rewrite Eq. (1) as: r = ln m m = ln m + ∆ mm with ∆ m = m − m = ln (cid:18) ∆ mm + 1 (cid:19) ≈ ∆ mm for small ∆ mm . Next, the total increment of messages ∆ m is expressed in terms of smaller increments µ ( t ),such as messages per day: ∆ m = t +∆ t X t = t +1 µ ( t ) , which is (assuming stationarity) statistically equivalent to ∆ m = P ∆ tt =1 µ ( t ) , and one canwrite r ≈ m P ∆ tt =1 µ ( t ) for the growth rate. The conditional average growth is then h r ( m ) i = h m t X t =1 µ ( t ) i ≈ m t X t =1 h µ ( t ) i . σ ( m ) = p h [ r ( m ) − h r ( m ) i ] i , can be writtenin terms of the auto-correlation function as follows: r ( m ) − h r ( m ) i = 1 m ∆ t X t =1 µ ( t ) − ∆ t X t =1 h µ ( t ) i ! [ r ( m ) − h r ( m ) i ] = 1 m ∆ t X t =1 ( µ ( t ) − h µ ( t ) i ) ! h [ r ( m ) − h r ( m ) i ] i ≈ m

20 ∆ t X i ∆ t X j σ µ C ( j − i ) , where C (∆ t ) = σ µ h [ µ ( t ) − h µ ( t ) i ] [ µ ( t + ∆ t ) − h µ ( t ) i ] i is the auto-correlation function of µ ( t ) and σ µ is the standard deviation of µ ( t ). The auto-correlation function C (∆ t ) measuresthe interdependencies between the values of the record µ ( t ). For uncorrelated values, C (∆ t )is zero for ∆ t >

0, because on average positive and negative products of the record willcancel out each other. In the case of short-term correlations, C (∆ t ) has a characteristicdecay time, ∆ t × . A prominent example is the exponential decay C (∆ t ) ∼ exp( − ∆ t/ ∆ t × ).Long-term correlations are described by a slower decay namely a power-law, C (∆ t ) ∼ (∆ t ) − γ , (4)with the correlation exponent 0 < γ < H fromEq. (3) by γ = 2 − H [20]. We note that γ = 1 (or γ >

1) corresponds to an uncorrelatedrecord with H = 1 /

2. A key-property of long-term correlations is a pronounced mountain-valley structure in the records [20]. Statistically, large values of µ ( t ) are likely to be followedby large values and small values by small ones. Ideally, this holds on all time scales, whichmeans a sequence in daily, weekly or monthly resolution is correlated in the same way asthe original sequence.Assuming long-term correlations asymptotically decaying as in Eq. (4), we approximatethe double sum with integrals and obtain: h [ r ( m ) − h r ( m ) i ] i ≈ m σ µ Z Z ∆ t ( j − i ) − γ d j d i ∼ m σ µ (∆ t ) − γ . In order to relate ∆ t and m , one can use ∆ t = x t , where x is an arbitrary (small)constant, that simply states how large ∆ t is compared to t , and m ∼ t , which states thatthe number of messages is proportional to time assuming stationary activity. Using these7wo arguments we obtain: h [ r ( m ) − h r ( m ) i ] i ≈ m σ µ ( x ) − γ ( t ) − γ ∼ σ µ m − γ ,σ ( m ) ∼ σ µ m − γ/ . Comparing with Eq. (2), we ﬁnally obtain β = γ/ , and with γ = 2 − H : β = 1 − H . (5)Equation (5) is a scaling law formalizing the relation between growth and long-termcorrelations in the activity and is conﬁrmed by our data. For OC1 we measured β OC1 ≈ . H OC1 ≈ .

78 from Eq. (5), which is in approximate agreement with the (maximum)exponent we obtained by direct measurements for OC1 ( H = 0 . ± .

05 from Fig. 3b). ForOC2 we obtained β OC2 ≈ .

17 and therefore H OC2 ≈ .

83 through Eq. (5) which is not toofar from the (maximum) exponent found by direct measurements for OC2 ( H = 0 . ± . β G = 0) corresponds to very strong long-term correlations with H G = 1. This is the case when the activity on all time scales exhibitsequally strong correlations. In contrast, β rnd = 1 / H rnd = 1 / H and thegrowth ﬂuctuations quantiﬁed by β could be relevant to other complex systems. While thegeneralized version of Gibrat’s law has been reported for economic indicators displaying β ≈ . β could be explained by the existence of long-termcorrelations in the activity of the corresponding system ranging from ﬁrms and marketsto social and population dynamics. In turn, Eq. (5) establishes a missing link betweenstudies of growth processes in economic or social systems [16, 17, 18] and studies of long-term correlations such as in ﬁnance and the economy [23], Ethernet traﬃc [24], humanbrain [25] or motor activity [26]. Our results foreshadow that systems involving other typesof human interactions such as various Internet activities, communication via cell phones,trading activity, etc. may display similar growth and correlation properties as found here,oﬀering the possibility of explaining their dynamics in terms of the long-term persistence ofthe individuals’ behavior. 8 rowth of the degree in the underlying social network Communication among the members of a community represents a type of a social inter-action that deﬁnes a network, whereas a message is sent either based on an existing relationbetween two members or establishing a new one. There is considerable interest in the originof broad distributions of activity in social systems. Two paradigms have been invoked forvarious applications in social systems: the “rich-get-richer” idea used by Simon in 1955 [27]and the models based on optimization strategies as proposed by Mandelbrot [28]. Regard-ing network models, the preferential attachment (PA) model has been introduced [29] togenerate a type of stochastic scale-free networks with a power-law degree distribution in thenetwork topology. Considering the social network of members linked when they exchange atleast one message (that has not been sent before), we examine the dynamic of the numberof outgoing links of each member [the out-degree k ( t )] in analogy to Eqs. (2).We start from the empty set of nodes consisting of all the members in the communityand chronologically add a directed link between two members when a messages is sent. Inanalogy to the growth in the number of messages m ( t ) of each member, we study the growthof the members’ out-degree k ( t ), i.e. the number of links to others. We deﬁne the growthrate of every member as r k = ln k k , (6)where k ≡ k ( t ) is the out-degree of a member at time t and k ≡ k ( t ) is the out-degreeat time t . Again, there is a growth rate for each member j , but for a better readability,we skip the index. In Fig. 4 we study h r k ( k ) i , the average growth rate conditional to theinitial out-degree k , and σ k ( k ), the standard deviation of the growth rate conditional to theinitial out-degree k for OC1 and OC2. We obtain almost constant average growth h r k ( k ) i as a function of k as in the study of messages.The conditional standard deviation of the network-degree, σ k ( k ), is shown in Fig. 4 forboth social communities. We obtain a power-law relation analogous to Eq. (2): σ k ( k ) ∼ k − β k , (7)with ﬂuctuation exponents very similar to those found for the number of messages, namely β k, OC1 = 0 . ± .

02 for OC1 and β k, OC2 = 0 . ± .

08 for OC2. This values are consistentwith those we obtained for the activity of sending messages.9ext, we consider the preferential attachment model which has been introduced to gener-ate scale-free networks [29] with power-law degree distribution P ( k ) of the type investigatedin the present study. Essentially, it consists of subsequently adding nodes to the network bylinking them to existing nodes which are chosen randomly with a probability proportionalto their degree. We consider the undirected network and study the degree growth propertiesusing Eqs. (6) and (7) and calculate the conditional average growth rate h r PA ( k ) i and theconditional standard deviation σ PA ( k ). The times t and t are deﬁned by the number ofnodes attached to the network. Figure 2 in the SI Sec. IV shows the results where an averagedegree h k i = 20; 50 ,

000 nodes in t , and 100 ,

000 nodes in t were chosen. We ﬁnd constantaverage growth rate that does not depend on the initial degree k . The conditional standarddeviation is a function of k and exhibits a power-law decay characterized by Eq. (7), re-spectively Eq. (2), with β PA = 1 /

2. The value β PA = 1 / H = 1 / k ( t ) ∼ (cid:0) tt ∗ (cid:1) b , where t ∗ is the time when the corresponding node was introduced to thesystem and b = 1 / r PA = b ln t t , which is constant independent of k , in accordancewith our numerical ﬁndings. Furthermore, in SI Sec. IV we obtain analytically the exponent β PA = 1 / b -exponents and therefore a distributionof growth rates. This model opens the possibility to relate the distribution of ﬁtness valuesto the ﬂuctuations in the growth rates, a point that requires further investigation. III. DISCUSSION

From a statistical physics point of view, the ﬁnding of long-term correlations opens thequestion of the origin of such a persistence pattern in the communication. At this point we10peculate on two possible scenarios, which require further studies. The question is whetherthe ﬁnding of an exponent

H > . H > / IV. MATERIALS AND METHODSCalculations of h r ( m ) i , σ ( m ) and optimal times t and t The average growth rate, h r ( m ) i , and the standard deviation, σ ( m ) = p h r ( m ) i − h r ( m ) i , are deﬁned as follows. Calling P ( r | m ) the conditional probabil-ity density of ﬁnding a member with growth rate r ( m ) with the condition of initial numberof messages m , then we obtain: h r ( m ) i = Z rP ( r | m ) d r , (8)and h r ( m ) i = Z r P ( r | m ) d r . (9)In order to calculate the growth rate Eq. (1), one has to choose the times t and t in theperiod of data acquisition T . Naturally, it is best to use all data in order to have optimalstatistics. Accordingly, t is chosen best at the end of the available data ( t = T ). We arguethat if the choice of t is too small, then m ( t ) is zero for many members (those that sendtheir messages later), which are then rejected in the calculation because of the division inEq. (1). Conversely, if t is chosen too large, then there is not enough time to observe themember’s activity and r = 0 will occur frequently, indicating no change (members have senttheir messages before). Thus, there must be an optimal time in between. In SI Sec. II,Fig. 1, we plot, as a function of t , the number of members with at least one message at t [ m >

0] and further exhibit at least some activity until t = T [ m − m > t in the middle of the period of observation t = T /

Shuﬄing of the message data

The raw data comprises one entry for each message consisting of the time when themessage is sent, the sender identiﬁer and the receiver identiﬁer. For example:12 ime sender receiver1 a b2 a c4 b a6 c d7 a b...

This means, at t = 1 member a sends a message to member b , at t = 2 member a sends amessage to member c , and so on.The randomized surrogate data set is created by randomly swapping the instants ( time )at which the messages are sent between two events chosen at random. Thus, each messageentry randomly obtains the time of another one. This means the total number of messagesis preserved and the associations between them get shuﬄed. Temporal correlations aredestroyed, but the set of instants at which the messages are sent remains unchanged. Forinstance, swapping events at t = 1 and t = 6 results in: t = 1, c → d , and t = 6, a → b . Acknowledgments

We thank NSF-SES-0624116 for ﬁnancial support and C. Briscoe, L. K. Gallos and H.D. Rozenfeld for discussions. F. L. acknowledges ﬁnancial support from The Swedish BankTercentenary Foundation. 13

1] Merton RK (1936) The Unanticipated Consequences of Purposive Social Action.

Am SociolRev

Economy and Society, Vol.1 . (University of California Press, Berkley).[3] Giddens A (1993)

New Rules of Sociological Method . (Stanford University Press, Stanford).[4] Durkheim E (1997)

Suicide, reprint from 1897 . (The Free Press, New York).[5] Pareto V (1896)

Cours d’Economie Politique . (Droz, Geneva).[6] Zipf G (1932)

Selective Studies and the Principle of Relative Frequency in Language . (HarvardUniversity Press, Cambridge, MA).[7] Gibrat R (1931)

Les in´egalit´es ´economiques . (Libraire du Recueil Sierey, Paris).[8] Sutton J (1997) Gibrat’s Legacy.

J Econ Lit

Q J Econ

Soc Networks

IEEE/ACMTrans Networking

Proc. 2003 ACM SIGCOMM Conf. InternetMeasurement (IMC-03) . (ACM Press, New York).[13] Barab´asi A-L (2005) The origin of bursts and heavy tails in human dynamics.

Nature

Europhys Lett et al. (1996) Scaling behaviour in the growth of companies.

Nature

J Am Soc Inf Sci Tec

19] Rozenfeld HD, et al. (2008) Laws of Population Growth.

Proc Nat Acad Sci USA

Fractals , Physics of Solids and Liquids. (Plenum Press, New York).[21] Peng C-K, et al. (1994) Mosaic organization of DNA nucleotides.

Phys Rev E

Adv Phys

An Introduction to Econophysics: Correlations and Com-plexity in Finance . (Cambridge University Press, Cambridge).[24] Leland WE, Taqqu MS, Willinger W, Wilson DV (1994) On the Self-Similar Nature of Eth-ernet Traﬃc (Extended Version)

IEEE/ACM Trans Networking

J Neurosci

Proc NatAcad Sci USA

Biometrika

An informational theory of the statistical structure of language , ed.Jackson, W. (Butterworth, London), pp. 486–504.[29] Barab´asi A-L, Albert R (1999) Emergence of scaling in random networks.

Science

Rev Mod Phys

Euro-phys Lett

Phys Rev Lett

Phys Rev E

Biophys J ecords. Phys Rev Lett

Dissecting the Social: On the Principles of Analytical Sociology . (Cam-bridge University Press, Cambridge).[37] Kentsis A (2006) Mechanisms and models of human dynamics.

Nature IG. 1:

A typical example of an individuals’ message activity. a , Instants at whichmessages were sent by a member belonging to OC1. b , Cumulative number of messages m ( t )(green) and the same but with the messages placed at random (brown). c , Sequence of number ofmessages sent per day, µ ( t ), for the same individual. d,e,f , Color coded sequences µ ( t ) for memberssending M = 100; 1,000; or 10,000 messages overall, respectively. The color is proportional to thelogarithm of the number of messages per day (red: 1 message, blue: 400 messages, white for nomessage). IG. 2:

Average and standard deviation of the growth rate versus number of messages.a , Results for OC1. The average growth rate of messages conditional to m is almost constant andthe standard deviation decays with an exponent β OC1 = 0 . ± . b , Results for OC2. Thestandard deviation conditional to m decays with an exponent β OC2 = 0 . ± . c , Resultsfor OC1, when the messages are shuﬄed, displaying β rnd = 1 / d , Results for OC2, when themessages are shuﬄed. In all cases t corresponds to half of the period of data acquisition and t to the end, which we found to provide optimal statistics (see SI Fig. 1). IG. 3:

Long-term correlations in the message activity of OC1 (a and b) and OC2 (cand d). a , DFA ﬂuctuation functions averaged conditional to M , the total number of messagessent by each member (black: 1-2, red: 3-7, green: 8-20, blue: 21-54, orange: 55-148, brown: 149-403, maroon: 404-1096, violet: 1097-2980, turquoise: 2981-8103). The dotted lines serve as guides,the one in the bottom corresponds to the uncorrelated case, while the one in the top correspondsto the exponent 0 . b , Fluctuation exponent H measured from panel a on the scales 10 days ≤ ∆ t ≤

63 days as a function of the total number of messages sent, M , for real (blue) and individuallyshuﬄed (green) records. c , DFA ﬂuctuation functions averaged conditional to M [colors as in (A)].The dotted lines correspond to the uncorrelated case (bottom) and to the exponent 1 (top). d ,Fluctuation exponents obtained from panel c on the scales 32 days ≤ ∆ t ≤

200 days as a functionof the total number of messages sent, M . Due to weak statistics causing large error bars we do notconsider the last two values for M >

500 as reliable. For clarity the ﬂuctuation functions in panelsa and c are shifted vertically. IG. 4:

Mean out-degree growth rate and standard deviation versus initial out-degree.a , Results for OC1. The average growth of out-degree conditional to the out-degree at t is almostconstant. The standard deviation decays with an exponent β k, OC1 = 0 . ± . b , Resultsfor OC2. The standard deviation conditional to the out-degree at t decays with an exponent β k, OC2 = 0 . ± .

08. The quantities are analogous to those of Fig. 2 except that here the growthrate of the out-degree r k is considered instead of the number of messages sent. UPPORTING INFORMATION (SI)Scaling laws of human interaction activity

Diego Rybski, Sergey V. Buldyrev, Shlomo Havlin,Fredrik Liljeros, and Hern´an A. Makse

I. NOTATION

1. Member j sends his/her n th message at time t j ( n ), where 1 ≤ n ≤ M j and M j is thetotal number of messages sent by j in the time of data acquisition T . The sequenceof counts deﬁned as the number of messages in the period δt , is given by µ δtj ( t ) = X n,t j ( n ) ∈ [ t,t + δt ] a j ( n ) , (10)where a j ( n ) = 1. In addition, the periods are non-overlapping, t = iδt with integer i ,and therefore 1 ≤ t j ( n ) ≤ T . In the case of daily resolution δt = 1 day.2. The cumulative number of messages that a member sends until time t is: m δtj ( t ) = t X t ′ =1 µ δtj ( t ′ ) . (11)In particular, m j (1) = µ j (1) and m j ( T ) = M j .3. The displacement of the random walk is the cumulative sum of the normalized µ δtj ( t ): Y δtj ( t ) = t X t ′ =1 ( µ δtj ( t ′ ) − h µ δtj ( t ) i ) , (12)where h µ δtj ( t ) i is the average of µ δtj ( t ) in time t . The root-mean-square displacementafter ∆ t is deﬁned as F δtj (∆ t ) = q h [ Y δtj ( t + ∆ t ) − Y δtj ( t )] i t , (13)where the average is performed over the time t . Additionally, we perform an averageover members j with activity level M and deﬁne( F δt (∆ t )) M = h ( F δtj ) | M i j . (14)21

20 40 60t (t =T) [days]010000200003000040000 m e m be r s w i t h m > and m − m > (t =T) [weeks] 0100020003000400050006000 a b DC1 DC2

FIG. 5: Optimal times t and t . The panels show for a , OC1, and b , OC2, the number ofmembers with both, m > m − m >

0. While t obviously is optimal at the end of theperiod, t is varied to ﬁnd the value for which the number of members – with at least one messageuntil t and at least one new message between t and t – is maximal.

4. For simplicity, in the main text we skip the index j as well as δt and write µ ( t ), m ( t ), Y ( t ), as well as F (∆ t ).5. To investigate the growth in the number of messages we use the quantities r = ln m m , h r ( m ) i , σ ( m ) and the exponents β OC1 , β OC2 , β G , β rnd .6. To investigate the growth of the degree we use the quantities r k = ln k k , h r k ( k ) i , σ k ( k ) and the exponents β k, OC1 ; β k, OC2 .7. For the growth of the degree in the preferential attachment model we use the quantities r PA = ln k k , h r PA ( k ) i , σ PA ( k ) and the exponent β PA . II. OPTIMAL TIMES t AND t Figure 5 displays the optimal times t and t to calculate the growth rates for OC1(panel a) and OC2 (panel b). 22 II. DETAILS ON THE QUANTIFICATION OF LONG-TERM CORRELATIONSUSING DETRENDED FLUCTUATION ANALYSIS

Statistical dependencies between the values of a record µ ( t ) with t = 1 , . . . , T can becharacterized by the auto-correlation function C (∆ t ) = 1 σ µ ( T − ∆ t ) T − ∆ t X t =1 [ µ ( t ) − h µ ( t ) i ] [ µ ( t + ∆ t ) − h µ ( t ) i ] , (15)where T is the length of the record µ ( t ), h µ ( t ) i its average, and σ µ its standard deviation.For uncorrelated values of µ ( t ), C (∆ t ) is zero for ∆ t >

0, because on average positiveand negative products will cancel each other out. In the case of short-term correlations C (∆ t ) has a characteristic decay time ∆ t × . A prominent example is the exponential decay C (∆ t ) ∼ exp( − ∆ t/ ∆ t × ). Long-term correlations are described by a slower decay, e.g.diverging ∆ t × , namely a power-law, C (∆ t ) ∼ (∆ t ) − γ , (16)with the correlation exponent 0 < γ < µ ( t ) of length T consists of 5 steps:1. Calculate the cumulative sum, the so-called proﬁle: Y ( t ) = t X t ′ =1 ( µ ( t ′ ) − h µ ( t ) i ) . (17)2. Separate the proﬁle Y ( t ) into T ∆ t = int T ∆ t segments of length ∆ t . Often, the lengthof the record is not a multiple of ∆ t . In order not to disregard information, thesegmentation procedure is repeated starting from the end of the record and one obtains2 T ∆ t segments.3. Locally detrend each segment ν by determining best polynomial ﬁts p ( n ) ν ( t ) of order n and subsequently subtract it from the proﬁle: Y ∆ t ( t ) = Y ( t ) − p ( n ) ν ( t ) . (18)23. Calculate for each segment the variance (squared residuals) of the detrended Y ∆ t ( t ) F t ( ν ) = 1∆ t ∆ t X j =1 (cid:0) Y t [( ν − t + j ] (cid:1) (19)by averaging over all values in the corresponding ν th segment.5. The DFA ﬂuctuation function is given by the square-root of the average over all seg-ments: F (∆ t ) = " T ∆ t T ∆ t X ν =1 F t ( ν ) / . (20)The averaging of F t ( ν ) is additionally performed over members of similar activitylevel M .If the record µ ( t ) is long-term correlated according to a power-law decaying auto-correlation function, Eq. (16), then F (∆ t ) increases for large scales ∆ t also as a power-law: F (∆ t ) ∼ (∆ t ) H , (21)where the ﬂuctuation exponent H is analogous to the well-known Hurst exponent [20]. Theexponents are related via H = 1 − γ/ , γ = 2 − H . (22)When γ = 1 then H rnd = 1 /

2, that is the case of uncorrelated dynamics. If the correlationsdecay faster than γ > H rnd = 1 / < γ < / < H <

1. In practice, one plots F (∆ t ) versus ∆ t indouble-logarithmic representation, determines the exponent H on large scales and quantiﬁesthe correlation exponent γ . The order of the polynomials p ( n ) ν determines the detrendingtechnique which is named DFA n , DFA0 for constant detrend, DFA1 for linear, DFA2 forparabolic, etc.The subtraction of the average in Eq. (17) is only necessary for DFA0. By deﬁnitionthe corresponding ﬂuctuation function is only given for ∆ t ≥ n + 2. The detrending orderdetermines the capability of detrending. Since the local trends are subtracted from theproﬁle, only trends of order n − µ ( t ). Throughoutthe paper we show the results using DFA2 which we found to be suﬃcient in terms ofdetrending. 24 degree k −2 −1 < r PA ( k ) > , σ PA ( k ) preferential attachment σ PA (k )1/2 FIG. 6: Growth properties of the preferential attachment model [29] discussed in the main text.We plot the average (black circles) and standard deviation (blue squares) of the growth rate r PA conditional to k , the degree of the corresponding nodes at the ﬁrst stage. Since the ﬂuctuation functions F (∆ t ) for single users are very noisy, it is useful to averageﬂuctuation functions among various members. Thus, we ﬁrst group the members in loga-rithmic bins according to their activity level, the total number of messages M sent. Namely,we group all members that send 1-2, 3-7, 8-20, . . . messages in the period of data acquisitionby using bins determined by b = int (ln M ). Next we average the ﬂuctuation function amongall members from each group b and obtain for every activity level of the members one DFAﬂuctuation function. The error bars in Fig. 3a,c of the main text were obtained by subdivid-ing each group and determining the standard deviations of the ﬂuctuation exponents fromdiﬀerent groups of the same activity level. IV. GROWTH IN THE DEGREE

Figure 6 shows the results of the average growth rates and ﬂuctuations of the growthrates as a function of the initial degree for the preferential attachment model [29]. We ﬁnda constant average growth rate and a standard deviation decreasing as a power law withexponent β PA = 1 / k ( t ) ∼ (cid:18) tt ∗ (cid:19) b , (23)where t ∗ is the time when the corresponding node was introduced to the system and b isthe dynamics exponent in growing network models ( b = 1 / r PA = ln t t , which we alsoﬁnd in Fig. 6.To obtain σ PA ( k ) one can use analogous considerations as for σ ( m ) in the main text.Due to Eq. (6) in the main text, here we have r PA ≈ k t X t =1 κ ( t ) , (24)where κ ( t ) are small increments analogous to µ ( t ), whereas Eq. (23) implies κ ( t ) ∼ (∆ t ) − / . (25)As before, the conditional standard deviation of the growth rate is h [ r PA ( k ) − h r PA ( k ) i ] i ≈ k

20 ∆ t X i ∆ t X j σ κ C ( j − i ) . (26)In the uncorrelated case C ( j − i ) = δ ij , the double sum can be reduced to a single one: σ ( k ) = 1 k

20 ∆ t X i σ κ ( i ) . (27)As shown below, σ κ ( i ) ∼ i − / , and integration leads to σ ( k ) ∼ k Z ∆ t i − / d i (28) ∼ k (∆ t ) / . (29)Eliminating ∆ t using k ∼ t − / , Eq. (23), one obtains σ PA ( k ) ∼ k − / . (30)That is, we obtain β PA = 1 / σ κ ( t ) ∼ t − / . We assume new links are set according to a Poissonprocess, whereas every new link of a node represents an event. The intervals between these26vents (asymptotically) follow an exponential distribution p ( τ ) = λ e − λτ . Accordingly, κ ( t )is a sequence of zeros and only one when a new link is set to the corresponding node. Thestandard deviation of this sequence is σ κ ∼ λ / . (31)Due to Eq. (23) the rate parameter decreases like λ ( t ) ∼ t − / . (32)Accordingly, σ κ ( t ) ∼ t − / . (33)In order to extend the standard PA model, a ﬁtness model has been introduced [31]taking into account diﬀerent ﬁtnesses of the nodes of acquiring links and therefore involving adistribution of b -exponents. The spread of growth rates r could be related to the distributionof ﬁtness. On the other hand, the growth according to Eq. (23) is superimposed with randomﬂuctuations that we characterize with the exponent ββ