Activity Dynamics in Collaboration Networks
AActivity Dynamics in Collaboration Networks
Simon Walk ∗ , Denis Helic , Florian Geigl and MarkusStrohmaier IICM - Graz University of Technology KTI - Graz University of Technology GESIS - Leibniz Institute for the Social Sciences University of Koblenz-LandauSeptember 16, 2018
AbstractAbstract
Many online collaboration networks struggle to gainuser activity and become self-sustaining due to the ramp-up prob-lem or dwindling activity within the system. Prominent examplesinclude online encyclopedias such as (Semantic) MediaWikis, Ques-tion and Answering portals such as StackOverflow, and many others.Only a small fraction of these systems manage to reach self-sustainingactivity, a level of activity that prevents the system from reverting to anon-active state. In this paper, we model and analyze activity dynam-ics in synthetic and empirical collaboration networks. Our approach isbased on two opposing and well-studied principles: (i) without incen-tives, users tend to lose interest to contribute and thus, systems be-come inactive, and (ii) people are susceptible to actions taken by theirpeers (social or peer influence). With the activity dynamics modelthat we introduce in this paper we can represent typical situations ofsuch collaboration networks. For example, activity in a collaborativenetwork, without external impulses or investments, will vanish over ∗ [email protected] a r X i v : . [ c s . S I] F e b ime, eventually rendering the system inactive. However, by appro-priately manipulating the activity dynamics and/or the underlyingcollaboration networks, we can jump-start a previously inactive sys-tem and advance it towards an active state. To be able to do so,we first describe our model and its underlying mechanisms. We thenprovide illustrative examples of empirical datasets and characterizethe barrier that has to be breached by a system before it can becomeself-sustaining in terms of critical mass and activity dynamics. Addi-tionally, we expand on this empirical illustration and introduce a newmetric p —the Activity Momentum —to assess the activity robustnessof collaboration networks.
One of the major problems faced by both, new and existing online socialand collaboration networks—such as Facebook or StackOverflow—revolvesaround efficiently identifying and motivating the appropriate users to con-tribute new content. In an optimal scenario, this newly contributed contentprovides enough incentive for other users to contribute, triggering furtheractions and contributions. Once such a self-reinforced state of increasing ac-tivity is reached, we can say that a system becomes self-sustaining, meaningthat sufficiently high levels of activity are reached, which will keep the sys-tem active without further external impulses. For example, when looking atwell-established collaborative websites, such as StackOverflow or Wikipedia,we already know that at some point in time, these systems have becomeself-sustaining (in terms of activity), evident in their steady growing numberof supporters and overall activity.However, these self-sustaining states are neither easy to reach nor guaran-teed to last. For example, Suh et al. [81] showed that the growth of Wikipediais slowing down, indicating a loss in momentum and perhaps even first evi-dence of a collapse. Moreover, we typically lack the tools to properly analyzethese trends in activity dynamics and thus, can not even perform such sim-ple tasks as detecting self-sustaining system states. Therefore, we argue thatnew tools and techniques are needed to model, monitor and simulate activitydynamics for collaboration networks.The high-level contributions of this work are two-fold. First, we introducea model that is capable of simulating activity dynamics for online collabora-tion networks. Second, we describe in detail how to fit the model to empiri-2 a) Intrinsic Activity(blue) and PeerInfluence (yellow) attime t (b) Intrinsic Activity(blue) and PeerInfluence (yellow) attime t (c) Intrinsic Activity(blue) and PeerInfluence (yellow) attime t (d) Intrinsic Activity(blue) and PeerInfluence (yellow) attime t Figure 1:
Intrinsic Activity and Positive Peer Influence.
Activitydynamics in collaboration networks, represented by users as nodes, collab-oration as edges and activity as node size (Figure (a)), are based on twoopposing principles. The
Activity Decay Rate postulates the loss of intrinsicactivity (blue color of nodes) per user over time. In contrast, the
Peer Influ-ence Growth Rate follows the intuition, that users in collaboration networksare (positively) influenced by their peers (yellow color of nodes) where moreactive peers exercise a higher influence than less active peers. We initial-ize the network at time t with random intrinsic activities. Nodes with agreen halo at times t to t represent users that exhibit a gain in their overallactivity between two iterations t n and t n +1 , as the exercised positive peerinfluence is higher than the intrinsic loss of activity. Analogously, red halosrepresent decreases in overall activity. At first, very central (high degree)nodes with smaller activity values manage to increase their overall activity,while very active central nodes already start to lose activity. After t ormore iterations, due to overall decreasing activities and hence, decreasingpeer influences, all nodes in the collaboration network eventually start tolose activity and inevitably converge towards zero activity.cal datasets, simulate trends in activity dynamics and interpret our findings.The proposed model is based on the formalism of continuous deterministicdynamical systems—meaning that activity is modeled by a system of couplednon-linear differential equations. Each user of the system is represented bya single quantity (the current activity), and the social ties between users de-fine the coupling of variables. In general, when using dynamical systems onnetworks, we define the (micro-)behavior of each user to observe and gather3ew insights into the (macro-)behavior of the system. For a more detailedintroduction to dynamical systems see Section 5 and Newman [65]. For sim-plicity, we do not take individual differences between users into account—thedynamics and its parameters are the same for each user in the population.This allows us to configure the model with a single parameter, which is a ratioof the following two parameters, representing two basic activity mechanisms(cf. Figure 1) in online collaboration networks:(i) Activity Decay Rate λ , which postulates how fast users lose interest tocontribute,(ii) Peer Influence Growth Rate µ , postulating to what extent users areinfluenced by the actions taken by their peers.A first analysis of the model shows that activity dynamics in collabora-tion networks have an obvious and natural fixed point—the point of com-plete inactivity—where all contributions of the users have seized. However,by slightly manipulating the parameters in our model we show that it ispossible to destabilize the fixed point, resulting in a potential increase ofactivity. We then outline the process of calculating the Activity Decay Rate and
Peer Influence Growth Rate for existing collaboration networks, simu-late their corresponding activity dynamics and expand our understanding ofcritical mass—via the notion of
System Mass and
Activity Momentum —incollaboration networks by interpreting our findings.The remainder of this paper is structured as follows: In Section 2 weintroduce and examine our model analytically. We then continue with themodel illustration by simulating activity dynamics for a synthetic datasetand discuss different evolution scenarios of our parameters and their implica-tions. In Section 3 we outline the process of applying our model on empiricaldatasets. In Section 4 we introduce the notion of
System Mass and
ActivityMomentum , review related work in Section 5 and summarize our findingsand discuss limitations and implications for future work in Section 6.
We model activity dynamics in an online collaboration network as a dynam-ical system on a network. Hereby, the nodes of a network represent users ofthe system and links represent the fact that the users have collaborated in4he past. We represent the network with an n × n adjacency matrix A , where n is the number of nodes (users) in the network. We get A ij = 1 if nodes i and j are connected by a link and A ij = 0 otherwise. Since collaborationlinks are undirected, the matrix A is symmetric, thus A ij = A ji , for all i and j . We denote the total number of links in the network with m , and thus wehave 2 m = (cid:80) ij A ij .We model activity as a continuous real-valued variable a i evolving on node i of the network in continuous time t . The general time evolution equationcan be written as follows (see also Newman [65]): da i dt = f i ( a i ) (cid:124) (cid:123)(cid:122) (cid:125) IntrinsicActivityEvolution of i + Peer Influence (cid:122) (cid:125)(cid:124) (cid:123)(cid:88) j A ij g i ( a i , a j ) (cid:124) (cid:123)(cid:122) (cid:125) Influence of j on i , (1)where f ( a i ) specifies the intrinsic activity evolution of node i and g ( a i , a j )describes the influence of neighbor j on node i . To simplify, we assume thatthe intrinsic activity dynamics as well as the influence of node neighbors arethe same for each node i and for each neighbor pair ( i, j ). This means thatwe have a single intrinsic activity function f ( a i ) for all nodes i , as well as asingle peer influence function g ( a i , a j ) for all node pairs ( i, j ).In addition, we make the following assumptions: Intrinsic Activity Decay.
Without external incentives or without pos-itive influence from their social connections, each user has a tendency toslowly reduce activity. For example, people slowly lose interest to partici-pate in collaborative networks or exhaust their resources. An observationthat specifically reflects this inherent exhaust of activity over time has beenmade by Danescu-Niculescu-Mizil et al. [28] for different online communities.We model this situation by using a linear function for f ( a i ): f ( a i ) = − λa i , λ > λ the Activity Decay Rate —the rate at which usersreduce their activity per unit time, given a complete absence of other (pos-itive) incentives. The specific form of f ( a i ) results in an exponential decay( a i ( t ) = a i ( t ) e − λt , with a i ( t ) being the initial activity of node i at time t ) of activity without any external influence. Thus, without other positiveimpulses the activity of every user will decay over time (see Figure 2(a)).5 a) Intrinsic Activity Decay (b) Extrinsic Peer Influence Figure 2:
Intrinsic Activity Decay is the rate at which users reduce theiractivity per unit time and is represented as a linear function in the form of f ( a ) = − λa , which results in an exponential decay in activity that convergestowards zero. Extrinsic Positive Peer Influence describes to what extentusers are influenced by the actions taken by their peers, and is representedas a monotonically increasing function of a users activity in the form of g ( a ) = ( qa ) / (cid:112) a c + a . It naturally saturates at Maximum Peer ActivityFlow q as activity reaches infinity and, in our simulations, can never benegative per definition (see Equation 3). When the user activity passes thepoint of the Critical Activity Threshold a c , peer influence gains notable weightand influences neighbors to “do something” (become active). Positive Peer Influence.
People tend to copy their friends [23; 5; 86],meaning that if neighbors of a node i are active they will positively influ-ence node i to become active as well. The magnitude of the influence, orthe “speed” at which the influence is transferred from an active node to itsneighbors will depend on two quantities (cf. Figure 2):(i) Critical Activity Threshold a c , which represents a soft threshold of ac-tivity that marks the point when users have an activity potential, thatnotably exercises influence on their peers. Note that influence is exer-cised at all levels of a c . However, once a c is reached, the influence isdetermined as “notable” (e.g., a level of activity that is above the aver-age activity per user) for the corresponding peers. Hence, this criticallevel of activity is a system-dependent quantity. One can imagine thatin a system with high user activity (e.g., a large number of changes per6ser) the critical activity is higher than in a system with lower levelsof activity. For example, in the latter case the users will sooner noticea neighbor who became active recently. We model the Critical ActivityThreshold as a continuous threshold. Meaning that active users willalways influence their neighbors, but will exercise more influence afterthey have passed the critical level of activity.(ii)
Maximum Peer Activity Flow q represents the maximum activity flowper unit time from users to each of their neighbors. This maximum flowis reached as user activity approaches infinity. However, substantialamounts of the maximum flow are already reached whenever the useractivity passes the level of the critical activity a c .Thus, to model peer influence, we resort to a monotonically increasingfunction, where more active neighbors are always more influential than lessactive ones. Additionally, the function g ( a j ) saturates for sufficiently largevalues of activity, inducing a natural limit on how much users can be influ-enced by their neighbors. We model this by setting g ( a i , a j ) = g ( a j ) andchoosing an algebraic sigmoid function with: g ( a j ) = qa (cid:113) a c + a j , q, a c > . (3)Peer influence can also be analyzed in terms of the growth rate of g ( a ),in the form of the derivative dg/da of the function g ( a ). After simplifyingand rearranging, the growth rate can be calculated as: dgda = qa c ( a c + a ) / . (4)In the limit of large activity a the derivative of g ( a ) tends towards zero,thus peer influence saturates at q . On the other hand, the maximum changein influence is observed when a = 0—neighbors who suddenly become activewill be noted most, in terms of activity, by their peers. With f ( a i ) and g ( a j ) defined, the activity dynamics equation becomes: da i dt = − λa i + (cid:88) j A ij qa (cid:113) a c + a j . (5)7he different parameters of the equation have dimensions. For example, a i and a c have activity as unit, t has seconds as unit, λ is a rate and hasinverse seconds as unit, and q has activity per second as unit. Further, theequation has three free parameters, which span a huge parameter space thatis difficult to explore in detail. Therefore, our first step is to simplify theequation and express it in a dimensionless form, which typically also hasa smaller number of parameters as only their relative ratios, rather thantheir absolute values, are of importance. Another advantageous side-effectof a dimensionless formulation is that it eliminates the absolute values ofthe properties under investigation, in our case user activity, which can bedifficult to interpret.There are many ways to eliminate dimensions from such equations [53].A useful heuristic is to try to first eliminate the dimensions from the mostnon-linear term in the equation, which in our case is g ( a j ). Thus, we beginby defining a relative activity x as the ratio between the activity a and thecritical activity a c : x = aa c . (6)The variable x is dimensionless now, and it is easy to interpret. Forexample, the fact that x = 5 means that users exercises a strong influenceon their neighbors, since the level of activity is five times the critical activity a c . In fact, the influence in this case is g (5 a c ) = (5 q ) / √ ≈ . q . On theother hand if x (cid:28) x = 0 . g (0 . a c ) = (0 . q ) / √ . ≈ . q .By rearranging, substituting x for a and simplifying ( a c cancels in thesecond term) our activity dynamics equation reduces to: a c dx i dt = − λa c x i + (cid:88) j A ij qx j (cid:113) x j . (7)To eliminate the dimensions from the second term we divide both sideswith q : a c q dx i dt = − λ a c q x i + (cid:88) j A ij x j (cid:113) x j . (8)The term q/a c is the growth rate of the function g ( a ) evaluated at zero: dgda (cid:12)(cid:12)(cid:12)(cid:12) a =0 = qa c ( a c + a ) / (cid:12)(cid:12)(cid:12)(cid:12) a =0 = qa c . (9)8his quantity gives the rate at which the influence on the peers growsif the user activity experiences a small displacement from the point of zeroactivity. Let us now define this quantity as Peer Influence Growth Rate anddenote it with µ = q/a c since this will simplify the algebra and will makethe model interpretation more intuitive. Thus, the last equation can then bewritten as: 1 µ dx i dt = − λµ x i + (cid:88) j A ij x j (cid:113) x j . (10)Finally, we also want to scale time t and express the equation in termsof dimensionless time τ . This last reformulation will further simplify theequation and allows us to interpret and compare activity dynamics over timeacross various systems. The latter is possible due to the usage of dimen-sionless time τ to scale and compare the time evolution of different systemsrelative to each other. Let us make the following substitution: τ = µt. (11)By substituting τ for t in the term on the left hand side in Equation 10we arrive at the dimensionless dynamics equation: dx i dτ = − λµ x i + (cid:88) j A ij x j (cid:113) x j . (12)Now, there is only one parameter in our dynamics equation, namely theratio λ/µ . This is a dimensionless ratio of two rates: (i) The Activity DecayRate λ , which is the rate at which a user loses activity, and (ii) the PeerInfluence Growth Rate µ , which is the rate at which a user gains activity dueto the influence of a single neighbor.The ratio between those two rates is the ratio of how much faster userslose activity due to the decay of intrinsic activity (or interest) than they cangain due to positive peer influence of a single neighbor. For example, a ratioof λ/µ = 100 would mean that the users intrinsically lose activity 100 timesfaster than they potentially can get back from one of their neighbors. If wewould set λ/µ = 1, it would mean that users would lose activity as fast asthey can regain it from one of their peers. For a short description of allparameters of the activity dynamics model see Table 1.9 .2 Linear Stability Analysis In general, Equation 12 is a coupled set of n ( n being the number of nodes orusers in the network) non-linear differential equations, for which, in a typicalcase, no closed form solution can be found. Therefore, we turn our attentionto the properties of so-called fixed points. A fixed point x ∗ represents all thevalues for x ∗ i for which the system does not change in time: dx i dτ = − λµ x i + (cid:88) j A ij x j (cid:113) x j = 0 , ∀ i. (13)Suppose that we are able to find a fixed point x ∗ by solving Equation13. One obvious fixed point in our model is x ∗ = , meaning that x ∗ i hasthe same value for every i : x ∗ i = x ∗ = 0, representing a simple special case:a symmetric fixed point. We can easily check that x ∗ = 0 is indeed a fixedpoint since f ( x ∗ ) = g ( x ∗ ) = 0, and this also gives f ( x ∗ )+ (cid:80) j A ij g ( x ∗ ) = 0 , ∀ i .Table 1: Model and model parameters.
The activity dynamics equationis in a dimensionless form and scales over relative time τ . All properties,as well as the single parameter of the model, are briefly described under P roperties and
P arameters .Equation Name dx i dτ = − λµ x i + (cid:80) j A ij x j √ x j Activity Dynamics EquationProperties Name λ Activity Decay Rate q Maximum Peer Activity Flow a c Critical Activity Threshold µ = qa c Peer Influence Growth Rate τ Relative Time ScaleParameter Name λµ The ratio, describing how fast usersintrinsically loses activity comparedto how fast they get it back from(one of) their neighbors.10e are investigating this specific fixed point, as it also has a particularinterpretation in our model. At this fixed point all users have zero activ-ity, which means that they are completely inactive and the system is in aninactive or “dead” state. If the system is in such a state and no externalincentives are provided, nothing will ever change and the system will remaininactive indefinitely.Typically, we are interested in the implications on the system if we providea small enough impulse to leave such a steady (inactive) state. In our context,the most interesting question is if the system will move from an inactive statetowards a state of lively activity or if it will just revert to the inactive state.Technically, we are interested in the stability of the fixed point. In particular,we want to know if the fixed point is attracting (meaning that the system’sactivity in the proximity of the fixed point will be attracted to it) or repelling(meaning that the system’s activity close to the fixed point will be pushedaway from it).To answer this question we linearize the functions in the proximity ofa fixed point. We represent the value of x i close to the fixed point with x i = x ∗ + (cid:15) i , where (cid:15) i is sufficiently small. To simplify the calculations, weconcentrate on the case of a symmetric fixed point, such as x ∗ = . Next, weperform a Taylor expansion about the fixed point and linearize by neglectingthe terms of second and higher orders. After simplification we obtain (fordetails see e.g. Newman [65]): d(cid:15) i dτ = − λµ (cid:15) i + (cid:88) j A ij (cid:15) j , (14)where (cid:15) i is the displacement of x i from the fixed point x ∗ .We can also write Equation 14 in matrix form, which gives: d (cid:15) dτ = ( − λµ I + A ) (cid:15) , (15)where I is the identity matrix and A is the adjacency matrix.We can solve the last equation by writing (cid:15) as a linear combination ofeigenvectors v r of the symmetric real matrix ( − ( λ/µ ) I + A ): (cid:15) ( τ ) = (cid:88) r c r ( τ ) v r . (16)11quation 15 then becomes: (cid:88) r dc r dτ v r = ( − λµ I + A ) (cid:88) r c r ( τ ) v r = (cid:88) r c r ( τ )( − λµ + κ r ) v r , (17)where κ r are the eigenvalues of the graph adjacency matrix A . We also usedthe fact that the matrix ( − ( λ/µ ) I + A ) has the same eigenvectors as A , butwith the eigenvalues − λ/µ + κ r .The solution of the last equation for the coefficients of the linear combi-nation is then: dc r dτ = ( − λµ + κ r ) c r ( τ ) = ⇒ c r ( τ ) = c r ( t ) e ( − λµ + κ r ) τ . (18)Now, the displacement from the fixed point will decay in time towards 0if the exponents for the coefficients c r ( τ ) are all negative. Thus, we arrive atthe master stability equation for the special case of a dynamical system thatwe defined as: − λµ + κ r < , ∀ r, (19)Since the adjacency matrix has both positive and negative eigenvalues,a necessary stability condition is λ/µ >
0, which is satisfied by definition.Thus, we can rearrange Equation 19 and obtain the following inequality: κ < λµ . (20)where κ is the largest positive eigenvalue of the graph adjacency matrix.Note that this inequality separates the network structure ( κ ) from the ac-tivity dynamics ( λ/µ ).If this stability condition is satisfied, the fixed point x ∗ = 0, in whichthere is no activity at all (“inactive” system), represents a stable fixed point.This also means that small changes in activity only cause the system tomomentarily leave the (attracting) fixed point until it becomes inactive again.For illustration, we initialized Zachary’s Karate Club Network (cf. Fig-ures 3(a) and 3(b)) with random activities between 0 and 0 . a) Zachary’s Karate ClubNetwork Re ( ) I m () (b) Adjacency Spectrum( κ = 6 . . . . . . . t A c t i v i t y Activity per Node over t ( lm = , D t = , k = ) (c) Activity Evolution λµ > κ t A c t i v i t y Activity per Node over t ( lm = , D t = , k = ) (d) Activity Evolution λµ < κ Figure 3:
Illustrative example. Top Left (a):
Visualization of Zachary’sKarate Club. The size and color of a node represent random activity valuesbetween 0 . . Top Right (b):
Eigenvalue spectrum of Zachary’s KarateClub network. The highest eigenvalue is 6 . Bottom (c and d):
Evolu-tion of activity with random initial activities (averaged over 10 runs).
Bot-tom Left (c):
Activity dynamics with parameters satisfying the masterstability condition κ < λ/µ . Each line represents one node; all activitiesconverge to the state of zero activity. Bottom Right (d):
Invalidation ofthe master stability condition κ < λ/µ , activity converges towards a newand permanently active fixed point.In practice, additional system configurations are imaginable. Wheneverthe ratio is below κ , the system becomes unstable leaving the inactive state.However, due to the special form of the peer influence function, which sat-13 c t i v i t y t (in months) Init 1 2 3 4 5 6
Activity Increase Activity Variation Activity Decrease l l
Changes in Activity over Time lm Timespans of SimulationInit 0−1 1−2 2−3 3−4 4−5 5−6 k l l Changes of Ratio over Time
Activity Increase Activity Variation Activity Decrease
Figure 4:
Coupled evolution of activity and λ/µ . The top
Figure depictsthe evolution of activity ( y -axis) over time ( x -axis; in months) for Zachary’sKarate Club network with synthetically created (random) activities. Theratios, which correspond to the activity evolutions over time in the top Figure,are depicted in the bottom Figure (same symbol and color), with the y -axisrepresenting the value of the ratio, while the different timespans are depictedon the x -axis. As long as λ/µ < κ the network converges towards a state ofimmanent activity, yet decreases in activity are possible (see timespans 2 − Activity Variation sections in top and bottom ). If λ/µ > κ the networkconverges towards an inactive state.urates for large values of activity, the system will converge towards anotherstable state of immanent activity (i.e., ratios for periods 1 − κ > λ/µ , we can think of threedifferent activity evolution scenarios , depending on the current levelsof activity present in the network:1. If the levels of activity are lower than the ones the network convergestowards with the new ratio, we will see an increase in activity (e.g.,14imespans 1 − Activity Increase in Figure 4).2. If the new ratio lets the system converge towards lower levels of activitythan currently present, activity will decrease, even though κ > λ/µ (e.g., see timespans 2 − − Activity Variation and
ActivityDecrease of Figure 4).3. Lastly, the levels of activity have already converged towards their fixedpoint and λ/µ is left unchanged, retaining the levels of activity fromthe past (e.g., see timespans 0 − Activity Increase in Figure 4).If κ < λ/µ holds, the system is stable and activity converges towardsthe attracting fixed point at zero activity (see timespans 5 − ActivityDecrease in Figure 4).
Summary of system stability analysis.
In order to permanently leavethe stable state of complete inactivity we are interested in making the systemunstable. To be able to leave the attracting force of the fixed point at zeroactivity we have the following two options:(i) We provide (continuous) external impulses to the system, forexample, in the form of incentives for users to increase their activity,pushing the system far away from the fixed point of no activity (andhope that it will be attracted by another fixed point where activity isnot zero).(ii) We compromise the stability condition by either manipulating:(a) the network structure (i.e., making κ larger ) or(b) the activity dynamics (i.e., making λ/µ smaller ).Structurally, we can manipulate the size of κ by creating or remov-ing links (and nodes) in our network (for more information on how tomanipulate κ see [65]). Dynamically, λ/µ becomes smaller if either λ becomes smaller, meaning that the intrinsic user activity decays at aslower pace or µ becomes larger, meaning that people copy their friendsmore and faster, or both. 15 .3 Discussion on Parameter Evolution At this time, we leave the investigation of the manipulation of the activitydynamics ratio λ/µ as well as the manipulation of the network structure toinvalidate the master stability equation open for future work. Nevertheless,before illustrating how our proposed activity dynamics model can be appliedto empirical datasets, we discuss potential system evolution scenarios andtheir implications for activity.
Activity Decay Rate . Technically, if λ increases, the ratio λ/µ increasesas well, resulting in higher (faster) losses of activity per timespan. Once thesystem satisfies the master stability equation ( κ < λ/µ ) it will inevitablybecome inactive. To be precise, the larger λ for a stable system, the fasteractivity will converge towards zero. Essentially, an increase in λ represents anincreased intrinsic loss of activity for all users (e.g., due to a lack of interest tocontribute) while a decrease of λ can be interpreted as an increase of interest(more precisely, slower loss of interest) and thus higher levels of activity. Evolution scenarios of Activity Decay Rate . We would expect to see anincrease in λ on websites with low levels of user interaction and activity(i.e., meaning that individual contributions are not valued, as no feedbackis provided). On the other hand, websites that engage with their users andprovide steady updates (e.g., new content or functionality) will likely see aconsistent or even decreasing λ . In general, practitioners can influence λ by, for example, providing incentives for users to contribute, such as badges,barn stars, likes, reputation systems, or monetary incentives. Peer Influence Growth Rate . With increasing values for µ the ratio λ/µ decreases, resulting either (i) in an overall increase in activity if thesystem is unstable ( κ > λ/µ ), (ii) in prolonged timespans of activity beforeconverging towards inactivity if the system is stable ( κ < λ/µ ), (iii) or in aninvalidation of the master stability equation if λ/µ reaches a tipping pointwhere κ > λ/µ .The evolution of µ directly corresponds to the evolution of the MaximumPeer Activity Flow and
Critical Activity Threshold . Maximum Peer Activity Flow . The parameter q defines the maximum amountof activity (peer influence) that can traverse along the edges of the collabora-tion network per unit time. If this parameter increases, µ = q/a c will increaseas well; resulting in an overall increase in activity. In contrast, reducing thevalue of q results in overall decreasing levels of activity.16 l l l l l l l Time P a r a m e t e r V a l ue Evolution of Parameters l a c l / m q lk Time A c t i v i t y Evolution of Activity activity (a) Evolution of
Critical Activity Threshold l l l l l l l l
Time P a r a m e t e r V a l ue Evolution of Parameters l a c l / m q lk Time A c t i v i t y Evolution of Activity activity (b) Coupled Evolution of Parameters
Figure 5:
Parameter Evolution Scenarios.
In a system with (at first)increasing overall levels of activity and fixed values for q and λ for all users,we expect a c to slowly increase (see (a) ), as individual contributions areindistinguishable due to a flood of newly added content (activity). As aconsequence, more posts and replies are required from all users to exercisethe same amount of peer influence—represented by increasing values for a c over time. After a certain point in time, a c will reach a threshold and activitywill start to decrease, if not intervened by administrators. In a more realisticscenario (see (b) ), again with increasing levels of overall activity, users will—in addition to increasing values of a c —start to lose interest in contributingto the system, represented by increasing values for λ . As a consequence,activity will decrease at a faster pace. Evolution scenarios of Maximum Peer Activity Flow . In real-world systems, q is best interpreted as a proxy for the efficiency of the user interface, describ-ing how well information (or influence) is transported (e.g., highlighted orvisualized) across users. For example, practitioners can influence the Maxi-mum Peer Activity Flow by adding recommendations for users to collaboratewith or by optimizing the presentation of newly added/edited content. Notethat with increasing numbers of users and levels of activity it becomes in-creasingly difficult for practitioners to keep q at its current level, let alonepositively influence the parameter due to the vast amount of content and/oractivity present in the system. Critical Activity Threshold . The parameter a c represents a soft threshold,which defines when users start to “effectively notice” the actions of theirpeers and are, as a consequence, “notably” influenced (see Figure 2(b)) bythem. The larger a c , the more actions (i.e., posts or replies) are required17y users to positively influence their peers to copy their actions and increasetheir activity levels (see Figure 5). Evolution scenarios of Critical Activity Threshold . In practice, we wouldexpect to see an increasing a c with an increasing number of active usersand levels of activity. For example, in a system with low activity and asmall number of users, each action by a particular user will be noticed im-mediately by all others—meaning that the level of a c is low. However, withincreasing numbers of users and an increase in activity, users have to increasetheir number of posts and replies to be noticed by their peers. Hence, themore active users are present in a system, the harder it becomes for users tospecifically notice each contribution of their peers individually. In a worstcase, users are confronted with an activity overload that might even result indecreasing levels of (positive) peer influence. In particular, an initial increasein activity likely leads to an increase in a c , which in turn decreases activityin the system. Thus, evolution of a c represents a negative feedback loop inthe system. In contrast to q , which serves as a proxy for the user-interface, a c represents an intrinsic parameter of the users of a system. Administra-tors of such networks and websites can influence a c by either influencing q (e.g., by adjusting the user interface to better promote each individual actiontaken by the peers of a user) or by actively avoiding and counteracting theactivity overflow by filtering and reducing the amount of new content that isdisplayed at once.For example, the mechanisms of how Facebook displays posts in its “NewsFeed” can be seen as a measure to filter and limit newly added content; ac-tively avoiding information or activity overloads while maximizing the (peer)influence of each individual contribution. Summary of evolution scenarios.
If activity increases over time and noadaptations to the system are implemented, activity will inevitably decrease,due to a larger
Critical Activity Threshold (see Figure 5). To counteract thisdevelopment, website administrator could either try to manipulate
ActivityDecay Rate —an intrinsic property that varies per user—or optimize the userinterface, and thus manipulate
Maximum Peer Activity Flow . We are now interested in modeling and simulating activity dynamics forempirical datasets. In particular, we investigate activity dynamics for an18
50 100 150 200 250
Degree O cc u rr en c e s Degree Distribution for HistoryStackExchange (a) HistoryStackExchange
Degree O cc u rr en c e s Degree Distribution for BitcoinStackExchange (b) BitcoinStackExchange
Degree O cc u rr en c e s Degree Distribution for EnglishStackExchange (c) English Language& UseStackExchange
Degree O cc u rr en c e s Degree Distribution for MathStackExchange (d) MathematicsStackExchange
Degree O cc u rr en c e s Degree Distribution for Beachapedia (e) BeachapediaWiki
Degree O cc u rr en c e s Degree Distribution for Nobbz (f) Nobbz Wiki
Degree O cc u rr en c e s Degree Distribution for NeuroLex (g) NeuroLex Wiki
Degree O cc u rr en c e s Degree Distribution for 15MW (h) 15Mpedia Wiki
Figure 6:
Degree Distribution of Empirical Collaboration Networks.
Visualization of the degree distribution of all investigated collaboration net-works. The top row (a to d) depicts the different StackExchange collab-oration networks, while the bottom row (e to h) shows the collaborationnetwork visualizations for the different Semantic MediaWiki instances. Themajority of users, across all collaboration networks, exhibits between 0 and10 collaboration edges.array of different websites, consisting of instances of the StackExchange network as well as multiple Semantic MediaWikis .First, we characterize the investigated datasets and outline our methodsfor the empirical estimation of the required parameters (see Table 1). Wethen fit our model to the collaboration networks and present the results ofthe activity dynamics simulation. .1 Datasets We selected a total of four differently sized instances from the StackExchangenetwork as well as four different Semantic MediaWiki instances to model ac-tivity dynamics. In particular, we concentrate our efforts on the HistoryStackExchange (HSE), which is the smallest of the StackExchange datasetsand allows users to discuss topics and questions related to history and histor-ical events. The Bitcoin StackExchange (BSE) as well as the The EnglishLanguage & Usage StackExchange (ESE) represent two medium-sized web-sites and are platforms for asking and discussing questions related to every-thing related to mining, buying and selling of bitcoins and the English lan-guage respectively. On the Mathematics StackExchange (MATHSE) web-site, which also represents our largest dataset, users can ask and discussmathematics related questions and topics.We further investigate activity dynamics for the Beachapedia Wiki (BP),representing the smallest dataset in our activity dynamics analysis, strivingto create a structured knowledge base for a variety of topics on beachesin the United States. The medium-sized german Nobbz Wiki (NZ) pro-vides a structured knowledge base and discussion platform for the onlinegame “Die Verdammten” . The second largest dataset, the NeuroLex Wiki (NLX), represents a large and semantically enriched lexicon on terms andtopics related to neuroscience. Our largest dataset is the 15Mpedia Wiki (15MW)—a Spanish Semantic MediaWiki instance that discusses a wide va-riety of topics related to Spain and its different areas and regions.In general, the investigated datasets are very diverse in their characteris-tics, for example, the number of active users ranges from 35 ,
476 in MATHSEto a total of 16 in BP. For the analyses conducted in this paper we focus onthe last 52 weeks of each dataset. For more detailed information see Table 2.The different degree distributions for all collaboration networks are highlyheterogeneous (cf. Figure 6). For all investigated datasets, the majority of http://history.stackexchange.com http://bitcoin.stackexchange.com http://english.stackexchange.com http://mathematics.stackexchange.com http://nobbz.de/wiki http://neurolex.org/ http://wiki.15m.cc/wiki/Portada λ/µ > To estimate λ/µ for (preprocessed) empirical datasets we resort to an output-error estimation method. First, we formulate the estimation of the modelparameter as an optimization problem. As objective function we use a well-known least-squares cost function. Second, we solve the optimization prob-lem numerically, using the method of gradient descent in combination withNewton’s method to speed up the calculations. Finally (as a proof of con-cept), we evaluate the accuracy of the ratio estimate by calculating predictionerrors on unseen data. Next, we describe these estimation steps in more de-tails.
Preprocessing.
First, we aggregate all activities per user per day and21igure 7:
Collaboration Network Construction.
This plot depicts thedifferent elements of the StackExchange and Semantic MediaWiki datasetsthat have been classified as posts and replies (cf. Table 2) as well as theedges that have been drawn between certain entities and change-actions andrepresent collaboration in our collaboration networks.Table 2:
Dataset statistics.
Note that all datasets differ in the number ofusers, collaboration edges and activity. Users refers to the number of uniqueusers that have contributed more than one post or reply to the correspondingdatasets within our observation periods. Posts represent newly created ques-tions in the case of the StackExchange network and newly created articles inthe case of the Semantic MediaWiki datasets. Replies are either commentsor answers for all StackExchange datasets and edits of existing articles forSemantic MediaWikis. κ denotes the largest eigenvalue of the correspond-ing collaboration network. For our experiments we limited our observationperiods to the last 52 + 3 weeks of each dataset. Dataset HSE BSE ESE MATHSE BP NZ NLX 15MWUsers 682 1 ,
299 7 ,
893 35 ,
476 16 36 112 394Edges 5 ,
179 5 ,
528 83 ,
457 477 ,
133 38 125 383 772 κ .
33 43 .
88 162 .
04 303 .
58 6 .
71 11 .
46 18 . . ,
496 12 ,
295 151 ,
028 986 ,
996 2 ,
718 603 33 ,
792 102 , Formulating estimation as an optimization problem.
Depending ona particular application of the model we may need to introduce a suitableobjective function. For example, we may be interested in applying our modelto analyze and simulate the aggregated levels of activity in a system. In otherwords, we are interested in the overall activity level in a system, rather thanin the particular activity distribution over the users (see below for anotherexample involving user activity levels). Hence, we formulate the objectivefunction (see Equation 21) as a least squares cost function, which calculatesthe error of the sum of activity over multiple data points over a certain periodof time T : J ( λµ ) = 1 T T − (cid:88) k =0 (cid:34) n (cid:88) i x i ( k + 1) − n (cid:88) i ˆ x i ( k + 1) (cid:35) , (21)where x i ( k ) is the empirically observed activity of user i at time k , ˆ x i ( k )is the estimated activity for user i at time k , and n is the total number ofusers as before.To calculate the estimates ˆ x i ( k ) we numerically integrate the differentialequations from our model by applying Euler’s method for solving differentialequations computationally. Thus, we approximate the time evolution of ˆ x i between all time steps k and k + 1 (for each of these steps we set the totaltime to τ ) by iterating:ˆ x i,t +1 ( k ) = ˆ x i,t ( k ) + ∆ τ (cid:34) − ˆ λµ ˆ x i,t ( k ) + (cid:88) j A ij ˆ x j,t ( k ) (cid:112) x j,t ( k ) (cid:35) , (22)where we set ˆ x i,t =0 ( k ) = ˆ x i ( k ), ∀ i, k and use the current estimate for λ/µ to23erform calculations. The final equation for ˆ x i ( k + 1) becomes:ˆ x i ( k + 1) = ˆ x i ( k ) + ∆ τ t = τ (cid:88) t =0 (cid:34) − λµ ˆ x i,t ( k ) + (cid:88) j A ij ˆ x j,t ( k ) (cid:112) x j,t ( k ) (cid:35) . (23)The local approximation error for the Euler’s method is of the order O (∆ τ ) and the global of the order O (∆ τ ). To perform integration betweensteps k and k + 1 we need to iterate for τ / ∆ τ steps, where ∆ τ needs tobe chosen with care. In general, if we set ∆ τ too high—meaning that thecalculations are less computationally intensive, as we have to run a smallernumber of iterations—the accuracy of our simulation (including the estima-tion of the ratio) will decline, as the potential error per iteration due to ourapproximations becomes higher. This error can become so large that it couldpotentially lead to numerical instability, meaning that the overall activity ina system can become negative, which might result in activity to diverge to-wards ±∞ . With certain combinations of the network structure, ∆ τ and thecalculated ratios, activity can become negative without diverging, oscillatingaround the fixed point of zero activity until convergence. In contrast, if weset ∆ τ too low we end up with a very precise simulation, although the timenecessary to compute the simulation will be much higher, as a much largernumber of iterations will have to be executed. Numerical solution of the optimization problem.
We solve the op-timization problem numerically using the method of gradient descent. Thefirst derivative of the objective function (Equation 21) defines the updaterule or gradient, which directs if and to what extent we have to increaseor decrease λ/µ to minimize the error of the sum of activities over all datapoints during T .Once we calculate the first derivative with the current values of estimatedactivities we update the ratio by multiplying the derivative with the learningrate η . Thus, the complete procedure is as follows. First, we initialize ourestimation by using κ for the first simulation. Second, we estimate theactivities and calculate the gradient with these estimates. Third, we calculatethe error between our simulated and empirical values, and adapt the ratioaccording to the corresponding update function and step size η . Fourth,we repeat this process until the calculated update for the ratio is smallerthan a given convergence criterion (e.g., 10 − ) or if we reach a total of20 ,
000 iterations without reaching convergence. Additionally, we have alsoimplemented Newton’s method, which in our cases substantially reduces the24
Simulated Activity Synthetic Activity Ratios −2 0 2 4 6 8 t (in weeks) A c t i v i t y l l l ll l l l l l l l l l Activity over t (in weeks) R a t i o (a) Increasing Activity −2 0 2 4 6 8 t (in weeks) A c t i v i t y l l l ll l l l l l l l l l Activity over t (in weeks) R a t i o (b) Decreasing Activity −2 0 2 4 6 8 t (in weeks) A c t i v i t y l l l ll l l l l l l l l l Activity over t (in weeks) R a t i o (c) Variable Activity Figure 8:
Illustrations with Synthetic Data.
The plots depict the resultsof the activity dynamics simulations for Zachary’s Karate Club network withsynthetic activity values (left y -axes) and the corresponding ratios (right y -axes). The black solid lines with x markers represent the simulated activityover t (in weeks; x -axes). The solid gray lines with circles represent syn-thetic activities; the gray dotted lines with diamonds represent the ratioscorresponding to the simulated activities. With increasing and decreasingactivities, the ratios become smaller (see (a) ) and larger (see (b) ). Whensetting activity randomly (see (c) ) the ratio adjusts analogously.computation time. In all our experiments we set T to four weeks, meaningthat we optimize the objective function by calculating the optimal ratio overa span of four data points (weeks). Evaluation of the parameter estimates.
We evaluate the accuracy ofthe estimated parameters by cross-validation (leave-one-out method). Inparticular, we use the estimated ratios over 4 weeks to simulate activity forthe succeeding week. For example, we calculate the optimal λ/µ (accordingto our objective function) for weeks 1 – 4 and predict activity for week5. Next, we use the empirical data of weeks 2 – 5 to calculate the ratioto predict activity for week 6. Hence, we calculate a total of 52 ratios tosimulate activity for a total of 52 weeks.As depicted in Figure 8, we have created three synthetic scenarios to testand illustrate the mechanisms of the
Activity Dynamics Model . First, weestimate λ/µ (right y -axes; gray dotted lines with diamonds) for the three25cenarios with synthetically created increasing, decreasing and variable orrandom activities (left y -axes; gray solid lines with circles) over 10 + 3 weeks( x -axes). In all three scenarios we use Zachary’s Karate Club as the un-derlying collaboration network. Due to our parameter estimation processthe simulated levels of activity (left y -axes; black solid lines with x markers)exhibit a small lag when activity steadily moves into one direction (i.e., in-creases or decreases). On the other hand, small fluctuations (see weeks 6 –9 in Figure 8(c)) are mitigated. The ratios (right y -axes), which correspondto the simulated levels of activity in the same week, are depicted as well. Discussion on parameter estimation method.
To validate the correct-ness of our implementation of the method of least squares, we have simulatedactivity for datasets with a preset ratio (and random weights for initializa-tion) for 3 weeks. We then used the random activity initialization values, aswell as the activity values for each of the 3 weeks as input for the calculationof the ratio with the method of least squares. Using this approach, we wereable to estimate previously set ratios with negligibly small errors. Whenadding noise to the simulated activity values, the obtained ratios were lessaccurate accordingly.Note that the estimation and validation method that we apply is only oneof many possible methods. In this paper, we want to illustrate the generalapplicability of our method as well as its potential to gather new insightsinto the intricate dynamics of activity in online collaboration networks. Wemeasure the accuracy of the prediction only as a general proof of conceptof our model and leave further investigations of the predictive power of ourmethod open for the future work. Following up on this notion, we now shortlydiscuss some alternative approaches for formulating the objective functionand their implications.
Alternative objective functions.
To demonstrate the versatility of ourmodel—if we are interested in answering questions about the distribution ofthe activities over users—we may change the formulation of the objectivefunction to calculate ratios that minimize the error of activity per user andper data point (see Equation 24). Note that when optimizing towards ag-gregated levels of activity, we obtain ratios that characterize the systems. Incontrast, with the adapted objective function, we are interested in learningmore about the users of such systems. The alternative objective functionmay be defined as follows: 26 ( λµ ) = 1 T T − (cid:88) k =0 [ x ( k + 1) − ˆ x ( k + 1)] , (24)where x and ˆ x are now n -dimensional vectors storing the activities of all n users. Thus, this objective function represents the sum of squared errorscalculated for each of the n users of the corresponding systems over a totalof T data points.We have estimated λ/µ and simulated activity for HSE using this ob-jective function. In contrast to the aggregated levels of activity, we obtaina more accurate distribution of activities across all users, as was intended.However, each of the 4 data points in T now corresponds to a vector of n users, as opposed to a single value (the aggregated activities), resulting ineither much higher computation times, a larger error for the prediction tasksor both.Additionally, to tackle the prediction problem and to avoid overfitting wemay introduce a regularization term to the objective function. For example,we might be interested in keeping the ratio or the difference between the ratioand κ small. In the latter case we would add a term such as γ ( κ − λ/µ ) to our objective function, where γ represents the strength of regularization.We leave a detailed analysis and comparison of different objective func-tions open for future work. The ratios calculated to minimize the error foraggregated activity levels exhibit higher accuracy in our simulations (in termsof overall activity per month). The trade-off for a more accurate distributionof activities over users with the changed objective function are worse resultsfor the simulation of activity, as not only the aggregated activity levels areconsidered, but the vector of activities of all user in our datasets over multi-ple points in time. However, these ratios provide a better overall correlationbetween simulated and empirical activities per contributor of our system. After calculating λ/µ and setting ∆ τ we simulate activity in our collabo-ration networks. Due to our chosen approximations, the main goal of thepresented illustration is not to predict activity in collaboration networks.Rather, we are interested in demonstrating that our assumptions regardingthe Activity Decay Rate and the
Peer Influence Growth Rate hold and allowus to simulate trends in activity dynamics for given and real values. Further,27
Simulated Activity Empirical Activity t (in weeks) A c t i v i t y llllllllllllllllllllllllllllllllllllllllllllllllllllllll Activity over t (in weeks) (a) HistoryStackExchangeActivity t (in weeks) A c t i v i t y llllllllllllllllllllllllllllllllllllllllllllllllllllllll Activity over t (in weeks) (b) BitcoinStackExchangeActivity t (in weeks) A c t i v i t y llllllllllllllllllllllllllllllllllllllllllllllllllllllll Activity over t (in weeks) (c) EnglishStackExchangeActivity t (in weeks) A c t i v i t y llllllllllllllllllllllllllllllllllllllllllllllllllllllll Activity over t (in weeks) (d) MathematicsStackExchangeActivity l Simulated Activity Empirical Activity t (in weeks) A c t i v i t y llllllllllllllllllllllllllllllllllllllllllllllllllllllll Activity over t (in weeks) (e) BeachapediaActivity t (in weeks) A c t i v i t y llllllllllllllllllllllllllllllllllllllllllllllllllllllll Activity over t (in weeks) (f) NOBBZ Activity t (in weeks) A c t i v i t y llllllllllllllllllllllllllllllllllllllllllllllllllllllll Activity over t (in weeks) (g) NeuroLexActivity t (in weeks) A c t i v i t y llllllllllllllllllllllllllllllllllllllllllllllllllllllll Activity over t (in weeks) (h) 15MW Activity Figure 9:
Results for the activity dynamics simulation.
The plotdepicts the results of our activity dynamics simulation for the StackExchangedatasets ( top row ) and Semantic MediaWiki instances ( bottom row ). Thesolid gray lines with circles represent the empirical (observed) activity over t (in weeks; x -axes), while the solid black lines represent the simulated activitydynamics ( y -axes). In all of our analyzed datasets, the simulated activitydynamics exhibit a notable resemblance to the empirical activity.by modeling and simulating activity dynamics for empirical datasets we notonly deepen our understanding of the model but we also—depending on thevalues of the parameters—potentially obtain new insights into the systemsunder investigation.Figure 9 depicts the results of the activity dynamics simulation. The rootmean-squared errors (RMSEs) of the simulations are listed in Table 3.28verall, the results gathered from the activity dynamics simulation ex-hibit a notable resemblance to the real activities of the corresponding datasets.Due to the chosen approximations and simplifications when estimating λ/µ for our model (i.e., static network structure and average model parametersover weeks and users), the simulated activity is naturally limited in its accu-racy. These limitations are particularly visible whenever there are large andsudden increases of activity in the collaboration networks. Note that λ/µ will only be higher than κ if activity in our datasets is either zero or therelative difference in activity between two months is extremely high, whichis never the case for our smoothed empirical datasets.Further, the assumption of a fixed network structure of our investigatedcollaboration networks also (negatively) influences the obtained results ofour simulation. For example, it is possible for our simulation to yield higherincreases in activity (e.g., Figure 9(b)), as users might be influenced by peers,who would join the collaboration network only at a later point in time. We can further analyze the obtained ratios and parameters of our activitydynamics simulation to broaden our understanding of the collaboration net-works under investigation. Figure 10 depicts the value of the calculated ratios λ/µ ( y -axis) for each week ( x -axis). If the ratio is higher than κ (denoted inthe title of each Figure), our master stability equation holds and the systemconverges towards zero activity (over time). The amount of activity that islost per iteration—and hence the speed of activity loss—is proportional tothe value of the ratio and the activity already present in the network. Ingeneral, a higher ratio results in a higher and faster loss of activity.Table 3: RMSE.
The table depicts root mean-squared errors (RMSE) ofour activity dynamics simulation per user and week for all datasets. Oursimulation yields a small RMSE for all StackExchange datasets. RMSE forthe Semantic MediaWiki datasets is slightly higher, which is likely due tothe lower number of active users (listed in the Users column).
Dataset
HSE BSE ESE MATHSE BP NZ NLX 15MW
Activity ,
496 12 ,
295 151 ,
028 986 ,
996 2 ,
718 603 33 ,
792 102 , Users
682 1 ,
299 7 ,
893 35 ,
476 16 36 112 394
RMSE .
076 0 .
031 0 .
029 0 .
030 1 .
755 0 .
274 4 .
397 4 . lllllllllllllllllllllllllllllllllllllllllllllllllll t (in weeks) R a t i o Ratio ( lm ) over t (in weeks) D t = , k = l Ratio (a) HistoryStackExchangeRatios llllllllllllllllllllllllllllllllllllllllllllllllllll t (in weeks) R a t i o Ratio ( lm ) over t (in weeks) D t = , k = l Ratio (b) BitcoinStackExchange Ratios llllllllllllllllllllllllllllllllllllllllllllllllllll t (in weeks) R a t i o Ratio ( lm ) over t (in weeks) D t = , k = l Ratio (c) EnglishStackExchangeRatios llllllllllllllllllllllllllllllllllllllllllllllllllll t (in weeks) R a t i o Ratio ( lm ) over t (in weeks) D t = , k = l Ratio (d) MathematicsStackExchangeRatios llllllllllllllllllllllllllllllllllllllllllllllllllll . . . . . t (in weeks) R a t i o Ratio ( lm ) over t (in weeks) D t = , k = l Ratio (e) BeachapediaRatios llllllllllllllllllllllllllllllllllllllllllllllllllll t (in weeks) R a t i o Ratio ( lm ) over t (in weeks) D t = , k = l Ratio (f) NOBBZ Ratios llllllllllllllllllllllllllllllllllllllllllllllllllll t (in weeks) R a t i o Ratio ( lm ) over t (in weeks) D t = , k = l Ratio (g) NeuroLex Ratios llllllllllllllllllllllllllllllllllllllllllllllllllll t (in weeks) R a t i o Ratio ( lm ) over t (in weeks) D t = , k = l Ratio (h) 15MW Ratios
Figure 10:
Evolution of ratios λ/µ . The evolution of the ratios λ/µ ( y -axes) over τ (in weeks; x -axes) for the StackExchange datasets ( top row )and for the Semantic MediaWiki instances ( bottom row ). The smaller theratio, the higher the levels of activity in Figure 9. Small variances in λ/µ over time indicate that activities of the systems are less influenced by theactivity of single individuals than they are by peer influence.If the ratio is smaller than κ , the master stability equation has beeninvalidated and the system will converge towards a new fixed point of im-manent activity (cf. Section 2.2). If this is the case, we can observe one ofthree potential behaviors, which are triggered depending on the amount ofactivity already present in the network and the current ratio:(i) An increase in activity if the new fixed point, corresponding tothe new ratio, is of higher overall activity than the activity alreadypresent in the collaboration network (see τ = 20 −
30 in Figures 9(d)and 10(d)). This situation emerges whenever we invalidate the masterstability equation from a previously stable fixed point or if the system30s already stable in a situation when the new ratio is smaller than thelast estimated ratio.(ii)
A decrease in activity if the new fixed point is of lower overallactivity than the activity already present in the collaboration network(see τ − No change in activity if the new fixed point corresponding to the newratio is of the same overall activity than the activity already present inthe collaboration network (see τ −
30 in Figures 9(b) and 10(b)).
System Mass.
We can now use the obtained ratios to characterize thecollaboration networks and quantify their robustness in terms of their activitydynamics. Robust systems are systems with lively and high levels of activity,which are able to keep that activity even in the cases of small unfavorablechanges in the dynamical parameters. Less robust systems are systems thatlose their activity very quickly as a consequence of even small changes inthe ratio. Thus, we calculate the standard deviation over all ratios σ λ/µ overtime and normalize it over κ —to account for the size of the collaborationnetworks—and refer to it as ρ —the normalized standard deviation of theratio λ/µ (see Equation 25). ρ = σ λ/µ κ (25)The normalized standard deviation is a measure of system sensitivity andits inverse (1 /ρ ) represents a measure of system stability or inertia to changesin activity. Analogously to mass in classical mechanics—which defines theinertia or resistance of being accelerated or decelerated for an object by agiven force—we call the quantity 1 /ρ the System Mass . We denote thisquantity with m s with the subscript s to distinguish it from the number oflinks m in a collaboration network (see Table 4). In systems with a large System Mass it is more difficult to induce changes in activity. In particular,this means that it is more difficult to reduce activity in a consistently activesystem (due to the small standard deviations of λ/µ ), as well as it is difficult31o jump-start the same system if activity levels were consistently low in thepast (again, due to small standard deviations of λ/µ ). Activity Momentum.
After calculating the
System Mass m s , we are nowinterested (again analogously to classical mechanics) in calculating the Ac-tivity Momentum p for our collaboration networks (see Equation 26). p = m s a (26)For activity we take (i) the average activity (posts and replies) per week and(ii) the activity in the last month of our observation periods (cf. Table 4)and calculate (i) the average and (ii) the current momentum.The higher the Activity Momentum of a collaboration network, the moreforce is needed to “stop” (make it inactive) the system. Hence, the higher themomentum, the more robust a given network. In particular, if a (sufficiently)small number of users would suddenly stop contributing to a collaborationnetwork that exhibits a very large
Activity Momentum p , activity in theoverall network would be minimally influenced. On the other hand, if thesame number of users would stop contributing to a collaboration networkwith a (significantly) smaller Activity Momentum p , chances are that theiractions (or lack thereof) will have a notable influence on the overall trends inactivity dynamics of the system. In particular, there are three factors thatTable 4: System Mass and Activity Momentum.
The table depicts theresults for the activity momentum analysis. ρ is the standard deviation ofthe calculated ratios normalized over κ . System Mass is represented by 1 /ρ and Activity Momentum represents System Mass multiplied with Activity.Activity depicts the average activity per week as well as the value for thelast observed months in brackets. Activity Momentum follows analogously.MATHSE and ESE exhibit the largest average and current Activity Momenti,followed by 15MW and NLX. Even though 15MW exhibits a System Masssimilar to HSE and NZ, its Activity Momentum is much larger. Dataset Activity (last month) ρ System Mass Activity Momentum (last month)MATHSE 19 ,
255 (70 , . .
65 1 , ,
415 (6 , , ,
952 (13 , . .
07 85 ,
815 (399 , . .
12 3 ,
228 (10 , , . .
10 4 ,
489 (20 , ,
999 (4 , . .
76 39 ,
500 (92 , , . .
80 12 .
558 (21 , . .
67 152 (3 , . .
28 987 (4 , Activity Momentum of collaboration networks:(i)
The standard deviation of λ/µ . If the ratio is very stable and does notfrequently oscillate, the standard deviation and hence the normalizedstandard deviation will be very small. This also means that activity,as well as increases and decreases thereof, is equally distributed across τ and is not (frequently) exercised in bursts.(ii) The largest eigenvalue κ . Larger and denser collaboration networksexhibit a larger highest eigenvalue κ . As ρ is the normalized varianceof the ratios over κ , the largest eigenvalue will directly influence ρ .The notion of normalizing ρ over κ follows the intuition that thatlarge collaboration networks are less likely to exhibit sudden changesin activity than smaller ones.(iii) The activity.
The larger the average activity (posts and replies) permonth, the higher the
Activity Momentum of a collaboration network,and hence the higher the force that is needed to render the collaborationnetwork inactive. Analogously, networks with a small
Activity Momen-tum require less force to be influenced (i.e., to either speed up/increaseor slow down/decrease activity).Hence, we can use the calculated
Activity Momentum p as an indicatorof the activity level as well as the tendency of a system to stay at thatactivity level in the future. For example, MATHSE exhibits the most robustcollaboration network of our datasets regarding changes in activity, with an Activity Momentum of order 10 (average per week and last month). ESEand 15MW both exhibit similar average Activity Momenti of orders 10 .However, when looking at the Activity Momenti of the last months, ESE isroughly four times as hard to stop as 15MW.In contrast, HSE and BSE exhibits very similar activity levels for lastmonth, however the corresponding
Activity Momentum of HSE is twice theone of BSE, indicating that half the force is needed to render BSE inactivethan it would be needed to render HSE inactive. The other datasets followanalogously.On the other hand, BP exhibits a high value for
System Mass and avery low corresponding
Activity Momentum , indicating that it will be verydifficult to to accelerate or jump-start the system with regards to activity.33
Related Work
The work presented in this paper was inspired by and builds upon workpresented in the areas of critical mass theory and dynamical systems onnetworks . In 1985 and 1988, Oliver et al. [68]; Oliver and Marwell [69]; Marwell et al.[56] have discussed and analyzed the concept of critical mass theory by in-troducing so called production functions to characterize decisions made bygroups or small collectives. Fundamentally, these production functions rep-resent the link between individual benefits and benefits for the group.They argue that one very important aspect of critical mass is the naturallimitation of collective goods for groups such as housing, food, fuel or oil.Hence, the capacity of users (and thus critical mass) for such a group or sys-tem is naturally limited by the corresponding resource. However, collective(digital) goods are not (or only artificially) limited for online communities;theoretically allowing for an infinite increase in users and interest. With-out users motivated to contribute, interest will decrease and critical masswill lose momentum and ultimately decelerate until all interest vanishes. Intheir work they identified multiple different types of production functions,with the most important ones being:
Accelerating , decelerating and linear functions. The idea behind accelerating production functions is that eachcontribution is worth more than its preceding one. In a decelerating produc-tion function the opposite would be the case, resulting in each succeedingcontribution to be worth less than the preceding one, while contributions tolinearly growing functions are always worth the same. Until today it is stillmostly unclear what these production functions look like for online commu-nities (e.g., StackOverflow) and online production systems (e.g., SemanticMediaWikis).Depending on the investigated or desired point of view, different char-acteristics of these communities and online production systems can be usedas basis for calculating production functions. The analysis of Oliver et al.[68] also highlights that different production functions can lead to very dif-ferent outcomes in similar situations. For example, given an acceleratingproduction function, users who contribute to a system are likely to find theirpotential contribution “profitable”, as each subsequent contribution increases34he value of their own contribution. Naturally, this increases the incentiveto make larger contributions to begin with. Given a deceleration productionfunction, users would not immediately see the benefit of large contributions,given that each subsequent contribution is increasing the overall value less,while more effort, in the form of larger contributions, is needed to turn adecelerating production function into an accelerating one.One approximation for critical mass by Solomon and Wash [78] involvedthe investigation of the number of changes – as activity – and number ofusers – as growth of a community – for calculating production functions forWikiProjects. The authors argue that activity in online production systems,after certain amounts of time, is the best indicator of a self-sustaining system.In this work, we have extended the analysis presented by Solomon and Washand specifically define the point of when an online system has reached criticalmass and has become self-sustaining in terms of its activity dynamics. Walkand Strohmaier [87] recently conducted a similar analysis to characterizecritical mass for Semantic MediaWikis.Raban et al. [75] investigated factors that allow for a prediction of survivalrates for IRC channels and identified the production function of these chatchannels regarding the number of unique users versus the number of messagesposted at certain times, as the best predictor.Cheng and Bernstein [22] have analyzed concepts of activation thresholds,which resemble features that, when achieved, can help to reach and sustainself-sustainability. They created an online platform that allow groups topitch ideas, which only will be activated if enough people commit to it.With regards to activity, Suh et al. [81] have shown that contributionsto Wikipedia are slowing down, which is likely a direct consequence of theincrease in required coordination activities, as well as comprehensive contri-bution guidelines which discourage posts by users. Kittur and Kraut [48] havedemonstrated that when reducing the overhead for editors—effectively mini-mizing the efforts necessary to contribute to Wikipedia—can help to increasethe number of contributions and article quality. Similarly, Anderson et al.[3] investigated the value and development of contributions to the questionanswering portal StackOverflow. In contrast, Yang et al. [92] have investi-gated the evolution of two different types of users in StackOverflow, namely sparrows (very active users) and owls (experts) in the discussed topics, andcould identify various differences between the two user-groups.We use the notion of critical mass to define the barrier, that has to beovercome, for collaboration networks to become self-sustaining in terms of35ctivity. Dynamical systems in a non-network context are a well-studied scientific andengineering field. Generally, a dynamical system is any system that changesin time, whose behavior is determined by some specific rules or (differential)equations over a set of quantifiable variables. We distinguish between contin-uous and discrete as well as deterministic and stochastic systems. Strogatz[80] and Barrat et al. [12] provide excellent introductions and analyses ofdynamical systems.Different social and economic processes, which take place both offlineand online, have been modeled with the use of dynamical systems. In thecontext of the Web, the primary focus of dynamical systems was set onanalyzing and understanding the diffusion of information in online socialnetworks [51; 52; 64; 85], including the analysis of online memes and viralmarketing.On the other hand, the Bass Model [14] describes how novel productsare accepted and adopted in a network and has seen a wide variety of appli-cations in different fields of research and also for practical use. The modelconsists of two parameters, the propensity for innovation and the propensityfor imitation. A product will be successfully accepted and adopted by thecommunity, depending in the ratio between these two parameters.Acerbi et al. [2] investigated factors that determine how social traits prop-agate within a specific popularity. Iribarren and Moro [42] conducted a viralemail experiment, allowing them to track the diffusion of information in asocial network. They showed that due to heterogeneity in human activity,the most common and simple growth equation from epidemic models is notsuitable to model information diffusion in social networks.Recently, in the context of activity dynamics, Ribeiro [76] conducted ananalysis of the daily number of active users that visit specific websites, fittinga model that allows to predict if a website has reached self-sustainability,defined by the shape of the curve of the daily number of active users over time.He uses two constants α and β , where α represents the constant rate of activemembers influencing inactive members to become active. β describes the rateof an active member spontaneously becoming inactive. Whenever β/α ≥ β/α < epidemic models , and opinions or traits of a person, also known as opiniondynamics . Modeling the outbreak of diseases can be seen as a special case of dynami-cal systems. At first, epidemic models dealt with the spreading of diseasesin social (real life) networks [57; 38; 4; 16; 17; 54; 43; 31], ignoring the un-derlying network aspect, simulating contractions and outbreaks via randomencounters of the whole population under investigation. For an exhaustivesurvey of epidemic models refer to Pastor-Satorras et al. [74].Henceforth, these models have been extended to include the structureand other aspects of the underlying networks [77; 31; 41; 55; 32; 27], limitingthe spread and outbreaks according to different factors. Further, epidemicmodels were also utilized to simulate the spread for a plethora of properties indifferent kinds of networks, such as viruses spreading in computer networks[46; 47; 70; 7; 73] and information propagation (e.g., memes) [51] amongothers.In general, epidemic models are based on the intuition that a diseasepropagates through a social network with a given infection rate, defining theprobability that a neighbor of an already infected node contracts the disease.Different models have been developed and analyzed to simulate epidemicoutbreaks in a population or network [9; 4; 39; 65], which can only transferon contact. Typically, such an outbreak is modeled using a small number ofpossible states for each node and a fixed probability of contraction (e.g., β , γ ), which defines the probability or “threshold” that has to be reached fora node to change to a different state. For example, the SI model consistsof only two states – susceptible and infected – and one probability param-eter β , that determines when the transition from susceptible to infected isinitiated. Note that transitions in the SI model can only occur from suscep-37ible to infected while already infected nodes remain infected indefinitely. Asthe infection rate is relative to the population under investigation, epidemicsimulations with a small number of originally infected hosts usually start-offby slowly contracting the disease until exponential growth is reached. Oncethe majority of the population carries the disease, the infection process slowsdown again until the whole population is infected.A more sophisticated extension to the SI model is the SIR model [4; 63],which additionally introduces the recovered (or removed ) state as well as anadditional parameter γ to model the transition from infected to recovered.Again, transitions only occur from susceptible to infected to recovered. Asthe name suggests, this newly introduced state allows nodes to become im-mune to the disease and will not be infected in the future, nor be able toinfect other nodes. Other models for simulating epidemic outbreaks are theSIS and SIRS models, where the population can recover but does not becomeimmune (SIS) or stays immune but still has a chance to become susceptiblefor infection again (SIRS) [18; 29].Since their introduction, epidemic models have seen a wide array of ap-plication. For example, to analyze how computer viruses spread [44; 45; 67]or the study of epidemics in complex (scale-free, power-law) networks [70;71; 72; 62].Among others Wang et al. [89] as well as Ganesh et al. [37] demonstratedthe importance of the networks spectra (eigenvalues and eigenvectors of thenetwork adjacency matrix) for epidemic and dynamical network models [24;25]. We show a similar dependency of activity dynamics on eigenvalues inthis paper in Section 2. Another important field of application of dynamical systems on networks areopinion dynamics. They are used to model collective behavior and influence,usually in the form of a consensus-reaching task, at every point in time. Themain idea behind the concept of social influence is that interacting agentsstrive to become more alike [33].For example, agents in the Ising model for ferromagnets [15; 13] are in-fluenced by the state/opinions of the majority of their peers. This influencenaturally drives the system towards an ordered state where all agents are ei-ther positive or negative (ferromagnets). Hence, the model can be interpretedas a very simple model for simulating (binary) opinion dynamics. However,38he transition probabilities of the Ising model are influenced by temperature,representing the modeling of external or influential factors. In particular, ifthe temperature is above a certain threshold, consensus-finding, in terms ofmagnetization, becomes an unstable process that never converges. The Pottsmodel [91; 30] further extends the Ising model by increasing the number ofpotential states an agent can assume from two (positive or negative) to anarbitrary number greater than two. Other factors that might influence theprocess of reaching consensus is the size of the system under investigation[82]. In particular, this means that differently sized (or connected) systemspotentially need different strategies to reach consensus.Opinions are usually represented as a set of words or numbers for eachagent individually. Weidlich [90] introduced such a model, based on sociody-namics, in 1971. Galam et al. [36]; Galam and Moscovici [35] analyzed thepotential applications of the Ising model for simulating opinion dynamicsstarting in 1982.The most wide-spread and adapted models to simulate (among others)opinion dynamics are the voter model [26; 40], the Axelrod model [8] as wellas The Naming Game [11].
The voter model constitutes that each agent is equipped with a binaryvariable. At each step in time, the binary variable of one (randomly chosen)agent is synchronized with one of its neighbors variable. Introducing theconcept of social influence for opinion dynamics. The voter model has sincebeen adapted and extended by many researchers to fit an array of differentpurposes (e.g., [59; 60; 61; 84; 83; 21]).
The Axelrod model [8] combines the notion of social influence – individ-uals becoming more similar upon frequent interactions – and the tendencythat similar individuals will have a higher tendency (and frequency) to in-teract with each other. Each agent is endowed with a set of characterizingvariables. The more variables are shared among two agents, the more similarthey are. Given this description, one would assume that the described no-tions are self-reinforcing dynamics and hence, will inevitably produce stablenetworks with only identical agents. However, Castellano et al. [19] haveshown that the resulting number of different states is dependent on the num-ber of characterizing variables. Large numbers are likely to result in very fewsimilar individuals (high agent diversity). Analogously to the voter model,the Axelrod model has been extensively adapted, analyzed and expanded byresearchers to broaden our understanding of the spread of (cultural) traitsacross agents (e.g., Klemm et al. [50, 49]; Flache and Macy [34]).39 he Naming Game originates from idea to analyze and explore the evo-lution of language [79]. Baronchelli et al. [11] introduced the most basicversion of The Naming Game in 2006, where a group of agents that com-municate via a complete network, try to reach consensus when naming anentity. Each agent holds a list of synonyms or words associated with theentity, also referred to as vocabulary, under investigation. Every iteration(or step in time), two agents are chosen. One agent is assigned the role ofthe speaker, who randomly choses a word of a given/pre-defined vocabu-lary. If the other agent – the listener – knows (i.e., also has the word in thevocabulary) the chosen word, both agents discard all other words in theirvocabulary and “agree” on the common word. However, if the listeners donot know the word of the speaker, the word is appended to their vocabu-lary and no words are discarded. In the next step another pair of nodes ischosen and process is repeated until either consensus is found or a predeter-mined number of steps (time) have passed. The Naming Game has spurreda complete line of dynamical models with a variety of different parameters,that each address different problems and tasks (e.g., Abrams and Strogatz[1]; Minett and Wang [58]; Wang and Minett [88]; Castell´o et al. [21]). Foran excellent and comprehensive introduction to opinion dynamics (amongothers) we refer the interested reader to Castellano et al. [20].
We have developed a model to simulate and characterize the intricate dy-namics of activity in collaboration networks, consisting of an Activity DecayRate and
Peer Influence Growth Rate . First, we applied it on Zachary’sKarate Club (see Figure 3) dataset to illustrate its core mechanics. Subse-quently, we continued with a linear stability analysis (cf. Section 2.2) anddepicted the behavior that can occur when the master stability equation isinvalidated (see Figure 3). Using our proposed model to simulate activitydynamics, we have shown that the overall activity in collaboration networksappears to be a composite of the
Activity Decay Rate and the
Peer Influ-ence Growth Rate , as described in Section 2. In Section 3, we have fittedour model on synthetic and empirical datasets to simulate activity dynamics We have released a Python implementation of our model, to estimate empirical pa-rameters and run activity dynamics simulations, as Open Source Software at https://github.com/simonwalk/ActivityDynamics . λ/µ . For example,a ratio of 4 means that users intrinsically lose activity 4 times faster thanthey can get back from one of their peers, while the coefficients of the autore-gression lack such interpretable characteristics. Further, using the concept ofdynamical systems we can represent the underlying mechanisms in a closedform, allowing for detailed analytical analyses (i.e., the linear stability anal-ysis), which is much harder (if not impossible) to conduct for other models,such as agent-based models, autoregression or more complex models basedon dynamical systems.For future work we plan on extending the ability of our model to not onlyreflect on changes in activity dynamics but also properly cope with structuralchanges in the underlying collaboration networks. One additional limitationof the presented approach is the fact that nodes with a very small degree,which are not connected to the largest connected component, inevitably willlose activity until they reach the point of total inactivity. Including the struc-tural evolution of a collaboration network in our analyses will allow us tomitigate this effect, as users will only be added to the collaboration networkand considered in our calculations, once they have actually become active.One potential approach involves the investigation of snapshots of the collab-oration networks at every τ , providing additional insights into the evolutionof the parameters of our model and the investigated systems. Additionally,we assume that peer influence is a symmetric property. This means thatposts and replies exercise the same amount of influence on peers as we donot differentiate between different types of activity and influence will alwaystraverse along both directions of the edges in our collaboration networks.Further, tasks that do not trigger entries in the change-logs (i.e., readingarticles, posts or replies) are not considered in our experiments due to a lackof available data.The fact that the Activity Dynamics Model only requires a single param-eter to be configured represents not only an advantage, but also a limitation.41iven that there is only one parameter that determines the evolution of ac-tivity in a system, we are not be able to model periodic fluctuations withonly one ratio. Instead, we have to calculate ratios for multiple points intime. For future work we plan on extending the
Activity Dynamics Model by adding parameters, for example, to model different external influences.With this extended model, we will be able to simulate such periodic patternswith a single configuration. On the other hand, we are only able to modeladditional (social) mechanisms with the use of additional parameters. Forexample, one reason for the decreasing levels of activity in Wikipedia mightalso be related to a very high barrier for newly registered users to add contentdue to comprehensive guidelines for contributions and a very concentratedand active community of power users. Over time, these power users leaveWikipedia for various reasons while new contributors are lacking to fill in thegaps.Furthermore, all of our estimated parameters are calculated for the col-laboration networks as a whole. Future work will also include extending theactivity dynamics model to calculate the ratio λ/µ on a user level, ratherthan on a network level. This modification not only potentially increasesthe accuracy of our model but would also allow us to gather additional in-formation for each user of the corresponding networks. Further, with anincreased accuracy in our simulations it will be possible to conduct activ-ity prediction experiments and emulate network attacks as well as optimize(arbitrary) cost-strategies for increasing activity in these systems.In this context it is also worth mentioning that decreasing levels of activityfor collaboration networks can also signal that the community has completedtheir work and no further actions are required as the intended goal has beenachieved. Further analyses are required to determine if completeness andquality of content affect activity in collaboration networks. One could evenargue that, once we are able to calculate λ/µ for each user, we could poten-tially observe the evolution of users and categorize different types of users incollaboration networks (e.g., early adopters or experienced users versus newand inexperienced users).The ratio λ/µ —describing how fast users lose activity (
Activity DecayRate λ ) over how fast they regains activity over their neighbors ( Peer Influ-ence Growth Rate µ )—fluctuates below the corresponding highest eigenvalue κ for all investigated empirical datasets. Negative peaks in this ratio repre-sent periods of time ( τ ; in our case weeks) where activity grew faster thancould be compensated by the Peer Influence Growth Rate . It naturally fol-42ows that a decrease of λ —resulting in less activity-loss per contribution foreach user—is necessary to accomplish such drastic increases of activity. Ifthe network itself is of a smaller scale and/or these negative peaks occur ona frequent basis, the activity dynamics of the corresponding networks aredepending on the contributions (and thus influence) of single (individual)users. To compare the stability of the activity dynamics across multiple net-works we calculated the System Mass and
Activity Momentum p —indicatingthe required force to accelerate or render the corresponding collaborationnetworks inactive.When comparing p and the results of our empirical illustration (cf. Fig-ures 9 and 10) between the different datasets, we can see that the ActivityMomentum is very small for datasets that either (i) exhibit only a very smallnumber of changes and are close to inactivity or (ii) exhibit a small κ (seeFigure 9 and 10). This suggests that we can use Activity Momentum as anindicator for the robustness of a collaboration network with regards to itsactivity dynamics.Further, we can characterize the potential of a collaboration network tobecome self-sustaining by comparing the calculated ratios of λ/µ with thecorresponding κ and Activity Momentum . If the ratio is below κ , our mas-ter stability equation is invalidated, pushing the system towards a new fixedpoint where the forces of the Activity Decay Rate and the
Peer InfluenceGrowth Rate reach an equilibrium so that the network converges towardsa state of immanent and lasting activity (see Figure 3). If such a state isreached and combined with a high
Activity Momentum , the correspondingcollaboration network has reached critical mass of activity and has becomeself-sustaining; no external impulses are required to keep the network ac-tive. Of course, in real world scenarios, activity will not last forever withoutproviding additional incentives as interest (and thus activity) in a systempotentially decays over time. As a consequence, this would first result inan increase of µ and inevitably, with a sufficiently large µ , the collabora-tion network would return to its stable fixed point, once our master stabilityequation holds again, and activity would once more converge towards zero.Once we extend our model to allow for user-based calculations, we will beable to not only calculate Activity Momentum for collaboration networks,but also for single and individual users.43 cknowledgements
This research was in part funded by the FWF Austrian Science Fund researchprojects P24866.
References [1] D. M. Abrams and S. H. Strogatz. Linguistics: Modelling the dynamicsof language death.
Nature , 424(6951):900–900, 2003.[2] A. Acerbi, S. Ghirlanda, and M. Enquist. The logic of fashion cycles.
PloS one , 7(3):e32541, 2012.[3] A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec. Discover-ing value from community activity on focused question answering sites:a case study of stack overflow. In
Proceedings of the 18th ACM SIGKDDinternational conference on Knowledge discovery and data mining , pages850–858. ACM, 2012.[4] R. M. Anderson and R. M. May.
Infectious Diseases of Humans: Dy-namics and Control . Oxford University Press, USA, 1991.[5] S. Aral and D. Walker. Identifying influential and susceptible membersof social networks.
Science , 337(6092):337–341, 2012.[6] S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks.
Proceedings of the National Academy of Sciences , 106(51):21544–21549,2009.[7] J. L. Aron, M. O’leary, R. A. Gove, S. Azadegan, and M. Schnei-der. The benefits of a notification process in addressing the wors-ening computer virus problem: Results of a survey and a simulationmodel.
Comput. Secur. , 21(2):142–163, Mar. 2002. ISSN 0167-4048. doi:10.1016/S0167-4048(02)00210-9. URL http://dx.doi.org/10.1016/S0167-4048(02)00210-9 .[8] R. Axelrod. Advancing the art of simulation in the social sciences. In
Simulating social phenomena , pages 21–40. Springer, 1997.449] N. T. Bailey et al.
The mathematical theory of infectious diseases andits applications . Charles Griffin & Company Ltd, 5a Crendon Street,High Wycombe, Bucks HP13 6LE., 1975.[10] A.-L. Barabˆasi, H. Jeong, Z. N´eda, E. Ravasz, A. Schubert, and T. Vic-sek. Evolution of the social network of scientific collaborations.
PhysicaA: Statistical mechanics and its applications , 311(3):590–614, 2002.[11] A. Baronchelli, M. Felici, V. Loreto, E. Caglioti, and L. Steels. Sharptransition towards shared vocabularies in multi-agent systems.
Jour-nal of Statistical Mechanics: Theory and Experiment , 2006(06):P06014,2006.[12] A. Barrat, M. Barthelemy, and A. Vespignani.
Dynamical processes oncomplex networks , volume 1. 2008.[13] M. Barth´elemy. Spatial networks.
Physics Reports , 499(1):1–101, 2011.[14] F. M. Bass. A new product growth model for consumer durables. In
Mathematical Models in Marketing , pages 351–353. Springer, 1976.[15] J. J. Binney, N. Dowrick, A. Fisher, and M. Newman.
The theory of crit-ical phenomena: an introduction to the renormalization group . OxfordUniversity Press, Inc., 1992.[16] B. Bolker and B. Grenfell. Chaos and biological complexity in measlesdynamics.
Proceedings of the Royal Society of London. Series B: Bio-logical Sciences , 251(1330):75–81, 1993.[17] B. Bolker and B. Grenfell. Space, persistence and dynamics of measlesepidemics.
Philosophical Transactions of the Royal Society of London.Series B: Biological Sciences , 348(1325):309–320, 1995.[18] T. Britton. Stochastic epidemic models: a survey.
Mathematical bio-sciences , 225(1):24–35, 2010.[19] C. Castellano, M. Marsili, and A. Vespignani. Nonequilibrium phasetransition in a model for social influence.
Physical Review Letters , 85(16):3536, 2000.[20] C. Castellano, S. Fortunato, and V. Loreto. Statistical physics of socialdynamics.
Reviews of modern physics , 81(2):591, 2009.4521] X. Castell´o, V. M. Egu´ıluz, and M. San Miguel. Ordering dynamics withtwo non-excluding options: bilingualism in language competition.
NewJournal of Physics , 8(12):308, 2006.[22] J. Cheng and M. S. Bernstein. Catalyst: Triggering collective actionwith thresholds. 2014.[23] N. A. Christakis and J. H. Fowler. The collective dynamics of smokingin a large social network.
New England journal of medicine , 358(21):2249–2258, 2008.[24] F. Chung, L. Lu, and V. Vu. Eigenvalues of random power law graphs.
Annals of Combinatorics , 7(1):21–33, 2003.[25] F. Chung, L. Lu, and V. Vu. Spectra of random graphs with givenexpected degrees.
Proceedings of the National Academy of Sciences , 100(11):6313–6318, 2003.[26] P. Clifford and A. Sudbury. A model for spatial conflict.
Biometrika , 60(3):581–588, 1973.[27] V. Colizza, A. Barrat, M. Barth´elemy, and A. Vespignani. The role ofthe airline transportation network in the prediction and predictabilityof global epidemics.
Proceedings of the National Academy of Sciences ofthe United States of America , 103(7):2015–2020, 2006.[28] C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, andC. Potts. No country for old members: User lifecycle and linguisticchange in online communities. In
Proceedings of the 22Nd InternationalConference on World Wide Web , WWW ’13, pages 307–318, Republicand Canton of Geneva, Switzerland, 2013. International World WideWeb Conferences Steering Committee. ISBN 978-1-4503-2035-1. URL http://dl.acm.org/citation.cfm?id=2488388.2488416 .[29] K. Dietz. Epidemics and rumours: A survey.
Journal of the RoyalStatistical Society. Series A (General) , pages 505–528, 1967.[30] S. N. Dorogovtsev, A. V. Goltsev, and J. F. Mendes. Critical phenomenain complex networks.
Reviews of Modern Physics , 80(4):1275, 2008.4631] N. M. Ferguson, M. J. Keeling, W. J. Edmunds, R. Gani, B. T. Grenfell,R. M. Anderson, and S. Leach. Planning for smallpox outbreaks.
Nature ,425(6959):681–685, 2003.[32] N. M. Ferguson, D. A. Cummings, S. Cauchemez, C. Fraser, S. Riley,A. Meeyai, S. Iamsirithaworn, and D. S. Burke. Strategies for containingan emerging influenza pandemic in southeast asia.
Nature , 437(7056):209–214, 2005.[33] L. Festinger.
Social pressures in informal groups: A study of humanfactors in housing . Stanford University Press, 1950.[34] A. Flache and M. W. Macy. Local convergence and global diversity:The robustness of cultural homophily. arXiv preprint physics/0701333 ,2007.[35] S. Galam and S. Moscovici. Towards a theory of collective phenomena:consensus and attitude changes in groups.
European Journal of SocialPsychology , 21(1):49–74, 1991.[36] S. Galam, Y. Gefen, and Y. Shapir. Sociophysics: A new approachof sociological collective behaviour. i. mean-behaviour description of astrike.
Journal of Mathematical Sociology , 9(1):1–13, 1982.[37] A. Ganesh, L. Massouli´e, and D. Towsley. The effect of network topol-ogy on the spread of epidemics. In
INFOCOM 2005. 24th Annual JointConference of the IEEE Computer and Communications Societies. Pro-ceedings IEEE , volume 2, pages 1455–1466. IEEE, 2005.[38] H. W. Hethcote. An immunization model for a heterogeneous popula-tion.
Theoretical population biology , 14(3):338–349, 1978.[39] H. W. Hethcote. The mathematics of infectious diseases.
SIAM review ,42(4):599–653, 2000.[40] R. A. Holley and T. M. Liggett. Ergodic theorems for weakly interactinginfinite systems and the voter model.
The annals of probability , pages643–663, 1975.[41] L. Hufnagel, D. Brockmann, and T. Geisel. Forecast and control ofepidemics in a globalized world.
Proceedings of the National Academy ofSciences of the United States of America , 101(42):15124–15129, 2004.4742] J. L. Iribarren and E. Moro. Impact of human activity patterns onthe dynamics of information diffusion.
Physical review letters , 103(3):038702, 2009.[43] M. J. Keeling and P. Rohani. Estimating spatial coupling in epidemi-ological systems: a mechanistic approach.
Ecology Letters , 5(1):20–29,2002.[44] J. O. Kephart and S. R. White. Directed-graph epidemiological modelsof computer viruses. In
Research in Security and Privacy, 1991. Pro-ceedings., 1991 IEEE Computer Society Symposium on , pages 343–359.IEEE, 1991.[45] J. O. Kephart and S. R. White. Measuring and modeling computer virusprevalence. In
Research in Security and Privacy, 1993. Proceedings.,1993 IEEE Computer Society Symposium on , pages 2–15. IEEE, 1993.[46] J. O. Kephart, S. R. White, and D. M. Chess. Computers and epidemi-ology.
Spectrum, IEEE , 30(5):20–26, 1993.[47] J. O. Kephart, G. B. Sorkin, D. M. Chess, and S. R. White. Fightingcomputer viruses: Biological metaphors offer insights into many aspectsof computer viruses and can inspire defenses against them.
ScientificAmerican , 1997.[48] A. Kittur and R. E. Kraut. Harnessing the wisdom of crowds inwikipedia: quality through coordination. In
Proceedings of the 2008ACM conference on Computer supported cooperative work , pages 37–46.ACM, 2008.[49] K. Klemm, V. M. Egu´ıluz, R. Toral, and M. San Miguel. Global culture:A noise-induced transition in finite systems.
Physical Review E , 67(4):045101, 2003.[50] K. Klemm, V. M. Egu´ıluz, R. Toral, and M. San Miguel. Nonequilibriumtransitions in complex networks: A model of social interaction.
PhysicalReview E , 67(2):026120, 2003.[51] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viralmarketing.
ACM Transactions on the Web (TWEB) , 1(1):5, 2007.4852] J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and thedynamics of the news cycle. In
Proceedings of the 15th ACM SIGKDDinternational conference on Knowledge discovery and data mining , pages497–506. ACM, 2009.[53] C. Lin and L. Segel.
Mathematics Applied to Deterministic Problems inthe Natural Sciences , volume 1. SIAM, 1988.[54] A. L. Lloyd and R. M. May. Spatial heterogeneity in epidemic models.
Journal of theoretical biology , 179(1):1–11, 1996.[55] I. M. Longini, A. Nizam, S. Xu, K. Ungchusak, W. Hanshaoworakul,D. A. Cummings, and M. E. Halloran. Containing pandemic influenzaat the source.
Science , 309(5737):1083–1087, 2005.[56] G. Marwell, P. E. Oliver, and R. Prahl. Social networks and collectiveaction: A theory of the critical mass, ill.
American Journal of Sociology ,94(3):502–534, 1988.[57] R. M. May and R. M. Anderson. Spatial heterogeneity and the design ofimmunization programs.
Mathematical Biosciences , 72(1):83–111, 1984.[58] J. W. Minett and W. S. Wang. Modelling endangered languages: Theeffects of bilingualism and social structure.
Lingua , 118(1):19–45, 2008.[59] M. Mobilia. Does a single zealot affect an infinite group of voters?
Physical Review Letters , 91(2):028701, 2003.[60] M. Mobilia and I. T. Georgiev. Voting and catalytic processes withinhomogeneities.
Physical Review E , 71(4):046102, 2005.[61] M. Mobilia, A. Petersen, and S. Redner. On the role of zealotry in thevoter model.
Journal of Statistical Mechanics: Theory and Experiment ,2007(08):P08029, 2007.[62] Y. Moreno, R. Pastor-Satorras, and A. Vespignani. Epidemic outbreaksin complex heterogeneous networks.
The European Physical Journal B-Condensed Matter and Complex Systems , 26(4):521–529, 2002.[63] J. D. Murray.
Mathematical biology . Springer, New York, 2002. ISBN0387952233 9780387952239. URL .4964] S. A. Myers, C. Zhu, and J. Leskovec. Information diffusion and ex-ternal influence in networks. In
Proceedings of the 18th ACM SIGKDDinternational conference on Knowledge discovery and data mining , pages33–41. ACM, 2012.[65] M. Newman.
Networks: an introduction . Oxford University Press, 2010.[66] M. E. Newman. Scientific collaboration networks. i. network construc-tion and fundamental results.
Physical review E , 64(1):016131, 2001.[67] M. E. Newman, S. Forrest, and J. Balthrop. Email networks and thespread of computer viruses.
Physical Review E , 66(3):035101, 2002.[68] P. Oliver, G. Marwell, and R. Teixeira. A theory of the critical mass. i.interdependence, group heterogeneity, and the production of collectiveaction.
American journal of Sociology , pages 522–556, 1985.[69] P. E. Oliver and G. Marwell. The paradox of group size in collectiveaction: A theory of the critical mass. ii.
American Sociological Review ,pages 1–8, 1988.[70] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scale-freenetworks.
Physical review letters , 86(14):3200, 2001.[71] R. Pastor-Satorras and A. Vespignani. Epidemic dynamics and endemicstates in complex networks.
Physical Review E , 63(6):066117, 2001.[72] R. Pastor-Satorras and A. Vespignani. Epidemic dynamics in finite sizescale-free networks.
Physical Review E , 65(3):035108, 2002.[73] R. Pastor-Satorras and A. Vespignani.
Evolution and structure of theInternet: A statistical physics approach . Cambridge University Press,2007.[74] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespig-nani. Epidemic processes in complex networks. arXiv preprintarXiv:1408.2701 , 2014.[75] D. R. Raban, M. Moldovan, and Q. Jones. An empirical study of criticalmass and online community survival. In
Proceedings of the 2010 ACMConference on Computer Supported Cooperative Work , CSCW ’10, pages501–80, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-795-0. doi:10.1145/1718918.1718932.[76] B. Ribeiro. Modeling and predicting the growth and death ofmembership-based websites. In
Proceedings of the 23rd InternationalConference on World Wide Web , WWW ’14, pages 653–664, Repub-lic and Canton of Geneva, Switzerland, 2014. International WorldWide Web Conferences Steering Committee. ISBN 978-1-4503-2744-2. doi: 10.1145/2566486.2567984. URL http://dx.doi.org/10.1145/2566486.2567984 .[77] L. A. Rvachev and I. M. Longini. A mathematical model for the globalspread of influenza.
Mathematical biosciences , 75(1):3–22, 1985.[78] J. Solomon and R. Wash. Critical mass of what? exploring communitygrowth in wikiprojects. 2014.[79] L. Steels. A self-organizing spatial vocabulary.
Artificial life , 2(3):319–332, 1995.[80] S. H. Strogatz.
Nonlinear Dynamics And Chaos: With Applications ToPhysics, Biology, Chemistry, And Engineering (Studies in Nonlinearity) .Studies in nonlinearity. Perseus Books Group, 1994.[81] B. Suh, G. Convertino, E. H. Chi, and P. Pirolli. The singularity isnot near: slowing growth of wikipedia. In
Proceedings of the 5th Inter-national Symposium on Wikis and Open Collaboration , page 8. ACM,2009.[82] C. J. Tessone and R. Toral. Diversity-induced resonance in a model foropinion formation.
The European Physical Journal B-Condensed Matterand Complex Systems , 71(4):549–555, 2009.[83] F. Vazquez and S. Redner. Ultimate fate of constrained voters.
Journalof Physics A: Mathematical and General , 37(35):8479, 2004.[84] F. Vazquez, P. L. Krapivsky, and S. Redner. Constrained opinion dy-namics: Freezing and slow evolution.
Journal of Physics A: Mathemat-ical and General , 36(3):L61, 2003.5185] A. Vespignani. Modelling dynamical processes in complex socio-technical systems.
Nature Physics , 8(1):32–39, 2012.[86] C. Wagner, S. Mitter, C. K¨orner, and M. Strohmaier. When socialbots attack: Modeling susceptibility of users in online social networks.
Making Sense of Microposts ( , page 2, 2012.[87] S. Walk and M. Strohmaier. Characterizing and predicting activityin semantic mediawiki communities. In
SWCS14 Third InternationalWorkshop on Semantic Web Collaborative Spaces, 2014 , page 21, 2014.[88] W. S. Wang and J. W. Minett. The invasion of language: emergence,change and death.
Trends in ecology & evolution , 20(5):263–269, 2005.[89] Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsos. Epidemic spread-ing in real networks: An eigenvalue viewpoint. In
Reliable DistributedSystems, 2003. Proceedings. 22nd International Symposium on , pages25–34. IEEE, 2003.[90] W. Weidlich. The statistical description of polarization phenomena insociety.
British Journal of Mathematical and Statistical Psychology , 24(2):251–266, 1971.[91] F.-Y. Wu. The potts model.
Reviews of modern physics , 54(1):235, 1982.[92] J. Yang, K. Tao, A. Bozzon, and G.-J. Houben. Sparrows and owls:Characterisation of expert behaviour in stackoverflow. In