Predictive Analysis for Social Processes II: Predictability and Warning Analysis
Abstract —This two-part paper presents a new approach to predictive analysis for social processes. Part I identifies a class of social processes, called positive externality processes, which are both important and difficult to predict, and introduces a multi-scale, stochastic hybrid system modeling framework for these systems. In Part II of the paper we develop a systems the-ory-based, computationally tractable approach to predictive analysis for these systems. Among other capabilities, this ana-lytic methodology enables assessment of process predictability, identification of measurables which have predictive power, dis-covery of reliable early indicators for events of interest, and robust, scalable prediction. The potential of the proposed ap-proach is illustrated through case studies involving online mar-kets, social movements, and protest behavior. I. I NTRODUCTION
S discussed in Part I of this two-part paper [1], predict-ing the outcome of social processes is both important and very challenging. Many social phenomena of interest in applications are positive externality processes (PEP), in which individuals are motivated to behave as others do. Re-search in the social and behavioral sciences provides com-pelling evidence that for such processes it is often not possi-ble to obtain useful predictions using standard methods, which focus almost exclusively on the intrinsic characteris-tics of the process and its possible outcomes. We propose that accurate prediction requires careful con-sideration of the interplay between the intrinsics of a process and the social dynamics which are its realization. We there-fore adopt an inherently dynamical approach to predictive analysis: given a social process, a set of measurables, and the behavior of interest, we formulate prediction problems as questions about the reachability properties of the system. As an illustrative example, consider the task of assessing the predictability of market share in a popular culture mar-ket, say for music, in which “buzz” about products spreads through various social networks. If, in a market containing two products with indistinguishable appeal, it is possible for one product to achieve a dominant market share, the market may be regarded to be unpredictable [2]. Conversely, in a predictable market the market shares of indistinguishable products evolve similarly and market shares of superior
The research described in this paper was supported in part by the U.S. Department of Homeland Security and Sandia National Laboratories. R. Colbaugh is with Sandia National Laboratories, Albuquerque, NM 87111 USA and New Mexico Institute of Mining and Technology, Socorro, NM 87801 USA (phone: 505-603-1248; e-mail: [email protected]). K. Glass is with New Mexico Institute of Mining and Technology, So-corro, NM 87801 USA (e-mail: [email protected]). products are typically larger than those of inferior ones. In our formulation, market share dominance by product A is associated with a region of market share state space, and de-ciding whether A can achieve such dominance while pos-sessing an appeal that is indistinguishable from product B is posed as a question about state space reachability. More generally, in order to formulate prediction questions in terms of reachability, the behavior about which predic-tions are to be made is used to define system state space subsets of interest (SSI). Candidate measurables allow iden-tification of indistinguishable starting sets (ISS), that is, sets of initial states and system parameters which cannot be re-solved with the available data. This setup permits the four predictive analysis tasks of interest to us – predictability as-sessment, identification of useful measurables, warning analysis, and prediction – to be performed in a systematic manner. Predictability assessment involves determining which SSI can be reached from ISS and deciding if these reachability properties are compatible with the prediction goals. For example, if moving between state-parameter pairs within an ISS leads to unacceptably large variations in the probability of reaching the SSI, then the process is deemed unpredictable. This analysis leads naturally to a way of iden-tifying those measurables with the most predictive power: these are the ISS coordinates for which predictability is most sensitive. If a system’s reachability properties are incom-patible with the prediction goals – if, say, “hit” and “flop” in a cultural market are both reachable from a single ISS – then the given prediction question should be refined in some way. Possible refinements include relaxing the level of detail to be predicted or introducing additional measurables. If and when a predictable situation is obtained, the prob-lems of discovering reliable early indicators for events of in-terest and forming robust predictions can be addressed. These problems are also readily studied within a reachability framework. Warning analysis involves identifying indicator (state) sets with the property that observing a trajectory en-tering an indicator set implies that the event of interest is likely to occur. Prediction entails estimating the probability that the process will evolve to an SSI and quantifying the uncertainty associated with this estimate. The remainder of this paper transforms these intuitive no-tions into a rigorous, tractable methodology for predictive analysis, with a focus on predictability and early warning, and illustrates the utility of the approach through several real world case studies.
Predictive Analysis for Social Processes II: Predictability and Warning Analysis
Richard Colbaugh Kristin Glass A II. P REDICTIVE A NALYSIS
This section describes the proposed approach to predic-tive analysis for social processes. The presentation is struc-tured to realize three objectives: 1.) provide reachability-based definitions for basic predictive analysis tasks; 2.) de-velop a rigorous, tractable methodology for reachability analysis; and 3.) derive efficient (reachability-based) algo-rithms for performing predictive analysis. A. Problem formulation
We now provide quantitative definitions for predictability assessment, identification of useful measurables, early warn-ing, and robust prediction. Assume the behavior about which predictions are to be made and the measurables upon which these predictions can be based have been used to spe-cify the system SSI and ISS, respectively. Denote by Σ sp the social process of interest, and suppose it is modeled using the stochastic hybrid system (S-HDS) framework developed in [1]. Definition 2.1:
Let X ⊆ℜ n and Par ⊆ℜ p denote the (bounded) state and parameter sets for social process Σ sp , P be a subset of Par, and X , X s1 , X s2 be subsets of X. Suppose X × P and {X s1 , X s2 } are the ISS and SSI, respectively, corre-sponding to the prediction question. Let a specification δ be given for the acceptable level of variation or uncertainty in system behavior relative to {X s1 , X s2 }, and suppose (x *, p *) ∈ X × P is the best estimate for the process initializa-tion. Initial state (IS) predictability assessment involves de-termining whether unacceptable variation in the reachability properties of {X s1 } results from changing the system initiali-zation. More precisely, a prediction problem is IS unpredict-able if γ max − γ BE > δ , where γ max and γ BE are the probabilities of Σ sp reaching X s1 from X × P and (x *, p *), respectively; the problem is IS predictable otherwise. Eventual state (ES) predictability assessment involves evaluating the acceptabil-ity of uncertainty in system behavior relative to reachability of the two sets X s1 and X s2 . Thus a situation is ES unpredict-able if min( γ , γ ) > δ , where γ i is the probability of Σ sp reaching X si , and is ES predictable otherwise. Note that in ES predictability problems it is expected that the two sets {X s1 , X s2 } represent qualitatively different sys-tem behaviors (e.g., hit and flop in a cultural market), so that if the probability of reaching each from X × P is relatively high then system behavior is unpredictable in an application-relevant sense. These predictability concepts form the basis for our defi-nition of useful measurables: Definition 2.2:
Let the components of the vectors (x , p ) ∈ X × P which comprise the ISS be denoted x = [x … x ] T and p = [p … p ] T . The measurables with most predictive power correspond to the state variables x and/or parameters p for which predictability is most sensitive. We do not specify a particular measure of sensitivity to be used in identifying measurables with maximum predictive power and do not require that these measurables actually have sufficient power to be useful. Such considerations are ordinarily application-dependent and are explored in [3]. Remark 2.1:
Definitions 2.1 and 2.2 characterize the role played by initial states in the predictability of social proc-esses. In some cases it is useful to expand this formulation to allow consideration of states other than initial states. For in-stance, we will show in the case studies that very early time series are often predictive for PEP, suggesting that it can be valuable to consider initial state trajectory segments , rather than just initial states, when assessing predictability. This extension can be accomplished by redefining the ISS X × P , for example by augmenting the state space X with an ex-plicit time coordinate. We now consider warning analysis and prediction. Definition 2.3:
Let an event of interest be specified in terms of some SSI X s (e.g., Σ sp reaching or leaving X s ) and let the required warning accuracy be given (e.g., a warning signal is to be issued only if the probability of event occurrence exceeds some level). Warning analysis involves identifying one or more state-parameter subsets X w × P w ⊆ X × Par, which we term indicators , with the property that “observing an indicator” – that is, observing the system trajectory enter-ing X w × P w – corresponds to the issuing of a warning with the specified accuracy. For instance, we often specify the warning accuracy and indicator in such a way that if the indicator is observed then the probability of event occurrence exceeds the given thre-shold. Note that this definition for warning analysis and warning indicators captures the essence of the informal us-age of these terms and is also convenient for formal analysis. Finally, we have: Definition 2.4:
Let a behavior of interest be specified in terms of the trajectories of the social process Σ sp (e.g., the ul-timate state for a convergent process or the maximum value attained by a state variable) and let X × P denote the ISS. Prediction entails 1.) estimating the salient characteristics of the behavior and 2.) quantifying the uncertainty associated with this estimate. This definition specifies that we seek “best estimate plus un-certainty” predictions, for example estimating the state x* to which Σ sp will converge given our best guess for (x *, p *) ∈ X × P , and also computing the set of all possible values for x* associated with feasible pairs (x , p ) ∈ X × P . B. Stochastic reachability analysis
The previous section formulates predictive analysis prob-lems as reachability questions. In this section we show that these reachability questions can be addressed by adopting an analysis methodology which is related to familiar Lyapunov function stability analysis [4,5]. More specifically, we seek a scalar function of the system state that permits conclusions to be made regarding reachability without computing system trajectories . We refer to these as “altitude functions” to pro-vide an intuitive sense of their role in reachability analysis: if some measure of “altitude” is low on the ISS and high on an SSI, and if the expected rate of change of altitude along system trajectories is nonincreasing, then it is unlikely for trajectories to reach this SSI from the ISS. Part I of this paper [1] develops an S-HDS framework for modeling a broad range of social processes, including PEP, and we employ that framework here. Let Σ S-HDS denote a general S-HDS with bounded state space Q × X, and suppose that the dynamics of Σ S-HDS is characterized by the infini-tesimal generator BA(q,x) [6]. We quantify the uncertainty associated with Σ S-HDS by specifying bounds on the possible values for some system parameters and perturbations and probabilistic descriptions for other uncertain system ele-ments and disturbances. Given this representation for social processes, it is natural to seek a probabilistic assessment of system reachability. We begin with an investigation of (probabilistic) reach-ability on infinite time horizons. The following result is proved in [7] and is instrumental in our development:
Lemma 2.1:
Consider a stochastic process Σ s with bounded state space X, and let x(t) denote the stopped process associ-ated with Σ s (i.e., x(t) is the trajectory of Σ s which starts at x and is stopped if it encounters the boundary of X). If A(x(t)) is a nonnegative supermartingale then for any x and λ > 0 P{sup A(x(t)) ≥ λ | x(0) = x } ≤ A(x ) / λ . Denote by X ⊆ X and X u ⊆ X the initial state set and SSI, respectively, for the continuous system component of Σ S-HDS , and assume that X and the parameter set Par ⊆ ℜ p are both bounded. Thus, for instance, the SSI is a subset of the continuous system state space X alone; this is typically the case in applications and is easily extended if necessary. We are now in a position to state our first result: Theorem 1: γ is an upper bound on the probability of trajec-tories of Σ S-HDS reaching X u from X while remaining in Q × X if there exists a family of differentiable functions {A q (x)} q ∈ Q such that ▪ A q (x) ≤ γ ∀ x ∈ X , ∀ q ∈ Q; ▪ A q (x) ≥ ∀ x ∈ X u , ∀ q ∈ Q; ▪ A q (x) ≥ ∀ x ∈ X, ∀ q ∈ Q; ▪ BA q (x) ≤ ∀ x ∈ X, ∀ q ∈ Q, ∀ p ∈ Par.
Proof:
As BA q (x) is the infinitesimal generator for Σ S-HDS , the third and fourth conditions of the theorem imply that A(q(t),x(t)) is a nonnegative supermartingale ∀ p ∈ Par. Thus, from Lemma 2.1, we can conclude that P{x(t) ∈ X u for some t} ≤ P{sup A(q(t),x(t)) ≥ } ≤ A(q,x ) ≤ γ ∀ x ∈ X , ∀ q ∈ Q, ∀ p ∈ Par. (cid:132)
Remark 2.2:
Theorem 1 extends a similar result given in [4] to allow probability bounds to be established for reachability questions in the presence of set-bounded uncertainties.
The preceding result characterizes reachability of S-HDS on infinite time horizons. In some situations, including im-portant applications involving social systems, it is of interest to study system behavior on finite time horizons. The fol-lowing result is useful for such analysis:
Theorem 2: γ is an upper bound on the probability of trajec-tories of Σ S-HDS reaching X u from X during time interval [0,T], while remaining in Q × X, if there exists a family of differentiable functions {A q (x,t)} q ∈ Q such that ▪ A q (x,t) ≤ γ ∀ (x,t) ∈ X × ∀ q ∈ Q; ▪ A q (x,t) ≥ ∀ (x,t) ∈ X u × [0,T], ∀ q ∈ Q; ▪ A q (x,t) ≥ ∀ (x,t) ∈ X ×ℜ + , ∀ q ∈ Q; ▪ BA q (x,t) ≤ ∀ (x,t) ∈ X ×ℜ + , ∀ q ∈ Q, ∀ p ∈ Par.
Proof:
The proof follows immediately from that of Theorem 1 once it is observed that P{x(t) ∈ X u for some t ∈ [0,T]} = P{(x(t),t) ∈ X u × [0,T]}. (cid:132) The idea for the proof of Theorem 2 was suggested in [8]. The analytic methodology employed above also can be used to determine lower bounds on the probability of reach-ing an SSI. Consider a stochastic process Σ s with bounded state space X and SSI X u ⊆ X, and suppose it is of interest to determine a lower bound on the probability of Σ s reaching X u during some time interval [0,T]. Assume, for simplicity, that the dynamics of Σ s is such that X u is invariant (i.e., if Σ s enters X u it cannot escape this set); this situation is common in applications. We formulate the problem in terms of escaping X e = X \ X u , as the probability of reaching X u is identical to that of escaping X e , and suppose that x ∈ X e (otherwise x ∈ X u and the problem is trivial). Let X* = X e × T denote the set obtained by augmenting X e with the time value t = T. Ob-serve that P{(x(t),t) ∈ X*}=P{x(t) ∈ X e at t=T}=P{x(t) ∈ X e ∀ t ∈ [0,T]}, because once a trajectory escapes X e it cannot return to this set. Now an upper bound γ can be determined for the prob-ability of reaching X*, P{(x(t),t) ∈ X*} ≤ γ , using the results developed in the preceding discussion. Then, since P{x(t) ∈ X e ∀ t ∈ [0,T]} + P{x(t*) ∉ X e for some t* ∈ [0,T]} = 1 we can conclude P{x(t*) ∈ X u for t* ∈ [0,T]}=P{x(t*) ∉ X e for t* ∈ [0,T]} ≥ − γ . Thus we have proved Theorem 3:
Suppose γ is an upper bound on the probability of Σ s reaching X* = X e × T. Then γ lb = 1 − γ is a lower bound on the probability of Σ s reaching X u during the time interval [0,T]. The preceding theoretical results are of direct practical in-terest only if it is possible to efficiently compute families of altitude functions {A q (x)} q ∈ Q . Toward that end, observe that the results presented in Theorems 1-3 specify convex condi-tions to be satisfied by the associated altitude functions. Thus the search for altitude functions can be formulated as a convex programming problem [9]. Moreover, if the system of interest admits a polynomial description (i.e., the system vector fields are polynomials and system sets are semialge-braic) and if we restrict our search to polynomial altitude functions, then the search can be carried out using sum of squares (SOS) optimization [4,10]. Importantly, this ap-proach is tractable: for fixed polynomial degrees, the com-putational complexity of the associated SOS program grows polynomially in the dimension of the continuous state space, the cardinality of the discrete state set, and the dimension of the parameter space C. Reachability-based predictive analysis
Having formulated predictive analysis for social processes in terms of system reachability and presented a methodology for assessing reachability, we are now in a position to derive algorithms for predictive analysis. In what follows we focus on the tasks of predictability assessment and early warning analysis; algorithms for identifying measurables with predic-tive power and forming predictions are developed in [11]. Consider first predictability assessment as characterized in Definition 2.1. We have the following algorithms for infi-nite time horizon predictability:
Algorithm 2.1: IS predictability (outline)
Given: social process of interest is Σ S-HDS , ISS = X × P , best estimate for initialization is (x *, p *) ∈ X × P , SSI = X s , and acceptable level of variation = δ . Procedure: ▪ compute (upper bound for) probability γ max of Σ S-HDS reaching X s from X × P ; ▪ compute (upper bound for) probability γ BE of Σ S-HDS reaching X s from (x *, p *); ▪ if γ max − γ BE > δ then problem is IS unpredictable, else problem is IS predictable. Note: γ max , γ BE can be computed using Theorem 1 and SOS programming. Algorithm 2.2: ES predictability (outline) Given: social process of interest is Σ S-HDS , ISS = X × P , SSI = {X s1 , X s2 }, and acceptable level of uncertainty = δ . Procedure: ▪ compute (upper bound for) probability γ of Σ S-HDS reaching X s1 from X × P ; ▪ compute (upper bound for) probability γ of Σ S-HDS reaching X s2 from X × P ; ▪ if min( γ , γ ) > δ then problem is ES unpredictable, else problem is ES predictable. Note: γ , γ can be computed using Theorem 1 and SOS pro-gramming. Remark 2.3:
IS and ES predictability can be assessed on fi-nite time horizons in the same manner, with the required probability bounds being computed using SOS programming and the criteria given in Theorem 2. We now examine the warning analysis problem specified in Definition 2.3. Assume given a social process Σ sp , some SSI X s ⊆ X, a finite time interval [0,T], and a warning accu-racy α∈ (0,1]. We are interested in two versions of the prob-lem, corresponding to whether the event of interest involves Σ sp reaching X s or escaping from X s . In either case, the warning is to be issued if and only if the probability of event occurrence during the time interval [0,T] following this warning is at least α . Following Definition 2.3, we seek an “indicator” X w ⊆ X with the property that if Σ sp enters X w then the probability of event occurrence is at least α . Consider first the situation in which the event to be an-ticipated is Σ sp reaching X s . In this case, the objective is to identify the largest X w , with X s ⊆ X w necessarily, such that x(0) ∈ X w implies P{x(t) ∈ X s for some t ∈ [0,T]} ≥ α . Sup-pose, as above, that the dynamics of Σ sp is such that X s is in-variant. The following algorithm provides a solution to this version of the warning problem: Algorithm 2.3: reach warning analysis (outline)
Given: social process of interest is Σ sp , SSI = X s , and warn-ing accuracy = α . Procedure: Initialize X w0 = X s . For k = 1, 2, …, K: ▪ Incrementally enlarge X* wk ⊇ X w(k − . ▪ Compute γ lb ≤ P{x(t) ∈ X s for some t ∈ [0,T] | x(0) ∈ X* wk } (via Theorem 3 and SOS programming). ▪ If γ lb ≥ α set X wk = X* wk and RETURN, else STOP. Note: There exist numerous methods for incrementally “growing” a sequence of nesting sets X wk [e.g., 12]. Next consider the case in which the event of interest is Σ sp escaping from X s . Here the goal is to identify the largest X w , with X \ X w ⊆ X s necessarily, such that x(0) ∈ X w implies P{x(t) ∉ X s for some t ∈ [0,T]} ≥ α . Suppose that X \ X s is invariant for the dynamics of Σ sp . The next algorithm pro-vides a solution to the escape warning problem: Algorithm 2.4: escape warning analysis (outline)
Given: social process of interest is Σ sp , SSI = X s , and warn-ing accuracy = α . Procedure: Initialize X w0 = X \ X s . For k = 1, 2, …, K: ▪ Incrementally enlarge X* wk ⊇ X w(k − . ▪ Compute γ lb ≤ P{x(t) ∈ X \ X s for some t ∈ [0,T] | x(0) ∈ X* wk } (via Theorem 3 and SOS programming). ▪ If γ lb ≥ α set X wk = X* wk and RETURN, else STOP. An alternative approach to identifying a warning indicator set X w for a given social process-event pair is to compare trajectories of Σ sp which lead to the event of interest with those that do not. If differences are found between the dy-namics of the two classes of processes, and if these differ-ences can be expressed in terms of some X w , then this analy-sis identifies an empirically-grounded event indicator. Stan- dard statistical or machine learning classification methods can be employed for this task in cases where there are suffi-cient data to enable an empirical comparison. For social processes, however, it is often the case that available data consists mainly of “positive” events, involving trajectories which lead to the event of interest. In such situations, it is sometimes possible to construct useful synthetic ensembles of negative events and to formulate a comparison study by combining the actual positive instances and the synthetic negative instances. This approach to warning analysis can be effective if good models are available for building the syn-thetic ensembles. III. C ASE S TUDIES
This section presents three case studies involving predic-tive analysis for classes of social processes that have proven to be both practically important and challenging to predict. We begin with a discussion of predictability assessment for online markets and then address early warning analysis for social movements and mobilization/protest events. A. Online markets
Consider an online market in which individuals visit a web site, browse an assortment of available items, and choose one or more items to download. An interesting and surprising characteristic of these markets – and many other markets as well – is that they are often both unequal and un-predictable: a few items capture a large share of the market, but which items achieve popularity appears to be hard to an-ticipate. For instance, the study reported in [2] created an ar-tificial music market and demonstrated this phenomenon ex-perimentally. Moreover, that work showed that increasing the opportunity for social influence increased both the ine-quality of the ultimate market shares and the unpredictability of which songs attained market dominance. Our study of CNET, the online software library, yielded similar results [3]. The positive externalities present in these markets makes predictive analysis using standard methods a chal-lenging undertaking. We now assess the feasibility of forecasting ultimate mar-ket share in online markets. Consider a market visited by a sequence of consumers, with each visitor choosing between two items {A, B}; generalizing this simple binary choice setting to any finite number of choices is straightforward. We model this situation by supposing that agent i chooses item A with probability Σ online P i (A) = βπ + (1 − β ) f where f ∈ [0,1] is item A’s current market share, (1 − β ) quan-tifies the intensity of social influence (with β∈ [0,1]), and π is the probability of an agent choosing A in the “no social influence” case (i.e., when β =1). Agent i selects item B with probability 1 − P i (A). In this model, π can be interpreted as a measure of the “appeal” of item A (relative to B), f is the so-cial signal, and β quantifies the relative importance of ap- peal and social influence in the decision-making process. The model Σ online is extremely simple, perhaps the sim-plest possible representation which captures the effects of both social influence and appeal in an online market. Never-theless, this model is able to reproduce the key behaviors observed in the music market study described in [2], in our investigation of CNET site dynamics, and in other online markets (e.g., for books and DVDs) [3]. In particular, as so-cial influence (SI) increases ( β decreases) both inequality and unpredictability of market shares increase. Thus, despite its simplicity, Σ online provides a useful starting point for stud-ying predictability of online markets. Note that Σ online can be written in the form of the continuous system portion of the S-HDS model Σ S-HDS , with state variables x = f and x = 1/(t+1). Consequently, the system’s reachability properties can be determined using Theorems 1-3. We now investigate the predictability of ultimate market share for the system Σ online . The standard approach to market share prediction is to assume that item appeal is a relevant measurable, estimate appeal in some way, and use this esti-mate to predict market share. To examine the utility of this approach, we assess ES predictability of market share for items with identical appeal ( π =1/2) and identical initial mar-ket shares (f(0)=1/2). If it is reasonably likely that the mar-ket will evolve so one or the other item dominates (f be-comes large or small), then the market dynamics is not very dependant on item appeal and therefore is unpredictable us-ing the standard approach. In this case we should seek a dif-ferent prediction method, perhaps based on other measur-ables. Alternatively, if market dominance by either item is unlikely then the market dynamics depends on item appeal in a more predictable way and the standard method may be useful. We evaluate ES predictability, as specified in Definition 2.1, via Algorithm 2.2 and Theorem 1. Let the two SSI, X s1 and X s2 , be defined to correspond to, respectively, f ≈ to be a small set surrounding f(0) = 1/2, the identical initial market share condition. Then, if both X s1 and X s2 are likely to be reached from X , the problem is ES unpredictable (and also unpredictable from a practical viewpoint). See Figure 1 for a diagram depicting the basic setup. As an illustration of the insights obtainable with such analysis, consider the high SI case corresponding to small β in Σ online . For a broad range of noise models, the analysis ge-nerates relatively high probability bounds for reachability of both X s1 and X s2 from X (e.g., γ∈ [0.3, 0.4] is typical). Thus two qualitatively different outcomes – market share equity (X s1 ) and market shares dominance (X s2 ) – are both likely, indicating that the system is ES unpredictable. This result is consistent with empirical findings [e.g., 2] and suggests that the standard approach to market share prediction is not like-ly to produce accurate forecasts. Next consider the problem of searching for alternative measurables which provide better predictability properties in the high SI case. For example, it might be supposed that very early market share time series data would be useful for prediction when SI is high. The intuition behind this idea is that the “herding” behavior that can arise from SI, and which makes market prediction hard using standard meth-ods, may lead to a lock-in effect, in which very early market share leaders become difficult to displace. To test this hy-pothesis, define X * to be a small set surrounding f(t*) = 1/2, where t* is a small but nonzero time (see Figure 1). We compute, using Theorem 1 and SOS programming, an upper bound on the probability that Σ online with π =1/2 will evolve from X * to X s1 and X s2 . In this case, the analysis generates large upper bounds for the probability of reaching X s1 and small bounds for the probability of reaching of X s2 (typical bounds are on the order γ ~0.9 and γ ~10 -3 , respectively). Thus using very early time series data to refine the ISS pro-duces a more predictable situation, in which indistinguish-able market configurations evolve to indistinguishable out-comes. B. Social movements
Social movements are large, informal groupings of indi-viduals and/or organizations focused on a particular issue, for instance of political, social, economic, or religious sig-nificance. There is considerable interest to develop methods for distinguishing successful social movements, that is, movements which attract significant followings, from un-successful ones early in their lifecycle. This task is naturally cast as a warning problem within the proposed approach to predictive analysis. We study the problem in two phases: 1.) a theoretical investigation , in which a collection of general models for social movement dynamics are analyzed, and 2.) an empirical study , involving the emergence and diffusion of Sweden’s Social Democratic Party (SDP). We begin with the theoretical investigation. Movement success is quantified by defining an SSI, X s , that corre-sponds to a level of movement membership consistent with movement goals, and we seek to identify an indicator set X w which permits early recognition of those movements that are likely to evolve to X s (see Definition 2.3). We construct a class of social movement models within a diffusion of inno-vations framework [11]; the resulting models are of the gen-eral form Σ S-HDS developed in Part I of this paper [1] and are consistent with the social movement theory (SMT) literature [e.g., 13]. In particular, the models capture important struc-tural features of social networks, including the existence and topology of social contexts , that is, localized social settings defined by work, family, or physical neighborhood within which close interactions take place [1]. We conduct reach warning analysis by employing the procedure outlined in Algorithm 2.3. Briefly, the theoretical study produced two main results. First, the degree to which movement-related activity shows early diffusion across mul-tiple social contexts is a powerful distinguisher of successful and unsuccessful social movements. Indeed, this measurable has considerably more predictive power than the magnitude of such activity and also more power than various system in-trinsics. Second, large social movements occur with finite probability only if 1.) the intra-context “infectivity” of the movement exceeds a certain threshold, and 2.) the inter-context interactions associated with the movement take place with a frequency that is larger than another threshold. The latter result is particularly interesting, as it is reminis-cent of, and significantly extends, well-known results for ep-idemic thresholds in disease propagation models. For in-stance, the characterization of intra-context infectivity gen-eralizes the notion of epidemic reproduction number [11] to social movements. More intriguing is the completely new condition on inter-context interactions: in order for a social movement to propagate “globally”, that is, to extend into so-cial contexts beyond its original local setting, the probability of context interaction must exceed a threshold value. This threshold behavior is depicted in Figure 2, which shows the way the probability of realizing global propagation depends on the rate at which individuals interact across social con-texts; it can be seen that this dependency exhibits a classic threshold behavior. Note that the probabilities shown in Fig-ure 2 are provably-correct upper bounds for the global cas- X s2 X x x X s2 X X X s1 X s2 X x x X s2 X X X s1 Fig. 1. Setup for online market predictability assessment.
Fig. 2. Probability of global diffusion of social movement as a function of context switching rate. cade probabilities and were obtained using Theorem 1 and SOS programming. The empirical investigation of early warning analysis for social movements focuses on the emergence and growth of the Swedish SDP. The case of the SDP is particularly rele-vant for our purposes, as the early activities of political “agi-tators” associated with the SDP led to the establishment of a well-defined and well-documented network linking previ-ously disparate geographically- and demographically-based social contexts in Sweden [13]. We explore the role played by this inter-context network by analyzing archived data [14] and published accounts describing the dynamics of the SDP. Our investigation uses standard time series analysis techniques similar to those employed in [13], and reveals that an important predictor of SDP spatio-temporal dynam-ics is early diffusion of SDP-related activity across geo-graphically-based social contexts. Thus both the theoretical and empirical investigations suggest that early social net-work dynamics are critical to social movement success. C. Mobilization / protest
This case study examines whether diffusion across social contexts is a useful early indicator for successful mobiliza-tion and protest events. The investigation focuses on Muslim reaction to six recent incidents, each of which appeared at the outset to have the potential to trigger significant protests: ▪ publication of photographs and accounts of prisoner abuse at Abu Ghraib in Spring 2004; ▪ publication of cartoons depicting Mohammad in the Danish newspaper Jyllands-Posten in September 2005; ▪ distribution of the DVD “I was blind but now I can see” in Egypt in October 2005; ▪ the lecture given by Pope Benedict XVI in September 2006 quoting controversial material concerning Islam; ▪ Salman Rushdie being knighted in June 2007; ▪ republication of the “Danish cartoons” in various news-papers in February 2008. Recall that the first Danish cartoons event ultimately led to substantial Muslim mobilization, including massive pro-tests and considerable violence, and that the Egypt DVD event also resulted in significant Muslim protest and vio-lence. In contrast, Muslim outrage triggered by Abu Ghraib, the pope lecture, the Rushdie knighting, and the second Danish cartoons event all subsided quickly with essentially no violence. Therefore, taken together, these six events pro-vide a useful setting for testing whether the extent of early diffusion across social contexts can be used to distinguish nascent mobilization events which become large and self-sustaining (and potentially violent) from those that quickly dissipate. A central element in the proposed approach to early warn-ing analysis is the measurement, and appropriate processing, of social dynamics associated with the process of interest. In the present case study we use online social activity as a proxy for real world diffusion of mobilization-relevant in- formation. More specifically, we use blog communications and discussions as our primary data set. The “blogosphere” is modeled as a graph composed of two types of vertices, the blogs themselves and the concepts which appear in them. Two blogs are linked if a post in one hyperlinks to a post in the other, and a blog is linked to a concept if the blog con-tains (significant) occurrences of that concept. Among other things, this blog graph model enables the identification of blog communities – that is, groups of blogs with intra-group edge densities that are significantly higher than expected [15]. In what follows, these blog communities serve as one proxy for social contexts. Consider the problem of deriving early indicators that re-liably distinguish successful and unsuccessful mobilization/ protest events. We adopt an approach which is analogous to that used in the preceding case study, quantifying mobiliza-tion success in terms of an SSI X s that is “large enough”, and seeking to identify an indicator condition that permits early recognition of events likely to evolve to X s . In this case study, however, we employ the second of the two me-thods for identifying warning indicators given in Section II.C. Thus the warning condition is derived by comparing trajectories of the social process which led to successful mobilization events with those that did not. Because in the present application the available data is insufficient to sup-port a purely empirical analysis, these data are augmented through construction of synthetic ensembles of events. This approach is feasible because the social diffusion model Σ S-HDS presented in [1] provides a reasonable mechanism for generating these ensembles. More specifically, the following procedure is proposed for mobilization/protest warning analysis using blog data: Given a potential triggering event of interest 1.
Use key words and concepts associated with the trigger-ing event to collect relevant blog posts and build the as-sociated blog graph. 2.
Identify the relevant blog social contexts (e.g., graph community-based, language-based). 3.
Assemble post volume time series for each social con-text and compute post/context entropy (PCE) time series associated with the post volume time series. 4.
Construct a synthetic ensemble of PCE time series from (actual) post volume dynamics using the S-HDS social diffusion model Σ S-HDS [1]. 5.
Perform motif detection: compare the actual PCE time series to the synthetic ensemble series to determine if the early diffusion of activity across contexts is “exces-sive”. We now provide a few additional details concerning this procedure. Step 1 is by now standard, and various off-the-shelf tools exist which can perform this task. In Step 2 we use two definitions for blog social context: graph-based, in which contexts are graph communities identified through community extraction applied to the blog graph [e.g., 15], and language-based, in which contexts are defined based on the language of the posts. In Step 3, post volume for a given context i and sampling interval t is obtained by counting the number of relevant posts made in the blogs comprising con-text i during interval t. PCE for a given sampling interval t is defined as follows: PCE(t) = −Σ i f i (t) log(f i (t)), where f i (t) is the fraction of total relevant posts made during interval t which occur in context i. Given the post volume time series obtained in Step 3, Step 4 involves the construction of an ensemble of PCE time se-ries which would be expected under “normal circum-stances”, that is, if Muslim reaction to the triggering event diffused from a small “seed set” of initiators according to SMT social dynamics. For this study, we use the multi-scale social diffusion model Σ S-HDS given in [1] to generate the PCE time series ensembles. Finally, motif detection in Step 5 is carried out by searching for time periods, if any, during which the actual PCE time series exceeds the mean of the synthetic PCE ensemble by at least two standard deviations. Sample results of applying the proposed approach to early warning analysis to the Islamic mobilization case study are shown in Figure 3. It can be seen that early diffusion of dis-cussions across blog communities is, indeed, an indicator that the associated Islamic mobilization event will be large. Such diffusion is observed in the mobilization associated with the first Danish cartoons and Egypt DVD events and not with the other four events, and this early diffusion is ex-cessive relative to the synthetic ensemble. More specifically, in the case of the first Danish cartoons event, the PCE of re-levant discussions (blue curve) experiences a dramatic in- crease a few weeks before the corresponding increase in vol-ume of blog discussions (red curve); this latter increase, in turn, takes place before any violence (see Figure 3). In con-trast, in the case of the pope event, PCE of blog discussions is small relative to the cartoons event, and any increase in this measure lags discussion volume. Similar curves are ob-tained for the other four events. More importantly, the pro-posed motif detection process also yields the expected re-sult: motifs are found only for the Danish cartoons and Egypt DVD events, and these motifs precede significant blog volume and real world violence. Note that qualitatively similar results are obtained for the graph community-based and language-based definitions of social context. This case study suggests that early diffusion of mobilization-related activity (here blog discussions) across disparate social con-texts may be a useful early indicator of successful mobiliza-tion events. R
EFERENCES [1]
Colbaugh, R. and K. Glass, “Predictive analysis for social processes I: Multi-scale hybrid system modeling”,
Proc.18 th IEEE International Conference on Control Applications , St. Petersburg, Russia, July 2009. [2]
Salganik, M., P. Dodds, and D. Watts, “Experimental study of ine-quality and unpredictability in an artificial cultural market”,
Science , Vol. 311, pp. 854-856, 2006. [3]
Colbaugh, R. and K. Glass, “Predictability and prediction of social processes”,
Proc. 4 th Lake Arrowhead Conference on Human Complex Systems , Lake Arrowhead, CA, April 2007 (invited talk). [4]
Prajna, S., A. Jadbabaie, and G. Pappas, “A framework for worst case and stochastic safety verification using barrier certificates”,
IEEE Trans. Automatic Control , Vol. 52, pp. 1415-1428, 2007. [5]
Sontag, E.,
Mathematical Control Theory , Second Edition, Springer, NY, 1998. [6]
Bujorianu, M. and J. Lygeros, “General stochastic hybrid systems: Modeling and optimal control”,
Proc. 43 rd IEEE Conference on Deci-sion and Control , Bahamas, December 2004. [7]
Kushner, H. , Stochastic Stability and Control , Academic Press, NY, 1967. [8]
Papachristodoulou, A., Personal communication, November 2007. [9]
Parrilo, P.,
Structured Semidefinite Programs and Semialgebraic Ge-ometry Methods in Robustness and Optimization , PhD dissertation, California Institute of Technology, 2000. [10]
Colbaugh, R. and K. Glass, “Predictive analysis for social processes”, Sandia National Laboratories SAND Report 2009-0584, Jan. 2009. [12]
Ciliz, M. and A. Harova, “Estimation of the basins of attraction of re-current type analog neural networks”.
Proc. 31 st IEEE Conference on Decision and Control , Tucson, AZ, December 1992. [13]
Hedstrom, P., R. Sandell, and C. Stern, “Mesolevel networks and the diffusion of social movements: The case of the Swedish Social De-mocratic Party”,
American Journal of Sociology , Vol. 106, pp. 145-172, 2000. [14]
Swedish National Data Service, Popular Movement Archive, 1881-1950 Social-Democratic Labour Party of Sweden, Data Set 062, ac-cessed 2007. [15]
Newman, M., “The structure and function of complex networks”,
SIAM Review , Vol. 45, pp. 167-256, 2003.
Time series motif analysisEvent Motif Danish cartoons 1: 1/1—1/26/2006.Egypt DVD release: 10/2–10/9/2005.Abu Ghraib story: none. Pope lecture: none. Rushdie knighting: none. Danish cartoons 2: none date b l o g a c t i v i t y date b l o g a c t i v i t y date b l o g a c t i v i t yy
Time series motif analysisEvent Motif Danish cartoons 1: 1/1—1/26/2006.Egypt DVD release: 10/2–10/9/2005.Abu Ghraib story: none. Pope lecture: none. Rushdie knighting: none. Danish cartoons 2: none date b l o g a c t i v i t y date b l o g a c t i v i t y date b l o g a c t i v i t yy date b l o g a c t i v i t yy