[PDF] Modelling heterogeneous outcomes in multi-agent systems

Abstract

A broad set of empirical phenomenon in the study of social, economic and machine behaviour can be modelled as complex systems with averaging dynamics. However many of these models naturally result in consensus or consensus-like outcomes. In reality, empirical phenomenon rarely converge to these and instead are characterized by rich, persistent variation in the agent states. Such heterogeneous outcomes are a natural consequence of a number of models that incorporate external perturbation to the otherwise convex dynamics of the agents. The purpose of this paper is to formalize the notion of heterogeneity and demonstrate which classes of models are able to achieve it as an outcome, and therefore are better suited to modelling important empirical questions. We do so by determining how the topology of (time-varying) interaction networks restrict the space of possible steady-state outcomes for agents, and how this is related to the study of random walks on graphs. We consider a number of intentionally diverse examples to demonstrate how the results can be applied.

Full PDF

MM ODELLING HETEROGENEOUS OUTCOMES IN MULTI - AGENTSYSTEMS

Orowa Sikder

Department of Computer ScienceUniversity College LondonLondon, WC1E 6BT [email protected]

July 3, 2020 A BSTRACT

A broad set of empirical phenomenon in the study of social, economic and machine behaviour can bemodelled as complex systems with averaging dynamics. However many of these models naturallyresult in consensus or consensus-like outcomes. In reality, empirical phenomenon rarely converge tothese and instead are characterized by rich, persistent variation in the agent states. Such heterogeneousoutcomes are a natural consequence of a number of models that incorporate external perturbation tothe otherwise convex dynamics of the agents. The purpose of this paper is to formalize the notionof heterogeneity and demonstrate which classes of models are able to achieve it as an outcome, andtherefore are better suited to modelling important empirical questions. We do so by determininghow the topology of (time-varying) interaction networks restrict the space of possible steady-stateoutcomes for agents, and how this is related to the study of random walks on graphs. We consider anumber of intentionally diverse examples to demonstrate how the results can be applied.

Many empirical phenomenon in the study of complex systems can be modelled as a form of averaging dynamics. Insuch models, we are interested in modelling the dynamic and steady-state behaviour of a large set of interacting agents,where a key feature is that agents tend to move closer to the states of their neighbours over time. For example, in thestudy of social behaviour, such models are often employed in the ﬁeld of social learning or opinion dynamics, whereagents each possess a state expressing their opinion, and agent’s opinions will gravitate towards the opinions of otheragents they interact with regularly (such as DeGroot [1] or bounded conﬁdence models [2, 3]). In studies of economicbehaviour, such models arise naturally in the study of strategic complements or co-ordination games, and are closelyrelated to the theory of linear-quadratic games [4]. Averaging dynamics also play a key role in the study of animalbehaviour in the form of swarm dynamics [5], and more recently have been applied to understand some elements ofmachine behaviour [6] such as the feedback loop between recommender systems and user preferences [7].Such models of averaging dynamics overlap in an important way with control theory and the design of algorithms thatcan distribute computation over decentralised agents. In an important pair of reviews [8, 9], the authors synthesize arich set of theoretical results from across opinion dynamics and control theory to demonstrate this correspondence. Inparticular, the reviews discuss in detail the conditions under which we expect classes of models to converge, and if theydo converge, whether it results in a consensus where all agents achieve the same state, or near-consensus outcomeswhere subsets of agents all achieve the same state.While the convergence of such models to consensus is certainly a desirable outcome in the design of algorithms andcontrol systems, in general this is not what we would like when we are modelling real empirical phenomenon, which areoften characterised by what we might refer to informally as “rich” outcomes: persistent individual variation which tendstowards a continuum of outcomes as the size of the models grow, allowing for complex distributions to be realised in thesteady state. We refer to these as heterogeneous outcomes , with a formal deﬁnition to follow (see Figure 2 for a visual a r X i v : . [ c s . M A ] J u l PREPRINT - J

ULY

3, 2020depiction). Indeed, in a review of social learning models [10], the author concludes “Long-run consensus is a centralﬁnding throughout this literature, occurring for a wide range of information structures and decision rules in large classesof networks...The consistency of this ﬁnding may cause some discomfort because we often observe disagreementempirically, even about matters of fact. Explaining such disagreement is an important task for this literature goingforward.”An important class of averaging dynamics models that do tend to provide us with these heterogeneous outcomes arethose models that include “stubborn agents” [11] or “zealots” [12], which inﬂuence the dynamics of other agents butare not inﬂuenced themselves. Another class which provides these outcomes are Friedkin-Johnsen models [13] andits time-varying alternative [14], which are characterised by “prejudiced” agents whose dynamics are perturbed by anexternal vector referred to as an agent’s prejudice (in the original models an agent’s prejudice was their state at time ,demonstrating a form of hysteresis).Clearly, there exists some correspondence between these classes of models that allow for the modelling of heterogeneousoutcomes, and one might speculate that it is intimately related to the connectivity of the (potentially time-varying)interaction network between agents. The objective of this paper is to formalise this notion, and provide a set of necessarycriteria for our model of averaging dynamics to result in the heterogeneous outcomes. As such, we hope to provideutility for future research endeavours and support the explicit construction of models that are better suited to real worldphenomenon.In order to do this we proceed in the following manner. We ﬁrst introduce a “generalised” model of averaging dynamicsthat can realise as a special case a number of different existing inﬂuential models. We then demonstrate this generalclass is isomorphic to an “augmented” graph representation that allows us to explicitly tie the steady state outcomes tograph-theoretic features. We provide a number of Theorems that develop necessary conditions for heterogeneity acrossour broad class of models. We also provide a conceptual bridge between these models and the study of random walkson graphs, and show how such a framework can provide a valuable shortcut to evaluate how topological features ofthe graph can dictate the distributional outcomes for the agents. Finally, we consider an intentionally diverse set ofexamples, where we demonstrate the application of different sets of results to models in social, animal, machine andeconomic behaviour.We believe the key contributions of this paper are threefold. Firstly, we provide a general and intuitive framework ofaveraging dynamics in multi-agent systems that contains as a special case many important and inﬂuential models in theﬁeld. Secondly, we introduce a formal notion of heterogeneity and establish the necessary conditions to achieve theseoutcomes in the steady state, through a modest generalisation of existing results in the convergence of such models.Finally, we establish conditions for the convergence of these heterogeneous models and provide an intuitive frameworkto understand such models in the theory of random walks on graphs. Consider a set of agents V = { v , v , . . . , v N } where each agent i possesses a state x i ∈ X where X is some compactsubset of R d . The states of all agents can be represented by x = ( x , x , . . . , x N ) ∈ X N = X ⊂ R N × d . For thispaper, we will be interested in the class of quasi-linear update dynamics that can be represented as: x ( t + 1) = ( I − Λ) A ( t ) x ( t ) + Λ b ( t ) (1)Where the adjacency matrix A ( t ) is a row stochastic matrix implicitly representing interactions between agents, b ( t ) ∈ X is a private signal associated with each agent, and Λ is a diagonal matrix with Λ ii = λ i ∈ [0 , a parameterthat weights the inﬂuence of each update component for the i -th node. The greater λ i , the more weight each agentplaces on the set of private signals, and the less weight they place on “social” signals . Note that the framework is notvery restrictive, as A ( t ) = A ( x ( t )) = f ( x ( t )) can represent any function f of x ( t ) such that f i ( x ( t )) can be expressedas a convex combination over the states x of agents at time t . Similarly, b ( t ) = b ( x ( t )) can be quite broad, representingany function g : X → X .This general class of update dynamics contains as a special case a number of different existing and inﬂuential models.For example, traditional DeGroot updating [1] can be retrieved for

Λ = 0 , A ( t ) = A . Similarly, bounded conﬁdencemodels [15] can also be characterised by Λ = 0 , but A ( t ) = A ( x ( t )) , where the weights between nodes is determinedby whether two agents are within the threshold required to interact. By allowing for Λ > , we include models thatincorporate private signals for each agent. For example, for Λ > , A ( t ) = A and b ( t ) = b , we return to the standard The results in this paper can also be easily generalised to the case where

Λ = Λ( t ) , but necessitates a great deal of extra notation.We assume the ﬁxed Λ for simplicity. PREPRINT - J

ULY

3, 2020Figure 1: Illustrating how an original graph G with private signals (left) can be re-formulated as an augmented graph ˜ G (right). The state space of the original nodes is [ − , N , so only two ghost nodes are required, one with value +1 andone with value − . Nodes with darker shade have a more positive steady state, and we can see on the augmented graphthat nodes with a heavier weight on the positive ghost node tend to have a more positive steady state. An alternativeformulation would provide each original node with its own pair of ghost nodes.Friedkin-Johnsen model [13], and allowing A ( t ) to vary gives us its time-varying alternative [14]. If b ( t ) is drawn ateach time step from some distribution ρ b over X we can model noisy averaging processes. Finally, if b ( t ) = b ( x ( t )) wecan model processes where the private signals received by agents are endogenous and incorporate feedback loops, asthe authors previously explored in [7].In order to develop properties for this general class of models, it is useful to consider an alternative “augmented”representation of original afﬁne form: ˜ x ( t + 1) = ˜ A ( t )˜ x ( t ) (2)Where: ˜ A ( t ) = (cid:20) ( I − Λ) A ( t ) Λ W ( t )0 I (cid:21) (3) ˜ x ( t ) = (cid:20) x ( t ) C (cid:21) (4)Here ˜ A ( t ) is a row stochastic matrix that includes as blocks the original update matrix A ( t ) as well as a weight matrix W ( t ) , where W ( t ) C = b ( t ) . The block C includes the upper and lower bounds of each dimension in X , meaning thatany b ( t ) ∈ X can be represented with an appropriate choice of weights W ( t ) . One way of interpreting this augmentedrepresentation is to consider a set of “ghost nodes” associated with each of the original agents that represent the upperand lower bounds of X . The “private signals” can be re-interpreted on the weight each agent places on their ghostnodes. More technical details of this augmented structure are provided in the Appendix.The convenience of the augmented framework in Equation (2) is that we can ignore the “augmented” nature of thedynamics and just consider the general properties of (time-varying) linear updates on a modiﬁed graph, without havingto make speciﬁc considerations for the effect of private signals. The topology of the graph in question will dictatespeciﬁc features of the dynamics. For example, if there are no private signals in our model (i.e. Λ = 0 ), then theaugmented matrix ˜ A ( t ) will simply have a block diagonal structure, and the graph it implicitly represents will haveghost nodes disconnected from the original set of nodes. On the other hand if the private signals exist but vary over timethen the edge weights to the corresponding ghost nodes change over time. Regardless of the underlying dynamics wechoose, the states of the agents at any time step t can be represented compactly as: ˜ x ( t ) = t (cid:89) k =0 ˜ A ( k )˜ x (0) = ˜ A ( t : 0)˜ x (0) (5)3 PREPRINT - J

ULY

3, 2020Where the notation M ( t : t ) represents the product of matrices M ( k ) from t to t .Therefore until stated otherwise, we ignore the special structure of the afﬁne updates with private signals, and simplyconsider a generic set of nodes V with states x ( t ) and a general linear update A ( t ) : x ( t + 1) = A ( t ) x ( t ) (6)By analyzing properties of this generic update, we can draw important conclusions about the general class of dynamicsexpressed in Equation 1. We are interested in analyzing the steady state outcome x ∗ of such averaging dynamics, assuming they exist. Thefollowing deﬁnitions are useful in this regard: Deﬁnition 1 (Convergence). A model converges iff for any initial state x (0) , the limit x ∗ = lim t →∞ x ( t ) exists. Deﬁnition 2 (Consensus). A convergent model reaches consensus iff lim t →∞ x i ( t ) = lim t →∞ ¯ x ( t ) for all i ∈ V . We can see that a set of dynamics that result in consensus will have each agent possess the same steady stateasymptotically. It turns out that this is characteristic of a broad class of models [10], even though the vast majority ofreal phenomenon do not exhibit this. Persistent individual variation across agents is a key empirical feature we want tobe able to capture, and in fact has been ﬂagged as a shortcoming of models that always result in consensus [10]. Inorder to characterise non-consensus outcomes, consider the following:

Deﬁnition 3 (Heterogeneity). Deﬁne the heterogeneity of x as: H ( x ) = min i,j | x i − x j | (7) A state x is heterogeneous iff H ( x ) > . Deﬁnition 4 (Fragmentation). A non-consensus state x is fragmented iff H ( x ) = 0 . Intuitively, fragmented states are those where multiple steady state outcomes exist, but multiple agents still converge tothe same outcome. A typical example of this are the outcomes that result from bounded conﬁdence models, where forsmall enough thresholds (cid:15) , the agents will fragment into a set of m outcomes, where typically m (cid:28) n . While this iscertainly a step in the right direction with regards to modelling “rich” outcomes for the agents, we still might supposethat outcomes should approach a continuum of steady states as n grows large. The notion of heterogeneity capturesthis intuition. Heterogeneous outcomes are precisely those with persistent individual variation: each agent possesses aunique steady state outcome. An example is provided in Figure 2, where we contrast the fragmented outcome of thedynamics of a bounded conﬁdence model with the distributional outcomes of a biased learner model. We now show that the topology of the graph implicitly represented by the linear updates A ( t ) can dictate some importantproperties of the steady states that can be generated. In order to do so we need to be able to characterise time-varyinginteractions, and do so using a long-run interaction graph, as in [9]. We deﬁne the inﬁnite graph G ∞ = {V , E ∞ } , where ( i, j ) ∈ E ∞ ⇐⇒ (cid:80) t A ij ( t ) = ∞ . That is, if an edge exists between i and j , it implies they interact indeﬁnitely overthe course of the dynamics.An important result in [16, 17] and summarised in [9] offers sufﬁcient conditions to characterise steady state outcomes.In particular, suppose there exists some t such that for all t ≥ t the following assumptions hold for t ≥ t :1. A ij ( t ) ∈ { } ∪ [ δ, A ii ( t ) ≥ δ, ∀ i A ij ( t ) > ⇐⇒ A ji ( t ) > PREPRINT - J

ULY

3, 2020 (a) Non-heterogeneous outcomes in bounded conﬁdence mod-els. (b) Heterogeneous outcomes in the biased learner model.

Figure 2: (Left) A typical example of non-heterogeneous outcomes: Hegselmann-Krause dynamics for the one-dimensional case for two different values of (cid:15) , resulting in fragmented (top) and consensus (bottom) outcomes inthe asymptotic distribution. Taken from [2]. (Right) A heterogeneous outcome as seen in the biased learner model,where agents converge to a continuum of steady state outcomes instead. Histogram indicates steady state outcomesof numerical simulations and solid lines indicate a kernel density estimate, which can be shown to approximate thecontinuum outcome as N → ∞ [7].For some δ > . Assumptions (1) and (2) are what we refer to as regularity assumptions and are relatively mild. Theyensure that all persistent interactions between agents are edges of the graph G ∞ , and the graph G ∞ has self-loops .Assumption (3) is a bit stronger and ensures that G ∞ is undirected. Under these Assumptions, it is shown (Lemma1, [9]) that the dynamics will converge, and each connected component of G ∞ will converge to the same steady state.One of the particularly important consequences of this Lemma is that it implies that even for graphs that are alwaystime-varying (i.e. the sequence of graphs {G ( A ( t )) } do not converge), the states of the agents will converge. In otherwords convergence of states is robust in this instance of time-varying graphs.Before drawing our conclusions, however, we would like to generalise this a bit further to directed graphs G ∞ , since ourprivate signal models are characterised by directed edges in the form of connections to the ghost nodes. We make useof the following conventions. A strongly connected component (SCC) is a “sink” SCC (SSCC) if all outgoing pathsfrom the SCC are reciprocated. Intuitively, these SCCs form the sinks in the meta-graph constructed from SCCs of theoriginal graph. A graph consists of SSCCs if for every path between nodes i and j , a path from j to i exists. An obviousand important case of an SSCC-only graphs are strongly connected graphs, but it also includes graphs with multiplefully disconnected strongly connected components. All nodes outside the SSCCs are denoted as “quasi-connected”.Intuitively, a node i is quasi-connected if it possesses a path to some j and the path j to i does not exist. A graph thatcontains quasi-connected nodes is a quasi-connected graph. It is easy to see that all nodes are either quasi-connected ora member of a SSCC. Theorem 1

Suppose Assumptions (1) and (2) hold for t ≥ t . All sink strongly connected components of G ∞ willconverge to consensus. Importantly this means that all strongly connected graphs G ∞ converge to consensus. We can also immediately see that: Corollary 1

Suppose Assumptions (1) and (2) hold for t ≥ t . If G ∞ consists only of SSCCs and without any isolatednodes, H ( x ∗ ) = 0 for the steady state x ∗ . And is therefore aperiodic. PREPRINT - J

ULY

3, 2020That is to say, under quite mild regularity conditions, if our averaging dynamics can be represented by a sequenceof updates { A ( t ) } where mutually inﬂuencing agents exist, heterogeneity is impossible. Put differently, quasi-connectedness in the inﬁnite graph is a necessary condition for heterogeneity.Quasi-connected graphs G ( t ) are represented by update matrices A ( t ) = (cid:0) Q ( t ) R ( t )0 S ( t ) (cid:1) , where S ( t ) represents edgeswithin SSCCs, Q ( t ) represents edges between quasi-connected nodes and R ( t ) represents edges from quasi-connectednodes to SSCCs. Clearly, our private signal models with Λ > will fulﬁl this. Other common classes of models thatfulﬁl these are DeGroot models with leaders or stubborn agents . Such models assume there exist a set of nodes L that possess self-weight of , meaning each such agent is an (isolated) SSCC, and all nodes that have a path to suchnodes are quasi-connected. It is worth pointing out that when it comes to modelling real-life phenomenon, models withprivate signals can often be interpreted as equivalent to ones with stubborn agents or leaders. For example, modellinginformation diffusion on social networks such as Twitter poses a methodological question: do inﬂuential accountssuch as politicians or celebrities function as news sources or non-reciprocating members of an otherwise peer-to-peernetwork? In context of these models, both interpretations are likely to be functionally identical.While quasi-connected (directed) graphs provide us with a necessary condition for heterogeneity, we do unfortunatelylose the robust convergence properties we observed for the undirected graphs. We saw that in the undirected casethe convergence of the graph was not necessary to ensure convergence of the states of the agents, which is no longerapplicable to the quasi-connected case. In fact, we can characterise a general convergence criteria: Theorem 2

If the model converges according to Theorem 2, then the steady state outcomes are x ∗ SC = Sx SC ( t ) and x ∗ QC = M x SC ( t ) , where S = lim t →∞ S ( t : t ) . If the ﬁrst condition of Theorem is met, M = ( I − Q ) − R , otherwiseit is the arbitrary row stochastic matrix M . The corollary implies that the steady state outcomes are purely a function of the states of the SSCC nodes at the timestep t . In many models of interest (for example in stationary models), t = 0 , which means that the asymptoticoutcomes of all agents are determined exclusively by potentially a small minority.The steady states of the SSCCs will, by Theorem 1 converge to a consensus in each SSCC, regardless of the structureof x SC ( t ) or S , and will therefore have no heterogeneity. On the other hand we can see that the steady states of thequasi-connected nodes have no special structure that encourages such consensus. Let M i denote the i -th row of thematrix M . Therefore for all pairs i and j of quasi-connected nodes, x ∗ i = x ∗ j ⇐⇒ ( M i − M j ) ⊥ x SC ( t ) . Withoutany further restrictions on the structure of M or x SC ( t ) , this will be violated almost everywhere in the set of possible M and x SC ( t ) . We consider technical details in the Appendix. We now apply these general results to the speciﬁc case where we are modelling an augmented graph with ghost nodesrepresenting a model that may incorporate private signals. Let the inﬁnite graph of the augmented interactions berepresented by ˜ G ∞ and the inﬁnite graph of the original interactions be represented by G ∞ . Let us assume for simplicity6 PREPRINT - J

ULY

3, 2020that G ∞ is strongly connected . In this case, we can see that Q ( t ) = ( I − Λ) A ( t ) , R ( t ) = Λ W ( t ) and S ( t ) = I . Thatis, the SSCCs are just singleton ghost nodes, and all the original nodes are quasi-connected.From this representation we can quickly conclude from Theorem 1 that for any model that can be represented bya strongly connected interaction graph (and with regularity assumptions fulﬁlled), private signals are a necessarycondition to model heterogeneity . For example, a very common set of assumptions in models of dynamics over graphsis that the graph (and adjacency matrix A ) is stationary and strongly connected. We can see that any attempt to modelaveraging dynamics over this graph will require incorporating private signals for us to observe the rich heterogeneousoutcomes that characterise real processes.We can translate Theorem 2 into the following Corollary: Corollary 3

Consider a quasi-linear updating model where Λ (cid:54) = 0 and G ∞ is strongly connected. Suppose assumptions(1) and (2) hold for t ≥ t . The model converges if and only if one of the following conditions are met: • A ( t ) → A and W ( t ) C = b ( t ) → b • Λ W ( t ) = ( I − ( I − Λ) A ( t )) M + (cid:15) ( t ) Where (cid:15) ( t ) → and M is an arbitrary row-stochastic matrix. Here again we see that the conditions for convergence for a model with private signals are much stricter. Either both theinteraction matrix A ( t ) and the private signals b ( t ) must converge, or again are asymptotically afﬁne transformations ofone another.With the structure of private signal models, we can go further and reﬁne the conditions to observe heterogeneity in thesteady state. It is straightforward to show that: Corollary 4

Consider a quasi-linear updating model where Λ (cid:54) = 0 and G ∞ is strongly connected. Suppose assumptions(1) and (2) hold for t ≥ t . If the model converges, then the steady state is heterogeneous only if both A ( t ) and b ( t ) converge, or neither of them do. In other words, if the convergence of the two elements A ( t ) and b ( t ) are mismatched, then heterogeneity will not beachieved. As we show in the Appendix, this follows from the fact that for a convergent model, if A ( t ) → A , then b ( t ) → b , and if b ( t ) → b , A ( t ) can keep oscillating only if the steady state is fragmented or at consensus.This result can be quite useful if we know for example that one of the update matrices A ( t ) or b ( t ) is a function of x ( t ) ,in which case any convergence of the latter will guarantee convergence of the update matrices. For example, supposethat b ( t ) = b ( x ( t )) . In this case for any convergent model ( x ( t ) → x ∗ ), we can conclude that b ( t ) → b ( x ∗ ) = b ∗ .Therefore if A ( t ) continues to vary independently, we know that x ∗ must be at consensus or fragmented. A typical casein which this occurs is if the neighbourhood updates are asynchronous (i.e. a subset of vertices update their states as afunction of their neighbours). In such models, without any need to evaluate the dynamics, we know that heterogeneitywill never be achieved. We consider an example of such in Section 4.2. We are generally more interested in the ﬁrst condition for convergence in Corollary 3 (as the latter is mostly used toillustrate the restrictiveness of the conditions that allow for convergence - note that the second condition implies theﬁrst). If the model converges under this condition, we can see that the steady state of the original nodes will be: x ∗ = ( I − ( I − Λ) A ) − Λ W C = ( I − ( I − Λ) A ) − (cid:124) (cid:123)(cid:122) (cid:125) F Λ b (cid:124)(cid:123)(cid:122)(cid:125) B = F B (8)Here B ∈ R N × d is a (weighted) vector of the asymptotic private signals that agents receive. The matrix F ∈ R N × N isdenoted as the fundamental matrix in reference to the analogous object in the theory of absorbing Markov Chains [18].There are a few interesting implications of the steady state expression in Equation 8. If the model is exogenous (i.e. A ( t ) and b ( t ) evolve independently of the state x ( t ) , or are stationary), we can see that the early stage dynamics of theprocess have no impact on the steady state outcome. That is, since x ∗ is purely a function of the asymptotic A and b we The results can be generalised for quasi-connected graphs where each node is path connected in G ∞ to at least one node i where λ i > . We discuss this brieﬂy in the Appendix. PREPRINT - J

ULY

3, 2020can effectively ignore all intermediate states when we determine the steady state. For example, suppose that A ( t ) and b ( t ) are realised stochastically, but we can show independently that the noisy realisations converge almost surely to A and b . In this case only the limits are required. In this sense exogenous models with private signals are ergodic - theinitial state x (0) has no impact on the steady state outcomes .Of course, this restriction does not necessarily apply to endogenous models (where A ( t ) = A ( x ( t )) or b ( t ) = b ( x ( t )) ).We can see therefore that if we are attempting to model phenomenon that demonstrate both hysteresis and heterogeneity(with regularity assumptions), endogeneity will be a necessary element to incorporate into our averaging dynamics. Equation 8 provides a useful decoupling of the role of topology (summarised by F ) and the distribution of privatesignals (summarised by B ) on the steady state outcomes of a model with private signals. Furthermore, the form of thefundamental matrix provides a useful conceptual bridge to the theory of random walks and information diffusion overgraphs. In this section we brieﬂy introduce some of these concepts to show how they can help us intuitively interpretthe equilibrium outcomes of many multi-agent system models.Consider the augmented form of the dynamics as represented in the block matrix in Equation 3, except we consider theasymptotic form: lim t →∞ ˜ A ( t ) = ˜ A = (cid:20) ( I − Λ) A Λ W I (cid:21) (9)We can see this can be interpreted as the transition matrix of an absorbing Markov Chain, where the ghost nodesrepresent absorbing states and the original nodes represent transient states. The limiting powers of this matrix thereforeencode the probability of a random walk starting at any of the transient states and ending up at any of the absorbingstates: lim t →∞ ˜ A t = (cid:20) I − ( I − Λ) A ) − Λ W I (cid:21) = (cid:20) F Λ W I (cid:21) (10)In the language of Markov Chains, the fundamental matrix F encodes in element ( i, j ) the expected number of times awalk that begins at node i at time hits node j over its lifetime. At some ﬁnite time, all random walks must “exit” theset of transient states because they will hit an absorbing state and remain ﬁxed there. The probability of this happeningat any transient state i is λ i . The product Λ W therefore encode for each transient set the probability of moving to eachpossible absorbing state. The ( i, g ) -th entry of the product F Λ W encodes for each node i the total probability that arandom walk beginning at i ends up being absorbed at absorbing state g .The steady state of our model is x ∗ = F Λ W C = F B . Therefore, we can conclude that the steady state of a speciﬁcnode i , x ∗ i can therefore be seen to be a weighted average between all ghost nodes, with the weight for ghost g beingthe probability of a random walk starting at i hitting node g ﬁrst under the asymptotic dynamics A . For the sake ofillustration suppose X = [0 , N . That is, the agent states are one-dimensional and we can represent private signals asthe weight between two ghost nodes, one with state and one with state . The steady state outcome for each node i just encodes the probability of hitting the positive ghost node ﬁrst.An alternative but useful re-characterisation of this same process considers the diffusion of information from the privatesignals to the original ghost nodes. Suppose for illustration again that the states of the agents are one dimensional with x i ∈ [ − , +1] . Consider the following model: each agent i accrues an information set I i ( t ) of discrete signals whereeach signal takes value +1 or − . At each time step, the agent i obtains a single new signal s i ( t ) in one of two ways.A signal is drawn from a neighbour j ’s previous signal s j ( t − with probability (1 − λ i ) A ij . Alternatively withprobability λ i the agent draws a signal from a ghost node. The positive ghost node g + produces signals of only +1 thenegative ghost node g − produces signals of only − . The ghost nodes are drawn with probabilities λ i W i + and λ i W i − such that W i + − W i − = b i .We are interested in establishing the asymptotic composition of the information set I i ( t ) . In order to do so considerthe probability that a signal s i ( t ) drawn from some large t is positive ( P [ s i ( t ) = +1] ). This is simply the probabilitythat the signal originated from the positive ghost node g + . For example, the agent i could have sampled fromneighbour j at t , and the neighbour t sampled from the positive ghost node at t − . Each of these possible pathways Apart from, for example, very speciﬁc modelling choices such as b = x (0) as in the original Friedkin-Johnsen model [13]. Since t is large, the probability that a signal arrives from a ghost node approaches . PREPRINT - J

ULY

3, 2020Figure 3: Illustrating the ideas of “contact tracing”. For the graph G in the top left, we consider tracing the potentialpathways of signals that appear in node 5’s information set at time t , with two potential paths illustrated. We can usethis to construct the sampling distribution of signals drawn by node for various times t . We demonstrate this for t = 0 where it is concentrated on its own original signal, and the distribution converges over time to the limiting probability ofthe ghost nodes. 9 PREPRINT - J

ULY

3, 2020( i → j → . . . → g + ) is equivalent to a random walk from i to g + . We use the analogy of “contact tracing” to describethese possible pathways, and illustrate the general idea in Figure 3, which shows how the probabilities P [ s ( t ) = +1] and P [ s ( t ) = − converges over time to a ﬁxed distribution for draws at that agent. This ﬁxed distribution thereforeestablishes the asymptotic composition of the accrued information set I i ( t ) .In sum, we can see that the steady state outcomes of the agents in a model of averaging dynamics with private signalscan be conceptualised as the aggregation of signals that are transmitted from a small set of ghost nodes. Of course, thisis just a characterisation of the steady state (asymptotic) outcome: it is by no means precisely what is happening in theintermediate dynamics. Recall for example that the intermediate dynamics can have varying A ( t ) or b ( t ) , which canhave no relation to the random walks characterised by the asymptotic A and W .Nonetheless, this random walk interpretation can be useful to build intuition about the steady state outcomes we mightexpect for different agents, so long as we have some sense of the asymptotic graph G ( A ) . For example, if the clusteringcoefﬁcient of G ( A ) is high , it means that random walks that begin at any node i will circulate with high probabilityback to i . Therefore, each node will place a larger weight on their own private signals than in a comparative graph witha low clustering coefﬁcient. If nearby agents receive similar private signals (a form of “homophily”) we can see that thesignals that get circulated locally tend to only be of a single type, so the steady state outcome of neighbouring nodeswill be highly correlated, in contrast to a graph where private signals are independently distributed amongst all agents.If private signals are homophilous, but there exist hubs in the network, we can see that signals can travel very “far” inthe network, and local correlations are mitigated in the steady state outcomes. In this section we consider an intentionally diverse set of models of social, economic and machine behaviour to illustratehow the frameworks we discussed can be applied.

Suppose there exist a set of N agents that are connected over some latent strongly connected graph G . Each agentpossesses a state x i ( t ) ∈ X ⊂ R d . At every time step t , a randomly chosen subset of agents will update their states (i.e.updating is asynchronous).Agents in this model are contrarians, and they prefer to update their state towards observed states that are as different aspossible from their current states, and ignoring neighbours with similar states. We can think of this as modelling forexample, agents with multi-dimensional opinions, and who are most inﬂuenced by friends with “surprising” opinions.Alternatively, we can also consider the agents as representing ﬁnancial actors and x i ( t ) some representation of theirinvestment strategy. If they observe neighbours that take very different strategies, they may assume that neighbourpossesses private information, and switch to mimic the behaviour.In order to pick discordant neighbours, each updating agent will observe the states x j ( t ) of her neighbours j ∈ N ( i ) over the underlying graph G and measure the distance to her own state according to some metric (cid:107) . (cid:107) over X . Aneighbour j is picked with probability: p ij = (cid:107) x i ( t ) − x j ( t ) (cid:107) (cid:80) k ∈N ( i ) (cid:107) x i ( t ) − x k ( t ) (cid:107) (11)The agent i then updates their state towards the chosen neighbour j : x i ( t + 1) = γx i ( t ) + (1 − γ ) x j ( t ) The parameter γ > just modulates the speed of updating. If it is high, then agents update their states slowly, and viceversa if it is low. We can see that the update matrix A ( t ) = A ( x ( t )) varies at each time step as only a subset of edgesare activated. The updates are not symmetric (i.e. Assumption 3 used in the convergence of undirected inﬁnite graphs isviolated). Nonetheless, the inﬁnite graph will still be strongly connected, and we can see through the parameter γ > that realised edges will not decay to , fulﬁlling our regularity assumptions. Therefore, conditions for Theorem aremet, and we can conclude that the dynamics will converge to a consensus. For example, if the graph is constructed to connect k -nearest neighbours over some metric space, there is likely to be hightransitivity. PREPRINT - J

ULY

3, 2020 −1.0 −0.5 0.0 0.5 1.0 − . − . . . . Trajectories of contrarian agents (t=10) x x l ll ll lll l l (a) Contrarian agents at t = 10 −1.0 −0.5 0.0 0.5 1.0 − . − . . . . Trajectories of contrarian agents (t=20) x x l ll llll ll l (b) Contrarian agents at t = 20 −1.0 −0.5 0.0 0.5 1.0 − . − . . . . Trajectories of contrarian agents (t=30) x x l ll lllll l l (c) Contrarian agents at t = 30 −1.0 −0.5 0.0 0.5 1.0 − . − . . . . Trajectories of contrarian agents (t=200) x x llllllllll (d) Contrarian agents at t = 200 Figure 4:

The two-dimensional trajectories of contrarian agents converging towards consensus. The current position of an agent isdenoted with a large circle and their trail of past positions is a solid line with the same colour. In the bottom right panel we can seethe states have converged. PREPRINT - J

ULY

3, 2020We illustrate this in Figure 4, where we take X = [ − , , N = 10 , γ = 0 . over a strongly connected directedErdos-Renyi graph, and use the Euclidean norm to measure distance between nodes. We can see that despite the earlydynamics of the agents being somewhat haphazard as they try to gravitate away from nearby nodes, the dynamicseventually converge. In the previous example we showed how strongly connected inﬁnite graphs without private signals converge toa consensus and fail to produce heterogeneous outcomes. We now show that even if private signals are present,heterogeneity may not be achieved if the conditions of Corollary 4 are not met. In particular we demonstrate that a lackof co-ordination between the convergence A ( t ) and b ( t ) can eliminate or enable heterogeneity in steady state outcomes.Suppose we have a “swarm” of agents located in [ − , that are attempting to search for food sources (“landmarks”)while trying to not stray too far from a set of preferred neighbours. To model this suppose we have the preferred setof neighbours encoded in an undirected k -regular graph (that is, each node has k other agents they are trying to staynearby ). There are also a set L = { L , L , . . . , L k } landmarks randomly distributed over [ − , . The swarm agentssearch for landmarks conservatively, moving closer to their closest landmark ( l i ( t ) ) at each timestep, but also makingsure they do not stray too fair from their neighbourhood. Intuitively, we could suppose that agents get more nutrientsthe closer they are to a landmark, but do not want to stray too far from their neighbours. That is, each agent updatestheir position as: x i ( t + 1) = γl i ( t ) + (1 − γ )2 x i ( t ) + (1 − γ )2 k − i (cid:88) j ∈N ( i ) x j ( t ) (12)We refer to these as the synchronous dynamics. We can see that Assumptions (1) and (2) are fulﬁlled, G ∞ is stronglyconnected, and A ( t ) = A is ﬁxed. Furthermore, we can see that if this model converges, then the private signals(the location of the closest landmark), which vary only as a function of x ( t ) , will also converge. As such, this modelconverges (as per Theorem 2) and displays the necessary conditions to achieve heterogeneous outcomes.On the other hand, consider the following marginal difference: agents choose a random subset of their neighbourhoodsat any time step when they update: x i ( t + 1) = γl i ( t ) + (1 − γ )2 x i ( t ) + (1 − γ )2 k i ( t ) − (cid:88) j ∈N ( i,t ) x j ( t ) (13)We refer to these as the asynchronous updates. The only difference is the set of neighbours N ( i, t ) (and the degree k i ( t ) ) is now a function of t . We can now see that for a convergent model, b ( x ( t )) → b , but A ( t ) will vary endlessly,meaning that no heterogeneous steady states can be achieved as per Corollary 4.We illustrate two representative trajectories in Figure 5, with asynchronous dynamics on the top panels and synchronousdynamics on the bottom panels. We chose N = 20 and k = 3 , randomly picking locations of landmarks over thestate space. The starting positions of the agents are uniformly drawn over the state space but identical between the twosets of trajectories. This means the only difference between the dynamics is that the synchronous dynamics cause theagent to select neighbours at each time step and in the asynchronous dynamics random neighbour is chosen halfthe time and the other half both neighbours are chosen. We can see despite this minute difference, the trajectories arevery different, with the asynchronous dynamics resulting in a consensus as predicted and the synchronous dynamicsresulting in a heterogeneous steady state with each agent converging to a unique position. We now consider an example where private signals exist explicitly and show how both heterogeneous and consensusoutcomes can arise in different parts of the parameter space. We consider an example of machine behaviour with asimple model of how recommender systems might adapt to the preferences of users, a version of which was consideredin [7]. Consider a set of N agents over a ﬁxed social network G . Agents possess some state x i ( t ) ∈ [ − , thatrepresents their current tastes. Agents update their tastes by interpolating between the tastes of their neighbours and that The choice of graph structure is arbitrary; the argument could equally be made for a fully connected graph (agents have nopreferences over neighbours), a hub and spoke graph (i.e. leader/follower structure). PREPRINT - J

ULY

3, 2020 − . − . . . . Co−ordinates of each agent over time

Time step S t a t e s x co−ordinatex co−ordinate (a) Swarm x and x co-ordinates over time (asynchronous). −1.0 −0.5 0.0 0.5 1.0 − . − . . . . Trajectories of swarms (t=300) x x l ll ll (b) Swarm trajectories at t = 300 (asynchronous). − . − . . . . Co−ordinates of each agent over time

Time step S t a t e s x co−ordinatex co−ordinate (c) Swarm x and x co-ordinates over time (synchronous). −1.0 −0.5 0.0 0.5 1.0 − . − . . . . Trajectories of swarms (t=300) x x l ll ll (d) Swarm trajectories at t = 300 (synchronous). Figure 5:

The two-dimensional trajectories of swarms with asynchronous updating (top panels) and synchronous updating (bottompanels). The left hand panels denote the x and x co-ordinates of all the agents over time. The right hand panels show the 2Dtrajectories. The current position of an agent is denoted with a large circle and their trail of past positions is a solid line with thesame colour. The positions of the landmarks are denoted with solid black circles. In the top right panel we can see the agents haveconverged to a single point, whereas in the bottom right panel they have all achieved a unique outcome. Both dynamics are realisedover the same k -regular graph with N = 20 and k = 3 , and the same starting states uniformly distributed over the unit square. Theonly difference otherwise is that asynchronous agents randomly sample a subset of neighbours when they update. PREPRINT - J

ULY

3, 2020 . . . . . . Mean steady state outcomes across parameter space, initial state a p −1.0−0.50.00.51.0 (a) Parameter space demonstrating areas of consensus (darkred/blue) and heterogeneous outcomes. Steady state outcome (full distribution) for p =0.55, a = 0. 4 Steady state outcomes F r equen cy −0.5 0.0 0.5 (b) Example of a heterogeneous steady state. Figure 6: A) Demonstration of how a model with private dynamics can result in both heterogeneous and consensusoutcomes. The x-axis measures the weight on private signals α and the y-axis is the fraction of positive signals p in σ ( x (0)) . The gradient at each point is the mean value of steady state outcomes x ∗ , given that the dynamics unfold as perEquation 14. In the top left and bottom left, where α is low and the initial distribution x (0) is weighted towards positiveor negative signals, the dynamics cascade so that all agents (user and recommender) converge to either +1 or − . Theregion in between denotes outcomes where the recommenders do not cascade, the steady state mean outcome is not atan extreme, and heterogeneous outcomes are supported. B) An example of a heterogeneous outcome, where p = 0 . , α = 0 . (as indicated by a cross on the left panel). Numerical simulations were conducted over an Erdos-Renyi graphwith (cid:104) k (cid:105) = 12 , N = 1000 , and where a ij = k i if i, j were connected on G .of a personalised recommender that attempts to provide a signal σ i ( t ) = {− , +1 } that is as close as possible to thecurrent state of its user x i ( t ) . Clearly, σ i ( t ) = sign ( x i ( t )) .For example, x i ( t ) could represent a user’s political stance and the recommender engine offers news articles that matchan agent’s stance. Alternatively, x i ( t ) could represent a user’s purchase history that favours competing brands ( +1 or − ) and the recommender system offers products that complement a user’s past purchases. A typical example of this aretechnology ecosystems that confer network effects, such as phones (Apple vs Android) and the associated accessories.If we are not interested in modelling social network effects, the dynamics can also be extended to understand howcorrelated tastes for a single user might evolve. For example, suppose there exists N different dimensions of a user’spreferences for i.e. food. The state x i ( t ) represents strength of preference for one of two extremes, and the preferencesshape each other (i.e. if a user starts to prefer spicy food they also begin to prefer certain drinks). If the recommendersystem attempts select from a combinatorial set of meals to recommend, we can see how our dynamics can explore thefeedback loops between user’s tastes and the recommendations of the algorithm.Either way, we can summarise the dynamics of the system as: x i ( t + 1) = (1 − α ) (cid:88) j a ij x j ( t ) + α sign ( x i ( t )) (14) ⇒ x ( t + 1) = (1 − α ) Ax ( t ) + ασ ( x ( t )) (15)Where x ( t ) ∈ [ − , N represents the tastes of each user, σ ( t ) ∈ {− , } N represents the possible conﬁgurations ofeach personalization algorithm, A is a stochastic matrix representing the weights nodes place on neighbours, and α denotes the strength of the recommender inﬂuence. The dynamics conferred by this model are rich, and we do not go14 PREPRINT - J

ULY

3, 2020into a great amount of detail (see [7] for a more in-depth analysis), but the key aspect we are interested in are howmodels with private signals can support both heterogeneous and consensus outcomes depending on the parameterisation.For example, if we take α > . , we can see that the sign of agents will never change, so the private signals will beﬁxed, and with high probability the resulting steady state x ∗ = α ( I − (1 − α ) A ) − σ ( x (0)) will be heterogeneous.However, as α falls below . , the conﬁgurations of the recommendations will begin to vary over time. It turns out thatas α falls (as the recommender effects get weaker ), it increases the probability of a cascade occurring, in which case thepersonalisation systems all begin to align in their recommendations ( σ ( t ) → ± ) and as a result all agents end up witha consensus around +1 or − . In other words, the weaker the recommender effects, the less diversity is promoted forusers (in the form of lower expected heterogeneity). This is illustrated in Figure 6. Finally we consider an example where update matrices are not dynamic ( A ( t ) = A and b ( t ) = b ) but our random walkinterpretations can provide useful intuition as to the distributions of the heterogeneous steady state outcomes that occur.One important class of stationary models in economics that are nested in our dynamics are linear quadratic games.They are commonly used to model strategic complementarity in games played over networks, and have been used toinvestigate empirical questions ranging from criminal activity to educational attainment to industrial organisation (see[4] for a review).The basic setup is as follows . For a set of N agents, each agent i chooses an effort level x i ≥ that incurs aprivate reward r i x i and a private cost x i . Furthermore, the agent also receives a spillover a ij x i x j > reward fromco-ordinating activity levels with other nodes x j . Gathering this into a utility function we get: U i ( x ) = r i x i + (cid:88) j a ij x i x j − x i (16)A typical example used is criminal activity, for example in [20]. In such models, criminals choose a level of criminalactivity to engage in. Criminal activity results in some expected private reward, which increases as more associates areinvolved in the crime. The costs can capture for example the probability of capture. We can see therefore that the utilitystructure encourages agents to engage in more activity the more of their peers do so.Solve the partial derivative of 16 with respect to x i provides us with the best reply dynamics: x i ( x − i ) = argmax x i [ U i ( x i , x − i )] = (cid:88) j a ij x j + r i (17) ⇒ x ( k + 1) = Ax ( k ) + r (18)Where in the last step we just vectorised the best reply function to the states indexed at k to return to our familiarafﬁne form, with A summarising interaction effects between agents and r being a vector of private rewards. We cansuppose the level of effort is bounded (i.e. inﬁnite effort levels are ruled out, so the state space is compact). For thesake of exposition, we suppose the interaction effects matrix A is row substochastic (i.e. (cid:80) j a ij < ) . The best replydynamics therefore converge to the Nash Equilibrium, which is, as expected: x ∗ = ( I − A ) − (cid:124) (cid:123)(cid:122) (cid:125) F r (cid:124)(cid:123)(cid:122)(cid:125) B We can see that the equilibrium outcomes of such games will in general be heterogeneous (this is almost sure if elementsof A are drawn from some independent continuous distribution and r (cid:54) = c ). This heterogeneity will be the case even ifthe private signal vector r is not particularly varied - for example it can consist of only two levels of reward r and More general forms are possible, but incur a great deal of extra notation. For example, [19] consider Linear QuadraticGaussian games with multi-dimensional action spaces for agents alongside a learning framework where private rewards are realisedstochastically, resulting in analysis for a Bayes-Nash equilibrium instead. We consider a one-dimensional, deterministic version forsimplicity. This is not a particularly restrictive assumption and is in fact closely related to the conditions required for a Nash equilibrium toexist in this game, for example see [21]. It is straightforward to generalise from the example we consider. PREPRINT - J

ULY

3, 2020 . . . . . . . Transitivity vs expected number of returns

Transitivity E x pe c t ed nu m be r o f r e t u r n s ( F ii − ) Figure 7: We generated a sequence of small-world networks rewired from a 2D lattice with nodes connected toneighbours up to distance 2 and N = 100 . The rewiring probability ranges in 20 steps from 0.2 to 0 with 20 iterationsin each case. The interaction matrix is realised with a ij = . k i if i and j are connected, where k i is the degree of i . Themean transitivity at each level of rewiring is displayed alongside the mean value of F ii − for all nodes i . We can seethat as transitivity increases, the expected number of returns increase. Note the solid line merely connects the numericalmeans, and is used to emphasize the monotonic increase. r . The rows of the fundamental matrix F = ( I − A ) − encode how small differences in topological position of thenode in the weighted graph implicitly represented by A will encourage different steady state actions are adopted bythe agents. Put differently, even if the variation in private signals is low, the topological variation is often sufﬁcient toinduce heterogeneous outcomes where each agent adopts a unique strategy in equilibrium.We can also use the analogy of random walks we have developed to build some useful intuition about the generalcharacteristics of such equilibrium. For example, note that the partial derivative of an agent’s steady state outcome withregards to their own private reward is: ∂x ∗ i ∂r i = F ii (19)Recall that the diagonal elements of the fundamental matrix encode the expected number of times a random walk thatcommences at i hits i before it is absorbed (the expected number of returns is F ii − ). Therefore, we can concludethat any change in the network topology that increases the number of cycles (while holding all other features ﬁxed)will increase the attention agents pay to their own private rewards in equilibrium. We can sense check this in Figure 7,where we consider a range of small-world networks generated on a 2D lattice with a decreasing rewiring probability. Asthe rewiring probability p decays to , the network shifts from an Erdos-Renyi network with low transitivity to a latticewith high transitivity. The (mean) transitivity is measured for p ranging from . to , and is compared to the mean of F ii − , the expected number of returns of a random walk to each node, and the partial derivative we are interested in.Therefore, we can see that as the underlying interactions become increasingly transitive, the strategic choices of eachagent will be more heavily inﬂuenced by their private reward. Suppose for example we wished to modulate the expectedlevel of activity of some agent by reducing their private reward (in the example of criminal networks, this can betranslated as increasing surveillance on that agent, increasing their probability of capture and reducing their expectedreward for activity). We can see this strategy will be more effective if the network is highly transitive. This occurs16 PREPRINT - J

ULY

3, 2020naturally if, for example, agent interactions are shaped by physical proximity, which might be the case for i.e. physicalcrimes as opposed to cyber-crimes. Intuitively, this occurs because increases in an agent’s activity have a greaterspillover effect if local clusters are closely connected and reinforce each other, whereas agents with disconnectedneighbours will have less reinforcement between those neighbours.

In this paper, we have considered the problem of modelling heterogeneous outcomes in multi-agent systems. Asdemonstrated, many models on such systems will surely result in consensus, which is often unrepresentative of thereal empirical phenomenon we are wishing to investigate. In order to address this, we developed a set of necessarycriteria for our models to instead produce heterogeneous outcomes, where each agent possesses a unique outcome inthe steady state. Furthermore, through an appropriate analogy with random walks on graphs, we provide an intuitivecharacterisation of the features of this steady state, which can help us ensure our desired model contains the features wemay be hoping to represent in the real-world phenomenon.One of the key insights from our analysis was that for strongly connected graphs G ∞ , private signals were a necessaryfeature to ensure heterogeneous outcomes were possible. An intuitive way of seeing why this is the case follows fromobserving that the averaging dynamics enforced by the graph structure ( A ( t ) ) is an inherently convex operation, and bynecessity ensures that the span of the agents as a result of averaging is contained in the convex hull of the original setof states. In isolation, the hull must shrink, leading to the consensus outcomes we are familiar with. The presence ofprivate signals helps us break out of the convexity of these dynamics, and provide in some sense external perturbationsthat allow agents to explore the state space instead of iteratively compounding any similarity that exists between agents.One shortcoming of our analysis is that we were not able to provide sufﬁcient criteria for our models to result inheterogeneous outcomes. Given the structure we know our outcomes must take ( F B ), any such theorem is likely to berelated to the eigen-structure of the interaction matrices, and a precise measure-theoretic analysis for distributions oversuch matrices. We consider this a promising direction for future study.

Appendix

A1: Private signals and ghost nodes

We demonstrate how a model with private signals b ( t ) ∈ X ⊂ R N × d as in Equation 1 can be written in the augmentedform: ˜ X ( t + 1) = ˜ A ( t ) ˜ X ( t ) (20)Where ˜ A ( t ) is a row stochastic matrix. We have made the following augmentations: ˜ X ( t ) = (cid:20) X ( t ) C (cid:21) ∈ R ( N +2 d ) × d (21) ˜ A ( t ) = (cid:20) ( I − Λ) A ( t ) Λ W ( t )0 I (cid:21) ∈ R ( N +2 d ) × ( N +2 d ) (22) C = d  x . . . x . . . x . . . x . . . . . . . . . . . . . . . . . . x d . . . x d  ∈ R d × d (23) W ( t ) = d −  w (1)1 ( t ) w (1)1 ( t ) w (1)2 ( t ) w (1)2 ( t ) . . . w (1) d ( t ) w (1) d ( t ) w (2)1 ( t ) w (2)1 ( t ) w (2)2 ( t ) w (2)2 ( t ) . . . w (2) d ( t ) w (2) d ( t ) . . . . . . . . . . . . . . . . . . . . .w ( N )1 ( t ) w ( N )1 ( t ) w ( N )2 ( t ) w ( N )2 ( t ) . . . w ( N ) d ( t ) w ( N ) d ( t )  ∈ R N × d (24)17 PREPRINT - J

ULY

3, 2020Where x l is the upper bound of the l -th dimension of the state vector space x , and x l for the corresponding lowerbound . w ( i ) l ( t ) + w ( i ) l ( t ) = 1 then refer to the weights the i -th node places on the upper and lower bounds of the l -thcomponent respectively, ensuring that we can express b il ( t ) = w ( i ) l ( t ) x l + (1 − w ( i ) l ( t )) x l , and more generally that b ( t ) = W ( t ) C . The preceding d and d − weights in front of the matrices merely ensure that the matrix ˜ A ( t ) remainsrow stochastic. A2: Proof of Theorem 1 and Corollary 1

Theorem 1 states that for any model expressible as: x ( t + 1) = A ( t ) x ( t ) (25)With G ∞ ( { A ( t ) } ) deﬁned as in the main text, all sink strongly connected components of G ∞ must converge toconsensus, so long as the regularity assumptions are fulﬁlled. As a reminder, these are, for some δ > :1. A ij ( t ) ∈ { } ∪ [ δ, A ii ( t ) ≥ δ, ∀ i In order to prove our result we can utilize the following result from [22], which makes use of some extra terminology.Deﬁne γ ( A ) : γ ( A ) = max j max i ,i | a i ,j − a i ,j | (26)That is, γ ( A ) measures the extent to which the rows of A vary. Furthermore, a stochastic matrix A is indecomposableand aperiodic (SIA) iff A ∗ = lim t →∞ A t exists and γ ( A ∗ ) = 0 . Then the ﬁrst theorem in [22] states: Theorem 3

For any product of stochastic matrix A ( t ) A ( t − . . . A (1) A (0) , let any subproduct (product of somesubset of consecutive matrices) be SIA. Then for any (cid:15) > there exists n ( (cid:15) ) such that any subproduct A of length n satisﬁes γ ( A ) < (cid:15) . We can now proceed. Firstly, designate some t such that for all t ≥ t , A ( t )[ i, j ] > → ( i, j ) ∈ E ∞ . That is, aftersome long enough time period, all ﬁnite interactions will cease, and any edges that are instantiated in the A ( t ) matricesmust be drawn from the inﬁnite edge set. Without loss of generality , assume that G ∞ consists only of k SSCCsdenoted C = { C , C , . . . , C k } .Importantly, this means there exists some permutation of A ( t ) which organises the matrix into block diagonals, whereeach block diagonal is a stochastic matrix corresponding to a sink strongly connected component of G ∞ (that is, nopaths exist in either direction between two components of G ∞ ; recall that in our deﬁnition of SSCCs there are nooutgoing paths from each SSCC). In this case we can designate x ( t ) = (cid:81) t t =0 A ( t ) x (0) , and restart the dynamics with x ( t ) as our new initial vector. We can now see that the matrix updates will be of the form: x ( t + 1) =  A ( t ) · · · ... . . . ... · · · A k ( t )  x ( t ) =  (cid:81) tn = t A ( n ) · · · ... . . . ... · · · (cid:81) tn = t A k ( n )  x ( t ) (27)Establishing the asymptotic properties of this process simpliﬁes to establishing the asymptotic properties of (cid:81) tn = t A r ( n ) , since the blocks do not otherwise interact.The simplest case arises when the block is invariant ( A r ( n ) = A r , ∀ n ≥ t ). In this case, all known results aboutDeGroot models can be applied directly (see, for example, [8]). In particular, by Assumptions (2), the subgraph G ( r ) ∞ isstrongly connected and aperiodic. Then A r is irreducible and A nr → w (cid:48) , meaning that the nodes in this block willconverge to consensus. Recall that X is a compact set, and is thus bounded All quasi-connected components can be partitioned into a block that does not interact with or otherwise inﬂuence the stronglyconnected components. PREPRINT - J

ULY

3, 2020A more general case is when A r ( n )[ i, j ] > ⇐⇒ ( i, j ) ∈ E ∞ . That is, each (sub)matrix contains all (as opposedto a subset of) the edges of the inﬁnite interaction matrix, but the weights may vary. Denoting as ˆ A r the “mean”interaction (sub)matrix, the stochastic matrix induced by row normalizing the adjacency matrix of the inﬁnite (sub)graph( ˆ A r = A [ G ( r ) ∞ ] ). The product (cid:81) tn = t A r ( n ) will inherit many of the properties of ˆ A ( t − t +1) r . In particular we can provethe following Lemma: Lemma 1

For some graph G , let A [ G ] be an induced adjacency matrix where A ij ≥ δ ⇐⇒ ( i, j ) ∈ E ( G ) forsome δ > . Similarly, let G [ A ] be the graph induced by a square matrix A . Consider some arbitrary ﬁnite setof p induced adjacency matrices { A (1)[ G ] , A (2)[ G ] , . . . , A ( p )[ G ] } and arbitrary reference matrix ˆ A = A [ G ] . Then ˜ G ( ˆ A p ) = ˜ G ( (cid:81) pn =0 A ( n )) . Put differently, if we take some arbitrary reference matrix generated from a graph ( ˆ A [ G ] ) and raise it to some power p ,we can denote the graph induced by ˆ A p as ˜ G ( ˆ A p ) . If we do the same with a a product of adjacency matrices (cid:81) pn =0 A ( n ) where each A ( n ) also fully realises the original graph G , the resulting product will generate the same graph ˜ G .This follows from induction. For the base case, note that under the deﬁnition we can directly see G ( ˆ A ) = G ( A ( n )) .Now suppose it holds for some arbitrary ( m − . Then for m , we can see: ( i, j ) ∈ E ( G ( ˆ A m )) ⇐⇒ ˆ A m [ i, j ] > (28) ⇐⇒ (cid:104) ˆ A [ i, ] , ˆ A m − [ , j ] (cid:105) > (29) ⇐⇒ ∃ ( i, k ) ∈ E ( G ( ˆ A )) ∧ ( k, j ) ∈ E ( G ( ˆ A ( m − )) (30)But note that since by the inductive step, G ( ˆ A ( m − ) = G ( (cid:81) m − n =0 A ( n )) . Together with the base case: ∃ ( i, k ) ∈ E ( G ( ˆ A )) ∧ ( k, j ) ∈ E ( G ( ˆ A ( m − )) (31) ⇐⇒ ∃ ( i, k ) ∈ E ( G ( A ( m ))) ∧ ( k, j ) ∈ E ( G ( m − (cid:89) A ( n ))) (32) ⇐⇒ (cid:104) A ( m )[ i, ] , [ m − (cid:89) A ( n )][ , j ] (cid:105) > (33) ⇐⇒ m (cid:89) A ( n )[ i, j ] > (34) ⇐⇒ ( i, j ) ∈ E ( G ( m (cid:89) A ( n )) (35) ⇒ ( i, j ) ∈ E ( G ( ˆ A m )) ⇐⇒ ( i, j ) ∈ E ( G ( m (cid:89) A ( n )) (36) ⇒ G ( ˆ A m ) = G ( m (cid:89) A ( n )) (37)Giving us our desired result. We can therefore proceed with the knowledge that the graph induced by any powerof the (sub)matrix ˆ A pr will be identical to the graph induced by the product of the p terms (cid:81) p A r ( n ) . That is, G [ ˆ A pr ] = G [ (cid:81) p A r ( n )] . Importantly, this means that any properties that are inherited by any adjacency matrix of such agraph are equivalent between these two representations.To exploit this property, note that if the (sub)graph G ( r ) ∞ is strongly connected and aperiodic, then so is ˜ G [ ˆ A r [ G ( r ) ∞ ] p ] . Since G ( ˆ A pr ) = G ( (cid:81) tn = t A r ( n )) , the graph induced by the product of any set of consecutive matrices A r ( n ) is alsostrongly connected and aperiodic. Finally, this means that the product of the matrices themselves, (cid:81) tn = t A r ( n ) are G ( r ) ∞ is strongly connected and aperiodic if and only if its adjacency matrix A [ G ( r ) ∞ ] is primitive (see, for example, Section 1.3 of[18] for a discussion). This means that for some power q , A [ G ( r ) ∞ ] q has only strictly positive entries. This means that any powersof the matrix are also primitive, and therefore the graph induced by this power matrix must also possess strong connectivity andaperiodicity. PREPRINT - J

ULY

3, 2020stochastic, irreducible and aperiodic. Since this holds for any p we therefore fulﬁl the conditions of Theorem 3, and assuch we can see the limit of the products of these matrices is a consensus matrix. That is, lim t →∞ (cid:81) tn = t A r ( n ) → a (cid:48) r .Finally, consider the most general case where there are no restrictions on A r ( n ) (except of course that edges are onlydrawn from the inﬁnite graph). By Assumption 2, A r ( n )[ i, i ] ≥ δ for all i , then it is straightforward to see that theproduct of any two consecutive matrices AB will contain the edges of both the matrices. Since all edges in the inﬁnitegraph must recur, we can always partition the product lim t →∞ (cid:81) tn = t A r ( n ) into subproducts where each subproduct “hits”all the edges from the inﬁnite graph, ensuring that the subproduct contains all the edges from the inﬁnite graph. Now weare simply in the regime where each matrix is a (full) realisation of the inﬁnite graph, and the results from above apply.Corollary 1 follows straightforwardly from the deﬁnition of heterogeneity. A3: Proof of Theorem 2, Corollary 2, Corollary 3 and Corollary 4

As a reminder, Theorem 2 states:

Theorem 2

Suppose Assumptions (1) and (2) hold for t ≥ t . Let edges between quasi-connected nodes, SSCCs andquasi-connected nodes to SSCCs on G ∞ be represented in each A ( t ) by Q ( t ) , S ( t ) and R ( t ) respectively. Then themodel converges if and only if for all quasi-connected nodes one of the following conditions is met: • Q ( t ) → Q and R ( t ) → R • R ( t ) S = ( I − Q ( t )) M + (cid:15) ( t ) Where (cid:15) ( t ) → , M is an arbitrary row-stochastic matrix, and S = lim t →∞ S ( t : t ) . The ﬁrst thing to note is that the ﬁrst condition of Theorem 2, ( Q ( t ) → Q , R ( t ) → R ) is a special case of of the secondcondition, since we can always set M = ( I − Q ) − RS (Recall that since S ( t ) consists of sink strongly connectedcomponents with positive self-weights, S = lim t →∞ S ( t : t ) is well-deﬁned as per Theorem 1). Then, the left handexpression converges to RS . The right hand expression converges to: lim t →∞ ( I − Q ( t )) − ( I − Q ) RS = ( I − Q ) − ( I − Q ) RS = RS (38)Therefore, the burden of proof is on the second condition: • R ( t ) S = ( I − Q ( t )) M + (cid:15) ( t ) , where S = lim (cid:81) S ( t ) , M is some stochastic matrix, and (cid:15) ( t ) → . Which clariﬁes the sole, highly speciﬁc condition where the dynamics can converge without the convergence of eachsub-matrix. Recall that for t ≥ t , A ( t ) = (cid:0) Q ( t ) R ( t )0 S ( t ) (cid:1) , where Q ( t ) ∈ R m × m , R ( t ) ∈ R m × p , S ( t ) ∈ R p × p and n = m + p . Let us denote matrix products as M ( t : t ) = (cid:81) t n = t M ( n ) . The state vector x ( t ) can then be written: x ( t + 1) = (cid:20) Q ( t ) R ( t )0 S ( t ) (cid:21) x ( t ) = (cid:20) Q ( t : t ) R ( t : t )0 S ( t : t ) (cid:21) x ( t ) (39)From here we can draw some quick conclusions. Note that S ( t : t ) just consists of block diagonal sink stronglyconnected components (each absorbing set), and therefore by Theorem 1, S ( t : t ) → S , where S consists of blockdiagonal consensus matrices.Next, Q ( t : t ) = (cid:81) tn = t Q ( n ) → . Let G [ Q ] ∞ denote the subgraph of the inﬁnite graph consisting of these nodes. Letthe nodes of this subgraph that are directly connected to an SSCC be called the “exit nodes”. Denote the i -th rowsum ofa matrix M ( t ) as (cid:107) M i ( t ) (cid:107) .We will prove the follow speciﬁc claim. Start from any t ∗ ≥ t . For any node i ∈ G [ Q ] ∞ at distance d from an exit node,there exists some t ( d ) ≥ t ∗ such that (cid:107) Q i ( t : t ∗ ) (cid:107) ≤ (1 − δ T ( d +1) ) < for all t ≥ t ( d ) . Here, T ≥ is the longesttime between which all edges of the inﬁnite graph G ∞ are realised at least once.The proof follows by induction. Let us begin with d = 0 (i.e. for any exit node). We pick some arbitrary t ∗ ≥ t tobegin our analysis. There will exist some t (0) ≥ t ∗ such that an edge from the exit node i to the SSCC c is activated, inwhich case it must be realised with weight a ic ≥ δ . As such, the i -th rowsum (cid:13)(cid:13) Q i ( t (0) ) (cid:13)(cid:13) ≤ (1 − δ ) (since the matrix A ( t (0) ) is row stochastic). For the next time step t (0) + 1 , we can see that:20 PREPRINT - J

ULY

3, 2020 (cid:13)(cid:13)(cid:13) Q i ( t (0) + 1 : t (0) ) (cid:13)(cid:13)(cid:13) = Q ii ( t (0) + 1) (cid:13)(cid:13)(cid:13) Q i ( t (0) ) (cid:13)(cid:13)(cid:13) + (cid:88) j (cid:54) = i Q ij ( t (0) + 1) (cid:13)(cid:13)(cid:13) Q j ( t (0) ) (cid:13)(cid:13)(cid:13)(cid:124) (cid:123)(cid:122) (cid:125) ≤ (40) ≤ Q ii ( t (0) + 1) (cid:124) (cid:123)(cid:122) (cid:125) ≥ δ (1 − δ ) + (cid:88) j (cid:54) = i Q ij ( t (0) + 1) ≤ δ (1 − δ ) + (1 − δ ) < (41)We can repeat the above argument until t (0) + T to show that: (cid:13)(cid:13)(cid:13) Q i ( t (0) + T : t (0) ) (cid:13)(cid:13)(cid:13) ≤ (1 − δ )(1 + δ + δ + . . . + δ T ) = 1 − δ T < (42)Since the edge a ic must be realised by T , the rowsum bound would be “reset” back to (1 − δ ) < (1 − δ T ) . This latterquantity therefore represents an upper bound for all exit nodes for t ≥ t (0) .Now suppose the claim holds for any d . Consider a node i at distance d + 1 . We know that for all t ≥ t ( d ) , the rowsum (cid:13)(cid:13) Q k ( t : t (0) ) (cid:13)(cid:13) of their neighbour k will be upper bounded by (1 − δ ( d +1) T ) . Suppose the next time the edge to theneighbour is realised is t ( d ) + T ≥ t ( d +1) > t ( d ) . Since the edge Q ik ( t ( d +1) ) is realised with value at least δ , we cansee that: (cid:13)(cid:13)(cid:13) Q i ( t ( d ) : t (0) ) (cid:13)(cid:13)(cid:13) = Q ik ( t ( d +1) ) (cid:13)(cid:13)(cid:13) Q k ( t ( d +1) − t (0) ) (cid:13)(cid:13)(cid:13)(cid:124) (cid:123)(cid:122) (cid:125) ≤ (1 − δ ( d +1) T ) + (cid:88) j (cid:54) = k Q ij ( t ( d +1) ) (cid:13)(cid:13)(cid:13) Q j ( t ( d +1) − t (0) ) (cid:13)(cid:13)(cid:13)(cid:124) (cid:123)(cid:122) (cid:125) ≤ (43) ≤ Q ik ( t ( d +1) ) (cid:124) (cid:123)(cid:122) (cid:125) ≥ δ (1 − δ ( d +1) T ) + (cid:88) j (cid:54) = k Q ij ( t ( d +1) ) ≤ δ (1 − (1 − δ ( d +1) T )) + (1 − δ ) < (44)We now repeat the steps for the base case to get: (cid:13)(cid:13)(cid:13) Q i ( t ( d +1) + T : t (0) ) (cid:13)(cid:13)(cid:13) ≤ (1 − δ )(1 + δ + δ + . . . + δ T − ) + δ T (1 − δ ( d +1) T ) (45) = (1 − δ T ) + δ T − δ ( d +2) T = 1 − δ ( d +2) T < (46)Once again, since all edges are realised by T steps, we can conclude for the node i at distance ( d + 1) from an exitnode, for all t ≥ t ( d +1) , the row sum (cid:13)(cid:13) Q i ( t : t (0) ) (cid:13)(cid:13) ≤ (1 − δ ( d +2) T ) . Suppose the longest path from an exit node toall nodes in G [ Q ] ∞ is D . We can conclude therefore that for all t ≥ t ( D ) , the row sum (cid:107) Q i ( t : t ∗ ) (cid:107) ≤ (cid:13)(cid:13) Q i ( t : t (0) ) (cid:13)(cid:13) ≤ (1 − δ ( D +1) T ) = (1 − γ ) for all quasi-connected i .Recall that t ( d ) ≤ t ( d − + T . Therefore, t ( D ) ≤ t ∗ + ( D + 1) T . It follows therefore that for any starting point t ∗ , wecan conclude that (cid:107) Q (( t ∗ + ( D + 1) T ) : t ∗ ) (cid:107) i nf ty ≤ (1 − γ ) . Here (cid:107) M (cid:107) ∞ is the maximum row sum for the matrix M .Finally, we note that lim t →∞ Q ( t : t ) = (cid:81) ∞ n = t Q ( n ) can be partitioned into subproducts of length ( D + 1) T , which werefer to as ˜ Q ( n ) . Each subproduct will have (cid:13)(cid:13)(cid:13) ˜ Q ( n ) (cid:13)(cid:13)(cid:13) ≤ (1 − γ ) . In the following let (cid:107) M (cid:107) = (cid:107) M (cid:107) ∞ : lim t →∞ (cid:107) Q ( t : t ) (cid:107) = lim t →∞ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) t (cid:89) n =0 ˜ Q ( n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ lim t →∞ t (cid:89) n =0 (cid:13)(cid:13)(cid:13) ˜ Q ( n ) (cid:13)(cid:13)(cid:13) ≤ lim t →∞ (1 − γ ) t = 0 (47) ⇒ lim t →∞ Q ( t : t ) = 0 (48)Since the other blocks are guaranteed to converge, in order to show that A ( t : t ) converges, we just need to provethat R ( t : t ) converges under the conditions we stated. Beginning in the easier direction (only if), ﬁrstly suppose R ( t : t ) → M . Then the steady state is just: x ∗ = lim t →∞ x ( t ) = lim t →∞ A ( t : t x ( t ) = (cid:20) M S (cid:21) x ( t ) (49)21 PREPRINT - J

ULY

3, 2020Clearly, since M is a block in the stochastic matrix (cid:81) ∞ n = t A ( n ) , it must also be stochastic. We want to show that R ( t ) S → ( I − Q ( t )) M . The matrix R ( t : t ) updates as follows: R ( t : t ) = Q ( t ) R ( t − t ) + R ( t ) S ( t − t ) (50) ⇒ lim t →∞ R ( t : t ) = lim t →∞ ( Q ( t ) R ( t − t ) + R ( t ) S ( t − t )) (51) = lim t →∞ ( Q ( t ) M + R ( t ) S + Q ( t ) (cid:15) ( t ) ( M ) + R ( t ) (cid:15) ( t ) ( S ) )) (52) = lim t →∞ ( Q ( t ) M + R ( t ) S ) + lim t →∞ ( Q ( t ) (cid:15) ( t ) ( M ) + R ( t ) (cid:15) ( t ) ( S ) ) (53) = lim t →∞ ( Q ( t ) M + R ( t ) S ) (54) ∴ lim t →∞ ( Q ( t ) M + R ( t ) S ) = lim t →∞ R ( t : t ) = M (55)In the third step we introduced (cid:15) ( t ) ( M ) = M − R ( t − t ) (and analogously for S ) where we know (cid:13)(cid:13) (cid:15) ( t ) ( M ) (cid:13)(cid:13) → .This allows us to split the limit without issue in the fourth step. In order to complete this direction of the proofwe just need to deﬁne appropriate terms. Deﬁne (cid:15) ( t ) = ( Q ( t ) M + R ( t ) S − M ) . We can see that lim t →∞ ( (cid:15) ( t )) =lim t →∞ ( Q ( t ) M + R ( t ) S ) − M = M − M = 0 . Then we can just re-arrange to obtain: R ( t ) S = ( I − Q ( t )) M + (cid:15) ( t ) (56)Where lim t →∞ ( (cid:15) ( t )) = 0 , the result we wanted.Now we prove the other direction (if). Suppose R ( t ) S = ( I − Q ( t )) M + (cid:15) ( t ) for some M ∈ R ( m × p ) where M ij ≥ , M = (i.e. M is row stochastic). Then: R ( t : t ) = Q ( t ) R ( t − t ) + R ( t ) S ( t − t ) (57) = Q ( t ) R ( t − t ) + R ( t ) S + R ( t ) (cid:15) ( t ) ( S ) (58) = Q ( t ) R ( t − t ) + ( I − Q ( t )) M + (cid:15) ( t ) + R ( t ) (cid:15) ( t ) ( S ) (59) = Q ( t )( R ( t − t ) − M ) + M + (cid:15) ( t ) + R ( t ) (cid:15) ( t ) ( S ) (cid:124) (cid:123)(cid:122) (cid:125) δ ( t ) (60) = Q ( t )( Q ( t − R ( t − t ) − M ) + (cid:26)(cid:26) M + δ ( t − − (cid:26)(cid:26) M ) + M + δ ( t ) (61) = M + (cid:89) t Q ( t )( R − M ) (cid:124) (cid:123)(cid:122) (cid:125) → + ( δ ( t ) + Q ( t ) δ ( t −

1) + Q ( t ) Q ( t − δ ( t −

2) + . . . ) (cid:124) (cid:123)(cid:122) (cid:125) =∆( t ) (62)Finally we need to show that ∆( t ) → , since we know δ ( t ) → . This is not entirely straightforward since the errorterms δ ( t ) can in principle accumulate instead of diminishing exponentially (there is no guarantee (cid:107) Q ( t ) (cid:107) < for all t ).In order to get around this note: ∆( t ) = Q ( t )∆( t −

1) + δ ( t ) ⇒ ∆( t + ( D + 1) T ) = ( D +1) T + t (cid:89) n = t Q ( n )∆( t −

1) + ( ( D +1) T + t (cid:89) n = t +1 Q ( n ) δ ( t ) + ( D +1) T + t (cid:89) n = t +2 Q ( n ) δ ( t + 1) + . . .. . . + δ ( t + ( D + 1) T )) ⇒ (cid:107) ∆( t + ( D + 1) T ) (cid:107) ≤ (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( D +1) T + t (cid:89) n = t Q ( n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:124) (cid:123)(cid:122) (cid:125) ≤ (1 − γ ) (cid:107) ∆( t − (cid:107) + ( (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ( D +1) T + t (cid:89) n = t +1 Q ( n ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:124) (cid:123)(cid:122) (cid:125) ≤ (cid:107) δ ( t ) (cid:107) + . . . + (cid:107) δ ( t + ( D + 1) T ) (cid:107) = (1 − γ ) (cid:107) ∆( t − (cid:107) + ( D + 1) T max n ∈{ t,t +1 ,...t +( D +1) T } (cid:107) δ ( n ) (cid:107) (cid:124) (cid:123)(cid:122) (cid:125) µ ( t − ⇒ (cid:107) ∆( t + T ) (cid:107) ≤ (1 − γ ) (cid:107) ∆( t ) (cid:107) + µ ( t ) PREPRINT - J

ULY

3, 2020Note that since (cid:107) δ ( t ) (cid:107) → , the extraneous term µ ( t ) can be made arbitrarily small for large enough t . In order to easethe analysis consider the subsequence ( t, t + T , t + 2 T , . . . ) → ( k, k + 1 , k + 2 , . . . ) . Therefore, we can rewrite theabove as: (cid:107) ∆( k + 1) (cid:107) ≤ (1 − γ ) (cid:107) ∆( k ) (cid:107) + µ ( k ) (63)We will show that each subsequence {(cid:107) ∆( k ) (cid:107)} converges to zero (and therefore the full sequence converges tozero). Now suppose by contradiction that lim inf (cid:107) ∆( k ) (cid:107) = φ > . We can pick some k such that for all k ≥ k , µ ( k ) ≤ (1 − α ) φ for (1 − γ ) < α < . It follows therefore that there always exists some k such that (cid:107) ∆( k ) (cid:107) ≤ α φ (1 − γ ) .Therefore, (cid:107) ∆( k + 1) (cid:107) = αφ + (cid:15) ( k ) ≤ αφ + (1 − α ) φ < φ . This means φ can no longer be the lim inf of the sequence (cid:107) ∆( k ) (cid:107) , leading to a contradiction. Therefore, lim inf (cid:107) ∆( k ) (cid:107) = 0 .Finally, for completeness suppose lim sup (cid:107) ∆( k ) (cid:107) = ψ > . Choose some k and α < such that (cid:15) ( k ) ≤ αγψ forall k ≥ k . Since lim inf (cid:107) ∆( k ) (cid:107) = 0 , there exists some k ≥ k such that (cid:107) ∆( k ) (cid:107) ≤ αψ . Then (cid:107) ∆( k + 1) (cid:107) ≤ αψ (1 − γ ) + αψγ = αψ . Since (cid:107) (cid:15) ( k ) (cid:107) ≤ αψγ for all k ≥ k , we can see that (cid:107) ∆( k ) (cid:107) is bounded above by αψ , andtherefore lim sup (cid:54) = ψ , leading to a contradiction. Therefore, lim (cid:107) ∆( k ) (cid:107) = lim sup (cid:107) ∆( k ) (cid:107) = lim inf (cid:107) ∆( k ) (cid:107) = 0 .Since this occurs for any arbitrary subsequence, we can conclude that lim (cid:107) ∆( t ) (cid:107) = 0 . Finally, we can conclude that R ( t : t ) → M , and as such the entire process converges, proving Theorem 2.Corollary 2 follows from the block structure: x ( t + 1) = (cid:20) x QC ( t + 1) x SC ( t + 1) (cid:21) = (cid:20) Q ( t : t ) R ( t : t )0 S ( t : t ) (cid:21) (cid:20) x QC ( t ) x SC ( t ) (cid:21) (64) ⇒ x ( t ) = (cid:20) x QC ( t ) x SC ( t ) (cid:21) → (cid:20) M S (cid:21) (cid:20) x QC ( t ) x SC ( t ) (cid:21) (65) ⇒ x QC ( t ) → M x SC ( t ) (66) ⇒ x SC ( t ) → Sx SC ( t ) (67)Corollary 3 follows immediately from Theorem 2 with the appropriate mapping of the augmented matrix to the generalform illustrated above, noting in particular that S ( t : t ) = I for all t .One ﬁnal point to make is that Corollary 3 assumes that the inﬁnite graph of the original nodes G ∞ is strongly connectedfor simplicity. Note that the results can be extended to a general case where G ∞ is quasi-connected, but we must makeuse of an additional Assumption: all nodes must be path connected to a node where λ i > . To see this note that theproof for Theorem 2 the convergence of Q ( t : t ) → makes use of the fact that all quasi-connected nodes were pathconnected to an “exit node” (if they were not, they would not be a quasi-connected node). The exit nodes in that proofcorrespond to nodes in a private signal model that place some non-zero weight on their private signals (i.e. λ i > ).Corollary 4 states that heterogeneous steady states require that A ( t ) → A and b ( t ) → b , or A ( t ) (cid:54) → A and b ( t ) (cid:54) → b . Inorder to show this we rule out heterogeneous outcomes when one update matrix converges and the other does not.Consider ﬁrst if A ( t ) → A . By Theorem 2, we know that for a convergent model (where λ i > ∀ i ): Λ W ( t ) = ( I − ( I − Λ) A ( t )) M + (cid:15) ( t ) (68) ⇒ W ( t ) = Λ − ( I − ( I − Λ) A ) M (cid:124) (cid:123)(cid:122) (cid:125) L + δ ( t ) (69)Since δ ( t ) → , then W ( t ) → L , meaning W ( t ) C = b ( t ) → b . If λ i = 0 for any i , then we can simply replace the i -th row of W ( t ) with zeroes and let λ i = 1 to repeat the above. As such, there cannot be an outcome where A ( t ) → A and b ( t ) (cid:54) → b .If b ( t ) → b , then W ( t ) → W , and we can see similarly by re-arranging the expression in Theorem 2 that: A ( t ) M = ( I − Λ) − ( M − Λ W ) + φ ( t ) (70) ⇒ A ( t ) M C = A ( t ) x ∗ = ( I − Λ) − ( M − Λ W ) C (cid:124) (cid:123)(cid:122) (cid:125) K + κ ( t ) = K ( t ) (71)23 PREPRINT - J

ULY

3, 2020Where in the second line we just made use of the generic steady state structure from Corollary 1 to show that x ∗ = M C .Since κ ( t ) → , it follows that A ( t ) x ∗ = K ( t ) → K . Without loss of generality suppose that d = 1 (i.e. x i isone-dimensional). We can write the i -th entry K i ( t ) as: K i ( t ) = (cid:88) j ∈N ( i ) A ij ( t ) x ∗ j − κ i ( t ) (72)Suppose by contradiction that H ( x ∗ ) > , in which case all pairs x ∗ i (cid:54) = x ∗ j . For any i , this means that the summand (cid:80) j ∈N ( i ) A ij ( t ) will vary whenever an edge A ij ( t ) varies. Since κ i ( t ) → , it follows that the term K i ( t ) will neverconverge, leading to a contradiction. References [1] Morris H DeGroot. Reaching a consensus.

Journal of the American Statistical Association , 69(345):118–121,1974.[2] Rainer Hegselmann, Ulrich Krause, et al. Opinion dynamics and bounded conﬁdence models, analysis, andsimulation.

Journal of artiﬁcial societies and social simulation , 5(3), 2002.[3] Gérard Weisbuch, Guillaume Deffuant, Frederic Amblard, and J-P Nadal. Interacting agents and continuousopinions dynamics. In

Heterogenous agents, interactions and economic performance , pages 225–242. Springer,2003.[4] Matthew O Jackson and Yves Zenou. Games on networks. In

Handbook of game theory with economicapplications , volume 4, pages 95–163. Elsevier, 2015.[5] Tamás Vicsek, András Czirók, Eshel Ben-Jacob, Inon Cohen, and Ofer Shochet. Novel type of phase transition ina system of self-driven particles.

Physical review letters , 75(6):1226, 1995.[6] Iyad Rahwan, Manuel Cebrian, Nick Obradovich, Josh Bongard, Jean-François Bonnefon, Cynthia Breazeal,Jacob W Crandall, Nicholas A Christakis, Iain D Couzin, Matthew O Jackson, et al. Machine behaviour.

Nature ,568(7753):477–486, 2019.[7] Orowa Sikder, Robert E Smith, Pierpaolo Vivo, and Giacomo Livan. A minimalistic model of bias, polarizationand misinformation in social networks.

Scientiﬁc reports , 10(1):1–11, 2020.[8] Anton V Proskurnikov and Roberto Tempo. A tutorial on modeling and analysis of dynamic social networks. parti.

Annual Reviews in Control , 43:65–79, 2017.[9] Anton V Proskurnikov and Roberto Tempo. A tutorial on modeling and analysis of dynamic social networks. partii.

Annual Reviews in Control , 45:166–190, 2018.[10] Benjamin Golub and Evan Sadler. Learning in social networks.

Available at SSRN 2919146 , 2017.[11] Javad Ghaderi and Rayadurgam Srikant. Opinion dynamics in social networks with stubborn agents: Equilibriumand convergence rate.

Automatica , 50(12):3209–3215, 2014.[12] Naoki Masuda. Opinion control in complex networks.

New Journal of Physics , 17(3):033031, 2015.[13] Noah E Friedkin and Eugene C Johnsen. Social inﬂuence and opinions.

Journal of Mathematical Sociology ,15(3-4):193–206, 1990.[14] Anton V Proskurnikov, Roberto Tempo, Ming Cao, and Noah E Friedkin. Opinion evolution in time-varyingsocial inﬂuence networks with prejudiced agents.

IFAC-PapersOnLine , 50(1):11896–11901, 2017.[15] Jan Lorenz. Continuous opinion dynamics under bounded conﬁdence: A survey.

International Journal of ModernPhysics C , 18(12):1819–1838, 2007.[16] Jan Lorenz. A stabilization theorem for dynamics of continuous opinions.

Physica A: Statistical Mechanics andits Applications , 355(1):217–223, 2005.[17] Vincent D Blondel, Julien M Hendrickx, Alex Olshevsky, and John N Tsitsiklis. Convergence in multiagentcoordination, consensus, and ﬂocking. In

Proceedings of the 44th IEEE Conference on Decision and Control ,pages 2996–3000. IEEE, 2005.[18] D.A. Levin and Y. Peres.

Markov Chains and Mixing Times . MBK. American Mathematical Society, 2017.[19] Nicolas S Lambert, Giorgio Martini, and Michael Ostrovsky. Quadratic games. Technical report, National Bureauof Economic Research, 2018. 24

PREPRINT - J

ULY

3, 2020[20] Coralio Ballester, Yves Zenou, and Antoni Calvó-Armengol. Delinquent networks.

Journal of the EuropeanEconomic Association , 8(1):34–61, 2010.[21] Coralio Ballester, Antoni Calvó-Armengol, and Yves Zenou. Who’s who in networks. wanted: The key player.

Econometrica , 74(5):1403–1417, 2006.[22] Jacob Wolfowitz. Products of indecomposable, aperiodic, stochastic matrices.