On social networks that support learning
OOn social networks that support learning ∗ Itai Arieli † Fedor Sandomirskiy † ‡ Rann Smorodinsky † Abstract
It is well understood that the structure of a social network is critical to whether ornot agents can aggregate information correctly. In this paper, we study social networksthat support information aggregation when rational agents act sequentially and irrevocably.Whether or not information is aggregated depends, inter alia, on the order in which agentsdecide. Thus, to decouple the order and the topology, our model studies a random arrivalorder.Unlike the case of a fixed arrival order, in our model the decision of an agent is unlikelyto be affected by those who are far from him in the network. This observation allows us toidentify a local learning requirement , a natural condition on the agent’s neighborhood thatguarantees that this agent makes the correct decision (with high probability) no matter howwell other agents perform. Roughly speaking, the agent should belong to a multitude ofmutually exclusive social circles.We illustrate the power of the local learning requirement by constructing a family ofsocial networks that guarantee information aggregation despite that no agent is a social hub(in other words, there are no opinion leaders). Although, the common wisdom of the sociallearning literature suggests that information aggregation is very fragile, another applicationof the local learning requirement demonstrates the existence of networks where learningprevails even if a substantial fraction of the agents are not involved in the learning process.On a technical level, the networks we construct rely on the theory of expander graphs, i.e.,highly connected sparse graphs with a wide range of applications from pure mathematics toerror-correcting codes.
The way ideas and information propagate in society is critical for engineering political cam-paigns, evaluating new technologies, marketing products, and introducing new social conven-tions. It is well known that whether or not the information is properly aggregated is highlydependent on the quality of information accessible to the individual agents, on the way infor-mation is transmitted from one agent to another, the order and frequency with which agentstake actions, and, finally, on the topology of the underlying social network. ∗ We are grateful (in alphabetic order) to Herve Moulin, Alexander Nesterov, Matthew Jackson, Nicolas Vieille,and Omer Tamuz for inspirational discussions and suggestions. We also thank attendees of the game-theory sem-inar at the Technion and the Conference on Mechanism and Institution Design 2020 for their feedback. We aregrateful to Michael Borns for proofreading the paper.Arieli’s research is supported by the Ministry of Science and Technology grant † Technion IE&M, Haifa (Israel). ‡ Higher School of Economics, St. Petersburg (Russia). a r X i v : . [ ec on . T H ] N ov n this paper, we study the role that the topology of the social network plays in informationaggregation among rational agents. We do so using a variant of the herding model of Banerjee[1992] and Bikhchandani et al. [1992] adapted to a social network setting. Agents decide se-quentially on a binary action based on a private (bounded) signal and history of actions takenby their predecessors. In the variant we study, the set of predecessors an agent observes isrestricted by his set of neighbors in an exogenously given social network.Since agents act sequentially a network may properly aggregate information for a particularsequence while failing to do so for most other sequences. To decouple the network’s topologyfrom the order in which agents take their actions, we make a distinction between the a-priorinetwork , which is exogenously given, and the realized network , which is induced by the afore-mentioned social network and the ordering of agents. Such a distinction between the socialnetwork and realized observability structure is not standard. Our goal in this paper is to find asufficient condition on the a-priori network that guarantees learning for most sequences. We saythat such a network supports learning . Our first step in the analysis, which is of independentinterest, is to focus on a particular agent in the a-priori network and characterize features ofthe local structure of the network that guarantee that this agent will make the correct decision(with high probability). Indeed, we show that an agent that belongs to a variety of social circlesthat, among themselves, are mutually exclusive and socially distant has a high probability oftaking the correct action. We refer to this condition as the local learning requirement. Therefore,social networks with the property that each agent satisfies the local learning requirement willaggregate information. We show, with an example, that this condition is not vacuous. Quitesurprisingly, we demonstrate the existence of symmetric networks with the aforementionedproperty.In the sociology literature, and in particular the literature on mass communication, it is oftenargued that learning is facilitated vis-`a-vis a small number of opinion leaders predeterminedby their position in the social network [Katz and Lazarsfeld, 1955]. Thus, the existence ofsymmetric networks that support learning is quite counterintuitive as it implies that learningobtains without opinion leaders (see the discussion in Section 6).When studying learning in social networks it may be quite unrealistic to assume that everydecision problem pertains to all agents in the society. In fact, in large societies (such as thosepresent in online social networks) it is far more realistic that only a minority of the agents havea stake in an arbitrary decision problem. For example, a dilemma between two candidates formayorship may interest one subset of the population while a dilemma between two competing“green” technologies (e.g., hybrid engines vs. electric engines) may be relevant to some othersmall set of agents. Thus, the proper requirement on a social network is that information beproperly aggregated when only a fraction of agents participate. We say that a network supportsrobust learning if any subnetwork containing a constant fraction of the agents supports learning. To obtain robust learning it is sufficient that in any network induced by a fraction of the agentsmost of those agents satisfy the local learning requirement. We show, with an example, thatthis condition is not vacuous. Once again, the example we have satisfies symmetry assumptionson the agents.To construct social networks that support (robust) learning we tap into the theory of ex-pander graphs and in particular we make use of well-known properties of a family of graphsknown as
Ramanujan expanders.
The literature on information aggregation studies many aspects of social learning. In our lit-erature review, we focus on the connection between the topology of the social network and An alternative, weaker definition could require that most such subnetworks support learning. We discussthis alternative definition in Appendix E. Thus, our approach complements that of Acemoglu et al. [2010] in the sense thattheir model better suits settings where the network structure is generated ad hoc whereas ourmodel fits social networks whose a-priori structure is fixed with no connection to the order inwhich agents decide in one problem or another and so the same network underlies a varietyof decision problems, each with its own order. Arieli and Mueller-Frank [2019] also considerrandom sampling but assume the edges in the network are limited to the m -dimensional lattice.Bahar et al. [2020] are the first to study social learning over an exogenously given networkwith a random arrival order of the agents. They demonstrate the existence of a social networkwhere learning is guaranteed. Their network has a particular structure. It is a bipartite graphwith a minority of agents on one side (whom they refer to as “celebrities”) and a majorityon the other side (“commoners”); for more details see Example 2.2. Although the family ofcelebrity graphs supports learning, they are sensitive to the participation of agents. In otherwords, in decision problems that are irrelevant to a minority of the agents, information need notbe properly aggregated and social learning may fail; i.e., the celebrity graphs do not supportrobust learning.Recently, various papers have demonstrated how fragile social learning can be (see, e.g.,Bohren [2016]; Frick et al. [2020]; Mueller-Frank [2018]). In contrast to these findings weconstruct a network structure that is robust in the sense that learning prevails even if a majorityof agents, chosen adversarially, does not participate.On a more technical level, the networks we construct rely on the theory of expander graphs,i.e., highly connected sparse graphs with a wide range of applications from solving problems inpure mathematics to designing error-correcting codes (see Lubotzky [2012] for a survey). Re-cently, this theory was applied to problems of social learning by Mossel et al. [2014] and Feldmanet al. [2014] in the context of boundedly rational agents who take actions repeatedly. Unlike the independence of the random observation sampling assumed by Acemoglu et al. [2010], in ourmodel the random order generates a high correlation between the observation sets of individuals. Indeed, if anindividual observes only a small subset of his friends when he makes his decision he may infer that those friendsare early arrivals and, therefore, based their own decisions on very limited information. This is not the case inthe random sample model. .2 Structure of the paper Section 2 contains the description of the model. In Section 3, we demonstrate that whetheror not an agent learns is determined by the local structure of the a-priori network aroundhim and derive the local learning requirement on this neighborhood, which guarantees thatthe agent learns the state. Sections 4 and Section 5 are devoted to implications of the locallearning requirement. Section 4 shows that learning is possible in egalitarian societies: there aresymmetric networks that support learning. Section 5 strengthens this result by constructingsymmetric networks where learning is robust to adversarial elimination of groups of agents.Section 6 discusses alternative interpretations and extensions of the basic model.The results of Sections 4 and 5 heavily rely on the insights from the theory of expandergraphs, which are explained in Appendix A. Technical proofs for Sections 3 and 5 are in Ap-pendices B, C, and D. Appendix E discusses an alternative, weaker notion of robustness, wheregroups of agents are eliminated at random.
A social network is an undirected graph G = ( V, E ), where V is the set of agents and an edge vu is contained in E if v and u are “friends.” Friendship is always mutual, as on Facebook: vu ∈ E ⇒ uv ∈ E . We denote by F v the set of v ’s friends { u ∈ V : vu ∈ E } .With each agent v ∈ V , we associate his arrival time t v ; the arrival times T = ( t v ) v ∈ V areindependent random variables uniformly distributed on [0 , T induces the orientation on edges of G = ( V, E ) from late arrivals to early ones and convertsit into the direct network G T = ( V, E T ) with E T = { vu ∈ E : t u < t v } , which we call therealized network . We will refer to G as the a-priori network in order to distinguish it from G T . The directed edge vu in the realized network means that agent v gets to observe agent u ; i.e., v observes his friends who arrived earlier. We denote the set of all such friends of v by F v,T = { u ∈ V : vu ∈ E T } ⊂ F v .Upon arrival, each agent v takes an action a v ∈ { , } , depending on the information avail-able. The agent gets a payoff of one if a v = θ , where θ is the random state, a Bernoulli randomvariable with success probability 1 /
2; if the action does not match the state, the agent gets zeropayoff. Nobody observes θ but every agent receives a binary signal s v ∈ { , } that equals θ with probability p and 1 − θ with probability 1 − p , where < p ≤
1. Signals are independentconditional on θ . In addition to his own signal s v , each agent v observes the set of his friendswho arrived earlier, F v,T , and their actions, ( a u ) u ∈ F v,T . We denote the information set of anagent v by I v = ( s v , F v,T , ( a u ) u ∈ F v,T ). Note that an agent knows neither his arrival time northe set of agents who arrived before him (except for his friends).All agents are rational and risk-neutral; the description of the model, probability distribu-tions, and the a-priori network G are common knowledge. By contrast, the realized network isnot observed by agents. A mixed strategy σ v of an agent v maps his information set I v to the probability distribution on { , } , according to which his action a v is then chosen. The goal of each agent is to maximizehis expected payoff, which coincides with the probability of taking the action that matches thestate. Random arrival order allows us to disentangle the topology of the a-priori network and the order in whichagents make their decisions. Randomness is also justified by the fact that this order usually differs from one issueto another (e.g, iPhone vs. Android, public kindergartens vs. private ones). We stress that in our model arrival refers to the time when the agent makes a decision, and not to the time when he joins the network as in therandom sampling model of Acemoglu et al. [2010].
4e consider an equilibrium σ = ( σ v ) v ∈ V of the induced Bayesian game and use P σ and E σ for the probability and for the expectation with respect to all the randomness in the problem(the state, signals, arrival times, and actions). An equilibrium exists since it is a finite game;however, it may be non-unique. We omit the subscript σ and write P and E when this createsno confusion.We say that an equilibrium is state-symmetric if the distribution P is invariant under themapping (cid:0) θ, ( s v ) v ∈ V , ( a v ) v ∈ V (cid:1) → (cid:0) − θ, (1 − s v ) v ∈ V , (1 − a v ) v ∈ V (cid:1) , i.e., under simultaneousflipping of the state, signals, and actions. Since the payoffs enjoy this symmetry, a state-symmetric equilibrium exists.In any equilibrium, an agent v selects a v = 1 if P ( θ = 1 | I v ) > , i.e., if conditionally on theinformation, the state θ = 1 is more likely. Similarly, a v = 0 if P ( θ = 1 | I v ) < . It may look asif equilibria can only differ in tie-breaking; however, it is not the whole truth since P ( θ = 1 | I v )depends on the equilibrium strategies of other agents, which, in turn, are optimal replies to thestrategies of others, including v . In particular, an equilibrium lacks the sequential structuresince no pair of agents v, u ∈ V such that vu / ∈ E know which of them acted first. The following definition quantifies how well an agent and the a-priori network itself aggregateinformation.
Definition 2.1 (Learning quality) . For an a-priori network G = ( V, E ) and an equilibrium σ ,the learning quality of an agent v ∈ V is the probability that this agent takes the correct action: l σ ( v ) = P σ ( a v = θ ) . The learning quality of the network G is the expected number of agents taking the correctaction: L σ ( G ) = 1 | V | · E σ (cid:12)(cid:12) { v ∈ V : a v = θ } (cid:12)(cid:12) = 1 | V | (cid:88) v ∈ V l σ ( v ) . We say that an agent with l σ ( v ) close to one learns the state and the network with L σ ( G )close to one supports learning . The following example shows that networks may fail to supportlearning due to the herding phenomenon. Example 2.1 (Herding spoils learning) . Let the a-priori network G be the n -clique K n , thecomplete graph on n vertices. In this case, every agent v observes actions of all those whocame earlier. We consider an equilibrium σ , where in the case of indifference, each agent followshis signal (one can show that learning quality in other equilibria can only be worse). If thefirst two agents take the incorrect action a = 1 − θ , then the third agent will ignore his signalrepeating this wrong action since the chance that both his predecessors are wrong is lower thanthe chance of him getting the wrong signal, and so on. This phenomenon known as herdingruins information aggregation. Indeed, with a probability of at least (1 − p ) all the agents takethe incorrect action. Thus L σ ( K n ) ≤ − (1 − p ) ; i.e., the learning quality is bounded awayfrom 1 even for large cliques. By symmetry, l σ ( v ) = L σ ( K n ) for all agents v .Bahar et al. [2020] consider a similar model of learning with random arrivals and ask whetherthere exist networks that support learning. They provide an affirmative answer to this questionby identifying a family of celebrity graphs , the only known family of networks that supportslearning with random arrivals. We will construct another such family in Section 4. Example 2.2 (Celebrity graphs support learning) . Consider a two-tier society, where a largeset of k “commoners” observe a large but smaller set of m “celebrities,” 1 (cid:28) m (cid:28) k . Thecorresponding a-priori network is a complete bipartite graph B k,m ; see Figure 1.5n average, a set of km +1 (cid:29) km +1 (cid:29)
1, the law of large numbers suggests that the first celebrity aggregatesinformation from these i.i.d. inputs and thereby takes the correct action with high probability.Subsequent commoners observe this celebrity and, hence, are also likely to make the rightchoice. This correct action propagates and so we conclude that the whole population exceptfor a negligible fraction of m +1 commoners takes the right action with high probability. Thisinformal reasoning suggests that for any δ > k and m such that the learningquality of each agent and of the network itself is at least 1 − δ . The formal argument in Baharet al. [2020] is tricky since a celebrity must “guess” which observed commoners are isolated andwhich are probably not.Although the celebrity graphs support learning they are not robust to elimination of smallgroups of agents. Indeed, the elimination of all the celebrities would render the commonerscompletely isolated and so learning would not obtain. We also note that the fact that celebritygraphs support learning hinges on the assumption of a random arrival order. Indeed, if allcommoners arrive before celebrities, then all of them make a decision in isolation and learningfails. ? m celebrities k commoners Figure 1: The celebrity graph. When the first celebrity arrives, he typically observes km +1 (cid:29) Throughout the paper we use the following notation. For a real number x , the floor and theceiling are denoted by (cid:98) x (cid:99) and (cid:100) x (cid:101) , respectively. The former is the biggest integer n such that n ≤ x and the latter is the smallest integer n ≥ x . The base of the natural logarithm is denotedby e .Consider a network G = ( V, E ) that is possibly directed. By deg G ( v ) we denote the totaldegree of a vertex v , i.e., the number of u ∈ V such that at least one of the edges, uv or vu , isin E . The network is D -regular if deg G ( v ) = D for all v ∈ V . A map f : V → V is called an automorphism of G if f is a bijection and it preserves edges, i.e., uv ∈ E ⇔ f ( u ) f ( v ) ∈ E .A network G (cid:48) = ( V (cid:48) , E (cid:48) ) is a subnetwork of G (denoted by G (cid:48) ⊂ G ) if V (cid:48) ⊂ V and E (cid:48) ⊂ E ∩ ( V (cid:48) × V (cid:48) ). The subnetwork is induced by the set of vertices V (cid:48) if E (cid:48) = E ∩ ( V (cid:48) × V (cid:48) ); such asubnetwork is denoted by G V (cid:48) . We denote by G \ v = G V \{ v } the induced subnetwork obtainedby deletion of a given vertex v .A path of length k in G is a sequence of vertices ( u , u , . . . , u k ) such that all the edges u i u i +1 , i = 0 , , . . . , k − E . The distance d G ( v, v (cid:48) ) between two vertices v, v (cid:48) ∈ V is6he minimal k such that there is a path of length k with u = v and u k = v (cid:48) ; if there is no such k , the distance is infinite, d G ( v, v (cid:48) ) = + ∞ . The r -neighborhood B r,G ( v ) of a vertex v is the set ofall vertices v (cid:48) such that d G ( v, v (cid:48) ) ≤ r ; the number r is the radius of the neighborhood. Abusingthe notation, we will not distinguish between the r -neighborhood B r,G ( v ) (the set of vertices)and the subnetwork of G induced by this set of vertices. A cycle of length k is a path with u = u k such that all vertices u , u , . . . , u k − are distinct (sometimes cycles with no repetitionsare called simple). The girth g G is the length of the shortest cycle.When referring to the a-priori network G , we will omit the dependence of all the objects on G and write simply deg( v ), d ( v, v (cid:48) ), B r ( v ), and g . In this section we study the connection between the local structure of the a-priori network inthe neighborhood of an agent and his learning quality.In the case of a fixed arrival order, a major impediment for learning is that an agent canaffect the decision of those who are far from him in the social network, which results in thepossibility of large information cascades involving most of the network. Surprisingly, in ourmodel with random arrival order, the decisions have a local nature : in Section 3.1, we show thatif a pair of agents are far from each other in the a-priori network G , with high probability theaction of one cannot affect the other. This observation suggests that the learning quality of anagent must be determined by the local structure of his neighborhood in the a-priori network G ,not by the global topology of G .The local nature of decisions motivates one to look for a condition on the topology ofan agent’s neighborhood that ensures that this particular agent learns the state with highprobability no matter what the global topology of the network is and the learning qualities ofother agents are. We identify such a local learning requirement in Section 3.2. For a pair of agents v and u , the action of v may be affected by the choice made by u if v observes u , or if v observes somebody who observes u , or if v observes somebody who observessomebody who observes u , and so on.Given the collection of arrival times T , the agent v can be influenced by u only if there is apath ( v = u , u , u , . . . , u k = u ) in the a-priori network G such that t u i > t u i +1 for all i (i.e., if u is reachable from v by a path in the realized network G T ). We define the realized subnetwork G v,T of an agent v to be the induced subnetwork of the realized network G T composed of v and all such u reachable from v in G T . Whether v learns the state is determined solely byhis realized subnetwork: conditional on G v,T , the action of v is independent of the signals andarrival times of all the agents outside G v,T .Recall that B r ( v ) denotes the r -neighborhood of v in the a-priori network G and deg( v )denotes the degree of v in G . The following lemma shows that the realized subnetwork of v is contained in the neighborhood of v with high probability provided that the radius of theneighborhood is big enough compared to the maximal degree. Lemma 3.1 (Local nature of decisions) . The probability that the realized subnetwork of v iscontained in the ( r − -neighborhood of v enjoys the following lower bound: P (cid:16) G v,T ⊂ B r − ( v ) (cid:17) ≥ − · (cid:18) e · max u ∈ B r ( v ) deg( u ) r (cid:19) r . (3.1)An extended version of the lemma is proved in Appendix B. Here we present a sketch.7igure 2: Agent v satisfies the local learning requirement with parameters d = 3, r = 2, and D = 7. He has 3 friends with degree at least 3 (dark gray nodes), their 2-neighborhoods in thegraph G \ v are disjoint (shaded areas), and the maximal degree in these neighborhoods is 7(the agent at the bottom has 5 friends within the neighborhood and 2 outside). Proof sketch for Lemma 3.1.
The realized subnetwork of v belongs to the ( r − v = u , u , u , . . . , u k = u ) connecting v and the sphere B r ( v ) \ B r − ( v )in the a-priori network G is presented in the realized network G T . Note that a particular pathof this form exists in G T if and only if u k arrives first, then u k − , then u k − , and so on. Hence,each path of length k in G remains in the realized network with probability k +1)! . Each pathhas length k ≥ r and the total number of paths of length k connecting v and the sphere in G is bounded by (cid:0) max u ∈ B r ( v ) deg( v ) (cid:1) k . The union bound accompanied by manipulations withfactorials similar to the Stirling formula lead to the desired inequality (3.1). The previous subsection suggests that whether or not an agent v learns the state must bedetermined by the local structure of the a-priori network G around him. Here we formulate a local learning requirement on the topology of v ’s neighborhood that ensures high learning qualityfor v no matter how well other agents perform and what the global structure of the network is.In subsequent sections we demonstrate the usefulness of this condition by constructing networkswith exceptional learning and robustness properties.The essence of the requirement is that v must bridge many social circles. To define therequirement formally, we recall the following notation: G \ v is the network obtained from thea-priori network G by elimination of an agent v together with all adjacent edges and B r,G \ v ( u )is the r -neighborhood of an agent u in G \ v . Definition 3.1 (Local learning requirement) . An agent v satisfies the local learning requirementwith parameters ( d, r, D ) if among his friends we can find d such that each of them has degreeat least d , their r -neighborhoods in G \ v are disjoint, and the degrees in all these neighborhoodsare upper-bounded by D .Let u , . . . u d be the friends from the definition. Then their neighborhoods B r,G \ v ( u i ) canbe interpreted as disjoint social circles bridged by v . The intuition why the definition refers tothe network G \ v relies on the fact that the decision of v depends only on those agents thatarrive before him and, hence, do not observe v as if he was absent from the network.Figure 2 illustrates the definition. The following theorem is our main technical result.8 heorem 3.1. If an agent v satisfies the local learning requirement with parameters ( d, r, D ) ,then the learning quality of v enjoys the following lower bound: l σ ( v ) ≥ − δ ( p, d, r, D ) , (3.2) where δ ( p, d, r, D ) = ψ + 18 √ d − p − − ψ ) , ψ = 2 d · (cid:18) e · Dr (cid:19) r . (3.3) The bound holds for any state-symmetric equilibrium σ provided that the probability of the correctsignal satisfies p ≥ ψ ; i.e., the denominator in (3.3) is positive. Corollary 3.1.
The theorem shows that the agent learns the state whenever δ ( p, d, r, D ) isclose to zero. We note that δ ( p, d, r, D ) goes to zero for fixed p > when all the elements ofthe triplet ( d, r, D ) go to infinity such that lim inf rD > e .The intuition behind Theorem 3.1 is simple. Let u , . . . , u d be the friends of v from Defi-nition 3.1. By the local nature of decisions (Lemma 3.1), the realized subnetwork of each u i ,who arrives before v , is likely to be contained within his social circle B r,G \ v ( u i ). Hence, therealized subnetworks are disjoint, which ensures no herding among ( u i ) i =1 ,...,d . As a result, v learns the state by observing a large sample of independent sources of information. However,formalization of this intuition faces some obstacles: the independence is only conditional andthe conditioning is on a family of events that do not belong to the information partition ofany of the agents; hence, checking that the independent sources are informative requires theapproximation of the condition by elements of information partitions; we are able to carry outthis approximation only for high-degree friends of v since their partitions are finer (hence, thecondition deg( u i ) ≥ d ).Here we present a sketch of the proof; all the details can be found in Appendix C. Sketch of the proof of Theorem 3.1.
Denote by F dv,T the subset of those friends u , . . . , u d fromDefinition 3.1 who arrive earlier than v . Consider the following deviation a (cid:48) v from an equilibriumstrategy a v of v . He repeats the action played by the majority of F dv,T . In equilibrium, the agentcannot benefit from this deviation and, hence, l σ ( v ) ≥ P ( a (cid:48) v = θ ).The probability of a mistake for a (cid:48) v can be bounded using the Hoeffding inequality. Thisrequires two ingredients: independence of the actions a u for u ∈ F dv,T and a lower bound on theprobability of a mistake for a u .Unfortunately, the requirement of independent actions is not satisfied. However, the actionscan be made independent by conditioning on a certain collection of events. This should stillbe enough to drive the result provided that the probability of this collection is close to one.Let W v be the event that the realized subnetwork of each u i who arrived before v is containedin his social circle B r,G \ v ( u i ). Since these social circles are disjoint, the realized subnetworksdo not intersect and thus each u i aggregates the information from disjoint families of sourcesconditional on W v . The local nature of decisions (Lemma 3.1) implies that the probability of W v is close to one.Naively, one could argue that conditional on W v the actions taken by the friends of v areindependent. However, this argument is wrong. There are other sources of dependence. Forexample, the fact that we are interested in agents who arrive before v creates dependence betweentheir actions even if their realized subnetworks are disjoint. To see this consider the case where v arrives early. This implies that all the friends he observes are also early arrivals and, hence, The condition of state-symmetry can be dropped at the cost of getting (4 p − − ψ ) in the denominatorof (3.3) instead of (2 p − − ψ ) and of imposing the stricter condition on p to ensure the positivity of the newdenominator (see the discussion on the technical assumptions in Section 6). The Hoeffding inequality [Hoeffding, 1994] states that P (cid:16) N (cid:80) Nn =1 ξ i ≤ E (cid:16) N (cid:80) Nn =1 ξ i (cid:17) − x (cid:17) ≤ exp (cid:0) − x N (cid:1) for independent random variables 0 ≤ ξ i ≤ x ≥ v arrives late; inparticular, the earlier v arrives, the more mistakes the observed friends make, thus creatingdependence between their decisions.To eliminate all the sources of dependence, we end up conditioning on the arrival time of v ,the set F dv,T , the realized state, and the event W v .In order to apply the Hoeffding inequality, it remains to show that the conditional probabilityof a mistake for a u is bounded away from 1 / u ∈ F dv,T , i.e., that the independent sourcesobserved by v are informative. We use the following idea. For any event A from the informationpartition of u , we have P ( a u (cid:54) = θ | A ) ≤ − p because otherwise u is better off following hissignal whenever A occurs; additional conditioning on the realized state does not change thebound because we assume a state-symmetric equilibrium. Unfortunately, the family of eventson which we condition does not belong to u ’s information partition. We overcome this difficultyby approximating the condition to elements of the information partition and showing that theconditional probability of a mistake is at most − ψ (cid:18) − p + t v · √ deg( u ) − (cid:19) . The bound getsworse for low-degree agents since their information partition is not fine enough for a goodapproximation. This is why the deviation a (cid:48) v of agent v takes into account high-degree friendsonly.After these preparations, Theorem 3.1 becomes a corollary of the Hoeffding inequality. Here we demonstrate that there are a-priori networks where each agent satisfies the local learningrequirement from the previous section and, hence, all agents achieve high learning quality asdoes the network itself. Moreover, there are such networks with the additional property thatany two agents play the same role.
Definition 4.1 (Symmetric networks) . A network G = ( V, E ) is symmetric if for any pair v, v (cid:48) ∈ V there exists an automorphism f of G such that f ( v ) = v (cid:48) .Symmetric networks represent totally egalitarian societies. The celebrity graphs from Ex-ample 2.2 may lead to a conjecture that egalitarian societies cannot aggregate information sinceone needs a designated minority of agents (like celebrities) in the a-priori network for learning topropagate. The main result of this section states that this intuition is false and symmetric net-works can support learning. In Section 5, we will strengthen this result by showing robustnessof learning. Theorem 4.1.
For any p > and any δ > there exists a symmetric network G = ( V, E ) such that the learning quality l σ ( v ) ≥ − δ for each agent v ∈ V , any probability of the correctsignal p > p , and any state-symmetric equilibrium σ . In order to prove Theorem 4.1, we use classic results from the theory of expanders toconstruct networks, where the local learning requirement is satisfied for each agent.We will need the following lemma, which simplifies checking the local learning requirement.Recall that the girth g of a network G is the length of the shortest cycle; the girth of a tree isinfinite, g = + ∞ . Lemma 4.1.
Consider a network G of girth g and the maximal degree D . In such a network,any agent v satisfies the local learning requirement with parameters (cid:16) d, (cid:106) g − (cid:107) , D (cid:17) , where d isa number such that v has at least d friends of degree d or more. In mathematical literature, such graphs are usually called transitive since the group of automorphisms actstransitively on them; i.e., for any pair of vertices there is an automorphism mapping one to the other. The theorem extends to non-state-symmetric equilibria at the cost of assuming that signals are informativeenough, namely, p > . This and other extensions are discussed in Section 6.2. roof. Let u , u , . . . , u d be v ’s friends with degrees deg( u i ) ≥ d . Recall that B r,G \ v ( u ) denotesthe r -neighborhood of u in the network, where the agent v was eliminated. We aim to select r such that B r,G \ v ( u i ) and B r,G \ v ( u j ) are disjoint for i (cid:54) = j . If B r,G \ v ( u i ) and B r,G \ v ( u j ) intersect,this creates a cycle of length 2 r + 2 (or shorter): start from v , then go to u i , then to an agentin the intersection by the shortest path, then to u j again by the shortest path, and back to v .Hence, the intersection is possible only if 2 r + 2 ≥ g . Thus choosing r such that 2 r + 2 < g , weensure that the r -neighborhoods are disjoint; r = (cid:106) g − (cid:107) is the maximal such r . We concludethat v satisfies the local learning requirement with parameters (cid:16) d, (cid:106) g − (cid:107) , D (cid:17) .Recall that a network is called D -regular if all vertices have degree D . Thanks to Lemma 4.1and Theorem 3.1, proving Theorem 4.1 reduces to the question of the existence of symmetric D -regular networks with arbitrary high degree and girth. The existence of such networks isdemonstrated in the theory of expanders; see Appendix A. The following is the corollary of atheorem by Lubotzky et al. [1988] (Theorem A.1 in Appendix A) that describes a family ofso-called Ramanujan expanders. Corollary 4.1 (of Theorem A.1 by Lubotzky et al. [1988]) . There is a sequence D k → ∞ suchthat for any g and k , there exists a symmetric D k -regular network G = ( V, E ) such that thegirth g ≥ g and | λ | ≤ √ D k −
1, where λ is the second-largest eigenvalue of the adjacencymatrix. After all these preliminaries, proving Theorem 4.1 becomes easy.
Proof of Theorem 4.1.
For a D -regular network of girth g , Theorem 3.1 combined withLemma 4.1 implies a lower bound on the learning quality l σ ( v ) ≥ − δ (cid:16) p, D, (cid:106) g − (cid:107) , D (cid:17) for anyagent v and any state-symmetric equilibrium σ .Corollary 4.1 allows us to pick a symmetric D -regular network G with arbitrary high degree D and arbitrary high girth/degree ratio gD , while Corollary 3.1 implies that δ (cid:16) p, D, (cid:106) g − (cid:107) , D (cid:17) tends to zero if both the girth g and the degree D tend to infinity and the girth goes to infinityfaster: gD → ∞ . Hence, for any given δ and p , we can choose G to be a symmetric D -regularnetwork with D and g such that δ (cid:16) p , D, (cid:106) g − (cid:107) , D (cid:17) ≤ δ . Since, δ is decreasing in p , we obtain δ (cid:16) p, D, (cid:106) g − (cid:107) , D (cid:17) ≤ δ (cid:16) p , D, (cid:106) g − (cid:107) , D (cid:17) ≤ δ for any p ≥ p . Thus the network G satisfies thestatement of Theorem 4.1. Here we discuss the learning quality of a network when some of the agents are uninterested orunavailable and, hence, do not participate in the learning process. The celebrity graphs (Ex-ample 2.2) demonstrate that high learning quality can be very fragile: if celebrities (a negligibleminority of the population) do not show up, this leaves the network totally disconnected, andinformation aggregation breaks down. We show that there are networks that support robustlearning , i.e., are free of such a flaw.In Section 4, we saw that symmetric sparse networks that have high degrees but not shortcycles support learning. In such networks, all agents play the same role, which makes it naturalto expect that no small group of agents is critical in these networks; in particular, adversarialelimination of a small group cannot dramatically spoil the learning outcome. Here we obtainthe surprising, much stronger result demonstrating that there are symmetric networks wherelearning is robust to the adversarial elimination of any group, even a large one. We do not use the bound on λ in Section 4; in particular, any D -regular network with large D and girthwould suffice for the proof of Theorem 4.1. The bound on | λ | will be critical for the discussion on robust learningin Section 5. heorem 5.1. For any p > and any δ > there exists a symmetric network G = ( V, E ) such that for any α ∈ (0 , and any subset V (cid:48) ⊂ V with (cid:6) α · | V | (cid:7) agents, the learning qualityin the induced subnetwork G V (cid:48) is at least − δα for any state-symmetric equilibrium and anyprobability of the wrong signal p > p . Robustness of learning turns out to be related to spectral properties of the network. This iscaptured by Lemma 5.1 below, and Theorem 5.1 easily follows from this lemma combined withknown results on expander graphs. Consider a D -regular network and denote by | λ | ≥ | λ | ≥ ... ≥ | λ | V | | the eigenvalues of its adjacency matrix ordered by their absolute values. Expandersare networks with small | λ | relative to | λ | ; see Appendix A.The next lemma bounds by how much the learning quality can decrease if, instead of theoriginal D -regular network, we consider its arbitrary subnetwork with (cid:100) α · | V |(cid:101) agents. Lemma 5.1.
Let G = ( V, E ) be a D -regular network of girth g with the second-largest eigenvalue λ . Then for any α ∈ (0 , and any subset V (cid:48) ⊂ V of size | V (cid:48) | = (cid:6) α · | V | (cid:7) , the learning qualityin the induced subnetwork G V (cid:48) satisfies L σ (cid:48) (cid:16) G V (cid:48) (cid:17) ≥ − (cid:18) √ α + (1 − α ) | λ | α · D (cid:19) · δ (cid:18) p, D, (cid:106) g − (cid:107) , D (cid:19) (5.1) for any state-symmetric equilibrium σ (cid:48) . Here δ ( p, d, r, D ) is given by formula (3.3) . Lemma 5.1 is proved in Appendix D; here we present the main idea and then prove Theo-rem 5.1.
Sketch of the proof of Lemma 5.1.
The key tool is the mixing lemma from the theory of ex-panders (Lemma A.1). This lemma is applicable to any D -regular network and provides abound in terms of | λ | on how much the number of edges | E ( V , V ) | between any two disjointsubsets of vertices V , V deviates from the expected number of edges in the Erdos–Renyi model: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E ( V , V ) (cid:12)(cid:12) − D | V | · | V | · | V | (cid:12)(cid:12)(cid:12)(cid:12) ≤ | λ | (cid:112) | V || V | . (5.2)This lemma allows us to bound the fraction of agents with low degree in G V (cid:48) . Indeed, for γ < α , let V be the set of agents with degrees less than γ · D in G V (cid:48) and V be the set ofeliminated agents V \ V (cid:48) . Since each agent has degree D in the original network G , there are atleast (1 − γ ) D | V | edges between V and V . The Erdos–Renyi model prescribes (1 − α ) D | V | edges in expectation. This discrepancy is compatible with (5.2) only for relatively small subsets V . We obtain that the fraction of agents in G V (cid:48) with degree γ · D or higher is at least 1 − θ ,where θ = θ ( α, D, γ, λ ) is small.By Lemma D.1 from the appendix, the lower bound on the fraction of high-degree agents im-plies a bound on the number of agents having a large number of high-degree friends: the fractionof agents having at least γ D friends with degrees γ D and higher is above 1 − θ . By Lemma 4.1,each such agent satisfies the local learning requirement with parameters (cid:16) γ D, (cid:106) g − (cid:107) , γ D (cid:17) . Ap-plication of Theorem 3.1 to this set of agents completes the proof. Proof of Theorem 5.1.
As in the proof of Theorem 4.1, we can pick the Ramanujan expandersuch that δ (cid:16) p , D, (cid:106) g − (cid:107) , D (cid:17) is at most δ . The second eigenvalue of the Ramanujan expandersatisfies | λ | ≤ √ D − ≤ √ D (Corollary 4.1) and, hence, by Lemma 5.1, any subnetwork G V (cid:48) with | V (cid:48) | = (cid:6) α · | V | (cid:7) has learning quality at least1 − (cid:18) α + 4(1 − α ) α √ D (cid:19) · δ (cid:18) p, D, (cid:106) g − (cid:107) , D (cid:19) . Similarly to Sections 3 and 4, the result extends to non-state-symmetric equilibria under the requirement p > . α + − α ) α √ D ≤ α + α ≤ α and δ (cid:16) p, D, (cid:106) g − (cid:107) , D (cid:17) ≤ δ (cid:16) p , D, (cid:106) g − (cid:107) , D (cid:17) , the learningquality is bounded from below by 1 − δα . Remark 5.1 (Randomized robustness) . Instead of robustness with respect to adversarial elim-ination, one can consider a weaker notion: a subset of agents is deleted at random, and theresulting network has to support learning with high probability with respect to the choice ofthis subset.In Appendix E, we show that, surprisingly, any network that supports learning has theproperty of randomized robustness. For example, in the celebrity graphs, if we eliminate a largeset of agents randomly, say 50%, around 50% of the celebrities will remain in the network, thusensuring information aggregation (with slightly lower learning quality).More formally, for a network G = ( V, E ) consider an induced subnetwork G V (cid:48) , where V (cid:48) is asubset of (cid:100) α ·| V |(cid:101) agents taken uniformly at random. Under an additional technical assumption,we demonstrate that L ( G V (cid:48) ) ≥ − (cid:113) − L ( G ) α with probability at least 1 − (cid:113) − L ( G ) α with respectto the choice of V (cid:48) (Theorem E.1); here L ( G ) = max σ L σ ( G ) denotes the learning quality forthe best equilibrium.Randomized robustness holds because, in a sense, it is hardwired in the definition of learningquality with a random arrival order. Indeed, if a network aggregates information well for mostarrival orders, then the subnetwork of the first (cid:100) α · | V |(cid:101) arrivals must also aggregate informationwell for most such networks. The proof of Theorem E.1 is based on this coupling between thechoice of the subset V (cid:48) and the learning process in the original network. In Section 6.1, we offer two alternative ways to perceive the results on learning in symmetricnetworks (Section 4) and those on robust learning (Section 5). First, we present the network-designer perspective and then explain the connection to the classic sociological theory of thetwo-step information flow. Section 6.2 discusses our technical assumptions and how to relaxthem.
Network-designer perspective
Consider a designer of a social network who wants to ensuresocial learning. Sgroi [2002] constructs a network that supports learning for a particular arrivalorder, and Bahar et al. [2020] propose the celebrity graphs as networks that support learning andare robust to the arrival order of agents. However, the celebrity graphs only support learning if all agents actually participate and make a decision. In other words, if a minority of agents donot care about the decision at hand and are consequently inactive, then learning may fail. Bycontrast, our expander-based networks from Theorem 5.1 give a social structure that supportslearning and is robust both to the arrival order and to the actual subset of active agents.In practice, social networks are not designed from scratch and the social connections canbe considered as given. However, social networks on the Internet are usually combined with arecommendation system that decides which news to show to a particular user. In so doing, it canintervene in the structure of social connections by concealing some friends’ news and possiblyshowing some news of non-friends. Our local learning requirement (Theorem 3.1) combinedwith Lemma 4.1 suggests a possible recipe for improving overall learning quality: eliminatingshort cycles while keeping high the number of information sources an agent is subject to. Thisproposal, however, requires additional empirical evaluation.
The paradigm of the two-step information flow
The classic sociological theory of the“two-step information flow” by Katz and Lazarsfeld [1955] conveys the idea that learning is13lways facilitated by a small group of opinion leaders, i.e., influential agents predetermined bythe network structure. The original version of the theory, formulated during the golden era ofTV networks, argues that although such networks were responsible for sparking new ideas andintroducing new products, this content was consumed by most people in an indirect manner.In other words, people’s actions, whether in the form of adopting a new product or voting for acertain political candidate, were not so much a result of what they heard from the TV networksbut rather what they heard from influential agents who, for their part, had consumed suchcontent directly.Let us call a group of agents influential if the network supports learning whenever this groupis present and fails to aggregate the information when it is not. An example of an influentialminority is the group of celebrities in the celebrity graphs of Example 2.2.The theory of the two-step information flow suggests that networks supporting learningcontain an influential minority, which plays the role of intermediary in the information spreadover the network. The results of Sections 4 and 5 challenge this thesis (in our stylized theoreticalmodel): high learning quality can be achieved in a-priori symmetric networks and, hence, nogroup of agents is predetermined by the network structure; moreover, there are symmetricnetworks where no minority (or even majority) of agents is influential.
Some of our modeling assumptions were made to simplify the exposition and can be easilyrelaxed.
General distributions of arrival times
We assumed that agents’ arrival times are i.i.d.and, in particular, uniformly distributed on the unit interval. However, any non-atomic distribu-tion leads to an equivalent model (equivalence is obtained by a monotone reparameterization),as long as the i.i.d. assumption is maintained. An important robustness result would be toextend our conclusions to some approximate notion of i.i.d. as some local dependence amongagents is a realistic assumption.
Non-binary signals
Our results hold for any non-binary signaling device as long as signalsare informative and symmetry is maintained. By informativeness we mean that for a positiveprobability set of signals s , the posteriors P ( θ = 1 | s ) belong to the union of intervals [0 , − p ] ∪ [ p,
1] (the probability of this set of signals will enter into the bound on the learning quality).By “symmetry” we mean that the distribution posteriors P ( θ = 1 | s ) is symmetric around .Note that signals of unbounded precision are also allowed. Heterogeneous agents
Agents can be heterogeneous, i.e., the signaling device can be agent-specific, and so each agent v has his own signal precision p v . In this case, all the results holdwith p = min v ∈ V p v . Non-symmetric equilibria
The symmetry assumption on equilibria and signals can be re-laxed. However, for asymmetric equilibria, Theorems 3.1, 4.1, and 5.1 require the probability p of the correct signal to be above (instead of ). The reason for this is inequality (C.1), wherefor asymmetric equilibria we get 2(1 − p ) in the parentheses. General states
Extension to non-equiprobable states is straightforward. Note, however, thatthis breaks the state symmetry of equilibria and, hence, we get 2(1 − p ) in inequality (C.1) (seethe comment above). Extension to a non-binary state does not lead to any additional technicaldifficulties. 14 ore informed agents The results of Sections 3, 4, and 5 hold if in addition to observingthe actions of his friends, each agent v gets some information about his realized subnetwork G v,T , e.g., a possibly noisy signal about the set of agents, their arrival times, actions, and eventheir private signals. Less informed agents
All our results can be easily adapted to the case, where observationof friends’ actions is noisy. Namely, each agent v , instead of observing the action a u of his friend u ∈ F v,T , observes either a u with probability 1 − ε or the flipped action 1 − a u with probability ε . One can consider two variants of this model: the action of u is flipped for all his neighbors atthe same time but independently across u ∈ V , or the action is flipped independently for eachpair ( v, u ).Both variants require the same straightforward modifications in Sections 3, 4, and 5. Theyoriginate from a minor adjustment in the proof of Theorem 3.1: when applying the Hoeffdinginequality in formula (C.3), instead of the upper bound on P ( a u (cid:54) = θ ), we will need an upperbound on the probability that the action observed by v does not match the state. The latterprobability is equal to (1 − ε ) P ( a u (cid:54) = θ )+ ε (1 − P ( a u (cid:54) = θ )) and does not exceed (1 − ε ) P ( a u (cid:54) = θ )+ ε ,where P ( a u (cid:54) = θ ) can be bounded by Lemma C.3 as before. References
Daron Acemoglu, Munther A. Dahleh, Ilan Lobel, and Asuman Ozdaglar. Bayesian learning insocial networks.
Review of Economic Studies , 78:1–34, 2010.Noga Alon and Fan R. K. Chung. Explicit construction of linear sized tolerant networks.
DiscreteMathematics , 72(1-3):15–19, 1988.Itai Arieli and Manuel Mueller-Frank. Multidimensional social learning.
The Review of Eco-nomic Studies , 86(3):913–940, 2019.Gal Bahar, Itai Arieli, Rann Smorodinsky, and Moshe Tennenholtz. Multi-issue social learning.
Mathematical Social Sciences , 104:29–39, 2020.Abhijit V. Banerjee. A simple model of herd behavior.
Quarterly Journal of Economics , 107(3):797–817, 1992.Sushil Bikhchandani, David Hirshleifer, and Ivo Welch. A theory of fads, fashion, custom andcultural change as information cascade.
Journal of Political Economy , 100:992–1026, 1992.Aislinn Bohren. Informational herding with model misspecification.
Journal of Economic The-ory , 163:222–247, 2016.Xavier Dahan. Regular graphs of large girth and arbitrary degree.
Combinatorica , 34(4):407–426, 2014.Michal Feldman, Nicole Immorlica, Brendan Lucier, and S. Matthew Weinberg. Reachingconsensus via non-Bayesian asynchronous learning in social networks. In
Leibniz InternationalProceedings in Informatics . Klaus Jansen, Jos´e Rolim, Nikhil Devanur, and Cristopher Moore(eds.), pp. 192–208. Dagstuhl Publishers, 2014.Mira Frick, Ryota Iijima, and Yuhta Ishii. Misinterpreting others and the fragility of sociallearning.
Econometrica (forthcoming), 2020.Benjamin Golub and Matthew O Jackson. Naive learning in social networks and the wisdom ofcrowds.
American Economic Journal: Microeconomics , 2(1):112–49, 2010.15assily Hoeffding. Probability inequalities for sums of bounded random variables. In
TheCollected Works of Wassily Hoeffding , pages 409–426. Springer, 1994.Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications.
Bulletin of the American Mathematical Society , 43(4):439–561, 2006.Elihu Katz and Paul F. Lazarsfeld.
Personal Influence: The Part Played by People in the Flowof Mass Communications.
Free Press, 1955.Alexander Lubotzky. Expander graphs in pure and applied mathematics.
Bulletin of the Amer-ican Mathematical Society , 49(1):113–162, 2012.Alexander Lubotzky, Ralph Phillips, and Peter Sarnak. Ramanujan graphs.
Combinatorica , 8(3):261–277, 1988.Moshe Morgenstern. Existence and explicit constructions of q + 1 regular Ramanujan graphsfor every prime power q . Journal of Combinatorial Theory, Series B , 62(1):44–62, 1994.Elchanan Mossel, Joe Neeman, and Omer Tamuz. Majority dynamics and aggregation of in-formation in social networks.
Autonomous Agents and Multi-Agent Systems , 28(3):408–429,2014.Elchanan Mossel, Allan Sly, and Omer Tamuz. Strategic learning and the topology of socialnetworks.
Econometrica , 83(5):1755–1794, 2015.Manuel Mueller-Frank. Manipulating opinions in social networks.
Available at SSRN 3080219 ,2018.Daniel Sgroi. Optimizing information in the herd: Guinea pigs, profits, and welfare.
Gamesand Economic Behavior , 39:137–166, 2002.Lones A. Smith. Essays on Dynamic Models of Equilibrium and Learning. PhD thesis, Universityof Chicago, Department of Economics, 1991.
A Expanders
There are several equivalent definitions of expanders; see Hoory et al. [2006]. The “spectral”definition is the most convenient for our needs. Consider a D -regular (deg( v ) = D for allvertices) graph G = ( V, E ) and denote by ( λ k ) k =1 ,..., | V | eigenvalues of its adjacency matrixordered by absolute values | λ | ≥ | λ | ≥ | λ | .... ≥ | λ | V | | . The eigenvalue λ = D correspondsto the eigenvector representing the uniform distribution over vertices: the smaller the secondeigenvalue | λ | is, the faster the distribution of a random walk started at some vertex convergesto the uniform distribution; see Proposition 1.6. in Lubotzky [2012]. A graph G is an expander if | λ | is small relative to | λ | ; i.e., a random walk on G forgets the starting point fast. Theorem A.1 (Lubotzky et al. [1988]) . For any N and D such that D − is a prime numberand D − is divisible by , there exists a D -regular symmetric graph G = ( V, E ) with at least N vertices, | λ | ≤ √ D − , and girth g ≥ log D − | V | . Graphs with | λ | ≤ √ D − λ is essentially the best possible. It isquite intuitive that the best expanders cannot have short cycles, which lead to recurrences inthe random walk and, hence, slow down the expansion of the random walk over the graph.The proof of Theorem A.1 is quite technical and relies on group theory. Lubotzky et al. [1988]and their successors (e.g., Morgenstern [1994], who relaxed the condition of divisibility by 4,16nd Dahan [2014], who extended the result to arbitrary D ≥
11) construct G as a Cayley graphof a certain group. While the symmetry of the constructed graph is not mentioned explicitlyin these papers, it comes for free because any Cayley graph is symmetric; see Claim 11.4 in[Hoory et al., 2006]. In addition to large girth, we need another property of expanders demonstrating their sim-ilarity to random graphs in the Erdos–Renyi model with the probability p of an edge betweentwo given vertices equal to D | V | . For a pair of disjoint subsets V , V ⊂ V denote by E ( V , V )the set of edges with one endpoint in V and another in V . In the Erdos–Renyi model theexpected number of such edges is equal to p ·| V || V | . The following result, known as the mixinglemma, shows that (cid:12)(cid:12) E ( V , V ) (cid:12)(cid:12) for an expander is close to this number. Lemma A.1 (Mixing lemma, Alon and Chung [1988]) . For any D -regular graph G = ( V, E ) and any two disjoint subsets V , V ⊂ V the following inequality holds: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E ( V , V ) (cid:12)(cid:12) − D | V | · | V | · | V | (cid:12)(cid:12)(cid:12)(cid:12) ≤ | λ | (cid:112) | V || V | . B Local nature of decisions
Here we prove Lemma 3.1, which ensures that the realized subnetwork G v,T of v is likely to becontained in the neighborhood of v whenever the radius of this neighborhood is large enoughcompared to the maximal degree. In fact, we prove a slightly more general statement withconditioning on v ’s arrival time; this is a technical nuance that we will need in the proof ofTheorem 3.1. Lemma B.1.
The probability that the realized subnetwork of v is contained in the ( r − -neighborhood of v , conditional on v ’s arrival time t v , enjoys the following lower bound: P (cid:16) G v,T ⊂ B r − ( v ) | t v (cid:17) ≥ − · (cid:18) e · Dr (cid:19) r , (B.1) where D = max u ∈ B r ( v ) deg( u ) . Proof of Lemma B.1.
Without loss of generality we can assume that r > e · D since otherwisethe bound (B.1) trivializes.The realized subnetwork of v is contained in the ( r − v and the sphere S r ( v ) = B r ( v ) \ B r − ( v ) in the a-priori network G exists in therealized network G T . Consider an agent u ∈ S r ( v ) and let ( v = u , u , u , . . . , u k = u ) be a pathconnecting v and u in G . The chance that this path is presented in the realized network G T isbounded by the probability that u k , u k − , . . . , u arrive exactly in this order (we excluded theagent u = v since we condition on his arrival time); this probability is equal to k ! . Each pathbetween v and the sphere S r ( v ) has length k ≥ r . The total number of different paths of length k starting from v is at most D k since there are at most D options for each of k steps.Therefore, by the union bound, the chance that there is a path between v and S r ( v ) in therealized network G T is at most ∞ (cid:88) k = r D k k ! . Given a group G with a group operation ◦ and a subset S ⊂ G , its Cayley graph is defined in the followingway: the set of vertices V coincides with G and vu ∈ E if u = v ◦ s for some s ∈ S . Indeed, for any x ∈ G the map f ( v ) : v → x ◦ v is an automorphism of the Cayley graph. For any pair ofvertices v, v (cid:48) ∈ V , we achieve f ( v ) = v (cid:48) by choosing x = v (cid:48) ◦ v − . D k k ! ≤ D r r ! · (cid:0) Dr (cid:1) k − r , we can bound the sum by the geometric progression ∞ (cid:88) k = r D k k ! ≤ D r r ! ∞ (cid:88) l =0 (cid:18) Dr (cid:19) l = D r r ! · − Dr ≤ − e · D r r ! ≤ · D r r ! , where the last two inequalities follow from the assumption r ≥ e · D and the fact that − e ≤ n ! ≥ (cid:0) ne (cid:1) n , we can get rid of the factorial in the denominator:2 · (cid:0) D (cid:1) r r ! ≤ · (cid:18) e · Dr (cid:19) r . Hence, with probability at least 1 − · (cid:0) e · Dr (cid:1) r there are no paths between v and S r ( v ) in G T or,equivalently, G v,T ⊂ B r − ( v ). C Proof of Theorem 3.1
Denote by F dv = ( u i ) i =1 ,...,d the collection of v ’s friends from the local learning requirement andby F dv,T = { u ∈ F dv : t u ≤ t v } those of them who arrive earlier than v .Consider the following deviation a (cid:48) v of v from his equilibrium action a v . The agent decides torely on his friends F dv,T : if the majority of them played action a ∈ { , } , the agent v repeats thisaction; in the case of a tie or v being isolated, a v = s v , i.e., v follows his signal. In equilibrium,no deviation is profitable and, hence, P ( a v = θ ) ≥ P ( a (cid:48) v = θ ) . The probability of a (cid:48) v being amistake can be estimated using the Hoeffding inequality (see Footnote 5).To apply the Hoeffding inequality, we need independence of actions ( a u ) u ∈ F dv,T and an upperbound on the probability that a u (cid:54) = θ . To make the actions independent, we need conditioningon the appropriate family of events. The following lemmas describe the desired family of eventsand establish the bound.Recall that B r,G \ v ( u ) denotes the r -neighborhood of u in the network obtained from G by eliminating v . Denote by W v the event that the realized subnetwork of each u ∈ F dv,T iscontained in B r − ,G \ v ( u ), i.e., W v = ∩ u ∈ F dv,T { G u,T ⊂ B r − ,G \ v ( u ) } . Note that, given W v , therealized subnetworks of u ∈ F dv,T are disjoint. Lemma C.1.
Fix θ ∈ { , } , t ∈ [0 , , and a subset of friends F ⊂ F dv of agent v . The actions ( a u ) u ∈ F are independent conditional on θ = θ , the arrival time t v = t of v , the set of friendswho arrived earlier F dv,T = F , and W v . The second lemma ensures that W v is a high-probability event. It is a corollary of the localnature of decisions property (Lemma B.1). Lemma C.2.
The probability of W v conditional on arrival times of v and all his friends from F dv enjoys the following lower bound: P (cid:0) W v | t v , ( t z ) z ∈ F dv (cid:1) ≥ − ψ, where ψ = 2 d · (cid:18) e · Dr (cid:19) r . The third lemma provides an upper bound on the probability of the wrong action.
Lemma C.3.
For any subset F of F dv , an agent u ∈ F , and any state-symmetric equilibrium,the conditional probability of the wrong action satisfies the following inequality: P ( a u (cid:54) = θ | θ = θ , t v = t, F dv,T = F, W v ) ≤ − ψ (cid:32) − p + 3 t · (cid:112) deg( u ) − (cid:33) , (C.1) where ψ is from Lemma C.2. a (cid:48) v forthe high state ( θ = 1) is bounded as follows: P (cid:0) a (cid:48) v (cid:54) = θ | θ = 1 , t v , F v,T , W v (cid:1) ≤ P (cid:88) u ∈ F dv,T a u ≤ (cid:12)(cid:12) F dv,T (cid:12)(cid:12) (cid:12)(cid:12)(cid:12) θ = 1 , t v , F v,T , W v ≤ (C.2) ≤ exp (cid:32) − (cid:18) − − ψ (cid:18) − p + 3 t v · √ d − (cid:19)(cid:19) · (cid:12)(cid:12) F dv,T (cid:12)(cid:12)(cid:33) =(C.3)= exp − (cid:16) p − − ψ − t v ·√ d − (cid:17) (1 − ψ ) · (cid:12)(cid:12) F dv,T (cid:12)(cid:12) , (C.4)where we used that deg( u ) ≥ d for u ∈ F dv . The bound (C.4) holds only if t v is not too small: toapply the Hoeffding inequality, the expression in the parentheses, (cid:16) p − − ψ − t ·√ d − (cid:17) , mustbe non-negative (see the requirement x ≥ t v such that this expression is at least p − − ψ or equivalently t v ≥ √ d − p − − ψ ) . For small t v , we roughly bound the probability by 1 and get the following inequality valid for all t v : P (cid:0) a (cid:48) v (cid:54) = θ | θ = 1 , t v , F v,T , W v (cid:1) ≤ (cid:110) t v < √ d − p − − ψ ) (cid:111) + exp (cid:32) −
18 (2 p − − ψ ) (1 − ψ ) · (cid:12)(cid:12) F dv,T (cid:12)(cid:12)(cid:33) , (C.5)where A denotes the indicator of an event A .Let us dispense with conditioning on θ = 1 , t v , F v,T , and W v . For θ = 0, one similarlyderives the same bound (C.5) and, hence, it holds unconditionally for θ . Since P ( · ) ≤ P ( · | W v ) + (1 − P ( W v )), we dispense with conditioning on W v at the cost of increasing the upperbound by ψ . It remains to average the bound over F dv,T and t v . The size | F dv,T | is uniformlydistributed over (cid:8) , , . . . , d (cid:9) because it is the length of the prefix of v in a random permutationof F dv ∪ { v } . Thus, averaging the second summand in (C.5) results in the geometric progressionof length d . Taking into account that t v is uniformly distributed on [0 ,
1] and bounding thegeometric progression of length d by the infinite one, we get P ( a (cid:48) v (cid:54) = θ ) ≤ ψ + 12 √ d − p − − ψ ) + 1 d + 1 · − exp (cid:16) −
18 (2 p − − ψ ) (1 − ψ ) (cid:17) . (C.6)This upper bound can be simplified if we note that 1 − exp( − x ) ≥ x · − exp( − a ) a for x ∈ [0 , a ].Since p − − ψ − ψ ≤
1, we obtain11 − exp (cid:16) −
18 (2 p − − ψ ) (1 − ψ ) (cid:17) ≤ − ψ ) (cid:0) − exp (cid:0) − (cid:1)(cid:1) (2 p − − ψ ) ≤ p − − ψ ) , where in the second inequality we bounded (1 − ψ ) by 1 and used that 1 − exp (cid:0) − (cid:1) ≥ .Denoting by b and c the second and the third summand in (C.6), respectively, we see that c ≤ b . For b below 1, we have b ≤ b , while for b ≥ P ( a (cid:48) v (cid:54) = θ ) ≤ ψ + 18 √ d − p − − ψ ) . Taking into account that P ( a v = θ ) ≥ P ( a (cid:48) v = θ ) = 1 − P ( a (cid:48) v (cid:54) = θ ) and substituting the expressionfor ψ from Lemma C.2, we complete the proof of Theorem 3.1.19 roof of Lemma C.1: To ensure conditional independence, we make use of the following prop-erty of the realized subnetwork G u,T = ( V u,T , E u,T ) of an agent u ∈ V . Conditional on G u,T , hisaction a u is independent of ( s z , a z ) z ∈ V \ V u,T and ( t z ) z ∈ V . In other words, u ’s action is solely de-termined by his realized subnetwork and signals of agents from this subnetwork. This propertyhas an important consequence. Consider a group of agents U ⊂ V and an event determined bythe collection of arrival times (cid:8) T = ( t z ) z ∈ V ∈ T (cid:9) for some T ⊂ [0 , V . Then actions ( a u ) u ∈ U are independent conditional on θ = θ and { T ∈ T } provided that all the realized subnetworks( G u,T ) u ∈ U are independent disjoint random networks conditional on { T ∈ T } .With this general observation, we will demonstrate that actions ( a u ) u ∈ F are conditionallyindependent given θ = θ , t v = t, F dv,T = F , and W v . By the definition of W v , the realizedsubnetwork of each u i ∈ F belongs to B r,G \ v ( u i ) and thus the realized subnetworks ( G u,T ) u ∈ F are disjoint. It remains to check their independence.Recall that the event W v has the form ∩ u ∈ F dv,T { G u,T ⊂ B r − ,G \ v ( u ) } . We now check thateach event { G u,T ⊂ B r − ,G \ v ( u ) } is determined by arrival times t z of agents z ∈ B r,G \ v ( u ) ∪ { v } .Indeed, G u,T belongs to B r − ,G \ v ( u ) if and only if u arrives earlier than v and for any path( u = z , z , z , . . . , z k ) not passing through v and connecting u and the sphere S r,G \ v ( u ) = B r,G \ v ( u ) \ B r − ,G \ v ( u ) in the a-priori network G , there is an index i such that t z i < t z i +1 .Consequently, we can rewrite the event { G u,T ⊂ B r − ,G \ v ( u ) } as { ( t z ) z ∈ B r,G \ v ( u ) ⊂ T u ( t v ) } ,where T u ( t v ) is a certain subset of [0 , B r,G \ v ( u ) depending on the arrival time of v .Conditioning on F dv,T = F , t v = t , and W v becomes equivalent to conditioning on thefollowing family of events determined by arrival times: { ( t z ) z ∈ B r,G \ v ( u ) ⊂ T u ( t ) } for each u ∈ F , t u > t for u ∈ F dv \ F , and t v = t . Arrival times ( t z ) z ∈ V are unconditionally independent,and the condition restricts the values of disjoint subsets (here we use the fact that B r,G \ v ( u )are disjoint for u ∈ F dv ). Thus, conditionally on F dv,T = F , t v = t , and W v , the families ofrandom variables ( t z ) z ∈ B r,G \ v ( u ) are independent across u ∈ F . The realized subnetwork G u,T of u is determined by ( t z ) z ∈ B r,G \ v ( u ) and, hence, the networks ( G u,T ) u ∈ F are also conditionallyindependent, which implies the desired conditional independence of actions ( a u ) u ∈ F . Proof of Lemma C.2:
Recall that W v = ∩ u ∈ F dv,T { G u,T ⊂ B r − ,G \ v ( u ) } and consider one of theseevents. We pick an agent u ∈ F dv with t u < t v and demonstrate that P (cid:0) G u,T ⊂ B r − ,G \ v ( u ) | t v , ( t z ) z ∈ F dv (cid:1) ≥ − · (cid:18) e · Dr (cid:19) r . (C.7)Note that the event { G u,T ⊂ B r − ,G \ v ( u ) } is determined by the arrival times t z of z ∈ B r,G \ v ( u ) ∪{ v } only (see the argument in the proof of Lemma C.1). Hence, by the independence of arrivaltimes, we can simplify the condition P (cid:0) G u,T ⊂ B r − ,G \ v ( u ) | t v , ( t z ) z ∈ F dv (cid:1) = P (cid:0) G u,T ⊂ B r − ,G \ v ( u ) | t v , t u (cid:1) because (cid:0) F dv \ { u } (cid:1) ∩ B r,G \ v ( u ) = ∅ . Since we condition on t u , the only information added byknowing t v is that at the time u arrives, v is absent (recall that we assume that t u < t v ). Thus P (cid:0) G u,T ⊂ B r − ,G \ v ( u ) | t v , t u (cid:1) = P G \ v (cid:0) G u,T ⊂ B r − ,G \ v ( u ) | t u (cid:1) , where P G \ v refers to the probability with respect to the arrival process in the network G \ v .By Lemma B.1 applied to u in the network G \ v , we get P G \ v (cid:0) G u,T ⊂ B r − ,G \ v ( u ) | t u (cid:1) ≥ − · (cid:18) e · Dr (cid:19) r and we deduce (C.7). The desired inequality for the probability of W v follows from the unionbound and (C.7). 20 roof of Lemma C.3: In the proof of Lemma C.1 we observed that, once the realized subnet-work G u,T of an agent u is given, his action a u is determined by the signals of agents from thissubnetwork. We also saw that, conditional on F dv,T = F , t v = t , and W v , the realized subnet-work of u ∈ F is determined by arrival times ( t z ) z ∈ B r,G \ v ( u ) , and the condition is equivalent to { ( t z ) z ∈ B r,G \ v ( u ) ⊂ T u ( t ) } (we use the notation introduced in that proof). Therefore, the distri-bution of G u,T (and, hence, a u ) conditional on θ = θ , F dv,T = F , t v = t , and W v is the same nomatter what other agents are in F . This observation allows us to simplify the condition: P ( a u (cid:54) = θ | θ = θ , t v = t, F dv,T = F, W v ) = P ( a u (cid:54) = θ | θ = θ , t u < t, v / ∈ F u,T , W v ) . By the state symmetry of the equilibrium, the latter probability does not change if we eliminateconditioning on θ = θ . This probability can be bounded as follows: P ( a u (cid:54) = θ | t u < t, v / ∈ F u,T , W v ) ≤ P ( a u (cid:54) = θ | t u < t, v / ∈ F u,T )1 − ψ (C.8)by the formula of total probability and the lower bound P ( W v | t u , t v ) ≥ − ψ .It remains to estimate the numerator. We use the following observation: for any event A that belongs to the information partition of agent u , the conditional probability of the wrongaction P ( a u (cid:54) = θ | A ) is at most 1 − p . Otherwise, the agent can profitably deviate from hisequilibrium strategy by following his signal whenever A occurs.The event A (cid:48) = { t u < t, v / ∈ F u,T } is not known to u since u does not observe his arrivaltime. However, we can approximate the event A (cid:48) by the event A = { ˆ t u < t, v / ∈ F u,T } , whereˆ t u = | F u,T | deg( u ) − is a proxy for u ’s arrival time (for large-degree agents ˆ t u ≈ t u by the law of largenumbers). Agent u knows when A occurs and, hence, P ( a u (cid:54) = θ | A ) ≤ − p . The conditionalprobability with respect to A (cid:48) can be bounded as follows: P ( a u (cid:54) = θ | A (cid:48) ) ≤ P ( a u (cid:54) = θ, A ) + P ( A (cid:48) \ A ) P ( A (cid:48) ) == P ( a u (cid:54) = θ | A ) · P ( A ) P ( A (cid:48) ) + P ( A (cid:48) \ A ) P ( A (cid:48) ) ≤ (1 − p ) · P ( A ) P ( A (cid:48) ) + P ( A (cid:48) \ A ) P ( A (cid:48) ) . Let us estimate all the probabilities in this expression. The probability P ( A (cid:48) ) can be computedexplicitly as P ( A (cid:48) ) = P ( t u < t, t v > t u ) = (cid:90) t dt u (cid:90) t u dt v = (cid:90) t (1 − t u ) dt u = t − t . To estimate P ( A ), we note that conditionally on t u and t v > t u , the number of friends observedby u has the binomial distribution with parameters deg( u ) − t u (there are deg( u ) − t u independently with probability t u ). By theHoeffding inequality, we get P (cid:18) | F u,T | deg( u ) − ≤ t | t u , v / ∈ F u,T (cid:19) ≤ exp (cid:0) − t u − t ) (deg( u ) − (cid:1) for t ≤ t u P (cid:18) | F u,T | deg( u ) − ≥ t | t u , v / ∈ F u,T (cid:19) ≤ exp (cid:0) − t u − t ) (deg( u ) − (cid:1) for t ≥ t u . Now we are ready to estimate P ( A ): P ( A ) = P (cid:18) | F u,T | deg( u ) − ≤ t, t v > t u (cid:19) = (cid:90) dt v (cid:90) t v P (cid:18) | F u,T | deg( u ) − ≤ t | t u , t v > t u (cid:19) dt u ≤≤ (cid:90) t dt v (cid:90) t v dt u + (cid:90) t dt v (cid:90) t dt u + (cid:90) t dt v (cid:90) t v t exp (cid:0) − t u − t ) (deg( u ) − (cid:1) dt u ≤≤ t t (1 − t ) + (cid:90) −∞ exp (cid:0) − s (deg( u ) − (cid:1) ds = t − t (cid:114) π u ) − . P ( A (cid:48) \ A ) = P (cid:18) t u < t, t v > t u , | F u,T | deg( u ) − > t (cid:19) == (cid:90) t dt u (cid:18)(cid:90) t u dt v (cid:19) · P (cid:18) | F u,T | deg( u ) − > t | t u , v / ∈ F u,T (cid:19) ≤≤ (cid:90) −∞ exp (cid:0) − s (deg( u ) − (cid:1) ds = (cid:114) π u ) − . Putting all the pieces together, we obtain P ( a u (cid:54) = θ | A (cid:48) ) ≤ (1 − p ) (cid:32) t − t (cid:114) π u ) − (cid:33) + 1 t − t (cid:114) π u ) − ≤≤ − p + 3 t · (cid:112) deg( u ) − . In the last inequality we took into account that t − t ≥ t , p ≥
0, and √ π ≤ D Missed proofs for Section 5
Proof of Lemma 5.1.
Consider the induced subnetwork G V (cid:48) of G for some V (cid:48) ⊂ V with | V (cid:48) | = (cid:6) α · | V | (cid:7) and denote by deg (cid:48) ( v ) the degree of an agent v ∈ V (cid:48) in G V (cid:48) . Fix positive γ ≤ α .Our goal is to bound the fraction of agents that have deg (cid:48) ( v ) < γ · D = γ · deg( v ). Denote by V the set of all such agents v ∈ V (cid:48) and by V , the set of eliminated agents V \ V (cid:48) . Applyinequality (5.2) (see mixing lemma A.1) to these V and V . Since E ( V , V ) > (1 − γ ) D · | V | and (1 − α ) | V | − < | V | ≤ (1 − α ) | V | , we get(1 − γ ) D · | V | − D | V | · | V | · (1 − α ) | V | ≤ | λ | (cid:112) | V | · (1 − α ) | V | . Dividing both sides by (cid:112) | V | and rearranging the terms we get( α − γ ) D (cid:112) | V | ≤ | λ | (cid:112) (1 − α ) | V | and, therefore, | V | ≤ (1 − α ) | λ | ( α − γ ) D | V | ≤ (1 − α ) | λ | α ( α − γ ) D | V (cid:48) | . Thus at least a fraction (cid:16) − (1 − α ) | λ | α ( α − γ ) D (cid:17) of agents v ∈ V (cid:48) has deg( v ) ≥ γ · D .By Lemma D.1 contained below, at least (cid:16) − − α ) | λ | α ( α − γ ) D (cid:17) | V (cid:48) | agents have at least γ · D friends with degree γ · D or higher.Applying Theorem 3.1 combined with Lemma 4.1 to each agent in this set and estimatingthe chance of the correct action outside this set by zero, we obtain the following bound on thelearning quality for any state-symmetric equilibrium σ (cid:48) in G V (cid:48) : L σ (cid:48) (cid:16) G V (cid:48) (cid:17) ≥ (cid:18) − − α ) | λ | α ( α − γ ) D (cid:19) (cid:16) − δ (cid:16) p, γ D, r, D (cid:17)(cid:17) , where r = (cid:106) g − (cid:107) . 22aking into account that δ ( p, D, r, D ) > √ D , we see that the expression in the first paren-thesis is greater than 1 − (1 − α ) | λ | α ( α − γ ) D · δ ( p, D, r, D ). It is easy to check that δ ( p, βD, r, D ) ≤ (cid:113) β − D δ ( p, D, r, D ) for any β and, hence, the expression in the second parenthesis is at least1 − (cid:113) γ − D · δ ( p, D, r, D ).Opening the brackets and dispensing with positive terms, we get L σ (cid:48) (cid:16) G V (cid:48) (cid:17) ≥ − (1 − α ) | λ | α ( α − γ ) D + 1 (cid:113) γ − D · δ ( p, D, r, D ) . Picking γ = α and assuming that α − D ≥ α we obtain the desired bound (5.1). In thecomplementary case of small α < D the bound (5.1) trivializes since √ α δ ( p, D, r, D ) ≥ √ D √ · √ D > Lemma D.1.
If in a network G = ( V, E ) at least (1 − β ) | V | agents have degree D or higher;then, at least (1 − β ) | V | agents have at least D friends with degree at least D .Proof. Denote by V E Randomized robustness In Section 5, we demonstrated existence of a-priori networks satisfying a very strong notionof adversarial robustness: the network aggregates information even if a subset of agents iseliminated in an adversarial way. Here we consider a weaker notion of robustness to randomelimination and show that any network has this property. Namely, for networks with highlearning quality, even if a substantial fraction of agents leaves the network, the remaining agentsfind a way to learn the state even though most paths of information diffusion (those involvingeliminated agents) disappear.We will demonstrate the randomized robustness under an additional assumption about theinformation available to agents. Recall that G v,T = ( V v,T , E v,T ) denotes the realized subnetworkof an agent v ; see the definition in Section 3.1. Assumption E.1. Each agent v , in addition to his signal and actions of friends who arrivedearlier than him, observes the set of agents V v,T of his realized subnetwork (without their actionsand arrival times). In other words, an agent knows the set of those who can possibly affect hisaction. The information set of v is therefore I v = ( s v , ( a u ) u ∈ F v,T , V v,T ).23 emark E.1 (The role of Assumption E.1) . We believe that this assumption plays a technicalrole and the randomized robustness must be ubiquitous without it as well; however, we were un-able to get rid of it in our proof. Assumption E.1 ensures the sequential structure of equilibrium.If V v,T = ∅ , agent v follows his signal. If V v,T is non-empty with | V v,T | = k , the equilibriumstrategy σ v (cid:0) s v , ( a u ) u ∈ F v,T , V v,T (cid:1) is the optimal reply to ( σ u ) u ∈ V v,T with | V u,T | ≤ k − V u,T is a strict subset of V v,T ). This sequential structure implies that an agent gains no advantagefrom learning the set of agents who have not yet arrived, the property critical in the proof ofTheorem E.1, the main result of this section.For a given network G and the probability p of the correct signal, denote by L ( G ) thelearning quality for the best equilibrium: L ( G ) = max σ L σ ( G ). Theorem E.1 (Learning is robust to random elimination) . Under the Assumption E.1, considera network G = ( V, E ) that has learning quality L ( G ) = 1 − δ with some δ > .Fix α ∈ (0 , and pick a subset V (cid:48) ⊂ V with (cid:6) α · | V | (cid:7) agents uniformly at random. Thenthe learning quality for the induced subnetwork G V (cid:48) enjoys the lower bound L (cid:0) G V (cid:48) (cid:1) ≥ − (cid:113) δα with probability at least − (cid:113) δα with respect to the choice of V (cid:48) .Proof of Theorem E.1. The argument is based on a coupling of the learning process in theoriginal network G and the selection of the random subnetwork G V (cid:48) . Fix an equilibrium σ =( σ v ) v ∈ V maximizing L σ ( G ) and pick a subset V (cid:48) to be the set of (cid:6) α · | V | (cid:7) earliest arrivals.For v ∈ V (cid:48) , the equilibrium strategy σ v in the original network G can be used as a strategyin G V (cid:48) . The resulting family of strategies ( σ v ) v ∈ V (cid:48) constitutes an equilibrium in G V (cid:48) , which wedenote by σ V (cid:48) . Note that here we use Assumption E.1.The constructed coupling allows us to link the learning quality for G and for G V (cid:48) underequilibria σ and σ V (cid:48) , respectively. By the formula of total probability, the learning quality for G can be represented as L ( G ) = 1 | V | (cid:88) v ∈ V (cid:16) P ( a v = θ | v ∈ V (cid:48) ) · P ( v ∈ V (cid:48) ) + P ( a v = θ | v / ∈ V (cid:48) ) · P ( v / ∈ V (cid:48) ) (cid:17) . Using a rough estimate P ( a v = θ | v / ∈ V (cid:48) ) ≤ V (cid:48) and taking into account that P ( v ∈ V (cid:48) ) is bounded from below by α , we get the followinginequality: L σ ( G ) ≤ α · E V (cid:48) L σ V (cid:48) (cid:0) G V (cid:48) (cid:1) + (1 − α ) , where E V (cid:48) denotes expectation with respect to the choice of V (cid:48) . Since the left-hand side is equalto 1 − δ , we obtain 1 − E V (cid:48) L σ V (cid:48) (cid:0) G V (cid:48) (cid:1) ≤ δα . Application of the Markov inequality completes the proof. In the game played over V (cid:48) , all the present agents know that agents from V \ V (cid:48) are absent. In particular,without Assumption E.1 an agent v in V (cid:48) would know more about his predecessors than the same agent in thegame played over V . Thus, a best reply in V may no longer be a best reply in the game restricted to V (cid:48) , even ifall other agents maintained their strategy., even ifall other agents maintained their strategy.