[PDF] Averaging plus Learning in financial markets

Abstract

This paper develops original models to study interacting agents in financial markets. The key feature of these models is how interactions are formulated and analysed. Agents learn from their observations and learning ability to interpret news or private information. Central limit theorems are developed but they arise rather unexpectedly. Under certain type of conditions governing the learning, agents beliefs converge in distribution that can be even fractal. The underlying randomness in the systems is not restricted to be of a certain class. Fresh insights are gained not only from developing new non-linear social learning models but also from using different techniques to study discrete time random linear dynamical systems.

Full PDF

AAVERAGING + LEARNING IN FINANCIAL MARKETS

IONEL POPESCU AND TUSHAR VAIDYAA

BSTRACT . This paper develops original models to study interacting agents in ﬁnancial markets.The key feature of these models is how interactions are formulated and analysed. Agents learn fromtheir observations and learning ability to interpret news or private information. Central limit theo-rems are developed but they arise rather unexpectedly. Under certain type of conditions governingthe learning, agents’ beliefs converge in distribution that can be even fractal. The underlying ran-domness in the systems is not restricted to be of a certain class. Fresh insights are gained not onlyfrom developing new non-linear social learning models but also from using different techniques tostudy discrete time random linear dynamical systems.

1. I

NTRODUCTION

How do markets reach consensus on prices? This is the central theme of this paper. Tradersinteract with one another and learn from their environment. Our aim is to propose new models ofinteraction and learning.These new models of learning and interaction entail agents who observe actions of other tradersand update their own beliefs. Repeated interaction can in certain cases lead to consensus on a par-ticular value of a tradeable commodity. Interaction models should take into account the environ-ment of trading. The more traditional or tried approach is to analyze limit order books. However,the introduction of electronic limit order books poses challenges but also offers new opportunitiesto develop new models.Learning models offer a cogent and natural way to analyse interaction when agents learn andobserve each others’ past actions. For such models there is a rich interplay between probability,dynamical systems and game theoretic ideas [MT + dS t = µS t dt + σS t dW t . Here µ and σ denote the mean rate of return and volatility for some stock and W t is a Brownianmotion. These basic processes then form the backbone of advanced option pricing models that Mathematics Subject Classiﬁcation.

Primary: 60F05, 91D30; Secondary: 91A99.I.P. was partially supported by UEFISCDI PN-III-P4-ID-PCE-2016-0372 and T.V. is supported by the SUTD Presiden-tial Graduate Fellowship. a r X i v : . [ q -f i n . M F ] J un IONEL POPESCU AND TUSHAR VAIDYA F IGURE

1. Trust chain of agents on opinions. All individuals have self belief, whichis identiﬁed by loops.postulate a process for the asset. Let us turn the question on its head. What if we don’t know theprocess? Traditional ﬁnance models assure us that S t is a good process to model the stock priceand S t is the market consensus price or the mid price of indicative quotes. But if we dig a littlebit deeper we have to ask how did the marketplace decide on the stock price S t in the ﬁrst place.There must have been interactions between the players to arrive at this quote.One may propose more advanced stochastic processes but we are interested in a more basicquestion. How do we study interaction at the microscopic level? At a higher frequency level,agents or machines (algorithms) are interacting before a consensus is reached.An alternative way to ask is how do agents actually trading come to reach a consensus on aparticular price? In many instance, models will postulate that a ﬁnancial asset’s current price bethe available. What mechanism led to that price being selected. It seems natural to develop aspectsof social learning as a starting point. 2. F OUNDATIONS

Social learning models are now actively studied in many disciplines and there are many dis-tinct frameworks. The literature is too vast for us to cite all the major works. So we will highlightthe most relevant ones. In all walks of life, individuals make decisions by observing and infer-ring actions of others. What thought process leads one to make a choice after seeing his or herpeers select theirs is a central question not only in the social sciences but also in engineering andphysics [Lor05]. The key point is observation. Human beings are visual creatures. One of the mostcanonical models in learning and aggregation of information is the Degroot model [DeG74].

Example 1.

Imagine we have agents who each have an initial opinion X . They also take a weightedaverage of their neighbours: ﬁgure 1. Individuals act simultaneously. VERAGING + LEARNING IN FINANCIAL MARKETS 3

Round by round the agents observe the previous quotes and update their beliefs by taking new averageupdates of the truth. The averaging matrix is A =  / / / / / / /  and the dynamics are X t = AX t − . Iterating this, we obtain X t = A t X . Provided that the matrix A isaperiodic and irreducible consensus is reached and all the agents reach the same decision lim t →∞ X = C for some C ∈ R . Of course, the consensus value depends on the initial value. Instructive and illustrativeexamples are developed in [Jac10] . This simple example Degroot belies many important subtleties. Some social learning puristsmight object that there are redundancies. Agent 1 may take a weighted average of all agents butthen agent 2 is also incorporating views of the other agents also which gets double counted byagent 1 in further iterations. This is a strength of the model.The whole updating process is such that provided the matrix A is irreducible and aperiodicthere is eventual consensus. The fact there is double counting is not viewing the problem correctly.As each player may weigh beliefs differently. Players’ different averaging weights are seen as theirown unique take on the averaging rule. By repeated averaging, agents agree on how to averagethe same way: the rows become equal.We focus on Degroot learning models as these represent the reality of trading accurately. Thisstyle of learning is preferable because agents act simultaneously in a round-by-round fashion.In contrast, for sequential learning models, each agent makes a decision or update based on theinformation set of previous choices. Aggregation of information occurs as more agents updatebut at each point in time only one agent updates. Private signals can also be incorporated inthis setting. However, the sequential nature of updating seems unnatural. For a good surveyof social learning in both sequential and simultaneous settings one may refer to [GS, MT + Bayesian Observational Learning.

Theoretical social learning models are roughly dividedinto two paradigms: Bayesian and non-Bayesian. Bayesian observational learning examples in-clude [Ban92, BHW92] and [SS00]. They fall under the category of herd behaviour . These modelsare sequential in nature. Agents have a common prior P ( θ ) for some state of the world θ ∈ Θ at t = 0 , where Θ is the set of possible states. As time passes, a player in turn observes the actions ofprevious agents and receives a private signal. Each agent has a one-off decision when she updatesher posterior probability and takes an action (usually a binary choice). In some instances a correctdecision is reached on the true state of the world by the n th agent as n → ∞ . After some point,everyone may take the same action. So do agents asymptotically learn the truth? IONEL POPESCU AND TUSHAR VAIDYA

Even in the simplest of settings, characterizing equilibria is intractable [CEMS08] and compu-tationally difﬁcult [HJMR18]. Agents are assumed to be perfect Bayesian machines, who can docomplex posterior calculations by observing past actions and possess a common prior. These as-sumptions may seem a bit unrealistic or too strong. There could be signals that leads society astray.Information ﬂows in one direction, where an inﬁnite number of agents are exogenously orderedon a line. If the ﬁrst few signals are wrong, there could be a cascade and no asymptotic learningtakes place. Nevertheless, Bayesian models serve as a useful benchmark. Asymptotic Bayesiansocial learning is examined at length in [MST14], where the one-off action is relaxed to allow forrepeated plays.Many modelling environments assume there is a ground truth that agents want to learn. Itcould be that there is no ground truth. Recently there has been some work to try an axiomaticsemi-Bayesian approach [MTSJ18]. A more general framework for rational learning is offered in[MF13] from a theoretical economics standpoint.2.2.

Financial markets: non-Bayesian.

In ﬁnancial markets, trading is never sequential. Trans-actions occurs at breakneck speed. Agents move simultaneously: cancellations are the norm intoday’s fast markets. In practical terms, sequential learning models don’t seem appropriate. In-teraction is important in the emergence of consensus. Choices by agents from the previous roundof play are available to all agents in the current round of play. The question is then what sort ofaveraging or heuristic process is ideal.Degroot learning models convey an essential idea. They offer a functional form of updating.Myopic updating occurs in each round. Something akin to persuasion bias could explain ourbasic model [DVZ03]. As in an echo chamber, agents in our setup have ﬁxed weights but updatetheir responses until consensus is reached. One could think of it as a behavioural heuristic andwhy repeated averaging is effective. Alternatively, with the right cost function representing thedistance of an agent’s opinion against other opinions the best response is repeated averaging.Recently there have been some experimental papers on evidence of Degroot updating [CLX15,BBC17]. Repeated averaging models are our base precisely because they capture the nature ofinteraction and learning in ﬁnancial markets so succinctly. On top of the base models we developmore sophisticated extensions, relaxing the ﬁxed nature of the weights and learning matrices.2.3.

Multiagent learning.

Degroot updating is also studied as distributed consensus in the engi-neering community [Bau16]. A group of sensors or drones communicate to reach consensus. Hereexisting methods use graph theory. Moreover, the techniques we introduce to solve the consensusproblems are quite distinct from the usual ones utilized in engineering literature. Distributed con-sensus has an updating rule in the simplest of cases as x t = A ( t ) x t − , with x t ∈ R n and A a rowstochastic matrix. Agents can be seen as vertices in a graph ( G ) with edges that is represented as G ( V , E ) . Usually, the graphs have a ﬁxed set of vertices so V = { , , · · · , n } and the edges ( j, k ) denote if agent j puts weight on k ’s opinion. In our setting, this corresponds to the number ofagents being ﬁxed while the edges or links can be random or time varying. One can interpret the VERAGING + LEARNING IN FINANCIAL MARKETS 5 framework we investigate as a distributed consensus problem. Generally, in engineering prob-lems, the emphasis is on design of algorithms that can control the decentralized process to reachconsensus. Distribution algorithms on agreement have been extensively studied in engineering.Some related works are [MS07, BHOT05, OSFM07, Mor05].2.4.

Game theory.

Our emphasis is on trading but any network where the players have accessto some sort of learning feedback is suitable. A game theoretic framework where every playertakes into account other players’ payoff is unrealistic and points to serious difﬁculties on how toeven represent utilities; these are economic arguments that are better addressed by philosophicalinterludes. Moreover, traders rarely have access to private information on how previous decisionsled to a certain payoff for their opponent at least not in a high frequency sense. If a trading ﬁrmis a publicly listed company, then one can infer its trading losses or gains from public records.Nevertheless speciﬁc proﬁt and loss accounts of trading individual stocks is a private matter.Firms never break down their income statements down to speciﬁc asset classes or instruments.Results are amalgamated and reported quarterly: not per hour, minute or second.Therefore, pure game theory has its shortcomings. Similar questions and issues to this paperwere raised in [Kir02] at an informal level. Our interest is in building a suitable mathematicalstructure on which to ask those interesting questions of price formation. Players can observe pre-vious choices but not the payoffs of their competitors. A more in depth discussion of learning ingames would take us further away from our goal of studying the mathematical nature of interac-tion. The reader can consult [FDLL98, KL94] for a game theoretic perspective. Dynamical learningis an active area of research in computer science as well. Articles [PP18, PNGCS14, MPV17] pro-pose and analyze the dynamics separate from the concept of Nash equilibrium.3. B

ASIC MODELS

Economists also have many models of learning [Sob00]. Depending on the question, differentparadigms have been put forth. Our objective is learning and so we aim to use aspects of bothgame theory and dynamical systems. Difﬁculties in Bayesian environments mean the Degrootmodel has become a workhorse for social learning [BBCM19]. It offers a way forward for tractablemodels that can relax simple assumptions. Research using this framework is still active. In oursetting, a group of traders observe quotes of others and incorporate an average of previous roundquotes. The departure from standard Degroot learning comes from the fact that not only are theagents learning but they are getting feedback from an external source on the true consensus value.To our knowledge, the setting of these types of consensus models to trading is new. We use theframework of [VMP18] as the base case for our models. Consider(3.1) x it +1 = n (cid:88) j =1 a ij x jt + (cid:15) i (¯ σ − x it ) , which in the matrix form reads(3.2) X t +1 = AX t + E (¯ σ n − X t ) , IONEL POPESCU AND TUSHAR VAIDYA where X t = ( x t , ..., x nt ) T is the opinion of each agent in discrete time t , and E = diag ( (cid:15) , ..., (cid:15) n ) isthe learning rate of each agent when they are provided with a feedback on the consensus ¯ σ . Theopinion matrix A encapsulates the weights agents put on each other. We require (cid:80) nj =1 a ij = 1 . Agents’ aptitude to determine the quality of feedback is their ability (cid:15) .For our purposes, we are careful to distinguish between two concepts: learning and trading time. We will focus on learning time. Typically in active ﬁnancial markets, the quotes (bids andoffers) that agents post are cancelled or revised many times before actual trades occur. Althoughtrading is occurring at a high frequency, the revision of quotes is occurring at an even higherfrequency. See [GPW +

13] for a discussion on cancellations. Agents or market participants are alltrying to learn the true value of a traded instrument. Agents can see all the previous quotes andthus take a weighted view of what the next quote should be. The learning activity occurs before ¯ σ is actually evolving due to trading. For us, time t is learning time and is quite distinct fromtrading time, which we we will assume to be constant. We weaken the condition of convergenceas stated in [VMP18].The feedback can best explain the situation where a similar instrument is traded on anotherexchange or there is a common source of market chatter. Moreover, such chatter is commonlyprovided through voice box brokers or over-the-counter markets. We assume all agents haveaccess to this feedback or chatter. One example would the S&P500 European ETF (SPY) options,which are not cash settled as SPX options but stock settled. Quotes for the SPY options will alsobe linked with the SPX options. Another example of contracts that contain information on vols isa VIX (volatility index) futures contract. Sometimes trades occur off-exchange and get reported atthe end of the day through the exchange’s clearing system. How agents interpret information ormarket chatter is their unique learning ability.4. O RGANIZATION OF RESULTS

We investigate variations of the model 3.1, characterizing different features. In section 6, theresult from [VMP18] is relaxed to see under what conditions consensus is still possible. A keyfeature is that provided agents have positive learning rates (cid:15) i , then consensus is the equilibriumvalue. In this case, while the particular value is unknown at the start, learning and interactionensure convergence to equilibrium.While the ﬁrst type of deterministic dynamics are useful, they ignore the reality of noise. Ran-domness is an additional term in the feedback term in section 7. We introduce a random variable γ t as a source of noise. The main theorem shows that if γ t → almost surely or in probability, then X t → ¯ σ . However, the argument is not straightforward.Theorem 6 explains the mechanics behind these concepts. Furthermore, provided the weightsmatrix A and learning rates E satisfy some weak conditions, X t reaches consensus. If the noise isnot going to zero, then the system converges in distribution. Numerical simulations conﬁrm that X t does reach an asymptotic distribution that may not even be Gaussian.Nonlinear learning 8 is an extension of our Degroot learners. Players still average from theirobservations of past actions but their own unique learning ability and how they interpret the VERAGING + LEARNING IN FINANCIAL MARKETS 7 extra information is a nonlinear function. This type of model ﬁts with the earlier linear models,preserving the averaging nature of interaction. Suitable conditions on the nonlinear function arederived that exhibit consensus. If the shocks are permanent, then convergence to distribution ispossible as with the linear case.Section 9 presents an important weak convergence (in law ) result. The true state is arrived at en-dogenously and the consensus value is an endogenous feature. While the earlier sections assumedthat the equilibrium value or ¯ σ is present in the system, this section assumes no such universaltruth. Though the agents in earlier dynamics come to learn ¯ σ they do not know it outright. In theaveraging case, we propose that ¯ σ is not part of the system. Whatever value agents asymptoticallyconverge to, provided they all agree, is the consensus value. This presents challenges to proveconvergence to a probability measure. The main result in this section is the central limit theorem(9.13). Lindberg’s original argument is helpful in proving the CLT.In all our models, if the agents are already synchronized or at consensus, then the system staysthere. While this may seem a moot point, it is worth mentioning. In traditional game-theoreticmodels the focus is on equilibrium. The focus here is how do agents reach the steady state.5. N OTATION AND A SSUMPTIONS

In all subsequent analysis A refers to a row-stochastic weights matrix, whose rows sum to one.Depending on the setup, A can be time varying or ﬁxed.We use the inﬁnity norm, namely we take for a vector v =  v ... v n  , | v | ∞ = max i =1 , ··· ,n | v i | . For any m × n matrix B , we denote | B | ∞ = sup i =1 ,...,m n (cid:88) j =1 | b ij | . We then have for any m × n matrix B and any n dimensional vector v | Bv | ∞ ≤ | B | ∞ | v | ∞ .

6. B

ASE M ODEL

In the base model, we have n agents and a ﬁxed row-stochastic matrix A , which is the weightsmatrix. The dynamics for updating is(6.1) X t +1 = AX t + E (¯ σ n − X t ) . We can impose a weaker condition on (cid:15) i and use ¯ σ = ¯ σ n for notational convenience when thedimension is clear. IONEL POPESCU AND TUSHAR VAIDYA

Proposition 2. If < (cid:15) i < a ii , then all agents reach the same consensus value lim t →∞ X t = ¯ σ. Proof.

Equation 6.1 now becomes X t +1 − ¯ σ = ( A − E )( X t − ¯ σ ) . Setting B = ( A − E ) and Y t = X t +1 − ¯ σ , the updating rule simpliﬁes to ( Y t ) i = n (cid:88) j =1 b ij ( Y t − ) j , from which can then obtain | ( Y t ) i | ≤ n (cid:88) j =1 | b ij || ( Y t − ) j |≤ | Y t − | ∞ n (cid:88) j =1 | b ij | . Therefore, | Y t | ∞ ≤ | Y t − | ∞ max i =1 , ··· ,n (cid:80) nj =1 | b ij | .On the other hand b ij = a ij , if i (cid:54) = j so that n (cid:88) j =1 | b ij | = | a ii − (cid:15) i | + n (cid:88) j (cid:54) = i | a ij | = | a ii − (cid:15) i | + 1 − a ii , where we have used the stochasticity of A , that is sum of the elements of each row is 1. Fromthis if we check that | a ii − (cid:15) i | + 1 − a ii < which is the same as | a ii − (cid:15) i | < a ii or equivalently < (cid:15) i < a ii , then with ρ = max i ( | a ii − (cid:15) i | + 1 − a ii ) . we deﬁnitely obtain ≤ ρ < and | Y t | ∞ ≤ ρ | Y t − | ∞ . This is enough to conclude that | Y t | ∞ ≤ ρ t | Y | ∞ . From which letting t → ∞ shows that | Y t | ∞ −−−→ t →∞ and in particular also proves that Y t −→ t →∞ . (cid:3) Remarks.

This argument allows an extension to the case when the matrices A t and E t depend on t. Thebottom line here is that we want ρ t = max i ( | a ii ( t ) − (cid:15) i ( t ) | + 1 − a ii ( t )) so that (6.2) t (cid:89) i =1 ρ i −→ t →∞ . For example, this is the case if all ρ t are bounded by ρ < . However, condition 6.2 also allowscases where ρ t −→ t →∞ . We highlight two examples. For the ﬁrst we have convergence.

VERAGING + LEARNING IN FINANCIAL MARKETS 9

Example 3.

Let’s consider ρ t = tt +1 , then (cid:81) ti =1 ρ i = t +1 which converges to 0 as t → ∞ . However, condition 6.2 also ensures we don’t have the following situation.

Example 4.

Let’s consider ρ t = exp( − t ) , then (cid:81) ti =1 ρ i = exp( − (cid:80) tk =1 1 k ) which does not converge tozero. Condition 6.2 can also be written as t (cid:88) i =1 log ρ i −−−→ t →∞ −∞ , or differently as t (cid:88) i =1 ( − log ρ i ) −−−→ t →∞ ∞ . In fact, this is the case if − log ρ t t − α ≥ C for some C > and α > . This translates to ρ t ≤ e − Ct α . We can extend the conclusions if we replace the ∞ -norm of a vector by something of the form | ν | ∞ ,β = max i =1 , ··· ,n | ν i | /β i where β is a vector of positive values such that Aβ ≤ δβ . In this new norm we now have | ( Y t ) i | ≤ n (cid:88) j =1 | b ij || ( Y t − ) j || ( Y t ) i | β i ≤ n (cid:88) j =1 | b ij | β j β i | ( Y t − ) | j β j , which yields | ( Y t ) | ∞ ,β ≤ | ( Y t − ) | ∞ ,β max i =1 , ··· ,n n (cid:88) j =1 | b ij | β j β i = | ( Y t − ) | ∞ ,β max i =1 , ··· ,n  | a ii − (cid:15) i | + 1 β i n (cid:88) j (cid:54) = i a ij β j  . From the assumption Aβ ≤ δβ we can get in the ﬁrst place that (cid:80) nj =1 a ij β j ≤ δβ i or (cid:80) nj (cid:54) = i a ij β j ≤ β i ( δ − a ii ) and thus β i (cid:80) nj (cid:54) = i a ij β j ≤ ( δ − a ii ) . This yields | Y t | ∞ ,β ≤ | ( Y t − ) | ∞ ,β max i =1 , ··· ,n ( | a ii − (cid:15) i | + δ − a ii ) . as long as | a ii − (cid:15) i | + δ − a ii < , which is satisﬁed by − (1 − δ ) < (cid:15) i < − δ + 2 a ii . The question is ifthere exists such a vector with Aβ ≤ δβ (this means component wise). Such a choice is β =  ...  and δ = 1 since A is a stochastic matrix. If such a β exists with δ < then we get a relaxation ofthe main condition.Interestingly, if A is not necessarily stochastic but has positive entries, then by a theorem ofPerron-Frobenius there exists a real eigenvalue that is greater than the absolute value of all theother eigenvalues and its eigenvector has positive entries. The argument above shows that we candeﬁnitely choose δ and β to have the same result.The above arguments allow us to posit this result. Theorem 5.

Assume X t = A t X t − + E t (¯ σ − X t − ) and let ρ t = max i =1 , ··· ,n ( | ( a t ) ii − ( (cid:15) t ) i | + 1 − ( a t ) ii ) .If (cid:81) ts =1 ρ s −→ t →∞ , then X t −→ t →∞ ¯ σ . In the case A t are all equal to A, then if < (cid:15) i < a ii , i = 1 , · · · , n, then X t −→ t →∞ ¯ σ .7. L EARNING WITH RANDOM NOISE

Our base model with learning can be extended to have random noise in the feedback term. Weintroduce a random vector γ t which we quantify later. The hypothesis is that γ t is small. For thissection we also consider the case of time depending evolution.The model is given by X t = A t X t − + E t (¯ σ + γ t − X t − ) where X t is the vector of prices at time t and ¯ σ is the vector of equilibrium price or consensusvalue the agents are trying to learn. In order to prove that X t − ¯ σ converges to 0, we rewrite theequation as X t − ¯ σ = A t X t − − ¯ σ + E t (¯ σ − X t − ) + E t γ t = AX t − − A t ¯ σ + E t (¯ σ − X t − ) + E t γ t as A ¯ σ = ¯ σ = ( A t − E t )( X t − − ¯ σ ) + E t γ t . Therefore if we denote by Y t = X t − − ¯ σ , then we ca simplify the above expression as Y t = ( A t − E t ) Y t − + E t γ t . With the same argument as before we obtain | Y t | ∞ ≤ ρ t | Y t − | ∞ + C | γ t | with(7.1) ρ t = max i =1 , ··· ,n ( | ( a t ) ii − ( (cid:15) t ) i | + 1 − ( a t ) ii ) . We formulate a general result as follows.

VERAGING + LEARNING IN FINANCIAL MARKETS 11

Theorem 6.

Assume the model X t = A t X t − + E t (¯ σ + γ t − X t − ) . With the notation from (7.1) assumethat (7.2) sup t ≥ { ρ t + ρ t ρ t − + ρ t ρ t − ρ t − + · · · + ρ t ρ t − . . . ρ } < ∞ . (1) If γ t a.s −−−→ t →∞ , then X t a.s −−−→ t →∞ ¯ σ . (2) If γ t P −−−→ t →∞ , then X t P −−−→ t →∞ ¯ σ . (3) If γ t L p −−−→ t →∞ , then X t L p −−−→ t →∞ ¯ σ . (4) If we assume (7.3) X t = A t X t − + E t ( γ t − X t − ) where now ( γ t ) t ≥ are iid and integrable and in addition to (7.2) we assume that (7.4) (cid:88) t ≥ ( | A t − A t − | ∞ + |E t − E t − ) | ∞ ) < ∞ . Then, (7.5) X t converges in distribution as t → ∞ . (5) Furthermore, if γ t is integrable but not constant almost surely, then, without condition (7.4) , theconclusion of (7.5) does not hold. Observe here the fact that in the last part of the Theorem we incorporated the constant ¯ σ into γ t . The convergence is in distribution sense and thus it does not lead to convergence as in theprevious cases. Even if we assume that γ t is of the form ¯ σ + γ t , the convergence will not be to ¯ σ alone. Thus this is a different convergence scenario and in spirit is not of the same form as theother cases. Proof. (1) From our base model in terms of Y t is(7.6) Y t = ( A t − E t ) Y t − + E t γ t . From this we get(7.7) | Y t | ∞ ≤ ρ t | Y t − | ∞ + C | γ t | ∞ . If we assume that | γ t | ∞ a.s −−−→ t →∞ , then we get that | Y t | a.s −−−→ t →∞ . Indeed, this becomes apurely deterministic statement. For a given (cid:15) > , we can ﬁnd that | γ t | ∞ ≤ (cid:15) for all t ≥ t (cid:15) .Then, | Y t | ∞ ≤ ρ t | Y t − | ∞ + C(cid:15) ∀ t ≥ t (cid:15) . Using the previous inequalities for t − , t − , . . . , t (cid:15) gives that | Y t | ∞ ≤ ( t (cid:89) s = t (cid:15) ρ s ) | Y t (cid:15) − | ∞ + C(cid:15) (1 + ρ t + · · · + t (cid:89) s = t (cid:15) ρ s ) . From (7.2) we obtain in the ﬁrst place that for some constant

A > ,(7.8) A ≥ ρ t + ρ t ρ t − + · · · + ρ t ρ t − . . . ρ . We recall here the Cauchy-Schwartz inequality, which states that for any real numbers a , a , . . . , a t and b , b , . . . , b t , we have (cid:32) t (cid:88) i =1 a i (cid:33) (cid:32) t (cid:88) i =1 b i (cid:33) ≥ (cid:32) t (cid:88) i =1 a i b i (cid:33) and in particular this implies that for any a i > ,(7.9) (cid:32) t (cid:88) i =1 a i (cid:33) (cid:32) t (cid:88) i =1 a i (cid:33) ≥ t . Now we can write using (7.9) that A ≥ ρ t ρ t − . . . ρ (1 + 1 ρ + 1 ρ ρ + · · · + 1 ρ ρ . . . ρ t − ) ≥ ρ t ρ t − . . . ρ t ρ + ρ ρ + · · · + ρ ρ . . . ρ t ≥ ρ t ρ t − . . . ρ t tA where we used the fact that from (7.8), we have that ρ ρ . . . ρ s ≤ A for any s ≥ . Noticehere that we need to distinguish here two cases, namely ρ t ρ t − . . . ρ > and ρ t ρ t − . . . ρ =0 . The above inequality works for the former case, but in both cases we obtain that(7.10) ρ t ρ t − . . . ρ ≤ A t which yields that ρ t ρ t − . . . ρ converges to . Thus in this case we get | Y t | ∞ a.s. −−−→ t →∞ . (2) If we only assume a weaker condition, namely that γ t P −−−→ t →∞ (only convergence in proba-bility), then iterating (7.7) we obtain(7.11) | Y t | ∞ ≤ ( t (cid:89) s =1 ρ s ) | Y | ∞ + t (cid:88) s =0 ( t (cid:89) i = t − s +1 ρ i ) | γ t − s | ∞ with the convention that (cid:81) ti = t +1 ρ i = 1 .To ﬁnish the proof of we use the following Lemma with u t = | γ t | ∞ . Lemma 7.

Let ( u n ) n ≥ be a random sequence such that u n P −−−→ n →∞ (7.12) P (sup n ≥ | u n | < ∞ ) = 1 . (7.13) Then, under the assumption (7.2) , we have the convergence (cid:80) ti =1 ρ t ρ t − . . . ρ t − i +1 u t − i P −−−→ t →∞ . VERAGING + LEARNING IN FINANCIAL MARKETS 13

Proof.

For the argument denote for simplicity of writing η t,i = ρ t ρ t − i . . . ρ t − i +1 . The ﬁrstobservation here is that condition (7.21) gives that for any t ≥ s , using (7.9), we proceed asin the proof of (7.10) to argue that A ≥ ρ t + ρ t ρ t − + · · · + ρ t ρ t − . . . ρ t − s +1 = ρ t ρ t − . . . ρ t − s +1 (cid:18) ρ t − s +1 + 1 ρ t − s +1 ρ t − s + · · · + 1 ρ t − s +1 ρ t − s . . . ρ t − (cid:19) ≥ ρ t ρ t − . . . ρ t − s +1 s ρ t − s +1 + ρ t − s +1 ρ t − s + · · · + ρ t − s +1 ρ t − s . . . ρ t − ≥ ρ t ρ t − . . . ρ t − s +1 s As .

Notice that we used the fact that (from (7.8)) for any s ≤ t , ρ t ρ t − . . . ρ s ≤ A . Therefore(7.14) ρ t ρ t − . . . ρ t − s +1 ≤ A s and using this and again (7.8) for t replaced by t − s we obtain that(7.15) ρ t ρ t − . . . ρ t − s +1 (1 + ρ t − s + ρ t − s − ρ t − s − + · · · + ρ t − s . . . ρ ) ≤ A s . Now, we ﬁx s ≤ t and write | t (cid:88) i =1 η t,i u t − i | ≤ s (cid:88) i =1 η t,i | u t − i | + t (cid:88) i = s +1 η t,i | u t − i | Now, for a given (cid:15) and | (cid:80) ti =1 η t,i u t − i | > (cid:15) , we must have that at least one of the abovesums must be at least (cid:15)/ , thus, we can write for each ﬁxed (cid:15) > ,(7.16) P ( | t (cid:88) i =1 η t,i u t − i | > (cid:15) ) ≤ P ( s (cid:88) i =1 η t,i | u t − i | ≥ (cid:15)/

2) + P ( t (cid:88) i = s +1 η t,i | u t − i | > (cid:15)/ . The next step is to use the boundedness of u t . Take arbitrary δ, M > , (here δ is meant tobe small and M to be large) and then set A M = {| u n | ≤ M for all n ≥ } . From the condition (7.13) we deﬁnitely have that P ( A M ) converges to as M tends toinﬁnity. Therefore we can continue the equation (7.16) with P ( | t (cid:88) i =1 η t,i u t − i | > (cid:15) ) ≤ P ( s (cid:88) i =1 η t,i | u t − i | ≥ (cid:15)/

2) + P ( t (cid:88) i = s +1 η t,i | u t − i | > (cid:15)/ , A M ) + P ( t (cid:88) i = s +1 η t,i | u t − i | > (cid:15)/ , A cM ) ≤ s (cid:88) i =1 P ( η t,i | u t − i | ≥ (cid:15)/ (2 s )) + P ( M t (cid:88) i = s +1 η t,i > (cid:15)/ , A M ) + P ( A cM ) ≤ s (cid:88) i =1 P ( η t,i | u t − i | ≥ (cid:15)/ (2 s )) + P ( t (cid:88) i = s +1 η t,i > (cid:15)/ (2 M )) + P ( A cM ) ≤ s (cid:88) i =1 P ( η t,i | u t − i | ≥ (cid:15)/ (2 s )) + P ( A + 1 s > (cid:15)/ (2 M )) + P ( A cM ) where in the passage from the ﬁrst line to the second we used the union bound, moreprecisely, if we have (cid:80) si =1 η t,i | u t − i | ≥ (cid:15)/ then at least one of the terms must be ≥ (cid:15)/ (2 s ) plus the union bound on the probability. Finally in passage to the last line we simply used(7.15).Next we can freeze for now (cid:15), s, M and use the fact that for each i , η t,i u t − i converges to in probability since η t,i is bounded by A > and use (7.15) to argue that the limit as t → ∞ we gain that ≤ lim sup t →∞ P ( | t (cid:88) i =1 η i u t − i | > (cid:15) ) ≤ P ( A + 1 s > (cid:15)/ (2 M )) + P ( A cM ) . For large s , obviously P ( A +1 s > (cid:15)/ (2 M )) = 0 and thus we arrive at ≤ lim sup t →∞ P ( | t (cid:88) i =1 η i u t − i | > (cid:15) ) ≤ P ( A cM ) . From this, we take the limit as M → ∞ and using (7.12) ≤ lim sup t →∞ P ( | t (cid:88) i =1 η i u t − i | > (cid:15) ) = 0 which means convergence of (cid:80) ti =1 η i u t − i to in probability. (cid:3) Now let’s return to the proof of the Theorem.(3) For the L p convergence we just need to take expectation of (7.11).(4) For the convergence in distribution we start by writing X t = B t X t − + E t γ t where B t = A t − E t . The idea is that because γ t are in L so are all the variables X t . We aregoing to use the Wasserstein distance to control the difference between the distributions of X t and X t − .The basic idea is that in a slightly modiﬁed Wasserstein distance D we have a contractionin the sense that there exists some ρ < such that(7.17) D ( X t , X t − ) ≤ ρD ( X t − , X t − ) . For the sake of completeness we deﬁne here for two n -dimensional random variables, X, Y or better for their distributions µ X , µ Y ,(7.18) D ( X, Y ) = (cid:18) inf α (cid:90) | x − y | ∞ α ( dx, dy ) (cid:19) = inf α E [ | ˜ X − ˜ Y | ∞ ] where α is a n -dimensional distribution with marginals µ X and µ Y and ˜ X ˜ Y are tworandom variables on the same probability space (we call it a coupling) with the same dis-tributions as X , respectively Y . The second equality follows easily from taking ˜ X and ˜ Y tobe the projections from π i : R n × R n → R n , given by π ( x, y ) = x while π ( x, y ) = y . To gofrom the pair ( ˜ X, ˜ Y ) back to the measure α , we just need to take α to be the distribution ofthe pair ( ˜ X, ˜ Y ) . VERAGING + LEARNING IN FINANCIAL MARKETS 15

The standard Wasserstein distance is deﬁned as W ( X, Y ) = (cid:18) inf α (cid:90) | x − y | α ( dx, dy ) (cid:19) = inf E [ | ˜ X − ˜ Y | ] . Because any two norms on R n are equivalent, we can ﬁnd two constants c , c > suchthat c W ( X, Y ) ≤ D ( X, Y ) ≤ c W ( X, Y ) . It is known that W gives the topology of weak converge on the space of probability mea-sures with ﬁnite ﬁrst moment (that is (cid:82) | x | µ ( dx ) < ∞ ). Due to the above inequality we alsoinfer the completeness with respect to the metric D on the same space P ( R n ) .To carry on this program we deﬁne for a distribution µ , the following map F t ( µ ) = the distribution of g t ( X t − , γ ) with g t ( x, λ ) = ( A t − E t ) x + E t λ, x, λ, ∈ R n , where X is a random variable with distribution µ and γ is a random variable independentof X and having the same distribution as the sequence γ t .Now we want to look at D ( X t , X t − ) and estimate it from above. To do this assumethat we have a coupling between X t − and X t − and then we can create an optimal cou-pling between X t and X t − (with respect to the distance D , which certainly exists fromKantorovich general result) and then take γ independent of both X t − and X t − and use X t − X t − = ( A t − E t ) X t − + E t γ − ( A t − − E t − ) X t − − E t − γ = ( A t − E t )( X t − − X t − ) + ( A t − A t − − E t + E t − ) X t − + ( E t − E t − ) γ. Taking | · | ∞ and the expectation both sides we get the estimate E [ | X t − X t − | ∞ ] ≤ E [ | ( A t − E t )( X t − − X t − ) | ∞ ] + E [ | ( A t − A t − − E t + E t − ) X t − | ∞ ] + E [ | ( E t − E t − ) γ | ∞ ] ≤ ρ t E [ | X t − − X t − | ∞ ] + α t ( E [ | X t − | ∞ ] + E [ | γ | ∞ ]) (7.19) where we denoted by α t = | A t − A t − | ∞ + |E t − E t − | ∞ . Notice that in the time independent case, the terms α t is 0, which implies that X t con-verges in distribution.In the general case we need to use the extra conditions from (7.4). From the aboveconsiderations we actually show ﬁrst that the expectation of X t obeys the equation (keepin mind that sup t ≥ |E t | ∞ ≤ A + 1 ) E [ | X t | ∞ ] ≤ ρ t E [ | X t − | ∞ ] + ( A + 1) E [ | γ | ∞ ] . Using this and the standard iterations combined with (7.2) we get that sup t E [ | X t | ∞ ] < C < ∞ . On the other hand from (7.19) we get that(7.20) D ( X t , X t − ) ≤ ρ t D ( X t − , X t − ) + Cα t . Using this and a simple iteration it leads to D ( X t , X t − ) ≤ ρ t ρ t − . . . ρ D ( X , X ) + C ( α t + α t − ρ t + α t − ρ t ρ t − + · · · + α ρ t ρ t − . . . ρ ) . In particular, summing this over t from t to t + s , leads to D ( X t , X t + s ) ≤ s (cid:88) i =1 ρ t + i − . . . ρ D ( X , X ) + C t + s (cid:88) k =1 α k s (cid:88) i =1 ρ t + i ρ t + i − . . . ρ k . According to (7.15) we conclude that the sum (cid:80) si =1 ρ t + i − . . . ρ converges to as s, t → ∞ .We will show that the other sum also converges to as both t, s → ∞ . To this end noticethat from (7.4), we can set β t = (cid:88) i ≥ t α i and write α t = β t − β t +1 . After rearrangements, this leads to t + s (cid:88) k =1 α k s (cid:88) i =1 ρ t + i ρ t + i − . . . ρ k = β ρ t + s ρ t + s − . . . ρ + β ρ t + s ρ t + s − . . . ρ + · · · + β t + s . The ﬁrst term converges to 0 because of (7.14) and the rest, converges to because of theabove Lemma thanks to the fact that β t converges to , this converges to .This proves the convergence in distribution.(5) Next we show that the condition (7.4) is also a necessary condition. Indeed, if we take theone dimensional case with X t = X t − + (cid:15) t ( γ t − X t − ) such that | (cid:15) t − (cid:15) t − | = 1 / (10 t ) for t ≥ In fact we will choose (cid:15) t = 1 / c t (cid:88) k =1 w i /i and we will choose w i = ± in the following fashion. First we take all w , w , . . . , w τ suchthat (cid:15) τ ≤ / but / < (cid:15) τ + c/ ( τ + 1) . Notice that we can do this because the harmonicseries is divergent. Now, we choose τ > τ such that w τ +1 = w τ +2 = · · · = w τ = − and (cid:15) τ − / (10( τ + 1)) < / ≤ (cid:15) τ . Now we choose τ > τ and w t +1 = · · · = w t = 1 suchthat (cid:15) τ ≤ / < (cid:15) τ + c/ ( τ + 1) . Then we choose τ > τ such that w τ +1 = w τ +2 = · · · = w τ = − such that (cid:15) τ − / (10( τ + 1)) < / ≤ (cid:15) τ . And we continue inductively. Thuswe have deﬁned a sequence (cid:15) t such that / ≤ (cid:15) t ≤ / such that { (cid:15) t } t ≥ = [1 / , / . In other words the limit points of the sequence (cid:15) t is just the interval [1 / , / and obviouslythe condition (7.2) is fulﬁlled. VERAGING + LEARNING IN FINANCIAL MARKETS 17

With this choice of the sequence (cid:15) t , we claim that the sequence X t does not converge indistribution. Indeed the argument is based on the simple observation that if it were, thentaking the characteristic functions φ X t we would get φ X t ( ξ ) = φ X t − ((1 − (cid:15) t ) ξ ) φ γ ( (cid:15) t ξ ) . As a recall, φ X ( ξ ) = E [ e iξX ] for any ξ ∈ R . In particular this means that if X t convergesto some random variable Y , then taking a subsequence t n for which (cid:15) t n −−−→ n →∞ x we obtainthat(7.21) φ Y ( ξ ) = φ Y ((1 − x ) ξ ) φ γ ( xξ ) for any x ∈ [1 / , / . Under the assumption that γ is integrable we claim that γ must be constant and also X isgoing to be the same constant. To carry this out we argue that for x = 1 / and x = 3 / weget that φ γ (3 ξ/ φ γ ( ξ/

4) = φ Y (3 ξ/ φ Y ( ξ/ . Replacing ξ by ξ/ we arrive at φ γ ( ξ ) φ γ ( ξ/

3) = φ Y ( ξ ) φ Y ( ξ/ . Replacing here ξ by ξ/ , ξ/ , . . . , ξ/ n and multiplying these we get that φ γ ( ξ ) φ γ ( ξ/ n ) = φ Y ( ξ ) φ Y ( ξ/ n ) . Now letting n → ∞ and using the fact that for any random variable Z , φ Z ( ξ/ n ) −−−→ n →∞ we obtain that φ γ ( ξ ) = φ Y ( ξ ) , in other words, Y has the same distribution as γ . Using this in (7.21) with x = 1 / wearrive at φ Y ( ξ ) = φ Y ( ξ/ . Iterating this we get φ Y ( ξ ) = φ Y ( ξ/ n ) n which can be written alternatively as(7.22) φ Y ( ξ ) = φ Y Y ··· + Y n n ( ξ ) , where Y , Y , . . . are iid with the same distribution as Y . Since Y and γ have the samedistributions and γ is integrable, it follows that Y is also integrable. This in particularimplies from the law of large numbers that Y + Y + ··· + Y n n converges almost surely to E [ Y ] = E [ γ ] . Since convergence almost surely implies convergence in distribution, we get that(7.23) φ Y ( ξ ) = φ E [ Y ] ( ξ ) , in other words, Y must be constant. This implies that γ is also constant which then ﬁnishesthe argument. (cid:3) Joint plot of contours with marginals x1 x x1 x F IGURE

2. When the noise is Gaussian, then X t converges to a Normal distribution.This is the picture on the left. The joint plot illustrates the case for two agents wholearn from each other with A and E ﬁxed. Variable x1 and x2 represent agents 1and 2. The right hand side picture represents the convergence results for γ t takingonly values ± with equal probability and independently of one another in eachcomponent. Remark 8.

We need to point out that integrability is key for the conclusion of the last part of Theorem 6.If we drop the integrability condition, then the passage from (7.22) to (7.23) is not possible. In fact, if wetake ( γ t ) t ≥ to be all iid Cauchy( ) and X = 0 , then X t will also follow a Cauchy( ) random variable forany choice of ≤ (cid:15) t ≤ with (cid:15) > . Certainly in this case we do not need any other assumptions on (cid:15) or ρ t to get convergence. We leave as an open problem the optimal conditions under which the model (7.3) converges as t → ∞ . Simulations for convergence to distribution.

Let us illustrate Theorem 6 and result 7.5. Sup-pose that the noise γ t is a Normal random variable. Numerical simulations show that X t convergesto a Gaussian random variable for each component – ﬁgure 2.The the asymptotic distribution is Gaussian centered around the true value ¯ σ . The main pointis that we do not need to scale X t .Suppose, the iid ( γ t ) s are vectors of just +1 or − , then X t converges in distribution. In ﬁg-ure 2, the simulated distribution looks distinctly non-Gaussian. For other noises non-standarddistributions can occur. 8. N ONLINEAR LEARNING

While Degroot updating is retained in this section, we develop nonlinear models of learning.Instead of E , there is a non-linear function. VERAGING + LEARNING IN FINANCIAL MARKETS 19

Deﬁnition 9.

The learning function f t : R n → R n is continuous on some compact convex subset K ⊆ R n and differentiable on its interior, with f (0) = 0 . Component wise it is f t  x ... x n  =  f t ( x ) ... f t ( x n )  . Notice that the update or feedback is now varying with time. Learning or feed back stops when ¯ σ − X t = 0 , so the condition f t (0) = 0 ensures this. The updating rule for agent i becomes x it +1 = n (cid:88) j =1 ( a ij ) t x jt + f t,i (¯ σ − x it ) . Moreover, the weights matrix A is also time varying. Previous sections showed convergence re-sults of linear updating f t,i = (cid:15) i , a ﬁxed scalar. Actual updating of feedback can be be quitecomplex, and having a nonlinear feedback or learning rule allows us to expand the linear model. Theorem 10.

For ∀ i ∈ { , · · · , n } and ∀ t ≥ , suppose the learning function satisﬁes (8.1) < inf f (cid:48) t,i ≤ sup f (cid:48) t,i < a ii ) t , and if we denote ρ t = sup i sup ξ (cid:0) | ( a ii ) t − f (cid:48) t,i ( ξ ) | + 1 − ( a ii ) t (cid:1) we assume that (8.2) sup t ≥ ( ρ t + ρ t ρ t − + · · · + ρ t ρ t − ρ t − . . . ρ ) < ∞ . (1) With the dynamics X t = A t X t − + f t (¯ σ − X t − ) , consensus is reached and lim t →∞ X t = ¯ σ . (2) If the evolution is given by X t = A t X t − + f t (¯ σ + γ t − X t − ) under the same assumption as in (8.1) , then γ t −−−→ t →∞ yields that X t −−−→ t →∞ ¯ σ . (If the noise con-verges to zero a.s , in probability or in L , then X t converges accordingly). (3) Again assume (8.2) and (8.3) X t = A t X t − + f t ( γ t − X t − ) where the sequence ( γ t ) t ≥ is assumed to be iid and integrable. If in addition we have that (8.4) (cid:88) t ≥ (cid:32) | A t − A t − | ∞ + max i sup ξ ∈ R | f (cid:48) t,i ( ξ ) − f (cid:48) t − ,i ( ξ ) | (cid:33) < ∞ , then X t converges in distribution as t → ∞ . Notice that the last part of the result above does not involve the ¯ σ because it is actually hiddenin the sequence γ . As opposed to the other two cases, the convergence is only in distribution andin principle that is implicitly deﬁned, it is not a constant variable as in the previous cases. Proof. (1) First we subtract ¯ σ from both sides of the dynamics equation. As A is stochastic, A ( t )¯ σ = ¯ σ , hence ( X t +1 − ¯ σ ) = A ( t )( X t − ¯ σ ) + f t (¯ σ − X t − ) . Second, we recast the equation using the inﬁnity-norm | X t +1 − ¯ σ | ∞ = sup i | ( X t +1 − ¯ σ ) i | . For individual i , the updating rule becomes ( X t +1 − ¯ σ ) i = n (cid:88) j =1 ( a ij ) t ( X t − − ¯ σ ) j + f t (¯ σ − ( X t − ) i )= (cid:18) ( a ii ) t − f t,i (¯ σ − ( X t − ) i )(¯ σ − ( X t − ) i ) (cid:19) ( X t − − ¯ σ ) i + n (cid:88) j (cid:54) = i ( a ij ) t ( X t − − ¯ σ ) j ≤ (cid:0) | ( a ii ) t − f (cid:48) t,i ( ξ i ) || X t − − ¯ σ | i (cid:1) + n (cid:88) j (cid:54) = i ( a ij ) t | X t − − ¯ σ | j ≤ (cid:0) | ( a ii ) t − f (cid:48) t,i ( ξ i ) | + 1 − ( a ii ) t (cid:1) | X t − − ¯ σ | ∞ ≤ sup i sup ξ (cid:0) | ( a ii ) t − f (cid:48) t,i ( ξ i ) | + 1 − ( a ii ) t (cid:1) | X t − − ¯ σ | ∞ The second equality follows because the learning function is continuous and differentiablehence f t,i ( x ) − f t,i (0) = ( x − f (cid:48) t,i ( ξ ) = ⇒ f t,i ( x ) x = f (cid:48) t,i ( ξ ) . for some ξ i ∈ (0 , x ) by the Mean value theorem.By assumption < inf f (cid:48) t,i ≤ sup f (cid:48) t,i < a ii ) t but this is equivalent there being some < δ i < such that ∀ ξ ∈ R (8.5) δ i < f (cid:48) t,i ( ξ ) < a ii ) t − δ i . The above condition gives us two cases to consider. In the ﬁrst case, ignoring depen-dence on t, for all i ∈ { , · · · , n } and ξa ii > f (cid:48) i ( ξ ) (case 1) in which case, | a ii − f (cid:48) i ( ξ ) | + 1 − ( a ii ) = 1 − f (cid:48) t,i ( ξ ) < − δ i In the second case, a ii ≤ f (cid:48) i ( ξ ) (case 1) in which case, | a ii − f (cid:48) i ( ξ ) | + 1 − ( a ii ) = 1 + f (cid:48) t,i ( ξ ) − a ii < − δ i . VERAGING + LEARNING IN FINANCIAL MARKETS 21

Thus we obtain that sup i sup ξ (cid:0) | ( a ii ) t − f (cid:48) t,i ( ξ i ) | + 1 − ( a ii ) t (cid:1) < − min i δ i < thus we have a contraction in | X t − ¯ σ | ∞ and consequently, lim t →∞ X t = ¯ σ. (2) The deviation equation from consensus is ( X t +1 − ¯ σ ) = A ( t )( X t − ¯ σ ) + f t (¯ σ + γ t − X t − ) . Essentially the same steps follow as the in the proof with no noise ( X t +1 − ¯ σ ) i = n (cid:88) j =1 ( a ij ) t ( X t − − ¯ σ ) j + f t,i (¯ σ + γ t − ( X t − ) i )= (cid:18) ( a ii ) t − f t,i (¯ σ + γ t − ( X t − ) i )(¯ σ + γ t − ( X t − ) i ) (cid:19) ( X t − − ¯ σ − γ t ) i + n (cid:88) j (cid:54) = i ( a ij ) t ( X t − − ¯ σ ) j = (cid:0) ( a ii ) t − f (cid:48) t,i ( ξ ) (cid:1) ( X t − − ¯ σ ) i + n (cid:88) j (cid:54) = i ( a ij ) t ( X t − − ¯ σ ) j + γ t f (cid:48) t,i ( ξ )= (cid:0) ( a ii ) t − f (cid:48) t,i ( ξ ) (cid:1) ( X t − − ¯ σ ) i + (1 − ( a ii ) t )( X t − − ¯ σ ) j + γ t f (cid:48) t,i ( ξ ) ≤ (cid:0) | ( a ii ) t − f (cid:48) t,i ( ξ i ) || X t − − ¯ σ | i (cid:1) + (1 − ( a ii ) t ) | X t − − ¯ σ | j + | γ t | f (cid:48) t,i ( ξ ) ≤ (cid:0) | ( a ii ) t − f (cid:48) t,i ( ξ i ) | + 1 − ( a ii ) t (cid:1) | X t − − ¯ σ | ∞ + | γ t | f (cid:48) t,i ( ξ ) ≤ sup i sup ξ (cid:0) | ( a ii ) t − f (cid:48) t,i ( ξ i ) | + 1 − ( a ii ) t (cid:1) | X t − − ¯ σ | ∞ + C | γ t | The rest of the proof follows as in the proof of Theorem 6, more precisely, following thesame argument starting with (7.7). In all instances the convergence follows the same argu-ments as in the linear case.(3) First observe that from (8.3) we get E [ | X t | ∞ ] ≤ ρ t E [ | X t − | ∞ ] + 2 E [ γ ] . From this, iterating and using (8.2) as in the linear case we obtain that sup t ≥ E [ | X t | ∞ ] = C < ∞ . To treat the case where γ t are all iid, we follow the same argument as the linear case.Here we have to use in the ﬁrst place the distance deﬁned in (7.18) and the argument forthe estimate of D ( X t , X t − ) we need to take a for any coupling ˜ X t − and ˜ X t − the coupling A t ˜ X t − + f t ( γ − ˜ X t − ) and A t − ˜ X t − + f t − ( γ − ˜ X t − ) . Then, D ( X t , X t − ) ≤ E [ | A t ˜ X t − + f t ( γ − ˜ X t − ) − A t − ˜ X t − + f t − ( γ − ˜ X t − ) | ∞ ] ≤ E [ | A t ˜ X t − + f t ( γ − ˜ X t − ) − ( A t ˜ X t − + f t ( γ − ˜ X t − ) | ∞ ]+ E [ | A t ˜ X t − + f t ( γ − ˜ X t − ) − ( A t − ˜ X t − + f t − ( γ − ˜ X t − )) | ∞ ] ≤ ρ t E [ | ˜ X t − − ˜ X t − | ∞ + (cid:32) | A t − A t − | ∞ + max i sup ξ ∈ R | f (cid:48) t,i ( ξ ) − f (cid:48) t − ,i ( ξ ) | (cid:33) E [ | X t − | ∞ ] ≤ ρ t D ( X t − , X t − ) + C (cid:32) | A t − A t − | ∞ + max i sup ξ ∈ R | f (cid:48) t,i ( ξ ) − f (cid:48) t − ,i ( ξ ) | (cid:33) . From this we proceed exactly in the same way as in the proof of the linear case, moreprecisely, the same proof following (7.20) to show that X t is Cauchy in the metric D . (cid:3) Remark 11.

Matrix A ( t ) and learning function f t are allowed to be time dependent or slowly varying.They could be random but in a controlled way. Were A and f to be ﬁxed in time, the above result would stillhold. So the constant case is a special case of what we have shown. Continuity of the learning function f t is essential. We give an example of a situation where itbreaks down. Example 12.

Consider the sign function sign( x ) =  − if x < if x = 01 if x > . If the learning f were the signum function, then the dynamics would be X t = A ( t ) X t − + E sign(¯ σ − X t − ) . Consensus in this case would not be achieved. One can plainly see this in the one dimensional case of A t = 1 , σ = 1 , Y t = X t − σ , Y = 1 and take / < E < / . With this setup we get Y = 1 − E , Y = 1 − E , Y = 1 − E , Y = 1 − E , Y = 1 − E , . . . which shows that Y t becomes periodic, thus not convergent. We can extend this behaviour to more generalsituations of course, though this periodic pattern still follows.

9. A

VERAGE DYNAMICS

The previous model with learning assumes ¯ σ is already known. In the case where the true σ is not a priori assumed to exist, one of the possibilities is to replace σ by some average of all theplayers and the model becomes(9.1) X t = A t X t − + E t ( X t − − X t − ) . VERAGING + LEARNING IN FINANCIAL MARKETS 23 where X t − is the average of the sigmas of all the players. On a pure information level, this seemsmore satisfactory. There is no outside knowledge, all information is entirely contained within theinteractions. For a large number of players, this makes perfect sense.However, in this case the main issue is to show that the model converges. To do this we interpret U for any vector U as U =  U + ··· + U n n ... U + ··· + U n n  =  n + · · · + n ... n + · · · + n (cid:124) (cid:123)(cid:122) (cid:125) ∆  U ... U n  . The system becomes now X t = ( A t + E t ∆ − E t ) X t − . (9.2)We want to show that X t converges to a vector which is a multiple of =  ...  . Fixed A and E . In this section we give an algebraic approach of the case of A and E constantin time. Theorem 13. If A is an n × n stochastic matrix and E is such that < (cid:15) i < nn − a ii for any i and X t = ( A + E ∆ − E ) X t − then X t −−−→ t →∞ λ = λ  ...  . Convergence is also exponentially fast.Proof. Note that ∆ X t − = X t − . Since E ∆ − E =  − (cid:15) ( n − n (cid:15) n · · · (cid:15) n(cid:15) n − (cid:15) ( n − n · · · (cid:15) n . . . . . .  , it means that A + E ∆ − E =  a − (cid:15) ( n − n (cid:15) n + a · · · (cid:15) n + a n a + (cid:15) n a − (cid:15) ( n − n · · · (cid:15) n + a n ... ...  . For the time homogenous case, we require B = A + E ∆ − E to be a stochastic matrix. It sufﬁcesto enforce the condition < (cid:15) i < nn − a ii . In this case, B is a stochastic matrix and we are requiring b ii > or positive self belief. The systemnow becomes X t = BX t − . To show that X t converges, we can put B in Jordan form B = J − DJ , where D is a block Jordanmatrix. If we let Y t = J X t , we obtain the equivalent system Y t = DY t − . From the Perron-Frobenius theorem (notice here that indeed B has positive entries since (cid:15) i > )we know that 1 is a single eigenvalue and all other eigenvalues are in absolute value less than 1.Thus(9.3) D =  J i . . . J n  Where J i corresponds to Jordan block  λ i λ i . . . . . . λ i  . In fact, from the above representation, ( Y t ) = ( Y t − ) , which means that the ﬁrst entry of Y t doesnot change. The rest of the analysis reduces to systems of the form Y t =  λ λ . . . . . . λ  Y t − with | λ | < . For such a situation we use the concrete expression of Y t =  λ λ . . . . . . λ  t Y . Now as | λ | < , we can show that with U =  . . .  VERAGING + LEARNING IN FINANCIAL MARKETS 25 (this is the matrix with 1’s in the super-diagonal) that we have  λ . . . λ  t = ( λ I + U ) t = t (cid:88) k =0 (cid:18) nk (cid:19) ( λ I ) t − k U k . On the other hand, it is not difﬁcult to see that U n = 0 , thus in the above sum we have to collectonly the terms k < n , in other words  λ . . . λ  t = ( λ I + U ) t = n (cid:88) k =0 (cid:18) nk (cid:19) ( λ I ) t − k U k −−−→ t →∞ because | λ | < . The conclusion is that for < (cid:15) i < nn − a ii we have that X t converges to a vector v . From the equation X t = BX t − we get that v = Bv , which from Perron-Frobenius shows that v = λ  ...  . (cid:3) Time varying A ( t ) and E ( t ) . Time varying here allows for some randomness in that the en-tries can change over time but not drastically enough to alter the structure of the matrix. In thelater sections, we will consider matrices which are allowed to be truly random with less restric-tions on entries. In the previous section A and E were constant. We now look at the behaviourof time varying stochastic matrices and learning rates. We have the general model still given by(9.1), which reduces to X t = ( A t + E t ∆ − E t ) X t − or, if we set(9.4) B t = A t + E t ∆ − E t , the above system becomes X t = B ( t ) X t − .We will just refer to this system as X t = B t X t − for notation purposes. For the general case oftime varying matrices, we need a different idea since the Jordan form of B t produces a matrix J t which depends on t . The argument used above fails because for each t a different corresponding J t exists.Another way is to look at how much the components of X t differ from each other. A quantiﬁerfor this is the oscillation we introduce now. Deﬁnition 14.

Deﬁne for a vector X , osc( X ) = max i,j | X i − X j | and for an n × n matrix A , the Dobrushin coefﬁcient by δ ( A ) = 12 max i,j =1 n (cid:88) k =1 | a ik − a jk | . We provide here a Lemma which is classical but we provide also the proof for reader’s conve-nience.

Lemma 15. If A is a matrix whose row sums are all equal and X = AY , then (9.5) osc( X ) ≤ δ ( A ) osc( Y ) In particular, if A is a stochastic matrix, then osc( X ) ≤ osc( Y ) . Proof.

We can write Y i = ( Y min + Y max ) / Z i with Z i ∈ [ − osc( Y ) / , osc( Y ) / . Thus, because thesums on rows does not change, we actually have X i − X j = n (cid:88) k =1 ( a ik − a jk ) Z k . Taking the absolute values and using the fact that | Z k | ≤ osc( Y ) / , gives the result. (cid:3) Before we state the next result, we deﬁne =  ...  , ∆ = 1 n I I (cid:62) = 1 n I where I =  . . .

11 1 . . . . . . . . . . . . . . . . . .  Theorem 16.

Assume we have the model with deterministic stochastic matrices A t , γ t random variablesand X t = A t X t − + E t ( X t − − X t − + γ t ) , with (9.6) ≤ ( (cid:15) i ) t ≤ nn − a ii ) t ∀ i = 1 , · · · , n, ∀ t ≥ . (1) Let B t = A t + E t (∆ − I ) and assume that ρ t = δ ( B t ) > for each t ≥ and (9.7) t (cid:89) s =1 ρ s −−−→ t →∞ , and γ t = 0 , t ≥ , then (9.8) t (cid:89) s =1 B s −−−→ t →∞ C VERAGING + LEARNING IN FINANCIAL MARKETS 27 where for some ν , ν , . . . , ν n ≥ with (cid:80) ts =1 ν s = 1 , (9.9) C =  ν ν . . . ν n ν ν . . . ν n . . . . . . . . . . . . . . .ν ν . . . ν n  . If γ t = 0 for all t ≥ , then, for some λ > (9.10) X t a.s. −−−→ t →∞ λ (2) Assume now that we have (9.11) sup t ≥ { ρ t + ρ t ρ t − + ρ t ρ t − ρ t − + · · · + ρ t ρ t − . . . ρ } < ∞ . (a) If (cid:80) t ≥ | γ t | ∞ < ∞ almost surely, then X t a.s. −−−→ t →∞ λ for some random variable λ ≥ . (b) If (cid:80) t ≥ | γ t | ∞ converges in L p , then X t L p −−−→ t →∞ λ for some random variable λ ∈ L p . (3) Assume that (9.11) holds true and that (9.12) E t −−−→ t →∞ E and that ( γ t ) t ≥ is a sequence of iid random variables with mean µ ∈ R n and covariance matrix Σ .Then, (9.13) X t − E [ X t ] √ t = ⇒ N (0 , C (cid:62) E (cid:62) Σ E C ) in distribution sense. Remark 17. (1)

Notice here that because of our assumption (9.6) we get that < ρ t = δ ( A t + E t (∆ − I )) < and in particular we get another proof of the time independent model as well. (2) For any matrix B with sums of rows equal to , which is the case of B = A t + E t (∆ − I ) , using thefact that a ∧ b = a + b −| a − b | , we have another expression for δ ( B ) = 1 − min i,j n (cid:88) k =1 b ik ∧ b jk . For instance, if we deﬁne ω ( B ) = max i,j (1 − b ij ) and if B is a stochastic matrix with all positive entries, then δ ( B ) ≤ − ω ( B ) < . This is our situation generated by the condition (9.6) . (3) The convergence in probability of the series (cid:80) t ≥ | γ t | ∞ is the same as almost sure convergence, andthis is the reason why this is left out. (4) As opposed to the case of Theorem 6, this time, to guarantee consensus in the long run for X t , weneed the hypothesis of convergence of (cid:80) t ≥ | γ t | ∞ . Just convergence of γ t to is not enough in thiscase. For instance, if γ t = (1 /t ) e , and X = 0 , with A t and E i = E for all i independent of time t ,then X t = c ( (cid:80) ti =1 /i ) e which is clearly not convergent. (5) Contrary to the case of Theorem 6, if we take γ t to be iid, we do not get converge in distribution of X t itself. For instance if we assume that X = 0 and A t and E t are constant such that the diagonalof E is also constant, then for γ t = u t e , we get that X t = c ( (cid:80) ti =1 u t ) e . This is not convergentunless u t = 0 a.s.. However this is complemented by the last part of the Theorem which show thatproperly scaled, this convergences to a multidimensional normal random variable. (6) Notice that the covariance matrix of the normal limit (9.13) , is actually rank one because the matrix C is rank one. Thus the normal random variable is actually supported on a line.Proof. We will show in fact ﬁrst (9.13) and then (9.9) follows by taking initial vectors to be thecoordinate vectors of R n .We prove the general situation and we take with the notation B t = A t + E t (∆ − I ) , X t = B t X t − + ˜ γ t where ˜ γ t = E t γ t from which we get in the ﬁrst place that(9.14) osc( X t ) ≤ δ ( B t ) osc( X t − ) + osc(˜ γ t ) = ρ t osc( X t − ) + osc(˜ γ t ) and with the same argument from Theorem 6, under (9.7) we obtain that osc( X t ) converges to under the convergence of (cid:80) t ≥ | γ t | ∞ (in both a.s. case and also in L p sense). On the other hand,since B t is a stochastic matrix, we have also that(9.15) ( X t − ) min + (˜ γ t ) min ≤ ( X t ) min ≤ ( X t ) max ≤ ( X t − ) max + (˜ γ t ) max . (1) From this, if we take γ t = 0 , then (9.15) together with (9.14) gives that X t converges to amultiple of the vector . Thus, taking for X one of the basis vectors of R n , and using thefact that X t = t (cid:89) s =1 B s X we obtain that the matrix (cid:81) ts =1 B s converges to a matrix C which has repeated rows.(2) We need here the following rewriting of (9.15)(9.16) ( X t ) max − t (cid:88) s =1 (˜ γ t ) max ≤ ( X t − ) max − t − (cid:88) s =1 (˜ γ t ) max and ( X t − ) min − t − (cid:88) s =1 (˜ γ t ) min ≤ ( X t ) min − t (cid:88) s =1 (˜ γ t ) min (a) If (cid:80) t ≥ | γ t | ∞ < ∞ almost surely, then using (9.16) and get that ( X t ) min − (cid:80) ts =1 (˜ γ t ) min is convergent (monotone and bounded). In a similar way we also have that ( X t ) max − (cid:80) ts =1 (˜ γ t ) max is convergent. Therefore, because the oscillation goes to zero we get that X t is almost surely convergent. Combined with the fact that osc( X t ) converges to ,this implies the convergence of X t in almost sure sense. VERAGING + LEARNING IN FINANCIAL MARKETS 29 (b) If (cid:80) t ≥ | γ t | ∞ converges in some L p , then it is also a.s. convergent. Consequently,we have that (cid:80) t ≥ | γ t | ∞ < ∞ thus we can invoke the previous part to argue that X t converges a.s. to a random variable X .To show the L p convergence, we ﬁrst notice that X t is in L p for every p (just from therecurrence relation). Next, using (9.14), we have that osc( X t ) converges to in L p . Onthe other hand, for any s < t , we have X t − X s = ( t (cid:89) i =1 B i − s (cid:89) i =1 B i ) X + t (cid:88) k = s +1 k (cid:89) i =1 B i ˜ γ k . Using this, the convergence of B t to C and some standard estimates we get that forsome constant K > , (cid:107)| X t − X s | ∞ (cid:107) p ≤ (cid:107) t (cid:89) i =1 B i X − s (cid:89) i =1 B i X (cid:107) p + K (cid:107) t (cid:88) k = s +1 | γ k |(cid:107) p This shows that the sequence ( X t ) t ≥ is Cauchy in in L p . In particular this meansthe sequence is convergent in L p and because it also converges almost surely it mustconverge to the same limit.(3) To show convergence in distribution we proceed as follows. We ﬁrst observe that in thecase γ t are normally distributed, the random variable X t is actually normally distributedwith mean µ t = t (cid:89) k =1 B k X + t (cid:88) s =1 s (cid:89) k =1 B k E s µ and covariance matrix given by Γ t = t (cid:88) s =1 ( s (cid:89) k =1 B k ) (cid:62) E (cid:62) Σ E ( s (cid:89) k =1 B k ) . Because (cid:81) tk =1 B k converges to C and E t converges to E , this means that Γ t t −−−→ t →∞ C (cid:62) E (cid:62) Σ E C, Therefore X t − E [ X t ] √ t = ⇒ N (0 , C (cid:62) E (cid:62) Σ E C ) . On the other hand, if γ t are not normal, we can take copies of Z t which are normal andwe can put all the variables γ t and Z t on the same probability space. We will compare thesums X t with Y t , which we deﬁne by Y t = B t Y t − + E t Z t . with Y = X . We certainly have in the ﬁrst place that E [ X t ] = E [ Y t ] = ˜ B t X + t (cid:88) s =1 ˜ B s E s E [ γ s ] . where in order to simply the notation we set ˜ B t = t (cid:89) s =1 B s . From this we can reduce the rest of the proof to the case of X = 0 and E [ γ t ] = 0 . Thesedo not change the covariance matrices of X t and Y t which remain the same.Next we will use the Lindberg argument which involves the comparison of X t and Y t .For a reference of this look at [Str00]. In the ﬁrst place realize that X t = t (cid:88) s =1 ˜ B s E s γ s while Y t = t (cid:88) s =1 ˜ B s E s Z s . Take now a smooth function φ : R n → R with all bounded derivatives, and we will com-pare E (cid:20) φ (cid:18) X t √ t (cid:19)(cid:21) − E (cid:20) φ (cid:18) Y t √ t (cid:19)(cid:21) = t (cid:88) s =1 (cid:32) E (cid:34) φ (cid:32) W s + ˜ B s E s γ s √ t (cid:33)(cid:35) − E (cid:34) φ (cid:32) W s + ˜ B s E s Z s √ t (cid:33)(cid:35)(cid:33) = t (cid:88) s =1 (cid:32) E (cid:34) φ (cid:32) W s + ˜ B s E s γ s √ t (cid:33) − φ (cid:32) W s + ˜ B s E s Z s √ t (cid:33)(cid:35)(cid:33) (9.17) with the deﬁnition W s = s − (cid:88) i =1 ˜ B i E i Z i + t (cid:88) i = s +1 ˜ B i E i γ i . Thus, W k is independent of γ k and Z k . Now for a smooth function we can use Taylor’sformula with integral remainder, φ ( x + y ) = φ ( x ) + Dφ ( x )( y ) + (cid:90) D φ ( x + ty )( y, y ) tdt while using a Taylor’s formula to third order with integral remainder, φ ( x + y ) = φ ( x ) + Dφ ( x )( y ) + 12 D φ ( x )( y, y ) + 12 (cid:90) D φ ( x + ty ))( y, y, y ) t dt. We can combine both of these to write φ ( x + y ) − φ ( x ) − Dφ ( x )( y ) − D φ ( x )( y, y ) = R ( x, y ) with(9.18) R ( x, y ) ≤  (cid:107) D φ (cid:107) ∞ | y | (cid:107) D φ (cid:107) ∞ | y | Going now back to the decomposition (9.17), using the independence of W s from both γ s and also Z s and the fact that γ s and Z s have the same mean and covariance, we can write (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E (cid:34) φ (cid:32) W s + ˜ B s E s γ s √ t (cid:33) − φ (cid:32) W s + ˜ B s E s Z s √ t (cid:33)(cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R (cid:32) W s √ t , ˜ B s E s γ s √ t (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R (cid:32) W s √ t , ˜ B s E s Z s √ t (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:35) . VERAGING + LEARNING IN FINANCIAL MARKETS 31

Now, for a given (cid:15) > , we write E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R (cid:32) W s √ t , ˜ B s E s γ s √ t (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:35) = E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R (cid:32) W s √ t , ˜ B s E s γ s √ t (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , | γ s | ≤ (cid:15) √ t (cid:35) + E (cid:34)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) R (cid:32) W s √ t , ˜ B s E s γ s √ t (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , | γ s | > (cid:15) √ t (cid:35) ≤ t / (cid:107) D φ (cid:107) E [ | B s E s γ s | , | γ s | ≤ (cid:15) √ t ] + 1 t (cid:107) D φ (cid:107) E [ | B s E s γ s | , | γ s | > (cid:15) √ t ] ≤ (cid:15)K t (cid:107) D φ (cid:107) E [ | γ | ] + Kt (cid:107) D φ (cid:107) E [ | γ | , | γ | > (cid:15) √ t ] . For the ﬁrst expectation we used the estimation of R from (9.18) with the second estimate,while for the second we use the ﬁrst estimate in (9.18). The next round of inequalitiesinvolves the fact that ˜ B s converges, thus the operator norm is bounded. Similarly E t isbounded and K is a common bound for all | ˜ B s E s | .A similar inequality is obtained for the other case, namely with Z s instead of γ . Thus weget that each term in the summation of (9.17) is bounded by (cid:15)K t (cid:107) D φ (cid:107) ( E [ | γ | ] + E [ | Z | ]) + Kt (cid:107) D φ (cid:107) ( E [ | γ | , | γ | > (cid:15) √ t ] + E [ | Z | , | Z | > (cid:15) √ t ]) This means that we can bound now for every (cid:15) > , | E (cid:20) φ (cid:18) X t √ t (cid:19)(cid:21) − E (cid:20) φ (cid:18) Y t √ t (cid:19)(cid:21) | ≤ (cid:15)K (cid:107) D φ (cid:107) ( E [ | γ | ] + E [ | Z | ])+ K (cid:107) D φ (cid:107) ( E [ | γ | , | γ | > (cid:15) √ t ] + E [ | Z | , | Z | > (cid:15) √ t ]) Letting now t → ∞ and then (cid:15) → , ﬁnishes the argument, showing that we can replacethe γ by normal random variables with the same mean and variance for which the CLT isclear. (cid:3) Simulation result for CLT.

In the averaging case, when the noise is Gaussian so is the asymp-totic distribution. However, we illustrate the less obvious CLT result (ﬁgure 3), namely even whenthe noise terms are iid for any general distribution. Again the simulations are for two agents withﬁxed A and E matrices and ( γ t s ) is a vector with +1 or − . The joint plot are heavily concentratedon a straight line indicating that agents are synchronized.10. C ONCLUSION

To isolate learning, we dispensed with traditional game theoretic notions of utility. There hasbeen a growing trend across disciplines to study this aspect. The abundance of data from the on-line world on interactions means social network models are gaining the interests of theoreticiansas well as experimentalists. Our work is the ﬁrst to our knowledge that generalizes Degroot learn-ing to incorporate randomness and develop distribution results on the beliefs themselves. Whatkind of distributions arise, when there is no consensus? This question was examined at length. Itis not necessary that a Gaussian distribution arises. It depends on the underlying randomness.For previous studies in social learning, the noise is interpreted as a private signal. In our setting,one can think of the noise in this way. However, the emphasis we put was on the probabilistic

Joint plot of contours with marginals − . − . − . − .

025 0 .

000 0 .

025 0 .

050 0 .

075 0 . x1 − . − . . . . x F IGURE

3. Here X t converges to a Normal random variable. The joint plot illus-trates the case for two agents who learn from each other with A and E ﬁxed. Large t = 2000 and 3000 samples were created.notions of consensus. Agreement can be to a point, a probability measure or to a line. This holdsregardless of the number of agents.When the noise is not decaying, condition (7.4) is crucial to ensure convergence in distribution.This condition can be thought of as a stabilization feature of learning. Individuals learn withvarying A t and E t but these cannot change too drastically. Eventually, all agents settle down. Weextended the standard Degroot learning models to incorporate a variety of noise terms.One criticism of having ¯ σ is that it is already incorporated into the learning. To relax this as-sumption, we introduced averaging dynamics where the ground truth is endogenous [Son16]to the social network. The central limit theorem developed in Theorem 16 has an unusual fea-ture. The marginal distributions are Gaussian but the joint distribution encapsulates the consensusproperty as agents synchronize along a line. An interesting aspect of our results is that we placeno restrictions on the interaction matrix A , the implicit network topology, being fully connected all the time . Social connections can change with time. The only requirement was that agents haveself-belief. However, there can be periods of insanity where learning rates (cid:15) i ’s are also zero or theinteraction matrix is just the identity matrix. Individual players can be insane and refuse fo learnfor short bouts. Mixing of beliefs and convergence to consensus is ensured by conditions (9.7) or(7.2). These assumptions are weaker than previous attempts, which require strong connectivitybetween the agents.Thus far, agents’ rules are mechanical. Future work should address the issue of rationality.In Degroot learning, individuals are boundedly rational. They use the same rule. What if theagents are strategic? In the presence of noise or disturbance, manipulation of opinion dynamics VERAGING + LEARNING IN FINANCIAL MARKETS 33 by forceful agents [AOP10] becomes an interesting but difﬁcult question. A possible way forwardis to look at fully nonlinear models. Random dynamical systems were reviewed by [BM03]. Ourresults use different techniques to study social learning. Though it must be acknowledged thatrecursive random dynamical systems are not new in economics, their probabilistic analysis posesseveral challenges to researchers. The interaction between mathematical ﬁnance and game theorygroups should take into account the resurgence of social learning models. How a distributionof beliefs on prices for ﬁnancial assets arises is not only a fundamental question for game theo-rists but also of interest to proponents of stochastic volatility and asset pricing models. Ratherthan viewing trading as an exogenous activity, it should be seen as an essential combination ofinteraction and learning. R

EFERENCES [AO11] Daron Acemoglu and Asuman Ozdaglar,

Opinion dynamics and learning in social networks , Dynamic Gamesand Applications (2011), no. 1, 3–49.[AOP10] Daron Acemoglu, Asuman Ozdaglar, and Ali ParandehGheibi, Spread of (mis) information in social networks ,Games and Economic Behavior (2010), no. 2, 194–227.[Ban92] Abhijit V Banerjee, A simple model of herd behavior , The quarterly journal of economics (1992), no. 3,797–817.[Bau16] Dario Bauso,

Game theory with engineering applications , SIAM, 2016.[BBC17] Joshua Becker, Devon Brackbill, and Damon Centola,

Network dynamics of social inﬂuence in the wisdom ofcrowds , Proceedings of the national academy of sciences (2017), no. 26, E5070–E5076.[BBCM19] Abhijit Banerjee, Emily Breza, Arun G Chandrasekhar, and Markus Mobius,

Naive learning with uninformedagents , Tech. report, National Bureau of Economic Research, 2019.[BHOT05] Vincent D Blondel, Julien M Hendrickx, Alex Olshevsky, and John N Tsitsiklis,

Convergence in multiagentcoordination, consensus, and ﬂocking , Proceedings of the 44th IEEE Conference on Decision and Control,IEEE, 2005, pp. 2996–3000.[BHW92] Sushil Bikhchandani, David Hirshleifer, and Ivo Welch,

A theory of fads, fashion, custom, and cultural changeas informational cascades , Journal of political Economy (1992), no. 5, 992–1026.[BM03] Rabi Bhattacharya and Mukul Majumdar,

Random dynamical systems: a review , Economic Theory (2003),no. 1, 13–38.[CEMS08] Martin W Cripps, Jeffrey C Ely, George J Mailath, and Larry Samuelson, Common learning , Econometrica (2008), no. 4, 909–933.[CLX15] Arun G Chandrasekhar, Horacio Larreguy, and Juan Pablo Xandri, Testing models of social learning on net-works: Evidence from a lab experiment in the ﬁeld , Tech. report, National Bureau of Economic Research, 2015.[DeG74] Morris H DeGroot,

Reaching a consensus , Journal of the American Statistical Association (1974), no. 345,118–121.[DVZ03] Peter M DeMarzo, Dimitri Vayanos, and Jeffrey Zwiebel, Persuasion bias, social inﬂuence, and unidimensionalopinions , The Quarterly journal of economics (2003), no. 3, 909–968.[FDLL98] Drew Fudenberg, Fudenberg Drew, David K Levine, and David K Levine,

The theory of learning in games ,vol. 2, MIT press, 1998.[GJ10] Benjamin Golub and Matthew O Jackson,

Naive learning in social networks and the wisdom of crowds , Ameri-can Economic Journal: Microeconomics (2010), no. 1, 112–49.[GPW +

13] Martin D Gould, Mason A Porter, Stacy Williams, Mark McDonald, Daniel J Fenn, and Sam D Howison,

Limit order books , Quantitative Finance (2013), no. 11, 1709–1742. [GS] Ben Golub and Evan Sadler, Learning in social networks , The Oxford Handbook of the Economics of Net-works.[HJMR18] Jan Hazla, Ali Jadbabaie, Elchanan Mossel, and M. Amin Rahimian,

Reasoning in bayesian opinion exchangenetworks is pspace-hard , CoRR abs/1809.01077 (2018).[Jac10] Matthew O Jackson,

Social and economic networks , Princeton university press, 2010.[Kir02] Alan Kirman,

Reﬂections on interaction and markets , Quantitative Finance (2002), 322–326.[KL94] Ehud Kalai and Ehud Lehrer, Weak and strong merging of opinions , Journal of Mathematical Economics (1994), no. 1, 73–86.[Lor05] Jan Lorenz, A stabilization theorem for dynamics of continuous opinions , Physica A: Statistical Mechanics andits Applications (2005), no. 1, 217 – 223, Market Dynamics and Quantitative Economics.[MF13] Manuel Mueller-Frank,

A general framework for rational learning in social networks , Theoretical Economics (2013), no. 1, 1–40.[Mor05] Luc Moreau, Stability of multiagent systems with time-dependent communication links , IEEE Transactions onautomatic control (2005), no. 2, 169–182.[MPV17] Tung Mai, Ioannis Panageas, and Vijay V Vazirani, Opinion dynamics in networks: Convergence, stability andlack of explosion , 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017),Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.[MS07] Shie Mannor and Jeff S Shamma,

Multi-agent learning for engineers , Artiﬁcial Intelligence (2007), no. 7,417–422.[MST14] Elchanan Mossel, Allan Sly, and Omer Tamuz,

Asymptotic learning on bayesian social networks , ProbabilityTheory and Related Fields (2014), no. 1-2, 127–157.[MT +

17] Elchanan Mossel, Omer Tamuz, et al.,

Opinion exchange dynamics , Probability Surveys (2017), 155–204.[MTSJ18] Pooya Molavi, Alireza Tahbaz-Salehi, and Ali Jadbabaie, A theory of non-bayesian social learning , Economet-rica (2018), no. 2, 445–490.[OSFM07] Reza Olfati-Saber, J Alex Fax, and Richard M Murray, Consensus and cooperation in networked multi-agentsystems , Proceedings of the IEEE (2007), no. 1, 215–233.[PNGCS14] Georgios Piliouras, Carlos Nieto-Granda, Henrik I. Christensen, and Jeff S. Shamma, Persistent patterns:Multi-agent learning beyond equilibrium and utility , Proceedings of the 2014 International Conference on Au-tonomous Agents and Multi-agent Systems, AAMAS ’14, 2014, pp. 181–188.[PP18] Christos Papadimitriou and Georgios Piliouras,

Game dynamics as the meaning of a game , SIGEcom Ex-changes (2018).[Sob00] Joel Sobel,

Economists’ models of learning , J. Economic Theory (2000), 241–261.[Son16] Yangbo Song, Social learning with endogenous observation , Journal of economic theory (2016), 324–333.[SS00] Lones Smith and Peter Sørensen,

Pathological outcomes of observational learning , Econometrica (2000),no. 2, 371–398.[Str00] Daniel W. Stroock, Probability Theory, an Analytic View , revised ed., Cambridge University Press, 2000.[VMP18] Tushar Vaidya, Carlos Murguia, and Georgios Piliouras,

Learning agents in ﬁnancial markets: Consensusdynamics on volatility , Proceedings of the 17th International Conference on Autonomous Agents and Mul-tiAgent Systems (2018), 2106–2108.S

CHOOL OF M ATHEMATICS , G

EORGIA I NSTITUTE OF T ECHNOLOGY , 686 C

HERRY S TREET , A

TLANTA , GA 30332-0160 USA

E-mail address : [email protected] E NGINEERING S YSTEMS AND D ESIGN , S

INGAPORE U NIVERSITY OF T ECHNOLOGY AND D ESIGN , 8 S

OMAPAH R D ,S INGAPORE

E-mail address ::