Disordered high-dimensional optimal control
aa r X i v : . [ m a t h . O C ] J a n Disordered mean field games
Pierfrancesco Urbani Universit´e Paris-Saclay, CNRS, CEA, Institut de physique th´eorique, 91191, Gif-sur-Yvette, France.
Mean field games are a class of optimization problems that arise from optimal control when appliedto the many body setting. In the noisy case one has a set of controllable stochastic processes and acost function that is a functional of their trajectories. The goal of the optimization is to minimizethis cost over the control variables. In this work we consider the case in which we have N stochasticprocesses, or agents, with the associated control variables, which interact in a disordered way sothat the resulting cost function is random. The goal is to find the average minimal cost for N → ∞ when a typical realization of the quenched randomness is considered. We introduce a simple modeland analyze it using the replica method. We perform a dimensional reduction from the infinitedimensional case to a set of one dimensional stochastic partial differential equations of the Hamilton-Jacobi-Bellman and Fokker-Planck type. The statistical properties of the corresponding stochasticterms must be computed self-consistently. INTRODUCTION
The theory of mean field games (MFG) is a rapidly expanding branch of analysis and partial differential equations[1, 2]. In a nutshell, the setting is as follows: one has a set of N agents whose state evolve in time (and one isinterested in the limit N → ∞ ). These agents are initialized in a given initial condition and aim at reaching a fixedgoal at some final time (which may be infinite). Each agent has some control on their strategy to pursue the finalgoal, which determines its trajectory from the initial to the final time. Along their trajectory, the agents interactbetween themselves and are subjected to some potential cost due to the environment. Therefore, one has a total costwhich depends on the collective position of the other agents during their time evolution. The optimal strategy (orcontrol) is the one minimizing this cost.The main idea introduced in the foundational works by Huang, Malham´e, Caines [3] and Lasry and Lions [4] isthat if the interactions between agents are of mean field type, one can perform a dimensional reduction in which theoptimal strategy is obtained as a solution of an optimal control problem for an effective agent which feels the effectof the other ones only through their density. This basic idea is of course very reminiscent of mean field theory instatistical physics. Mean field games can be used to model several types of complex systems, see [5] for a review ofthe relevant models. In physics, for example, they have been applied to study active matter systems and in particularthe flocking transition as described by the Cucker-Smale model [6]. Many studies have been focusing on the case inwhich the interactions between agents are non-random. However in several cases this assumption is too constraining,and it is reasonable to introduce interactions between agents that are more complex and heterogeneous. The simplestway to investigate this case is to assume that the agents interact in a random manner. While MFG with randomenvironments have already been studied, see [7–9], the case of random interactions between the agents is largely openand, to the best of our knowledge, it is still to be investigated from the statistical physics perspective. In this workwe study a very simple model of this type and we describe how to construct the mean field theory using techniquesof disordered systems in statistical physics [10]. The main result of this analysis is a dimensional reduction in whichone goes from a stochastic optimal control problem for a large number of interacting agents to an effective stochasticoptimal control problem for a representative agent in a random environment whose statistical properties must becomputed self-consistently.The paper is organized as follows. In the first section we introduce the very simple model which we will focuson. In the second section we closely follow [11, 12] and set up the formalism. In the third section we set up thereplica approach and show how the replica symmetric closure of the equations gives rise to an effective stochasticoptimal control problem in a random environment whose statistical properties are self-consistently determined. Wewill conclude with a set of perspectives on future directions and applications. I. A SIMPLE MODEL
We consider a set of one dimensional random walkers obeying the controlled stochastic processes˙ x i = u i ( t ) + ξ i ( t ) (1) What follows can be easily generalized to higher dimensions. where the noise is Gaussian and such that h ξ i ( t ) i ξ = 0 h ξ i ( t ) ξ j ( t ′ ) i ξ = δ ij δ ( t − t ′ ) (2)and we have denoted with brackets the average over the functional distribution of { ξ i ( t ) } . The initial condition forthese walkers may be deterministic, for example x i (0) = 0 (3)but our computation can be extended straightforwardly to the case in which one has a separable probability distri-bution over the initial conditions, meaning that P ( x (0)) = N Y i =1 ˆ P ( x i (0)) . (4)In the following we will consider the deterministic setting for simplicity and in particular Eq. (3), just to fix the ideas.The variables u i ( t ) are time dependent and must be choosen in such a way that they minimize the following costfunction C [ x (0) ,
0] = * Z t f d τ u i ( τ ) + Z t f d τ V ( x ( τ )) + N X i =1 φ ( x i ( t f )) + ξ (5)and we denoted with x ( t ) = { x ( t ) , . . . , x N ( t ) } . The cost function in Eq. (5) is composed by three terms. The firstone is an elastic running cost on the control variables u ( t ) = { u ( t ) , . . . , u N ( t ) } . Since the cost is quadratic in u i these MFG are called quadratic MFG [13]. The second one contains the cost of interactions between agents and thelast one is a separable terminal cost function that each agent tries to minimize. We consider the following interactionpotential V ( x ) = N X i =1 ν ( x i ) + X i 0] (10)and the overline stands over the average over the random variables J ij s. The solution to this problem can beconstructed in the following way. We define the optimal cost-to-go function f ( x, t ) as the optimal cost for the walkersstarting in x at time t and walking up to time t f as f ( x, t ) = min u * Z t f t d τ u i ( τ ) + Z t f t d τ V ( x ( τ )) + N X i =1 φ ( x i ( t f )) + (11)Then f ( x, t ) satisfies the Hamilton-Jacobi-Bellman (HJB) equation [14] ( − ∂ t f ( x, t ) = − |∇ x f ( x, t ) | + ∇ x f ( x, t ) + V ( x ) f ( x, t f ) = P Ni =1 φ ( x i ( t f )) . (12)The HJB equation must be solved backward in time and its boundary condition is provided by the final cost theagents want to minimize. Once the solution is found we get the optimal cost as r = lim N →∞ N f ( x (0) , . (13)Furthermore one can show that the optimal strategy is given by choosing [14] u ( x, t ) = −∇ x f ( x, t ) . (14)The difficulty comes from the fact that we are interested in the high dimensional limit N → ∞ and therefore weneed to solve the HJB equation in that limit. This is the standard problem which is analyzed by the theory of MFG.However here we face the additional problem that we want to compute the typical cost, meaning the average over therandom interactions. We will show in the following how to perform a dimensional reduction that takes into accountthe average over disorder and that allows us to make progresses. II. THE STRUCTURE OF THE COMPUTATION We will closely follow [11, 12] and consider a logarithmic (Cole-Hopf) transformation [15] to define f ( x, t ) = − log ψ ( x, t ) (15)so that we have r = − lim N →∞ N log ψ (0 , 0) (16)where we have assumed for simplicity the initial conditions as in Eq. (3). Given the equation on f one can show that ψ ( x, t ) satisfies the following equation − ∂ t ψ ( x, t ) = (cid:20) − V ( x ) + 12 ∇ x (cid:21) ψ ( x, t ) . (17)with boundary condition ψ ( x, t ) = e − P Ni φ ( x i ) . Defining ρ ( y, t | x, t ′ ) the solution of the following (forward in time)problem ( ∂ t ρ ( y, t | x, t ′ ) = (cid:2) − V ( y ) + ∇ y (cid:3) ρ ( y, t | x, t ′ ) t > t ′ ρ ( x ′ , t | x, t ) = Q Ni =1 δ ( x ′ i − x i ) (18)one can show [11] that ψ (0 , 0) = Z d yρ ( y, t f | , e − P Ni =1 φ ( y i ) . (19)It follows that there is a path integral representation for ψ (0 , ψ (0 , 0) = Z x (0)=0 D x exp [ − S [ x ]] (20)where the action is given by S [ x ] = N X i =1 φ ( x i ( t f )) + 12 N X i =1 Z t f d τ (cid:2) ˙ x i ( τ ) + ν ( x i ( τ )) (cid:3) + X i We start by defining the partition function Z = Z x (0)=0 D x exp [ − S [ x ]] . (23)The replica method builds up on the identity [10]ln Z = lim n → ∂ n Z n . (24)If we assume that n is integer then we can write the following expression for the powers of Z : Z n = Z { x ( a ) (0)=0 } a =1 ,...,n " n Y a =1 D x ( a ) exp " − n X a =1 S [ x ( a ) ] . (25)The main advantage of having integer n is that we can perform explicitly the average over disorder. Once this is doneone needs to find a way to analytically continue n → 0. Averaging over the random couplings J ij we getexp X i 12 ˙ x a ( τ ) + ν ( x a ( τ )) (cid:21) . (28)The measure D Q in Eq. (27) is properly normalized. When N → ∞ we can use the saddle point method to evaluatethe functional integral. The equation for the saddle point is Q ab ( τ, τ ′ ) = h x a ( τ ) x b ( τ ′ ) i Z (29)where we have defined the average as h O i Z = 1 Z [ Q ] Z { x a (0)=0 } a =1 ,...,n " n Y a =1 D x a exp " − ˜ S [ x a ] + J Z t f d τ Z t f d τ ′ X ab Q ab ( τ, τ ′ ) x a ( τ ) x b ( τ ′ ) O (30)At this point we need to find the solution of the saddle point equation (29) and find how such solution behaves whenwe take the analytical continuation down to n → 0. We note that Eq. (29) is very close to the one emerging fromthe replica treatment of the quantum partition function of mean field quantum spin glasses [16]. However in thatcase trajectories are periodic in (imaginary) time (meaning that x (0) = x ( t f )) and t f plays the role of the inversetemperature which is due to the fact that one is trying to compute the trace of the Boltzmann factor using theSuzuki-Trotter formula. In that case one can show that translational invariance on the torus gives rise to the fact that Q a = b ( τ, τ ′ ) does not depend on τ and τ ′ . Here the situation is different since we do not have any periodic conditionon trajectories. Therefore on very general grounds we only have Q aa ( τ, τ ′ ) = Q aa ( τ ′ , τ ) Q ab ( τ, τ ′ ) = Q ba ( τ ′ , τ ) . (31)To move forward, we need to consider an ansatz for the form of the solution Q ab ( τ, τ ′ ) which is amenable to the n → A. The replica symmetric ansatz We assume that the solution of the saddle point equations is of the replica symmetric form Q ab ( τ, τ ′ ) = D ( τ, τ ′ ) δ ab + (1 − δ ab ) F ( τ, τ ′ ) . (32)With this assumption one can rewrite ln Z in terms of functional Gaussian integrals asln Z = ln Z D H exp (cid:20) − Z t f d τ Z t f d τ ′ H ( τ ) F − ( τ, τ ′ ) H ( τ ′ ) (cid:21) ˆ Z n ( H ) ≃ n Z D H exp (cid:20) − Z t f d τ Z t f d τ ′ H ( τ ) F − ( τ, τ ′ ) H ( τ ′ ) (cid:21) ln ˆ Z ( H ) (33)where we assumed that n → n . Furthermore we used that the measure D H is properly normalized, so that Z D H exp (cid:20) − Z t f d τ Z t f d τ ′ H ( τ ) F − ( τ, τ ′ ) H ( τ ′ ) (cid:21) = 1 . (34)Finally F − is the inverse operator of the kernel F andˆ Z ( H ) = Z D h exp (cid:20) − Z t f d τ Z t f d τ ′ h ( τ )( D − F ) − ( τ, τ ′ ) h ( τ ′ ) (cid:21) ˆ ψ [ h, H ] . (35)Again the measure D h is properly normalized analogously to Eq. (34) and we have definedˆ ψ [ h, H ] = Z x (0)=0 D x exp (cid:20) − φ ( x i ( t f )) − Z t f d τ (cid:20) 12 ˙ x ( τ ) + ν ( x ( τ )) + J ( h ( τ ) + H ( τ )) x ( τ ) (cid:21)(cid:21) . (36)Therefore one can consider both h ( t ) and H ( t ) as uncorrelated Gaussian random fields with zero average and twopoint functions given by h H ( t ) i H = 0 h h ( t ) i h = 0 h H ( t ) H ( t ′ ) i H = F ( t, t ′ ) h h ( t ) h ( t ′ ) i h = D ( t, t ′ ) − F ( t, t ′ ) . (37)We can then rewrite the saddle point equations in the following form D ( τ, τ ′ ) = * h J δ δh ( τ ) δh ( τ ′ ) ˆ ψ [ h, H ] i h h ˆ ψ [ h, H ] i h + H F ( τ, τ ′ ) = h R ( τ ) R ( τ ′ ) i H R ( τ ) = h J δδh ( τ ) ˆ ψ [ h, H ] i h h ˆ ψ [ h, H ] i h . (38)We will now give an alternative route to compute both ˆ ψ [ h, H ] and the two point correlation functions that define thekernel D and F , without resorting to the calculation of the full path integral. It is easy to show, just by rewindingthe same procedure we used to get to ψ ( x, t ), that one can writeˆ ψ [ h, H ] = e − c (0 , | h,H ) (39)where c ( x, t | h, H ) satisfies the backward stochastic HJB equation ( − ∂ t c ( x, t | h, H ) = − ( ∂ x c ( x, t | h, H )) + ∂ x c ( x, t | h, H ) + ν ( x ) + J ( h ( t ) + H ( t )) xc ( x, t f | h, H ) = φ ( x ( t f )) . (40)This equation is associated to a single body random stochastic optimal control problem. We have an effective onedimensional walker ˙ x ( t ) = u ( t ) + ξ ( t ), where ξ ( t ) is a noise with the same properties in Eq. (2), that tries to minimize C h,H ( x ′ , t ) = (cid:28) Z t f t d τ u ( τ ) + Z t f t d τ [ ν ( x ( τ )) + J ( h ( τ ) + H ( τ )) x ( τ )] + φ ( x ( t f )) (cid:29) ξ (41)and we need to impose that the stochastic process starts at x ( t ) = x ′ . The optimal strategy to solve the optimalcontrol problem reads u ( x, t ) = − ∂ x c ( x, t | h, H ) (42)which is analogous to Eq. (14). We note that by taking Eq. (40) and deriving it with respect to x one gets an equationfor u that is a forced stochastic Burgers equation with two stochastic terms given by the random fields h and H whosestatistical properties are self consistently determined. When the stochastic process for the effective agent is run withthe optimal strategy provided by Eq. (42) one can compute the associated Fokker-Planck equation ( ∂ t π h,H ( x, t | y, t ′ ) = (cid:2) ∂ x π h,H ( x, t | y, t ′ ) + 2 ∂ x ( π h,H ( x, t | y, t ′ ) ∂ x c ( x, t | h, H )) (cid:3) π h,H ( x, t ′ | y, t ′ ) = δ ( x − y ) (43)This equation must be solved forward in time. We note that π h,H ( x, t | y, t ′ ) is a random measure since it depends onthe full stories of h ( t ) and H ( t ) through c ( x, t | h, H ). Using again the result of [11] one can writeˆ ψ [ h, H ] = Z d yρ h,H ( y, t f | , e − φ ( y ) (44)and ρ h,H ( y, t f | , 0) satisfies the following equation ( ∂ t ρ h,H ( y, t | x, t ′ ) = (cid:2) − ν ( y ) − J ( h ( t ) + H ( t )) y + ∂ y (cid:3) ρ h,H ( y, t | x, t ′ ) ρ h,H ( x ′ , t | x, t ) = δ ( x ′ − x ) (45)In order to compute numerically ρ h,H ( y, t | x, t ′ ) on a given realization of the random fields h and H , one can followdirectly [11, 12] and consider a population of independent one dimensional walkers starting all at x at time t andall subjected to the same realization of h ( t ) and H ( t ). They undergo pure diffusion and are killed (namely droppedfrom the simulation) with a rate that is ν ( x ) + J ( H ( t ) + h ( t )) x . Here we also give an alternative way to compute ρ h,H ( y, t | x, t ′ ). We can introduce ψ h,H ( x, t ) that satisfies ( − ∂ t ψ h,H ( x, t ) = (cid:2) − ν ( x ) − J ( h ( t ) + H ( t )) x + ∂ x (cid:3) ψ h,H ( x, t ) ψ h,H ( x, t f ) = e − φ ( x ) (46)and we have ˆ ψ [ h, H ] = ψ h,H (0 , ρ h,H ( x, t | y, t ′ ) = π h,H ( x, t | y, t ′ ) ψ h,H ( y, t ′ ) ψ h,H ( x, t ) . (47)Furthermore we define J O [ h ] K = 1 N h D O [ h ] e − c (0 , | h,H ) E h N h = D e − c (0 , | h,H ) E h . (48)With these definitions the replica symmetric equations are given by D ( τ, τ ′ ) = h J C ( τ, τ ′ ) K i H F ( τ, τ ′ ) = h J m ( τ ) KJ m ( τ ′ ) K i H (49)where we have defined C ( τ, τ ′ ) = Z ∞−∞ d x Z ∞−∞ d x ′ xx ′ π h,H ( x, τ | x ′ , τ ′ ) π h,H ( x ′ , τ ′ | , m ( τ ) = Z ∞−∞ d x xπ h,H ( x, τ | , 0) (50)and we have assumed τ ≥ τ ′ . This concludes the replica symmetric solution of the model. The interpretation ofthese equations is rather clear. We have reduced the problem of computing the solution of a deterministic anddisordered HJB equation in infinite dimension, to the solution of a one dimensional stochastic HJB equation. In otherwords, thanks to the mean field nature of the interactions between the agents one has an effective agent that feelsthe interaction with the others through the random fields h ( t ) and H ( t ). The statistical properties of these randomfields must be computed self-consistently through Eq. (50). These equations tell that these random fields have bothGaussian statistics and one needs only to control their two point function.Finally we can write the optimal cost as r = J Z t f d τ Z t f d τ ′ (cid:2) D ( τ, τ ′ ) − F ( τ, τ ′ ) (cid:3) − h ln N h i H (51)which concludes the replica symmetric treatment of the model.We finally write a sort of ideal algorithmic strategy to solve the saddle point equations. We start from a guess forthe kernels D ( τ, τ ′ ) and F ( τ, τ ′ ). Given these two kernels one extracts samples of the trajectories of h ( t ) and H ( t )from their Gaussian probability distributions. For each of these samples one computes the solution of the stochasticbackward HJB equation, Eq. (40) and given a solution of this equation one gets the solution for the associatedFokker-Plank equation, Eq. (43). The update of the kernels D ( τ, τ ′ ) and F ( τ, τ ′ ) is done using Eqs. (49) and (50).We note that when the interactions between the agents vanish, J → 0, one gets the usual optimal control problemfor independent agents. C and m are then related to the dynamical two point functions (connected and disconnected)for a typical agent. However, since J → 0, the final cost in Eq. (51) does not depend on these two point functions asit should. IV. CONCLUSIONS AND PERSPECTIVES We introduced a simple model of a mean field game where the running cost depends on a set of random couplingsdescribing heterogeneous interactions between the agents. We presented the analysis of the model using the replicamethod under the replica symmetric ansatz and deduced a set of stochastic HJB and Fokker-Planck equations de-scribing the optimal strategy of an effective agent. The stochastic equations must be solved self-consistently, in away that is close to what happens for dynamical mean field theory of disordered systems [17–19]. The strategy ofthe computation can be extended to more complex cases. For example one can consider a situation in which theinteraction between the agents is multi-body (here we considered only the case in which it is two-body).An interesting perspective of this work is the analysis of the solution of the stochastic PDEs that we have obtainedwhen the cost function contains non-analyticities. For example one could have that the terminal cost function is non-analytic in some points. In this case one could think that the kernels D and F have a scaling behavior to regularizethe solution of the HJB close to the non-analytic points. An interesting case of this kind could be when the phasespace for the trajectories of the stochastic processes contains obstacles and inaccessible regions which could createbottlenecks resulting in crowding effects. For example one may think to generalize the problem to agents movingin higher dimensions and to place forbidden regions along their trajectories. Close to the boundary of these regionsone could have interesting behaviors. Finally we did not attempt a numerical solution of the saddle point equationswhich is left for future work. This goes together with the analysis of the stability of the replica symmetric assumptionthat we used to derive the effective single agent optimal control problem and whose validity must be self consistentlychecked. V. AKNOWLEDGMENTS This work was supported by “Investissements d’Avenir” LabEx PALM (ANR-10-LABX-0039-PALM). The authorwarmly thanks Cesare Nardini for useful discussions. [1] O. Gu´eant, J.-M. Lasry, and P.-L. Lions, in Paris-Princeton lectures on mathematical finance 2010 (Springer, 2011) pp.205–266.[2] P. Cardaliaguet, Notes on mean field games , Tech. Rep. (Technical report, 2010).[3] M. Huang, R. P. Malham´e, and P. E. Caines, Communications in Information & Systems , 221 (2006).[4] J.-M. Lasry and P.-L. Lions, Japanese journal of mathematics , 229 (2007).[5] D. A. Gomes and J. Sa´ude, Dynamic Games and Applications , 110 (2014).[6] F. Cucker and S. Smale, IEEE Transactions on automatic control , 852 (2007).[7] R. Carmona, F. Delarue, D. Lacker, et al. , The Annals of Probability , 3740 (2016).[8] G. Conforti, A. Kazeykina, and Z. Ren, arXiv preprint arXiv:2004.02457 (2020).[9] F. Delarue, ESAIM: Proceedings and Surveys , 1 (2017).[10] M. M´ezard, G. Parisi, and M. A. Virasoro, Spin glass theory and beyond (World Scientific, Singapore, 1987).[11] H. J. Kappen, Physical review letters , 200201 (2005).[12] H. J. Kappen, Journal of statistical mechanics: theory and experiment , P11011 (2005).[13] D. Ullmo, I. Swiecicki, and T. Gobron, Physics Reports , 1 (2019).[14] R. E. Bellman and S. E. Dreyfus, Applied dynamic programming (Princeton university press, 2015).[15] W. H. Fleming, Applied Mathematics and Optimization , 329 (1977).[16] A. Bray and M. Moore, Journal of Physics C: Solid State Physics , L655 (1980).[17] H. Eissfeller and M. Opper, Physical review letters , 2094 (1992).[18] A. Georges, G. Kotliar, W. Krauth, and M. J. Rozenberg, Reviews of Modern Physics68