[PDF] Relative wealth concerns with partial information and heterogeneous priors

Abstract

We establish a Nash equilibrium in a market with N agents with the performance criteria of relative wealth level when the market return is unobservable. Each investor has a random prior belief on the return rate of the risky asset. The investors can be heterogeneous in both the mean and variance of the prior. By a separation result and a martingale argument, we show that the optimal investment strategy under a stochastic return rate model can be characterized by a fully-coupled linear FBSDE. Two sets of deep neural networks are used for the numerical computation to first find each investor's estimate of the mean return rate and then solve the FBSDEs. We establish the existence and uniqueness result for the class of FBSDEs with stochastic coefficients and solve the utility game under partial information using deep neural network function approximators. We demonstrate the efficiency and accuracy by a base-case comparison with the solution from the finite difference scheme in the linear case and apply the algorithm to the general case of nonlinear hidden variable process. Simulations of investment strategies show a herd effect that investors trade more aggressively under relativeness concerns. Statistical properties of the investment strategies and the portfolio performance, including the Sharpe ratios and the Variance Risk ratios (VRRs) are examed. We observe that the agent with the most accurate prior estimate is likely to lead the herd, and the effect of competition on heterogeneous agents varies more with market characteristics compared to the homogeneous case.

Full PDF

aa r X i v : . [ q -f i n . P M ] J u l RELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION ANDHETEROGENEOUS PRIORS

CHAO DENG, XIZHI SU AND CHAO ZHOU

Abstract.

We establish a Nash equilibrium in a market with N agents with the performancecriteria of relative wealth level when the market return is unobservable. Each investor has arandom prior belief on the return rate of the risky asset. The investors can be heterogeneousin both the mean and variance of the prior. By a separation result and a martingale argument,we show that the optimal investment strategy under a stochastic return rate model can be char-acterized by a fully-coupled linear FBSDE. Two sets of deep neural networks are used for thenumerical computation to ﬁrst ﬁnd each investor’s estimate of the mean return rate and thensolve the FBSDEs. We establish the existence and uniqueness result for the class of FBSDEswith stochastic coeﬃcients and solve the utility game under partial information using deep neu-ral network function approximators. We demonstrate the eﬃciency and accuracy by a base-casecomparison with the solution from the ﬁnite diﬀerence scheme in the linear case and apply thealgorithm to the general case of nonlinear hidden variable process. Simulations of investmentstrategies show a herd eﬀect that investors trade more aggressively under relativeness concerns.Statistical properties of the investment strategies and the portfolio performance, including theSharpe ratios and the Variance Risk ratios (VRRs) are examed. We observe that the agentwith the most accurate prior estimate is likely to lead the herd, and the eﬀect of competition onheterogeneous agents varies more with market characteristics compared to the homogeneous case. Keywords : Portfolio allocation; Relative wealth concerns; Partial information; FBSDE; Deepneural networks.

Mathematics Subject Classiﬁcation (2010):

JEL Classiﬁcation:

G11, C73

Contents

1. Introduction 22. Market model and the control problem 73. Nash equilibrium by FBSDE 94. Existence and Uniqueness of the FBSDE solution 145. Deep learning algorithms 196. Numerical results and model implications 227. Conclusion and further remarks 32Acknowledgements 32Appendices 32References 37

Date : July 24, 2020. Introduction

This paper contributes to the theory of both portfolio optimization under partial information andthe relative wealth criteria and of forward backward stochastic diﬀerential equations (FBSDEs forshort). For the former, we establish a system of stochastic equations with the solution correspondingto the value function and the optimal control. The information is updated by a general ﬁlter, whichcould be nonlinear. We show the uniqueness of the solution to the fully coupled multi-dimensionalFBSDE under certain assumption on the boundedness of the generator coeﬃcients. We are the ﬁrstto use the deep learning method to solve for the portfolio allocation strategy for a utility game,and explicitly exam the strategies under various of market conditions, investor’s risk preferencesand the informational hetegegeneity. In combination with martingale approach, the deep learningalgorithm explores the structural features of the controlled process. On the theoretical aspect,we are the ﬁrst to use the variational FBSDE in the multi-dimensional case to solve for the N-equation system that characterizes the Nash equilibrium for the utility game. This transformationmotivates future applications to analyzes of coupled systems such as the mean ﬁeld game under ageneral non-Markovian setting. Simulation of the optimal portfoliio and the wealth process revealsnovel insights on the both the eﬀect of information heterogeneity under the relativeness utility. Theinvestors interact through competiton, and the investor with the most accurate information is likelyto be the leader.In practice, a fund can use the market average as a benchmark and measure its performance byhow much it overperforms. This is the case of an investment with competition . We also refer toinformation incompeletenss as partial information . Our market consists of a risk-free bond and d stocks, S = ( S , ..., S d ), for some integer d < ∞ . For simplicity, we assume the risk-free interest r = 0. The stock prices are continuous processes adapted to the ﬁltration F t on a ﬁltered probabilityspace (Ω , ( F t ) t ≤ T , P ). Each stock S i has a return rate that depends on the stock fundamentals,modeled by a hidden variable A t ∈ C ([0 , T ] , R l ). W and B are independent standard Brownianmotions adapted to F , with W ∈ C ([0 , T ] , R d ) and B ∈ C ([0 , T ] , R l ) for an integer l < ∞ . Thestock processes and hidden variable processes have the following dynamics: dS it S it = h i ( A t ) dt + d X j =1 σ ijw dW jt + l X j =1 σ ijh dB jt (observed) , (1.1) dA t = µ ( A t ) dt + m ( A t ) dB t (hidden) , (1.2)where the initial condition to the stochastic diﬀerential equation (SDE) (1.2), denoted by A , isalso unobserved and independent of the Brownian motions W and B . The coeﬃcients h ( a ), µ ( a )and m ( a ) are C and Lipschitz continuous. These conditions ensure the existence and uniquenessof strong solution to the above SDEs. The dependence of stock returns on the hidden variable A t is through the function h ( a ). We further assume the diﬀusion coeﬃcient is positive deﬁnite anduniform elliptic.The observable ﬁltration is the one generated by the stock prices. We denote by F St for the σ -algebra generated by ( S u ) u ≤ t . Clearly, F S ⊂ F . In the following discussion, we abbreviate h it for h i ( A t ), and write ˆ h it = E (cid:2) h i ( A t ) |F St (cid:3) . Each investor is aware of the model of the hidden variableand the stock prices. Her initial belief on the distribution of return is a normally distributed randomvariable, which we denote by ˆ h i ( A ) ∼ N ( m i , v i ) if investor i ’s initial belief is normal with mean m i and variance v i . ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 3

To write stock prices under partial information in a complete market form, deﬁne the innovationprocess as ν it = Z t dS iu S iu − ˆ h iu du ! = σ i ζ it , where ζ it is an F St -adapted standard Brownian motion.With the total variance, σ = (cid:0) σ w σ ⊺ w + σ h σ ⊺ h (cid:1) / . The stock dynamic (1.1) can be written as dS t S t = ˆ h t dt + σdζ t . The objective of an individual investor or a fund manager is to ﬁnd a portfolio allocation strategyover available assets such that it maximizes the expected utility, which depends on the wealthamount that exceeds the average of all investors at terminal time T . The investors are of CARAtype, mathematically, the utility function is(1.3) U i ( X iT , X T ) = − e − δi (cid:0) X iT − θ i X T (cid:1) , where X T = 1 N N X k =1 X kT . The parameters δ i > θ i ∈ [0 ,

1] represent the i -th agent’s absolute risk tolerance and competi-tion weight. A high value of δ implies a high risk tolerance which, in general, induces an aggressiveinvestment strategy. The case θ = 0 corresponds to an investor with no relative wealth concern.We adopt the convention for CARA utility to denote the dollar amount of investment by π t ∈ R d .We aim to identify a Nash equilibrium π ∗ = ( π i, ∗ t , ...π N, ∗ t ) t ∈ [0 ,T ] . The strategy is optimal in thesense that no one is better-oﬀ by unilaterally deviating from it. When the return process is linearGaussian, we can derive the dynamics of estimated return rate, then use the PDE approach to solvethe investor’s problem. The value function depends on the Markovian state variables that consist ofthe investor’s wealth and the estimated return rate. We then derive an HJB equation for the valuefunction. For exponential utility, we can reduce the dimension of the PDE. The resulting PDE onlydepends on the spacial variable ˆ h t . In other words, increasing the number of agents in the game doesnot increase the dimension of the problem. We obtain an analytical solution of the value function,hence the equilibrium strategy. The strategies of all investors can be solved through a linear systemwhose coeﬃcients depend on risk preferences, observable market parameters, estimates of marketreturns, and the investment horizon.BSDE is an essential tool for the problem of a single investor under partial information. In thecase of a nonlinear hidden variable process, the mean return cannot be written as a deterministicfunction of any ﬁnite-dimensional state. Therefore, the control problem is non-Markovian. Usinga non-standard martingale representation theorem, we can write an F S - adapted martingale as astochastic integral against the innovation process. We derived a non-Markovian one-dimensionalBSDE similar to the one in [24]. Combining the one-dimensional BSDE derived for the single-agentproblem, we obtain a multi-dimensional fully coupled FBSDE, by which the terminal condition forthe unidimensional BSDE is endogeneously determined.The Nash equilibrium is unique under certain assumptions on the market parameters and riskpreferences. The uniqueness of equilibrium follows from the uniqueness of the FBSDE solution.Since the return parameter in the FBSDE comes from estimation, it is bounded if the investor hasprior knowledge about the range of the true return process. Under this assumption, the generator CHAO DENG, XIZHI SU AND CHAO ZHOU f is linear and Lipschitz, so are the drift and volatility of the forward wealth process, which wedenoted by g ( t, · ) and σ ( t, · ), respectively. We show the uniqueness of the solution for fully coupledFBSDE in this case. Furthermore, since σ ( t, · ) can be degenerate, many of the existing results forthe well-posedness of FBSDEs do not apply. However, observe an important feature of this FBSDE,that the forward equation does not depend on the Y component of the backward equation. We canapply the main theorem in [52] to establish the uniqueness and existence of the solution.We numerically solve this multi-dimensional FBSDE by a deep learning method. The numericalscheme is conducted in two stages. In Stage I, we estimate the stock returns using an L projection.The recurrent neural network (RNN) is used in order to exploit the time series feature of the inputto facilitate the sequential learning. The RNN ﬁrst produces the hidden state, which will betransformed by a linear map to the ﬁnal output corresponding to the estimation at each discretetime step. The RNN as a function approximator takes the stock paths from time 0 up to time T , as well as the investor’s initial belief as the input. However, the estimation at each time stepdepends only on the past stock prices and the investor’s initial belief. The estimated return processappears in the drift and diﬀusion terms of the forward equation and the generator of the backwardequation. In Stage II, we solve the FBSDE, again using neural networks as function approximators.The algorithm in this step is similar to [18]. The FBSDE coupling requires no extra care in designingthe neural network structure. However, the loss function must include a terminal condition that is afunction of the wealth process, which is also computed from the NN parameters. We denote by theterminal loss for the diﬀerence between the parametrized function and the forward simulation atthe terminal time. Experiments of the deep learning scheme on diﬀerent sets of model parametersshow the eﬃciency and robustness of our method. The deep learning method is ﬂexible in that itallows the nonlinear type of ﬁlters. Moreover, deep learning can be easily adapted to multi-assetcases where most numerical scheme fails due to the explosion of the number of grids as the problemdimension grows, also known as the curse of dimensionality.Investment strategies are compared through time series statistics. With our choice of marketand risk parameters, in the linear Gaussian case, the standard deviations of the absolute value ofthe investment strategy are larger when with full information. Competition increases the mean andthe standard deviation. We also compute the coeﬃcient of variation (CV) as the ratio between theStd and the Mean. On average of three agents, competition does not change the CV signiﬁcantly,which means the increase of the volatility of the investment strategy mainly attributes to theincrease of the value itself but not the variation cross time. The Sharpe ratio and VRR furtherillustrates the performance of the optimal strategies for the utility game. In case of nonlinear ﬁlters,we found that the CV is signiﬁcantly smaller under partial information for the three sets marketparameters, indicating the strategies is less volatile when the investors estimate returns from themarket. Competition may reduce the average CV. The numerical results are presented in section6. Comparing the case with linear and nonlinear ﬁlter, we observe that the agent with the mostaccurate prior estimate is likely to lead the herd, and the eﬀect of competition on heterogeneousagents varies more with market characteristics. More generally, agent heterogeneity exploits marketproperties. This ﬁnding provides extra reasons why we introduce the information heterogeneity withagents’ interactions into the portfolio model.1.1. Literature review.

Competition among fund managers stemmed from various incentives,from career advances motives to purpose of seeking clients. The empirical works [8], [2] and [9]have documented the phenomenoen. However, competitions among a group of more than twomanagers have hardly be considered. Studies have also shown the importance of relative concerns

ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 5 in ﬁnancial economics, including [1], where the utility is one of the relative consumption levels.With a time-separable utility function, [21] shows that the Joneses behavior yields portfolio biaswhen the agents face non-diversiﬁable risks. [16] examed empirical implications of the relativewealth concerns, providing an explanation to the ﬁnanical bubble. The work [5] analyzed theeﬀect of social interactions when a market derivative is traded to share the risk among investorswith relative performance concerns. The above works yield the herd eﬀect by risk sharing motivesor relative utilities. A recent work that includes the private information into the model is [43].It analyzed the informative trading of managers and the implications on the eﬃciency of assetprices using an one-period mean-variance criteria. The focus is on the price informativeness withdiﬀerent information structures. Our focus is on the dynamic investment strategy with informationheterogeneous agents and therefore diﬀerent from it.The optimal investment problem was initiated by Merton in 1970s [38] as part of the asset pricingtheory. A mass body of literatures have been developed since then. Classicial works include [42],[27], and [12], among others. Duality approach was developed in [29] and [45] and has become auseful tool to solve incomplete market model, with [39] a recent application with partial information.Problems with portfolio constraints was in [13] and [51], among others. For other models withmarket frictions, the transaction cost case was considered in [35], [15], and [48]. [14] ﬁrst studiedportfolio problem with both transaction costs and position limits.The paper [30] used the PDE approach to solve an N-agent game. They established a Nashequilibrium under which all agents maximize their utilities with relative performance concerns fora varity of standard time-separable utility functions. The HJB equation can be derived whenthe agents are of CARA or the CRRA type. However, the PDE method has its limitations. It isrestricted to the case of the deterministic mean return process and may fail in our market model withpartial information and information ﬁltering. The idiosyncratic noise is speciﬁc for the individualstocks that is only available for a particular agent. In our model, the same set of stocks driven bythe common noise are available to all agents. Our model can be modiﬁed without much diﬃcultyto accomodate the case where the stocks are driven by the common noise, but agent i only investsin stocks i with return b i and stock price S i .Initial works on consumption-portfolio choice and asset pricing under partial information include[20]. The separation principle holds in the linear Gaussian setting - the investor’s optimal decision isequivalent to that of ﬁrst estimate and then optimize. The HJB equation for the optimization stepinvolves the estimated return as a new state variable. An early work [17] derived the equilibriumasset price, while symmetric information was assumed. [4] built an asset pricing model with investorsof heterogeneous beliefs, although the agents do not interact through the utility game as in ourmodel.Based on a martingale representation theorem, [28] reduced the partial information portfoliooptimization problem into a complete market problem. With linear Gaussian ﬁltering, partialinformation was studied in [7] where the loss of utility due to incomplete information was quantiﬁed.Partial information is also seen in [31] for optimization of pair trade strategies, and [49] for recursiveoptimization. [41] used martingales and duality theory to the case of stochastic volatility, althoughwithout explicitly solving the optimal strategy.For information models, the nonlinear ﬁlter was less considered than the linear one. Wonhamﬁlter is used to estimate the states of a Markov chain. [44] and [47] studied partial informationwith regime switching, with [44] using the PDE approach and [47] using the martingale approach.The latter is in fact more general since it allows stochastic interest rates and multiple assets. CHAO DENG, XIZHI SU AND CHAO ZHOU [36] considered the exponential utiity under partial information in a general semi-martingaleframework where the available information is part of the ﬁltration generating the stock prices. Itshows the equivalence to a new optimization problem formulated by the observable processes. Byreducing to complete market case, [6] solved explicitly optimal strategies for various utility functionsunder partial information. In addition to the Merton proportion, the strategy includes a hedgingdemand for the volatility of the return process. [39] used results from ﬁltering, duality, and theBSDE theory to solve the investment problem, which includes the case of an unbounded meanreturn. It argues that the BSDE solution is the unique limit of solutions to a sequence of truncatedproblems with unique solutions obtained by a martingale representation theorem. In our model,extra diﬃculty arises due to the FBSDE coupling, and a uniqueness result for unbounded returnsin the general setting could not be obtained similarly.The work [40] established a correspondence between the path-dependent HJB equations andnon-Markovian control problems. The technique of using BSDE to solve the portfolio maximizationproblem was ﬁrst introduced by [46] and further developed by [24]. Both works consider the portfolioproblem by indiﬀerence pricing for a contingent claim. More recently, [37] applied BSDE approachto the robust utility maximization, and second-order BSDEs to the robust problem under volatilityuncertainty.Compared to BSDEs, the theory of coupled FBSDE was developed more recently. Antonelli [3]ﬁrst obtained the result on the solvability of an FBSDE over a “small” time duration. Later, theFour Step Scheme in [33] and the Method of Continuation in [25] and [50] were used to establish thewell-posedness result on an arbitrary time duration. The main result used in this paper is from [52],which covers the cases of the fully-coupled equation with a degenerate diﬀusion coeﬃcient. Classicalmethods for solving the BSDEs includes the Monte Carolo method. More recently, [18] introduceda numerical method using the deep neural networks (NNs). The neural networks approximate theconditional expectations after time-discretization. [22] proved the theoretical convergence of thedeep learning (DL) algorithm for the coupled FBSDEs using properties of the discretized equa-tions. A backward scheme that also treats the deep neural networks as function approximatorswas developed in [26], where theoretical convergence of the numerical scheme was also shown. Adeep NN based method to solve the mean ﬁeld game (MFG) is in [10] and [11], respectively forthe ergodic and the ﬁnite horizon case. A systematic study of the performance of the DL methodfor solving (F)BSDEs with varying hyperparameters and network structures is yet missing, muchless is the convergence and the stability the optimization with stochastic gradient descent (SGD).In our paper, the solution to the FBSDE is unique, hence we regard the loss value as an indicatorof the training accuracy, and the convergence is justiﬁed by the theoretical features of the FBSDEand the empirical success of the DL algorithm.Using a martingale approach, [24] derived BSDEs for investors with portfolio constraints whenreturn is stochastic and uniformly bounded. Due to the boundedness of coeﬃcients, the existenceand uniqueness result is classical. [19] studied an investment problem with relative performanceconcerns and the argument relies on [24]. The paper identiﬁed the market average as the payoﬀ ofa contingent claim and solves it in the utility indiﬀerence pricing framework. The generator wasquadratic due to the portfolio constraint. For deterministic mean returns, they were able to derivean analytical solution for the N-agent equilibrium and show the solution exists by veriﬁcation. Theuniqueness result for stochastic coeﬃcients is not available since the FBSDE was fully coupled withquadratic generators. In our model, the equation is simplier with linear coeﬃcients.The following table summarizes the works and correponding methods mentioned above that arethe most closely related to this paper.

ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 7

Table 1.

Comparison of works

Paper Hu, Imekeller, Muller Espinosa, Touzi Zariphopoulou, Lacker This paperUtility exponential, power, log general utilities exponential, power, log exponentialGame No Yes Yes YesClass of return process Stochastic Deterministic Deterministic StochasticMain method BSDE FBSDE PDE FBSDEAnalytical solution No Yes Yes NoAdapt to portfolio constraints Yes Yes No YesNumerical solution No Yes Yes YesLearning No No No Yes Market model and the control problem

Preliminaries on market model.

For simplicity and without loss of generality, we assumethat risk-free interest rate r = 0. The price dynamics for stock i is dS it S it = h i ( A t ) dt + d X j =1 σ ijw dW jt + l X j =1 σ ijh dB jt , (2.1)for i ∈ { , ..., d } . σ w ∈ R d × d and σ h ∈ R d × l are constants. The Browian motions W and B are R d -valued and R l -valued, respectively and independent of each other. The functions h i are R -valuedwhose forms are to be speciﬁed.Following the partial information model in [39], we consider two classes of ﬁlters. Recall that therelation between asset return rate and the hidden variable is speciﬁed by the function h ( a ). Thefollowing examples considers diﬀerent dimensionality of the hidden variable A t . Example 1 (Multi-dimensional case). Suppose A t is R d L -valued. h ( A t ) is linear in A t . Suppose A t follows an Ornstein-Uhlenbeck (OU) process that reverts to a constant ¯ µ ∈ R d L :(2.2) dA t = − λ ( A t − ¯ µ ) dt + σ a dB t , where ¯ µ ∈ R d L , σ a ∈ R d L × l and B is the l -dimensional standard Brownian motion. The function h is linear with h ( a ) = ( w L ) T a + c L for w L in R d L and c L ∈ R . Based on the stock prices, investorsupdate beliefs according to a Kalman ﬁlter (KF) .Our framework of FBSDE is general enough to include the class of nonlinear ﬁlters, as in thefollowing examples: Example 2 (One-dimensional case). dA t = − λ ( A t − ¯ µ ) dt + σ a p ( A t − a l )( a u − A t ) dB t . (2.3) Remark . In the above dynamic, A t is essentially bounded between a l and a u . Example 3 (One-dimensional case). The hidden variable A t follows the Cox-Ingersoll-Ross(CIR) process that is mean reverting to ¯ µ :(2.4) dA t = − λ ( A t − ¯ µ ) dt + σ a p A t dB t . For the following discussion, |·| denotes the Euclidean norm in R m , m ∈ N . For p > L p denotesthe set of F T measurable random variables F such that E [ | F | p ] < ∞ . For k ∈ N , H k ( R d ) deonotes Refer to [32] for a general theory of Kalman-Bucy ﬁlter.

CHAO DENG, XIZHI SU AND CHAO ZHOU the set of all R d -valued stochastic processes φ that are predictable with respect to F and suchthat E [ R T | φ | k ] < ∞ . H ∞ ( R d ) is the set of all F -predictable R d -valued processes that are λ N P -a.e. bounded on [0 , T ] × Ω, where λ is Lebesgue measure on R . Denote E ( X ) for the exponentialmartingale of X .Recall that for 1 ≤ j ≤ d , the process π jt is the dollar amount invested in stock j at time t . Thenumber of shares to hold for stock j is therefore π jt S jt . Assumption . (Uniformly elliptic) The total variance σσ ⊺ is bounded, i.e,(2.5) 1 ǫ ≤ σσ ⊺ ≤ ǫ < ∞ . for some positive deﬁnite matrix ǫ . Condition . The process h t , t ∈ [0 , T ] satisﬁes the Novikov condition:(2.6) E h e ǫh R T || h t || dt i < ∞ . for some small constant ǫ h . Remark . By h t = h ( A t ), the Novikov condition is satisﬁed if h ( a ) is a square-root or powerfunction and A t is essentially bounded. By Jensen’s inequality, E h e R T ǫh k ˆ h t k dt i ≤ T Z T E e T ǫh k h t k dt. Hence, given the right-hand side is ﬁnite, all moments of the estimated return process are bounded.2.2.

Objective function under relative wealth concerns.

We consider the market with N investors, each with the constant absolute risk aversion (CARA), or the exponential risk preference.Investors are concerned about their performances valued by the wealth relative to the average ofmarket investors at a future time T . The market average is modeled by a random payoﬀ at theterminal time in the utility function. Denote by X t the average wealth of investors at time t , and X i = x i ∈ R the i -th investor’s initial wealth. Since we refer the problem as an N-agent utilitygame, we use the word agent and investors interchangeably in this paper.The relative utility function for the CARA investor i is U i ( X iT , X T ) = − e − δi (cid:0) X iT − θ i X T (cid:1) , where X T = 1 N N X k =1 X kT . The objective function for an arbitrary investor with the risk parameter ( δ, θ ) and wealth at time t , X t is(2.7) J ( t, π , ..., π N ) = E h − e − δ ( X T − θX T ) i . where δ > personal risk tolerance . θ ∈ [0 ,

1] is the investor’s competition weight parameter.In the following discussion, the superscript i indicates variables for investor i .Write ˜ X it = N P j = i X jt . By simple algebra, the objective can be written as J ( t, π , ..., π N ) = E (cid:20) − e − δi (cid:16) (1 − θiN ) X iT − θ i ˜ X iT (cid:17) (cid:21) (2.8) = E (cid:20) − e − δi (cid:16) − θiN (cid:17)(cid:16) X iT − NθiN − θi ˜ X iT (cid:17) (cid:21) ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 9

The value function for agent i with initial wealth x i is(2.9) V (0 , x i ) = sup π i ∈A i E (cid:20) − e − δi (cid:16) − θiN (cid:17)(cid:16) X iT − NθiN − θi ˜ X iT (cid:17) (cid:21) . Admissible set . The set of admissible strategies A i for agent i is the set of all predictable processes π = ( π t ) ≤ t ≤ T such that (1) E hR T | π t σ | i < ∞ . (2) The set n e ± X i,πτ : τ is a stopping time with value in [0 , T ] o is uniformly bounded in L q ( P ) for all q > Deﬁnition 1 (Nash equilibrium) . A vector ( π , ∗ , ..., π N, ∗ ) of admissible strategies is a Nash equi-librium if, for all π i ∈ A i , i ∈ { , ..., N } , J i ( π , ∗ , ..., π i, ∗ , ..., π N, ∗ ) ≥ J i ( π , ∗ , ..., π i , ..., π N, ∗ ) . Nash equilibrium by FBSDE

For presentation simplicity, we set d = 1 for the rest of the discussion. The analysis would adaptto the multiple common stocks case with purely notational change. See remark 3 for more details.3.1. Formal derivation of FBSDE.

Recall that when the terminal condition and b t := h t σ arebounded, [24] has shown that the investor’s value function and optimal strategy (without constraint)correspond to the solution ( Y t , Z t ) of the following BSDE:(3.1) Y t = F − Z Tt Z s dW s − Z Tt f ( s, Z s ) ds, t ∈ [0 , T ] , with the generator(3.2) f ( · , z ) = zb t + δ | b t | . and some bounded random variable F that is the terminal condition of the BSDE.The optimal strategy is given by π ∗ t = p ∗ t σ , where p ∗ t is linear in the Z t component in the BSDE solution,(3.3) p ∗ t = Z t + δb t , t ∈ [0 , T ] . The value function at the initial time is given by(3.4) V ( x ) = − e − δ ( x − Y ) . We now formulate the optimization problem under the relative performance criteria. The randomvariable F represents the benchmark market average at terminal time. The constant α i capturesthe risk preference and competition concern of the i -th investor. More explicitly, take F = NθN − θ ˜ X T ,and δ = δ i (cid:0) − θ i N (cid:1) in (3.1) and (3.2) , we obtain the objective function for the i -th investor. Suppose all agents adopt the strategies corresponding to p ∗ , we can write explicitly the wealthprocess of agent i in terms of Z it and estimated market parameters, X it = x i + Z t σ − p i, ∗ u dS u S u (3.5) = x i + Z t (cid:16) Z iu + δb u (cid:17) ˆ h u σ u du + dζ u ! . Since X i may be unbounded, the terminal condition F does not satisfy the condition in [24].Therefore, the correspondence between our optimization problem and the above BSDE is not im-mediate. We will show that under certain integrable conditions on the return rate, the solution toa single agent’s investment problem is still characterized by the BSDE (3.1).To write the system of BSDEs corresponding to each investor as a multi-dimensional equation,we introduce the vector notation: X = (cid:16) X i (cid:17) i ∈{ ,...,N } (3.6)where X i is for an arbitrary random variable that corresponds to the investor i .We introduce a matrix notation to compute the performance benchmark of the wealth amount. F = A X T , (3.7)where A is an N-by-N matrix that does both averaging and taking into account the relative concernof a particular agent. To be more speciﬁc,(3.8) A =  θ N − θ . . . θ N − θ θ N − θ . . . θ N − θ ... ... . . . ... θ N N − θ N θ N N − θ N . . .  . From now on, let f operates on z componentwise, and let ◦ denote the componentwise multipli-cation. If all agents solve the optimization problem, we can write f ( · , z ) = z ◦ b t + δ | b t | . (3.9)Notice that if b t is bounded, then f is Lipschitz.The above derivation suggests that any Nash equilibrium strategy for N agents with relativeperformance corresponds to the following multi-dimensional FBSDE: X t = x + Z t (cid:16) Z s ◦ b s + δ | b s | (cid:17) ds + Z t ( Z s + δb s ) ◦ d ζ s , (3.10) Y t = A X T − Z Tt Z s ◦ d ζ s − Z Tt f ( s, Z s ) ds, (3.11)where Z s ◦ d ζ s = (cid:0) Z is dζ is (cid:1) ⊺ i ∈{ ,...,N } . Remark . As was previously mentioned, in the present setting, agents invest in the same set ofstocks. Hence all components of b and ζ are identical. In case agent i invests in stock i while allthe stocks are still driven by the common noise, the components of b and the Brownian motion ζ will depend on the agent i ’s individual stock. ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 11

Theorem 1.

Suppose the return rate satisﬁes the Novikov condition. Let ˆ h it = ˆ h ( A t ) be theestimated mean return of the i -th agent. If there exists a solution ( Y t , Z t ) to the FBSDE (3.10- 3.11), then there is a Nash equilibrium strategy π ∗ = ( π , ∗ , ..., π N, ∗ ) given by (3.12) π ∗ = σ − (cid:18) Z t + δb t (cid:19) . In order to construct the investment strategy by the martingale approach under the observableﬁltration, we need the following lemma.

Lemma . Every martingale in F ζ is a martingale in F S . Proof.

Let M be a F ζ -martingale. By a martingale representation theorem in [6], the martingale M St = E (cid:2) M T |F St (cid:3) has a unique representation:(3.13) M St = E [ M T ] + Z t M Su σ Mu dζ u . Since M St is also a stochastic integral against ζ t , it is a martingale under F ζt . Alternatively, we canwrite it as(3.14) M St = E h M T |F ζt i = M t . Since M S is a F S -martingale, so is M . (cid:3) We next prove that the solution to the FBSDE we constructed is indeed a Nash equilibrium.The above lemma is used to show the martingale we will later construct is in a larger ﬁltration F S than F ζ . From now on, we omit the super(sub)-script i when there is no ambiguity. Proof.

We ﬁrst show the solution to the unidimensional BSDE is the value function for a singleagent’s investment problem with terminal payoﬀ F = NθN − θ ˜ X T .For investor i , following the proof in [24], ﬁrst deﬁne a strategy p ∈ A by(3.15) R ( p ) t := − e − δ ( X ( p ) t − Y t ) , t ∈ [0 , T ] , where Y t is a component of the solution to the unidimensional BSDE(3.16) Y t = F − Z Tt Z s dζ s − Z Tt f ( s, Z s ) ds, t ∈ [0 , T ] , Deﬁne M ( p ) t := − e − δ ( x − Y ) E − δ Z t ( p s − Z s ) dζ s ! . which is a local martingale in F ζ .For C ( p ) t := e − δ R t ( b s p s − f ( s,Z s ) − δ | p s − Z s | ) ds , t ∈ [0 , T ] , we have R ( p ) t = M ( p ) t C ( p ) t . For R ( p ) t to be a local martingale for some p ∗ and a supermartingale for all p ∈ A , we need C ( p ) t decreasing and C ( p ∗ ) t = 1, µ L N P -a.s. for µ L the Lebesgue measure on R and some p ∗ . The exponent of C ( p ) t is a quadratic function in p . Optimization yields,(3.17) f ( · , z ) = zb t + δ | b t | , and(3.18) p ∗ t = Z t + δb t , t ∈ [0 , T ] . Hence C ( p ∗ ) t = 1 and R ( p ∗ ) t = M ( p ∗ ) t is a local martingale in F ζ .Since b t satisﬁes the Novikov condition, M ( p ∗ ) t and hence R ( p ∗ ) t are true martingales. The Novikovcondition of b t also implies that R ( p ∗ ) t is uniformly integrable. Hence, R ( p ∗ ) τ is uniformly boundedin L q ( P ), which implies that the strategy p ∗ t is admissible.It remains to show that R ( p ) is a supermartingle for all p ∈ A under F S . Since the process M ( p ) t is a local martingale in F ζ , there exists a sequence of stopping times ( τ n ) n ∈ N such thatlim n →∞ τ n = T, P -a.s. and ( M t ∧ τ n ) t ∈ [0 ,T ] is a positive F ζ -martingale for each n ∈ N . By Lemma 1,they are F S -martingales. The process ˜ C ( p ) is decreasing. Thus R ( p ) t ∧ τ n is a F S -supermartingale foreach n . That is, for s ≤ t , E h R ( p ) t ∧ τ n |F Ss i ≤ R ( p ) s ∧ τ n . Equivalently, for any set U ∈ F S , we have E h R ( p ) t ∧ τ n U i ≤ E h R ( p ) s ∧ τ n U i . Notice that the admissible condition implies that e X τ is uniformly bounded in L q ( P ). By theforward equation for X , we have e R τ ( Z s b s ds + Z s dζ s ) = e X τ − x − R τ δ | b s | ds − R τ δb s dζ s = e X τ · e − x − R τ δ | b s | ds − R τ δb s dζ s is uniformly bounded in L q ( P ).Furthermore, the solution for the backward equation Y has (cid:12)(cid:12)(cid:12) e Y τ (cid:12)(cid:12)(cid:12) q = k e Y + R τ Z s dζ s + R τ ( Z s b s + δ | b s | ) ds k q ≤ e qC Y · e qC R τ ( Z s dζ s + Z s b s ds ) · e qC R τ δ | b s | ds < ∞ . Therefore, e Y τ is uniformly bounded in L q ( P ).The process R ( p ) τ is a constant multiple of the product of e − δ X ( p ) τ and e δ Y τ . Combining thedeﬁnition of admissiblility and the above argument, we have(3.19) E [ e qR ( p ) τ ] ≤ C E [ e − r δ X ( p ) τ ] E [ e r ′ δ Y τ ] < ∞ where r = q and r + r ′ = q .we have n R ( p ) t ∧ t n o n and n R ( p ) s ∧ t n o n are uniformly bounded in L q ( P ) over n for all q >

0. Letting n → ∞ , we obtain E h R ( p ) t U i ≤ E h R ( p ) s U i , which implies the supermartingale property of R ( p ) as claimed. (cid:3) ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 13

Explicit solution in case of linear ﬁlters.

In this section, we derive a PDE solution forlinear Gaussian ﬁlters. Consider a model with the following stock dynamics, which corresponds to d = 1, h ( a ) = a , σ w = σ S p − ρ and σ h = σ S ρ , µ ( a ) = − λ ( a − ¯ µ ), m ( a ) = σ µ in (1.1) - (1.2), dS t S t = A t dt + σ S (cid:16)p − ρ dW t + ρdB t (cid:17) , (3.20) dA t = − λ ( A t − ¯ µ ) dt + σ µ dB t , t ∈ [0 , T ] . (3.21)In the following discussion, we adopt the notation ˆ h t = ˆ h ( A t ) = E [ h ( A t ) (cid:12)(cid:12) F St ∨ ˆ h ( A )]. From ﬁlteringtheory, the innovation process(3.22) ν t = Z t (cid:18) dS u S u − ˆ h u du (cid:19) is a scaled Brownian motion under F St , i.e, ζ t = σ − ν ( t ) is a standard Brownian motion under F St .With the estimated return rate and the Brownian motion in a smaller ﬁltration, the asset pricecan be regarded as one in the complete market. The price evolves as dS t S t = ˆ A t dt + σ S dζ t , where the stochastic term is an F St adapted Brownian motion ζ t .We use the notation Σ( t ) to indicate Σ is a deterministic function of t . By the ﬁltering theory,(3.23) d ˆ A t = − λ ( ˆ A t − ¯ µ ) dt + ˆΣ( t ) + σ S σ µ ρσ S ! dζ t , with ˆ A ∼ N ( η , ˆΣ(0)). In addition, the conditional variance ˆΣ( t ) = E h ( A t − ˆ A t ) (cid:12)(cid:12) F St ∨ ˆ A i satisﬁes a Riccati ODE, with analytical solution(3.24) ˆΣ( t ) = ˆΣ( t ; Σ ) = √ kσ S k e √ kσS t + k k e √ kσS t − k − (cid:18) λ + σ µ ρσ S (cid:19) σ S . The constants k , k and k are k = λ σ S + 2 σ S σ µ λρ + σ µ ,k = √ kσ S + ( λσ S + σ S σ µ ρ ) + Σ ,k = −√ kσ S + ( λσ S + σ S σ µ ρ ) + Σ . Let ( α it ) t ∈ [0 ,T ] , i ∈ { , ..., N } be an investment strategy among N investors. For the investor i ,denote the competitor’s average by α − i , where q − it = N P j = i q jt for arbitrary process q t . dX it = π it ˆ A it dt + π it σdζ t ,d ˜ X it = ( α t ˆ A t ) − i dt + α − it σdζ t . We can derive a PDE for V ( t, x, y, η ) = sup π ∈A E h J ( t ) (cid:12)(cid:12) X t = x, ˜ X t = y, ˆ µ t = η i and obtain the fol-lowing results for the optimal strategy of the investors, from which we can compute the valuefunction of each investor. We include the explicit expression for the value function in the appendixto save the space. Theorem 2.

The stock return rate satisﬁes the condition 1. δ i > , θ i ∈ [0 , for all i ∈ { , ..., N } ,and deﬁne w i = δ i θ i , the investor i ’s estimate of return is η i , i ∈ { , ..., N } . Then there exists aunique Nash equilibrium strategy among investors.Deﬁne the following constants depending on i , m i = δ i σ S (1 − θ i N ) ( σ S w i (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) Z Tt e − R st (cid:18) λ + ˆΣ( u )+ σSσµρσ S (cid:19) du · Z st (cid:16) ˆΣ( u ) + σ S σ µ ρ (cid:17) e − R su (cid:18) λ + m )+ σSσµρ ) σS − ˆΣ( m )+ σSσµρσ S (cid:19) dm duds + θ i − θ i N ) , and β i = δ i σ S (cid:16) − θ i N (cid:17) (cid:26) η i + (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) Z Tt − σ S ! e − R st (cid:18) λ + ˆΣ( u )+ σSσµρσS (cid:19) du η i + ˆΣ( u ) + σ S σ µ ρσ S Z Tt Z st λ ¯ µe − R su (cid:18) λ + m )+ σSσµρ ) σS − ˆΣ( m )+ σSσµρσ S (cid:19) dm du · e − R st (cid:18) λ + ˆΣ( u )+ σSσµρσ S (cid:19) du ds (cid:27) . Let M be the matrix with the i -th row equals to m i e i , where ( e i ) j = 1 − δ { i = j } . The vector β withthe i-th component β i = β i . Then, the optimal strategy π ∗ = ( π ∗ ,i ) Ni =1 can be expressed in terms M and β as π ∗ = M − β . The detailed calculation is in the Appendix. Observe that the analytical solution includes all themarket parameters and risk preferences of all investors. In our numerical experiments, we set thenumber of agents N = 3. Given ˆΣ(0) ∈ R , we compute ˆΣ( t ) for t ∈ [0 , T ], then compute M and β by numerical integration. Agents can have identical or heterogeneous initial belief on the hiddenparameter. At each time step t k , instead of solving a linear system that involves a matrix inversion,we iteratively solve for strategy α it k until α converges. The convergence is within 3 iterations ateach time step, and hence the method is eﬃcient.4. Existence and Uniqueness of the FBSDE solution

To establish the existence and uniqueness of the FBSDE solution, we need to make some as-sumptions on the market parameters. Since the FBSDE is fully-coupled, we will ﬁrst show thatthe unique solution exists on a small time interval, then apply a pasting argument to get a uniquesolution on an arbitrary time interval [0 , T ]. Condition . Let L a be the norm of matrix A (3.8), L a = sup n k Ax k : x ∈ R N with k x k = 1 o . Suppose the following holds: L a e < , where e denotes the natural number. ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 15

Remark . We estimate the matrix norm as L a = k A k ≤ k A k k A k ∞ =  N X i =1 θ i N − θ i  max i θ i ( N − N − θ i ! ≤  N X i =1 θ i N −  max i θ i ( N − N − ! (4.1) ≤ max i (cid:16) θ i (cid:17) . So the condition is satisﬁed if max i θ i < e − . ≈ .

6. The moderate level of θ i is a reasonableassumption in view of the weak interaction in the existing literature [23]. Theorem 3.

Assume Condition 2 and that ( h t ) t ≤ T is a bounded process. There exists a uniquesolution to the FBSDE (3.11).Proof. Suppose | b t | = (cid:12)(cid:12)(cid:12) ˆ h t σ (cid:12)(cid:12)(cid:12) ≤ ˜ b ∈ R . We ﬁrst show the solution exists on a small interval, i.e, thereexists a δ b s.t for δ c ≤ δ b , the FBSDE has a unique solution on S δ c ( R N ) × S δ c ( R N ) × H δ c ( R N,d ).Denote g ( t, z ) = zb + 1 α | b t | , (4.2) σ ( t, z ) = z + 1 α b t , (4.3) f ( t, z ) = zb + 12 α | b t | . (4.4)Let δ h > δ c ∈ (0 , δ h ]. Let x ∈ R N be ﬁxed. We introduce the followingnorm(4.5) k ( Y, Z ) k N [0 ,δ c ] = sup t ∈ [0 ,δ c ] ( E | Y t | + E Z δ c t | Z t | ds ) / . Let N [0 , δ c ] be the completion of N [0 , δ c ] in H δ c ( R N ) × H δ c ( R N,d ) under norm (4.5). Take any( Y ( i ) , Z ( i ) ) ∈ N [0 , δ c ], i = 1 ,

2, the SDE for X ( i ) is: dX ( i ) = g ( t, Z ( i ) ) dt + σ ( t, Z ( i ) ) dζ t , t ∈ [0 , δ c ] , (4.6) X ( i )0 = x ( i ) . (4.7)Since both g and σ are independent of x , the above SDE has a unique strong solution X ( i ) ∈ H δ c ( R N ). Apply Itˆo’s formula to (cid:12)(cid:12)(cid:12) X (1) t − X (2) t (cid:12)(cid:12)(cid:12) , we then obtain the following estimate: E (cid:12)(cid:12)(cid:12) X (1) t − X (2) t (cid:12)(cid:12)(cid:12) = E Z t (cid:18) b s (cid:12)(cid:12)(cid:12) X (1) s − X (2) s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) (cid:19) ds ≤ E Z t ǫ (cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds + Z t ˜ b ǫ (cid:12)(cid:12)(cid:12) X (1) s − X (2) s (cid:12)(cid:12)(cid:12) ds + Z t (cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds  for some constant ǫ > E (cid:12)(cid:12)(cid:12) X (1) t − X (2) t (cid:12)(cid:12)(cid:12) ≤ e ǫ R δc ˜ b ds E Z δ c (1 + ǫ ) (cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds. Next, we solve the following BSDEs ( i = 1 , d ¯ Y ( i ) = f ( t, ¯ Z ( i ) ) dt + ¯ Z ( i ) dζ t , t ∈ [0 , δ c ] , (4.9) ¯ Y ( i ) δ c = AX ( i ) δ c . (4.10)where X ( i ) δ c is simulated with process Y ( i ) and Z ( i ) . Recall that we use A without the time indexto denote a constant matrix.Applying results to BSDEs with random coeﬃcients, we see that the BSDE (4.9) has a uniqueadapted solution ( ¯ Y ( i ) , ¯ Z ( i ) ) ∈ N [0 , δ c ] ⊂ ¯ N [0 , δ c ]. Deﬁne a map T : N [0 , δ c ] → N [0 , δ c ] by( Y ( i ) , Z ( i ) ) ( ¯ Y ( i ) , ¯ Z ( i ) ). Apply Ito’s formula to (cid:12)(cid:12)(cid:12) ¯ Y (1) t − ¯ Y (2) t (cid:12)(cid:12)(cid:12) , we have E (cid:12)(cid:12)(cid:12) ¯ Y (1) t − ¯ Y (2) t (cid:12)(cid:12)(cid:12) + Z δ c t (cid:12)(cid:12)(cid:12) ¯ Z (1) s − ¯ Z (2) s (cid:12)(cid:12)(cid:12) ds ≤ L a E (cid:12)(cid:12)(cid:12) X (1) δ c − X (2) δ c (cid:12)(cid:12)(cid:12) + Z δ c t b s (cid:12)(cid:12)(cid:12) ¯ Y (1) s − ¯ Y (2) s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds ≤ L a e ǫ R δc | b s | ds E Z δ c (1 + ǫ ) (cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds + Z δ c t b s (cid:12)(cid:12)(cid:12) ¯ Y (1) s − ¯ Y (2) s (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds ≤ L a e ǫ δ c ˜ b E Z δ c (1 + ǫ ) (cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds + Z δ c t ˜ b ǫ (cid:12)(cid:12)(cid:12) ¯ Y (1) s − ¯ Y (2) s (cid:12)(cid:12)(cid:12) ds + ǫ Z δ c t (cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds where we used (4.8) in the second last inequality. Hence, E (cid:12)(cid:12)(cid:12) ¯ Y (1) t − ¯ Y (2) t (cid:12)(cid:12)(cid:12) + Z δ c t (cid:12)(cid:12)(cid:12) ¯ Z (1) s − ¯ Z (2) s (cid:12)(cid:12)(cid:12) ds ≤ e ǫ δ c ˜ b ((cid:20) ǫ + L a e ǫ δ c ˜ b (1 + ǫ ) (cid:21) E Z δ c (cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds ) + ˜ b ǫ Z δ c t (cid:12)(cid:12)(cid:12) ¯ Y (1) s − ¯ Y (2) s (cid:12)(cid:12)(cid:12) ds ≤ e ǫ δ c ˜ b ((cid:20) ǫ + L a e ǫ δ c ˜ b (1 + ǫ ) (cid:21) E Z δ c (cid:12)(cid:12)(cid:12) Z (1) s − Z (2) s (cid:12)(cid:12)(cid:12) ds ) + ˜ b ǫ δ c sup t ≤ δ c (cid:12)(cid:12)(cid:12) ¯ Y (1) t − ¯ Y (2) t (cid:12)(cid:12)(cid:12) ≤ C ( ǫ , ǫ , δ c ) k ( Y (1) , Z (1) ) − ( Y (2) , Z (2) ) k N [0 ,δ c ] for C ( ǫ , ǫ , δ c ) = max ( e ǫ δ c ˜ b (cid:20) ǫ + L a e ǫ δ c ˜ b (1 + ǫ ) (cid:21) , ˜ b ǫ ) .Denote C ( ǫ i ) = e ǫi δ c ˜ b . By choosing ǫ = ǫ = 2 δ c ˜ b , we have C ( ǫ ) = C ( ǫ ) = 12 . Therefore, C ( ǫ , ǫ , δ c ) = max (cid:26) e (cid:16) ǫ + L a e (cid:17) (1 + ǫ ) , (cid:27) ≤ max (cid:26) L a e + e ǫ + L a eǫ + e ǫ ǫ , (cid:27) < . ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 17 for δ c small enough by condition (2).By the contraction mapping theorem, there exists a unique ﬁxed point ( Y, Z ) for T . In fact,( Y, Z ) ∈ N [0 , δ c ]. Let X be the corresponding solution to the SDE (4.6), ( X, Y, Z ) ∈ S δ c ( R N ) × S δ c ( R N ) × H δ c ( R N,d ) is a unique adapted solution of (4.9) with forward process (4.6).We next show that a unique solution exists for the problem with an arbitrary time horizon

T > Y component in the FBSDE solution.To this end, denote Θ := ( X , Y , Z ), and consider a FBSDE on a subinterval [ t , t ]: X t = ¯ η + Z tt g ( s, Θ s ) ds + Z t t σ ( s, Θ s ) dζ s , (4.11) Y t = φ ( X t ) − Z t t f ( s, Θ s ) ds − Z t t Z s dζ s , t ∈ [ t , t ] . (4.12)where ¯ η ∈ L ( F t ) and φ ( x, · ) ∈ L ( F t ), for each ﬁxed x . In the Markovian case and if σ ( t, · )is independent of Z , the solution Y t = u ( t, X t ) is a (viscosity) solution to a quasilinear PDE. Fornon-Markovian case as in the present setting, the solution Y t is a random function of the stochasticprocess. This correspondence provides intuition to the following deﬁnition that is standard in theliterature ([34]). Deﬁnition 2.

A decoupling ﬁeld of FBSDE (3.11) is an F -progressively measurable random ﬁeld u : [0 , T ] × R × Ω R with u ( T, x ) = h ( x ) if there exists a constant δ h > t < t ≤ T with t − t ≤ δ c and any ¯ η ∈ L ( F t ), the FBSDE ((4.11) - (4.12)) with initialvalue ¯ η and terminal condition u ( t , · ) has a unique solution that satisﬁes Y t = u ( t, X t ) for t ∈ [0 , T ].Such decoupling ﬁeld u is called regular if it is uniformly Lipschitz in the spacial variable x .To construct a regular decoupling ﬁeld, we look at a variational FBSDE. Omitting the subscript t , ﬁx processes Y (1) , Y (2) , X (1) , X (2) ∈ [0 , T ] × R N . The initial conditions for X (1) , X (2) are x (1) and x (2) , respectively. Let i and j be the indices for vector components.Deﬁne the operator ¯ ∇ : ([0 , T ] × R N ) → [0 , T ] × R N × N as( ¯ ∇ Y ) i,j :=  Y (1) i − Y (2) i x (1) j − x (2) j if x (1) j = x (2) j , x (1) j = x (2) j . for i, j ∈ { , ..., N } . The operator ¯ ∇ on a constant vector is deﬁned similarly.We next show the FBSDE in ¯ ∇ X , ¯ ∇ Y and ¯ ∇ Z has a unique solution that is in fact a constant.This solution leads to a decoupling ﬁeld associated to the original FBSDE. Then by a pastingargument, the FBSDE has a unique solution on an arbitrary time interval [0 , T ].By equation (4.2) and (4.3), and let z , z denote arbitrary vectors in R N , we have g ( t, z ) − g ( t, z ) z − z = b t , (4.13) σ ( t, z ) − σ ( t, z ) z − z = 1 . (4.14) We can compute X (1) − X (2) = x (1) − x (2) + Z t g (1) − g (2) Z (1) − Z (2) (cid:16) Z (1) − Z (2) (cid:17) ds + Z t σ − σ Z (1) − Z (2) ( Z (1) − Z (2) ) dζ s , (4.15) Y (1) − Y (2) = A ( X (1) T − X (2) T ) − Z Tt f (1) − f (2) Z (1) − Z (2) ( Z (1) − Z (2) ) ds − Z Tt ( Z (1) − Z (2) ) dζ s . (4.16)It can be veriﬁed that (4.15) - (4.16) imply that the processes ¯ ∇ X , ¯ ∇ Y and ¯ ∇ Z satisfy thefollowing FBSDE ¯ ∇ X = ¯ ∇ x + Z t b s ¯ ∇ Z s ds + Z t ¯ ∇ Z s dζ s , (4.17) ¯ ∇ Y = A ¯ ∇ X T − Z Tt b s ¯ ∇ Z s ds − Z Tt ¯ ∇ Z s dζ s . (4.18)A solution to the above FBSDE is ( ¯ ∇ X, ¯ ∇ Y, ¯ ∇ Z ) t ∈ [0 ,T ] = ( ¯ ∇ x, A ¯ ∇ x, t ∈ [0 ,T ] .Similarly, deﬁne(4.19) ( ¯ ∇ u ( t )) i,j = u i ( t, X (1) t ) − u i ( t, X (2) t ) X (1) ,jt − X (2) ,jt , we must have ¯ ∇ u ( t ) = ¯ ∇ Y t ( ¯ ∇ X t ) − = A. The random ﬁeld u ( t, x ) is uniformly Lipschitz continuous in the spacial variable. Hence it isregular.Let δ c > ∈ L for t ≤ δ c . For any ( t, x ), denote the (unique) solution to FBSDE (3.11) starting from ( t, x ) by Θ t,x ,and denote a random ﬁeld by u ( t, x ) = Y t,xt . The uniqueness of solution to FBSDE then leads tothat Y t,xs = u ( s, X t,xs ), for s ∈ [ t, T ], P -a.s.Let 0 = t < ... < t n = T be a partition of [0 , T ] such that t i − t i − ≤ δ c , i − , ..., n . Weﬁrst consider the FBSDE (3.11) on [ t n − , t n ]. By existence of solution on a small interval, thereexists a process Y t n − ,xt , for x = X t n − and hence a random ﬁeld u ( t, x ) for t ∈ [ t n − , t n ] suchthat ¯ ∇ u ( t ) = A for all t ∈ [ t n − , t n ]. Next, consider FBSDE (3.11) on [ t n − , t n − ] with terminalcondition u ( t n − , · ). Apply the results on solution on small interval, we ﬁnd u on [ t n − , t n − ] suchthat ¯ ∇ u ( t ) = A for t ∈ [ t n − , t n − ]. Repeating this procedure backward n times, we extend therandom ﬁeld u to the whole interval [0 , T ].We now show the solution obtained in this way is in the right space.Deﬁne I := E  Z T (cid:0) | g | + | f | (cid:1) ( s, ds ! + Z T | σ ( s, | ds  ≤ E Z T | b s | ds ! + E Z T | b s | ds ≤ T ˜ b + T ˜ b < ∞ . ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 19 By | u ( t, x ) | ≤ | u ( t, | + | x | , considering the FBSDE on each interval [ t i , t i +1 ] with initial value X t i = 0, we see that there exists a constant C such that(4.20) E | u ( t i , | = E (cid:12)(cid:12)(cid:12) Y t i , t i (cid:12)(cid:12)(cid:12) ≤ C (cid:16) E | u ( t i +1 , | + E (cid:12)(cid:12) X t i +1 (cid:12)(cid:12) (cid:17) + CI . Since u ( t n ,

0) = 0, we have(4.21) max ≤ i ≤ n E | u ( t i , | ≤ CI . A standard estimation using the forward and backward dynamics and the bounds for the coeﬃ-cients yields, E ( sup t i ≤ t ≤ t i +1 (cid:16) | X t | + | Y t | (cid:17) + Z t i +1 t i | Z s | ds ) ≤ C E h | X t i | + (cid:12)(cid:12) u ( t i +1 , (cid:12)(cid:12) i + CI (4.22)To estimate | X t i | , notice that (4.22) and (4.20) imply that E | X t i +1 | ≤ C E h | X t i | + | u ( t i +1 , | i + CI ≤ C E | X t i | + CI . Therefore, E ( sup ≤ t ≤ T (cid:16) | X t | + | Y t | (cid:17) + Z T | Z s | ds ) ≤ C (cid:16) | x | + I (cid:17) . (4.23) (cid:3) Remark . The proof relies crucially on the boundedness of ( b t ) t ∈ [0 ,T ] , or equivalently, of the returnrate ( h t ) t ∈ [0 ,T ] . The case where h t is an unbounded stochastic process is still open and it is left forfuture research. 5. Deep learning algorithms

In this section, we will give a brief introduction to the deep neural network that will be used inour numerical scheme.5.1.

The neural network as function approximators.

Neural networks are compositions ofsimple functions. They are eﬃcient in approximating the solutions of (stochastic) diﬀerential equa-tions. To obtain a good approximator, it usually requires the algorithm to ﬁnd the best parametersin the function composition, which, in many cases, is convenient by the method of SGD.We adopt notations from [ ? ] and consider simple feedforward neural networks (NNs). Denotethe dimension of state variable x by d x . Fix a input dimension d I = d x if the approximated functionis only in the variable x . We may take time t as an additional input parameter to enable parametersharing across time steps. In this case, the solution at all time steps is modeled by a single neuralnetwork and the function will depend on ( t, x ) and d I = d x + 1. We denote the output dimension by d O , and d O = N for N the dimension of the FBSDE solution. There are total number of L + 1 ∈ N , L ≥ m l , l ∈ { , ..., L } , the number of neurons in each hidden layer. Forsimplicity, we choose m l = m for l ∈ { , ..., L − } . More speciﬁcally, the fully connected feedforward neural network for the FBSDE solver is afunction from R d I to R d O deﬁned by the composition map x P L · ϕ · P L − · ... · ϕ · P ( x ) := f ( x ) ∈ R N . for x ∈ R d I . Here, P l , l ∈ { , ..., L } are aﬃne functions with assigned input and output dimensions.To be speciﬁcal on the structure of A l , P l ( x ) = w l x + b l , l ∈ { , ..., L } . The matrix w l and vector b l is the weight and bias of a hidden layer, respectively. ϕ : R → R is theactivation function, which is a nonlinear function that can be customized. Some standard activationfunctions are tanh, Softmax, Sigmoid, and ReLU. For all our experiments, we use ReLU( x ) =max { , x } as the activation function for the fully connected networks.Denote the parameters of the neural network by θ ∈ R N θ , which includes all the matrices w l andvectors b l . Let N θ ( m ) = P L − l =0 m l (1 + m l +1 ) = d I (1 + m ) + m (1 + m )( L −

1) + m (1 + N ). DenoteΘ m the set of possible parameters with m hidden units.The neural network that satisﬁes the given input and output dimension, the number of layers,and with the nonlinear function ϕ is in the function space N N ϕd I ,d O ,L = ∪ m ∈ N N N ϕd I ,N,L,m (Θ m ) = ∪ m ∈ N N N ϕd I ,N,L,m ( R N θ ( m ) )By a learnable variable, we mean any variable that is needed to compute the value of the lossfunction and can be optimized, other than the parameters in the above functional form of neuralnetworks.5.1.1. The recurrent neural network.

We will use the recurrent neural network for the network-based estimation step, and we introduce it here. Denote the weights and bias parameters similaras before. Let ϕ denote the activation function. the details network structure is as follows. For atime series sequence x = ( x , ..., x t , ..., x T ), we compute the hidden state at time t , H t inductivelyby H t = ϕ ( w it x t + b it + w ht H t − + b ht ) , (5.1) f t = ϕ ( w t H t + b t ) , t ∈ { , ..., T } . (5.2)where the subscript it denotes the weight and bias for input at time t and the subscript ht indicatesthe weight and bias for the hidden state at time t < T . The parameters w t and b t denotes theweight and bias of the linear map for the time t hidden state. The RNN structure we use for StageI estimation is exactly this one, with x t being the stock price at time t .5.2. Deep Learning Scheme.

We perform the deep learning scheme on several independent mod-els. Each model corresponds to a particular market setting. HM indicates the homogeneous initialbelief, HT for the heterogeneous initial belief, PI for partial information, FI for full information. L for the case of linear Gaussian ﬁlter. NL for nonlinear ﬁlter. C for game with competition, and NC for no competition.We use the uniform time discretization for interval [0 , T ]. Let 0 = t < ... < t K = T besuch that ∆ t = t k − t k − , k ∈ { , ..., K } . The conditional expectation in the previous section isa function of stock prices and the initial belief that best approximates the conditioned variablein the least square sense. The deep learning scheme is performed in two stages. In Stage I, weapproximate the conditional expectation of the mean stock return as a function of the hiddenvariable, h t = h ( A t ) on F St . The hidden state H t computed by the feed-forward neural network ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 21 depends the past stock prices up to time t , as well as the initial prior of the investor’s estimate onthe market return. Therefore, so does the approximated estimation of the investor, ˆ h at time t .We optimize E = 3 independent networks in parallel and take the average of network outputs toget a single agent’s estimation. The variance reduction technique of averaging random outcomesis common in the classical Monte Carlo method. Let G k be the neural network approximation ofconditional expectation at time step k . The input variable is the all the asset prices at discretetime steps, including that at time 0, and the investor’s initial prior. Hence, d I,K = d ( K + 1) + 1where d is the number of stocks and K is the maximal time step index. Let G ∈

Rnn ϕd I,K ,N,m =64 ,and G k be the k -th output in the sequential order that depends on information up to time t k . Formini-batch of size B , the loss function for Stage I is LossI = 1 B K + 1 B X j =1 K X k =0 kG k ( S ( j ) ·∧ t k , ˆ A ( j )0 ) − h ( j ) k ( ˆ A ( j )0 ) k where S ( j ) ·∧ t k denotes the stock prices up to time t k , and h ( j ) k ( ˆ A ( j )0 ) indicates the j -th simulatedstock return from the investor’s subjective probability measure P i mainly caused by diﬀerent initialbeliefs. Suppose G ∗ , ( e ) is the trained model for the e -th independent network, the estimation of thestock return at time t k given the stock price path S and initial estimate ˆ A isˆ h k = 1 E E X e =1 G ∗ , ( e ) k ( S ·∧ t k , ˆ A ) . In Stage II, we use neural networks to approximate the solution ( X t , Y t , Z t ) , t ∈ [0 , T ] of theFBSDE. Let ˆ Y be learnable variables that will be optimized by the SGD. ˆ X = x . First, deﬁnethe vector h by ( h k ) i = h ik where h ik is the agent i ’s estimation of return at time t k . The networkinput is (ˆ h k , S k , t k ), which is a vector consisting the value of estimated returns, the stock pricesand a time variable. Let d I = 2 dN + 1 := d S (S for Solver) for d the number of stocks as before, G S ∈ N N d S ,N, , ˆ h k , S k ∈ R d . Compute ˆ Y k +1 from ˆ Y k by (omitting index j for j ∈ { , ..., B } )ˆ Y k +1 = ˆ Y k + ˆ Z k ∆ ζ k + f (ˆ h k , ˆ Z k )∆ t, and ˆ X k +1 from ˆ X k byˆ X k +1 = ˆ X k +  ˆ Z k ˆ h k σ + 1 α (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˆ h k σ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  ∆ t + ˆ Z k + 1 α ˆ h k σ ! ∆ ζ k . where ˆ Z k = G S (ˆ h k , S k , t k ), until k = K . The loss function is LossII = 1 B B X j =1 k ˆ Y ( j ) K − A ˆ X ( j ) K k . We use a single layer recurrent neural network (RNN) with hidden units m = 64 for Stage I,the estimation step. The choice of network structure utilizes the time series nature of the inputvariable, and at the same time to reduce computational complexity. Due to the path dependence ofthe estimation, the estimate ˆ h k at diﬀerent times require neural network approximators of varyinginput dimension, if without the RNN. The RNN takes the sequence of stock prices indexed bytime as input and outputs a sequence of hidden states indexed by time. Each element of the outputsequence depends on the data up to the index time. We then transform each hidden state by a linear map to obtain the estimation of return at the corresponding time. The loss is the mean squareerror (MSE) of the estimate against the true return. To minimize the eﬀort of hyperparametertuning that gives no structural changes, we use the default activation function from the pytorchRNN module and let ϕ = tanh.For Stage II, we use networks with L = 3 layers and m = 64 hidden units for G S . We usethe Adam optimizer in both stages. The initial rate for Stage I is lr = 1 e − and we use learningrate decay and half the learning rate every ep decay = 400 steps. The learning rate for Stage II is lr = 3 e − , which is a standard choice for solving the BSDEs. Learning rates on the same scaleproduce similar results. We include a comparison of the same deep learning scheme with diﬀerentlearning rates in the appendix. As mentioned in the previous section, we use ReLU as the activationfunction for the fully connected network in the FBSDE solver.The training proceeds with ep train = 5000 epochs for Stage I, followed by ep train = 5000 epochsfor Stage II. Mini-batch size is B = 64 for both stages. The deep learning scheme is eﬃcient androbust across diﬀerent sets of hyperparameters. We use deeper networks on Stage I compared tothat of Stage II because of the path dependence nature of the estimation problem. Since the networkis long, fewer hidden units in each layer are needed to achieve the same complexity of the functionapproximator. In all the numerical experiments, we ﬁx the investment horizon T = 0 . η, ¯ µ ) = (0 . , s ≈ Numerical results and model implications

Although the solution to the PDE is in an analytic form, we still need to compute the values ofintegrals by numerical integrations. In this section, we present the solution to the HJB equation,as well as the numerical solution by solving the multi-dimensional FBSDE (3.11) using the deeplearning method. We compare results from both methods in case of linear ﬁlters when PDE solutionsare available. We further apply the deep learning scheme on FBSDEs when nonlinear ﬁlters areused to obtain the estimate of return rate. When the estimated return ˆ h t is bounded, the solutionto the FBSDE is unique. The deep learning solution converges to the unique Nash equilibrium.When the return process is not necessarily bounded, which in our case, can be a CIR process forthe hidden variable A t and a square root relation between the mean return and the hidden variable,we do not have theoretical results on the uniqueness of the solution. However, we can still applythe numerical method to ﬁnd an equilibrium strategy.6.1. Linear ﬁlter: homogeneous initial belief.

In this section, we assume the investor’s initialestimate is accurate, i.e, ˆΣ(0) = 0. Denote ∆ W t k = W t k +1 − W t k and ∆ B t k = B t k +1 − B t k . Bydynamics (3.21) - (3.20), generate sample paths of stock prices by A t k +1 = A t k − λ ( A t k − ¯ µ )∆ t + σ µ ∆ B t k , (6.1) S t k +1 − S t k S t k = A t k ∆ t + σ S (cid:16)p − ρ ∆ W t k + ρ ∆ B t k (cid:17) , k ∈ { , ..., K − } . (6.2)Set the base market parameters for the case of linear ﬁlters to be(6.3) λ = 8 , σ S = 0 . , σ µ = 0 . , ρ = − . , where we allow the initial condition for h = h ( A ) vary, as well as the ¯ µ vary for diﬀerent experi-ments. We specify the cases later when presenting the correponding investment strategies.The risk preference parameters are shown in Table 2. ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 23

Case δ δ δ θ θ θ NC C Table 2.

Investors’ risk parameters with or without competition. Case C indicatesinvestment under wealth competition, and NC indicates the standard CARA utilitycase.To illustrate the investor’s estimated return process, we plot the sample path of estimates fromthe RNN structure in Figure 1, together with the true return process. The estimate is the averageof three independent RNN network approximates. Consistent with standard results from ﬁlteringtheory, the estimates exhibits a trend that is similar to to the true process but with a time lag.The values of the estimate process are less extreme due to the eﬀect of estimation.Table 3 is a comparison of the the deep learning results to the benchmark solutions from thePDE method. Both the initial positions and the value functions are accurate for the experimentedparameter sets. The largest relative error of the initial position is (5 . − . / .

58 = 4 . . .

25. The initial values areaccurate to the 2nd signiﬁcant digit, and the maximal relative error in the value is less than 0 . π t of Agent 1 and Agent 3, and omit Agent 2 inthe ﬁrst row. In the second row, the wealth processes for all three agents are shown. In the ﬁrstrow, the black dash-dot line indicates the Merton strategy under the competition case, that is, thestrategy by assuming deterministic return process, which strategy can be solved as in [30]. Thediﬀerence between the investment strategy and the Merton strategy is the hedging demand of theinvestor under competition utility. The hedging demand vanishes as time approachs the end of theinvestment horizon, regardless of the market or risk parameters.Table 4 shows the statistics of investment strategies for all three investors under diﬀerent marketparameters and risk preferences. Each mean and standard derivation the empirical statistics of B = 64 sample paths. The point here is not to estimate the true time series mean and std usingMonte Carlo method, but to illustrate the distribution of strategies, hence a sample size the sameas the training mini-batch size is used. We compute the CV (coeﬃcient of variation) as the ratioof the std and the mean, or the std per unit of the mean, as an additional indicator for the timeseries volatility of strategies, and we report the average of CVs for the three investors in the table.Investors’ strategies are more volatile under full information by observing the std and the CV.However, the CV for the ﬁrst set of market parameter indicates that the variation per unit of themean may increase under the partial information setting. Competition does not have a signiﬁcanteﬀect on the CVs for the three test market parameters, indicating that standard deviation increasemostly due to the increase of the strategy in term of absolute value.Table 5 reports the empirical Sharpe ratio and the VRR, which is deﬁned as the mean returnover variance, instead of over the std, following [19]. We also report the Sharpe ratio and theVRR by viewing the three agents portfolio as the social portfolio. In the current experiments, themean reverting (¯ µ = 0 , .

02) level is quite low, since we take a conservative view of the market,which causes both the Sharpe and VRR to be small. For the ﬁrst set of parameters (the toppanel), small variation in the portfolio returns compensates for the inaccurate of return estimates, S t o c k r e t u r n t h t Figure 1.

Sample paths of the estimated return versus the real market returnfor the linear Gaussian return dynamics with process parameters speciﬁed in (6.3).Black dashed line indicates the average of 3 independent neural network approxi-mations. h = 0 .

05, ¯ µ = 0 .

02, and the blue dashed line indicates the true return.The initial estimate is a constant that equals to the true raturn rate, i.e, ˆ h = 0 . x -axis indicates time. y -axis is the stock return rate. η ¯ µ Equation Initial position V (0)0 .

02 0 PDE (2 . , . , . − . , − . , − . . , . , . − . , − . , − . .

05 0 .

02 PDE (7 . , . , . − . , − . , − . . , . , . − . , − . , − . . .

02 PDE (14 . , . , . − . , − . , − . . , . , . − . , − . , − . Table 3.

The investors’ initial positions and values of investment obtained fromsolving the PDEs and the FBSDEs. The top, middle and bottom panels showssolution under 3 diﬀerent market environment with diﬀerent initial return andmean-reverting level. The investors’ initial prior is a constant equal to the markettrue return rate.resulting in the larger Sharpe and VRR in the case PI compared to the case FI , for all three agents.Comparing the portfolio performances for the C and NC case under partial information, whethercompetition increases or decreases the Sharpe and VRR depends on the interaction of market andinvestors’ parameters. We leave it as future research to ﬁnd the condition of the parameters andrisk preferences that induces each case.6.2.

Linear ﬁlter: heterogenous prior beliefs.

Recall the estimated return of Agent i is givenby E i h h ( t ) |F S ∨ ˆ A i i , where E i indicates expectation under the subjective probability measure P i of Agent i . In both deep learning stages, we need to generate sample paths of h t under thoseprobability measures. The initial beliefs in the hidden variable ˆ A are sampled from a normaldistribution of N ( m i , v i ), i ∈ { , ..., N } . Notice that v i = ˆΣ i (0).To focus on the variation in the estimates’ accuracy, we assume the means of estimates are thesame for all 3 agents and let the standard deviation vary. The mean is equal to the true return ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 25 −20−100102030 D o ll a r a m o un t i n s t o c k −10010203040 −20−10010203040 Merton C Agent 1Merton C Agent 3π Agent 1π Agent 3 W e a l t h ( X t ) T ( rs)

X Agent 1X Agent 2X Agent 3

Figure 2.

Sample paths of the dollar amount invested in the stock for Agent 1 andAgent 3, as well as the wealth processes X t for all three agents. The left columncorresponds to investors’ response when the market parameters are h = 0 . µ = 0. The middle column corresponds to the investors’ response when the markethas h = 0 .

05, ¯ µ = 0 .

02. And the right column corresponds to the strategies andwealth processes for investors’ when the true market parameters are h = 0 . µ = 0 . m , std ) = (0 . , . , ( m , std ) = (0 . , . , ( m , std ) = (0 . , , where the Agent 3 has the accurate estimate.To generate sample paths under the subjective probability measure, we ﬁrst sample ˆ A i from theprescribed distribution, then proceed as follows:ˆ A it k +1 = ˆ A it k − λ ( ˆ A it k − ¯ µ )∆ t + σ a ∆ B t k , k ∈ { , ..., K − } . (6.5)The above equation requires simulation of ∆ B . To obtain the stock prices in P , we next simulate∆ W . Let A be the accurate market return, we simulate according to (6.1) and (6.2) to get thestock prices S t k in the objective world. The estimation is the projection on the subjection view ˆ A i .The hidden state of the recurrent network G S at time t k depends on the stock path up to time t k , as well as ˆ A . To get the return estimates for investor i , we then optimize the NNs by SGDon the mean square loss of the NN outputs against the subjective hidden state ˆ A i . The estimatedreturn is the output of Stage I. The estimation is a part of the neural network input at Stage II forsolving the FBSDE. Mean of (abs. value) strategies Std of (abs. value) strategies CV Ratio of CV

Agent 1 Agent 2 Agent 3 Agent 1 Agent 2 Agent 3

Mean

PI / FI C / NCNC-FI

NC-PI

C-PI

NC-FI

NC-PI

C-PI

NC-FI

NC-PI

C-PI

Table 4.

Time series mean and standard deviation of the three agents’s absolutevalue of investment strategies under diﬀerent market parameter sets for the linearGaussian case. The return dynamic is given by ((6.2) - (6.1)) and parametersare given by (6.3). The CV (coeﬃcient of variation) is the time series standarddeviation per unit of the mean, i.e, CV = Std / mean. The ratio PI / FI is theratio of the CVs of the case PI-NC and the FI-NC. The ratio C / NC is the ratio ofCVs of the case PI-C and PI-NC. The top, middle and bottom panel correspondsto the case ( h , ¯ µ ) equals to (0 . , . , .

02) and (0 . , . Sharpe ratio VRR

Agent 1 Agent 2 Agent 3 Social Agent 1 Agent 2 Agent 3 SocialNC-FI

NC-PI

C-PI

NC-FI

NC-PI

C-PI

NC-FI

NC-PI

C-PI

Table 5.

The empirical Sharpe ratio and VRR for each agent as well as the totalwealth of agents (social wealth). The top panel (row 1 - 3) corresponds to themarket parameters with initial return rate 0 .

02 and mean reverting to 0. Themiddle panel (row 4 - 6) corresponds to initial return 0 .

05 and mean reverting level0 .

02. The bottom panel (row 7 - 9) is for diﬀerent information setting when theinitial market return is 0 .

1, with mean-reverting to 0 .

02. In each market setting,the investors have the initial belief that is a constant equals to the true marketreturn rate.To facilitate the comparison, the numerical results for this section in both the case with andwithout competition, C and NC, respectively, are shown in the later section together with theheterogeneous agents case with nonlinear returns.

ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 27

Nonlinear ﬁlter.

The linear relation between the return rate and stock fundamentals allowsus to obtain an explicit solution. However, the assumption is restrictive and unrealistic. To ﬁndan investment strategy that is useful in practice, or to derive relevant asset pricing implicationsfrom the investment strategies, we need to consider the case of nonlinear h . For the numericalexperiments, we focus on the following stock and hidden state dynamics: dS t S t = c · sign( A t ) p | A t | dt + p − ρ σ S dW t + ρσ S dB t (observed) , (6.6) dA t = − λ ( A t − ¯ µ ) dt + σ a p ( A t − a l )( a u − A t ) dB t (hidden)(6.7)for λ, c, a l , a u ∈ R .The numerical scheme for solving the equilibrium investment strategy and value functions issimilar to the one with linear ﬁlters. The diﬀerence is that simulations of ˆ A it and A t are accordingto the discretized equation for the above dynamics instead of the Ornstein-Uhlenbeck process (6.2)- (6.1).We next present numerical results in cases of nonlinear ﬁlters for heterogeneous market investorswith parameters in (6.4). The base market parameters for (6.7) - (6.6) are(6.8) c = 0 . , ρ = − . , σ S = 0 . , λ = 1 , σ a = 0 . , a l = − . , a u = 0 . . Similar to the case of linear Gaussian return, Figure 3 shows sample path strategies and thewealth processes of the agents. The black dash-dotted line in the ﬁrst row of the ﬁgure is the Mertonstrategy that assumes the return is a deterministic process, which we use as the benchmark. Andthe diﬀerence between the real strategies and the benchmark is the hedging demand for stochasticreturns. As time approaches the end of the investment horizon, the hedging demand vanishes.Figure 4 illustrates the eﬀect of competition. The subﬁgures are the investment strategies for allthe three agents. Each subﬁgure includes the strategy for an investor in case of with and withoutrelative concerns, C and NC. The horizontal line indicates the time series mean of strategies in theC and NC case. In case with competition, the weight parameters are 0.2, 0.5 and 0.2, respectivelyfor the three agents, and it is apparent from the ﬁgure that the change in Agent 2’s strategy is thelargest among the three investors, due to the largest competition weight factor.Table 6 shows the time series statistics of investment strategies for all investors with diﬀerentmarket parameters and risk preferences under the nonlinear return dynamics. Both the mean andStd are mean of the B = 64 sample path Means and Stds. The CV is deﬁned similar as before asthe standard deviation per unit of the mean, and we report the mean of CVs for the three investorsin the table. We then calculate the changes in the CVs and report the ratio as an indicator tothe volatilities of the strategies. Under the nonlinear dynamics, the CVs for the ﬁrst parameterset is signiﬁcantly smaller in the partial information case (PI), compared to the full informationcase (FI). For other initial return rates and mean reverting levels, it is similar that the strategiesunder PI is less volatile. Unlike the experimented cases of the linear ﬁlter, competition can decreasethe volatility of the strategies, since the ratio of CVs for C and NC for the last set of parameters,( h , ¯ µ ) = (0 . , .

02) is smaller than 1.Table 7 reports the empirical Sharpe ratio and VRR, which is deﬁned as the mean return overvariance. The Sharpe ratio is higher in the full information case for the ﬁrst two sets of marketparameters, while for the last parameter set, the partial information case yields a higher Sharperatio. Similarly for the VRRs. The standard deviation of wealth processes may be large in thecase of full information to the level that a high return could not compensate for, and thus inducesa smaller empirical Sharpe ratio in the FI case. Whether or not competition increases the return D o ll a r a m o un t i n s t o c k Merton C Agent 1Merton C Agent 3π Agent 1π Agent 3 W e a l t h ( X t ) T ( rs)

X Agent 1X Agent 2X Agent 3

Figure 3.

Sample paths of the dollar amount invested in the stock for Agent 1 andAgent 3, as well as the wealth processes X t for all three agents when the return ratedynamics is nonlinear. The left column corresponds to investors’ response whenthe market parameters are h = 0 .

02, ¯ µ = 0. The middle column corresponds tothe investors’ response when the market has h = 0 .

05, ¯ µ = 0 .

02. And the rightcolumn corresponds to the strategies and wealth processes for investors’ when thetrue market parameters are h = 0 . µ = 0 . Heterogenous prior beliefs: the nonlinear ﬁlter and comparisons.

We have describedthe computation algorithm of the estimation step, Stage I in previous sections. The estimationsenter the FBSDE as state variables, and allow us to solve for the optimal investment strategiesunder both the case with and without competition. Table 8 shows the times series mean andstandard derivations of the absolute value of investment strategies of the 3 agents. Similar as inthe previous sections, we report the mean of the time series statistics over a sample of size B = 64.The bottom panel of 8 is the diﬀerences of the statistics between the linear and nonlinear case forthe set of market parameters that we speciﬁed.The top panel shows that competition increases the investment proportion and its volatility,since the ratio of the mean of the competition and no competition case is greater than 1 for allagents both in the homogeneous and heterogeneous case. Similarly for the case with nonlinearﬁlters as it is shown in the middle panel. A key observation is the ratios are larger in the HTcase compared to the HM case for both linear and nonlinear return dynamics, which indicates thatinvestors increase their investment proportion in absolute value. In other words, Agent 1 and Agent2 follow the strategy of Agent 3, who has the most aggressive strategy. Agent 3 also increases the ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 29 D o ll a r a m o un t i n s t o c k π Agent 1, Cπ Agent 1, NC 0.0 0.1 0.2 0.3 0.4 0.5 T (yrs) π Agent 2, Cπ Agent 2, NC 0.0 0.1 0.2 0.3 0.4 0.5π Agent 3, Cπ Agent 3, NC

Figure 4.

Sample paths of investors’ strategies under nonlinear return rate dy-namics. From the left to the right are the strategies for Agent 1, Agent 2 and Agent3, respectively. The solid lines are strategies when investors are under competition.The dash-dotted lines are the Merton strategies. The risk aversion parameters andcompetition weights for both cases are speciﬁed in 2. The horizontal lines are theaverages of strategies across time for the plotted sample paths.

Mean of (abs. value) strategies Std of (abs. value) strategies CV Ratio of CV

Agent 1 Agent 2 Agent 3 Agent 1 Agent 2 Agent 3

Mean

PI / FI C / NCNC-FI

NC-PI

C-PI

NC-FI

NC-PI

C-PI

NC-FI

NC-PI

C-PI

Table 6.

Time series statistics of the three agents’ absolute value of investmentstrategies under diﬀerent market parameter sets for the nonlinear case. The returndynamic is ((6.6) - (6.7)) and the parameter set is in (6.8). The CV (coeﬃcient ofvariation) is the time series standard deviation per unit of the mean, i.e, CV = Std/ mean. Mean of CVs is the average of investors’ CVs. The ratio PI / FI is theratio of the (mean of) CVs of the case PI-NC and the FI-NC. The ratio C / NC isthe ratio of the (mean of) CVs of the case PI-C and PI-NC. The top, middle andbottom panels correspond to the case ( h , ¯ µ ) equals to (0 . , . , .

02) and(0 . , . Sharpe ratio VRR

Agent 0 Agent 1 Agent 2 Social Agent 0 Agent 1 Agent 2 SocialNC-FI

NC-PI

C-PI

NC-FI

NC-PI

C-PI

NC-FI

NC-PI

C-PI

Table 7.

The empirical Sharpe ratio and VRR for each agent as well as the socialwealth for nonlinear return rate dynamics. The top panel (row 1 - 3) correspondsto the market parameters with initial return rate 0 .

02 and mean reverting to 0. Themiddle panel (row 4 - 6) corresponds to initial return 0 .

05 and mean reverting level0 .

02. The bottom panel (row 7 - 9) is for diﬀerent information setting when theinitial market return is 0 .

1, with mean-reverting to 0 .

02. In each market setting,the investors have the initial belief that is a constant equals to the true marketreturn rate. Agents’ risk parameters are the base parameters.The bottom panel is the diﬀerence between the top and the middle panel, the case L minusthe case NL. The ratio is higher in the linear ﬁlter case, except for Agent 3 with heterogeneousinvestors, where the competition eﬀect is more pronounced for Agent 3 in the nonlinear case com-pared to the linear case. Viewing the competition eﬀect as an agent-market characteristics, thecross-market diﬀerence varies more across agents in the HT case (the column C / NC in the bottompanel). Therefore, partial information heterogeneity aﬀects the sensitivity of the competition eﬀectto market characteristics, the return process in this particular case.Notice that the information heterogeneity is only on the prior estimates. All agents follow theBayesian learning procedure with unlimited information processing ability. The eﬀect of hetero-geneity is already pronounced.

ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 31

Mean of (abs. value) strategies Std of (abs. value) strategies

C / NCAgent 1 Agent 2 Agent 3 Agent 1 Agent 2 Agent 3 Agent 1 Agent 2 Agent 3

L-HT

NC-PI

C-PI

NC-PI

C-PI

NC-PI

C-PI

NC-PI

C-PI

NC-PI -0.735 -0.888 1.661 1.566 2.753 4.038

C-PI -0.748 -1.027 1.308 2.156 3.916 4.140 0.080 0.082 -0.088Diﬀ: HM

NC-PI

C-PI

Table 8.

The mean and standard deviation of absolute value of investment strate-gies for investors with hetegogeneous initial beliefs, under both the linear and non-linear return dynamics. L stands for linear ﬁlter, NL for nonlinear ﬁlter, HMand HT stands for homogeneous and nonhomogeneous agents, respectively. Ineach section of case L and NL, the ﬁrst two rows show strategies for the hetero-geneous beliefs, while we include the result from the homogeneous investors casefor comparison (the last two rows in each section). The base market parameter is( h , ¯ µ ) = (0 . , . h = h , while diﬀer in the variance parameters. For the linear case, the variancesare 0 .

05, 0 . .

05, 0 . h . The bottompanel is the diﬀerence of the Means and Stds of the L and NL cases from the topand middle panel. Conclusion and further remarks

In this paper, we consider an N-agent game where the investors are utility maximizers. The in-vestor’s utility function depends on the wealth amount she outperforms the market average. Marketinvestors can only observe the stock prices, but not the state that drives the drift. First, we establisha fully-coupled forward backward stochastic diﬀerential equation (FBSDE) that characterizes theN-agent investment decisions. For bounded return process, we show that the FBSDE solution isunique. Therefore, for the linear Gaussian or bounded nonlinear returns , we have the existence anduniqueness result of the FBSDE solution. Hence, there is a unique Nash equilibrium for the game.The wellposedness of the FBSDE in the case of unbounded return process is not readily available,because it requires higher moments estimation of the solution components to meet the couplingcondition. We leave it for future reseach. For the numerical scheme, we apply a novel deep learningapproach to the system of equations. We ﬁrst apply deep-neural-network-based L projection toobtain each investor’s estimation of the asset return, and then design a deep FBSDE solver to ﬁndthe value functions and the optimal controls simultaneously for all agents. The deep learning solu-tion is compared to the PDE solution for the linear Gaussian return. The methodology developedin this paper, both the theoretical results and the numerical methods have potential applications instochastic controls, stochastic games as well as in the mean ﬁeld setting. Moreover, in the presentpaper, the information heterogeneity is only on the prior estimates. All agents follow the Bayesianlearning procedure with uncounstrained information processing ability. The eﬀect of heterogeneityis already pronounced. The case with agents’ heterogeneity in information capacity and the cor-responding asset pricing implications are promising reseach directions that will potentially lead tofruitful insights. Acknowledgements

We are grateful for the ﬁnancial support from the NSF of China (Nos.11801099 and 11871364).Chao Deng also appreciates the ﬁnancial support from the Ministry of education of Humanitiesand Social Science project of China (No. 18YJC910005), and the Natural Science Foundation ofGuangdong Province (No. 2017A030310575). Chao Zhou’s work is also supported by SingaporeMOE (Ministry of Educations) AcRF Grants R-146-000-219-112, R-146-000-255-114, and R-146-000-271-112 as well as the French Ministry of Foreign Aﬀairs and the Merlion programme. XizhiSu acknowledges the ﬁnancial support from Centre for Quantitative Finance at NUS.

Appendices

Derivation of PDE solutions.

When the return process is a linear function of ( A t ) t ∈ [0 ,T ] , wefollow the market model (3.20) and (3.21) and solve for the utility maximization problem using PDEapproach. The optimal control and value function can be characterized by HJB equation. When aunique classical solution exists for the PDE, we can apply the Ito’s formula to verify the solutionis the value function of the control problem. The optimal control is obtained as a byproduct. Wenow state the veriﬁcation theorem for classical solutions.Let w be a function in C , ([0 , T ] × R N ) solution to the HJB equation: ∂w∂t ( t, x ) + sup a ∈A (cid:2) L a w ( t, x ) + f ( x, a ) (cid:3) = 0 , ( t, x ) ∈ [0 , T ) × R N ,w ( T, x ) = g ( x ) , x ∈ R N . ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 33

Veriﬁcation theorem.

Suppose there exists a measurable function ˆ a ( t, x ) , ( t, x ) ∈ [0 , T ] × R N ,valued in A attaining the supremum, i.e.sup a ∈A (cid:2) L a w ( t, x ) + f ( x, a ) (cid:3) = L ˆ a ( t,x ) w ( t, x ) + f ( x, ˆ a ( t, x )) , such that the SDE dX u = b ( X u , ˆ a ( u, X u )) du + σ ( X u , ˆ a ( u, X u )) dW u admits a unique solution denoted by ˆ X t,xu , t ≤ u ≤ T , with the initial condition X t = x , and theprocess ˆ α = { ˆ a ( u, ˆ X t,xu ) , t ≤ u ≤ T } lies in A , then w = v , and ˆ α is an optimal feedback control. Partial information HJB equation.

For CARA utility, the solution V ( t, x, y, η ) is smooth.Hence the classical veriﬁcation theorem applies, which states that if the HJB equation has a smoothsolution, then the solution is the value function to the control problem. Let V ( t, x, y, η ) = sup π ∈A E h J ( t ) (cid:12)(cid:12) X t = x, ˜ X t = y, ˆ µ t = η, ˆΣ( t ) = σ i . Omitting the script i when there is no ambiguity. The agent i ’s value function V ( t, η, x, y ) satistifesthe HJB equation, V t + sup π t (cid:26) π t ηV x + α − it ηV y − λ ( η − ¯ µ ) V η + 12 π t σ S V xx + 12 ( α − it ) σ S V yy + 12 ˆΣ( t ) + σ S σ µ ρσ S ! V ηη + α − it π t σ S V xy + π t (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) V xη + α − it (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) V yη (cid:27) = 0 , with terminal condition V ( T, x, y, η ) = − e − δ ((1 − θN ) x − θy ) .By the ﬁrst order condition, the optimal π t is π ∗ t = − ηV x + α − it σ S V xy + (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) V xη σ S V xx . Substitute π ∗ t into the above equation, we obtain the PDE for value function, V t + α − it ηV y − λ ( η − ¯ µ ) V η + 12 ( α − it ) σ S V yy + 12 ˆΣ( t ) + σ S σ µ ρσ S ! V ηη − (cid:16) ηV x + α − it σ S V xy + ( ˆΣ( t ) + σ S σ µ ρ ) V xη (cid:17) V xx σ S = 0 , with terminal condition V ( T, x, y, η ) = − e − δ (cid:16) ( − θN ) x − θy (cid:17) . Make an ansatz V ( t, x, y, η ) = − e − δ (cid:16) ( − θN ) x − θy (cid:17) f ( t, η ). The PDE for f ( t, η ) is given by f t + w α − it ηf − λ ( η − ¯ µ ) f η + 12 w ( α − it ) σ S f + 12 ˆΣ( t ) + σ S σ µ ρσ S ! f ηη − (cid:18) f + w α − it σ S f + (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) f η (cid:19) σ S f = 0with f ( T, η ) = 1 and w = θδ .Further simpliﬁcation gives f t +  w α − it η + 12 w ( α − it ) σ S − (cid:16) η + w α − it σ S (cid:17) σ S  f + 12 ˆΣ( t ) + σ S σ µ ρσ S ! f ηη − λ ( η − ¯ µ ) f η − (cid:16) η + w α − it σ S (cid:17)(cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) σ S f η − ˆΣ( t ) + σ S σ µ ρσ S ! f η f = 0 , with f ( T ) = 1.Set a transformation f ( t, η ) = e g ( t,η ) , then g ( t, η ) satisﬁes the PDE, g t + 12 ˆΣ( t ) + σ S σ µ ρσ S ! g ηη − λ ( η − ¯ µ ) g η − (cid:16) η + w α − it σ S (cid:17)(cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) σ S g η + w α − it η + 12 w ( α − it ) σ S − (cid:16) η + w α − it σ S (cid:17) σ S = 0 , with g ( T, η ) = 0.For the above PDE that is second order in the variable η , we make an ansatz that the solutionis quadratic in η with coeﬃcients as an integral with respect to t : g ( t, η ) = Z Tt h A ( t, s ) η + B ( t, s ) η + C ( t, s ) i ds where A ( t, s ) , B ( t, s ) and C ( t, s ) satisfy the following ODEs:˙ A − λA − (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) σ S A = 0 , ˙ B − λB − ˆΣ( t ) + σ S σ µ ρσ S B + 2 λ ¯ µA + 2 w α − it (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) A = 0 , ˙ C + (cid:0) Σ( t ) + σ S σ µ ρ (cid:1) σ S A + λ ¯ µB + w α − it (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) B = 0 , with A ( t, t ) = − σ S , B ( t, t ) = C ( t, t ) = 0. ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 35

The solution to the ODE system is A ( t, s ) = − σ S e − R st (cid:18) λ + ˆΣ( u )+ σSσµρσS (cid:19) du , (7.1) B ( t, s ) = l ( t, s ) e − R st (cid:18) λ + ˆΣ( u )+ σSσµρσ S (cid:19) du , (7.2)for l ( t, s ) = − σ S Z st (cid:18) λ ¯ µ + w α − it (cid:16) ˆΣ( u ) + σ S σ µ ρ (cid:17)(cid:19) e − R su (cid:18) λ + ( ˆΣ( m )+ σSσµρ ) σS − ˆΣ( m )+ σSσµρσ S (cid:19) dm du, (7.3)and C ( t, s ) = Z st ( ˆΣ( u ) + σ S σ µ ρ ) σ S A ( u, s ) + (cid:16) λ ¯ µ + w α − it ( ˆΣ( u ) + σ S σ µ ρ ) (cid:17) B ( u, s ) ! du. (7.4)Moreover, the strategy π ∗ is given by π ∗ t = δη + (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) δg η ( t, T ) + α − it σ S θσ S (1 − θN )= δσ S (1 − θN ) η + (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) Z Tt (cid:0) A ( t, s ) η + B ( t, s ) (cid:1) ds ! + θ − θN α − it . More explicitly, π ∗ = δσ S (cid:16) − θN (cid:17) (cid:26) η + (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) Z Tt − σ S e − R st (cid:18) λ + ˆΣ( u )+ σSσµρσS (cid:19) du η + ˆΣ( u ) + σ S σ µ ρσ S Z Tt Z st λ ¯ µe − R su (cid:18) λ + m )+ σSσµρ ) σS − ˆΣ( m )+ σSσµρσ S (cid:19) dm due − R st (cid:18) λ + ˆΣ( u )+ σSσµρσ S (cid:19) du ds (cid:27) + δσ S (1 − θN ) ( w i (cid:16) ˆΣ( t ) + σ S σ µ ρ (cid:17) σ S Z Tt e − R st ( λ + ˆΣ( u )+ σSσµρσ S ) du · Z st (cid:16) ˆΣ( u ) + σ S σ µ ρ (cid:17) e − R su (cid:18) λ + m )+ σSσµρ ) σS − ˆΣ( m )+ σSσµρσ S (cid:19) dm duds + θ − θN ) α − it . Appendix B.

The deep learning results with respect to diﬀerent learning rates are shown inthe ﬁgure below. L o ss l o ss ( l o g s c a l e ) l o ss lr=0.001lr=0.002lr=0.0030 2000 4000 6000 80000.20.40.60.81.01.21.4 Y A g e n t Y A g e n t Y A g e n t π A g e n t π A g e n t π A g e n t Figure 5.

The convergence of FBSDE solutions with respect to the trainingepochs with diﬀerent (constant without decay) learning rates. The top panel showsthe loss quantity, with the left two ﬁgures showing the loss in the ordinary scaleand the log scale, respectively. The right-most ﬁgure shows the loss for the last2000 training epochs. The mid row is the initial Y values corresponding to thecomponents of the FBSDE solution in R N . From the left to the right, it is theﬁrst, second and the third component, respectively. The last row shows the initialinvestment amount for agent 1, agent 2 and agent 3, respectively from the left tothe right. ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 37

References [1] Abel, A. B. (1990). Asset prices under habit formation and catching up with the Joneses.Technical report, National Bureau of Economic Research.[2] Agarwal, V., Daniel, N. D., and Naik, N. Y. (2009). Role of managerial incentives and discretionin hedge fund performance.

The Journal of Finance , 64(5):2221–2256.[3] Antonelli, F. (1993). Backward-forward stochastic diﬀerential equations.

The Annals of AppliedProbability , pages 777–793.[4] Basak, S. (2005). Asset pricing with heterogeneous beliefs.

Journal of Banking & Finance ,29(11):2849–2881.[5] Bielagk, J., Lionnet, A., and Reis, G. D. (2017). Equilibrium pricing under relative performanceconcerns.

SIAM Journal on Financial Mathematics , 8(1):435–482.[6] Bj¨ork, T., Davis, M. H., and Land´en, C. (2010). Optimal investment under partial information.

Mathematical Methods of Operations Research , 71(2):371–399.[7] Brendle, S. (2006). Portfolio selection under incomplete information.

Stochastic Processes andtheir Applications , 116(5):701–723.[8] Brown, K. C., Harlow, W. V., and Starks, L. T. (1996). Of tournaments and temptations: Ananalysis of managerial incentives in the mutual fund industry.

The Journal of Finance , 51(1):85–110.[9] Brown, S. J., Goetzmann, W. N., and Park, J. (2001). Careers and survival: Competition andrisk in the hedge fund and CTA industry.

The Journal of Finance , 56(5):1869–1886.[10] Carmona, R. and Lauri`ere, M. (2019a). Convergence analysis of machine learning algorithmsfor the numerical solution of mean ﬁeld control and games: I – the ergodic case. arXiv preprintarXiv:1907.05980 .[11] Carmona, R. and Lauri`ere, M. (2019b). Convergence analysis of machine learning algorithmsfor the numerical solution of mean ﬁeld control and games: II –the ﬁnite horizon case. arXivpreprint arXiv:1908.01613 .[12] Cox, J. C. and Huang, C. (1989). Optimal consumption and portfolio policies when asset pricesfollow a diﬀusion process.

Journal of Economic Theory , 49(1):33–83.[13] Cvitani´c, J. and Karatzas, I. (1992). Convex duality in constrained portfolio optimization.

TheAnnals of Applied Probability , 2(4):767–818.[14] Dai, M., Jin, H., and Liu, H. (2011). Illiquidity, position limits, and optimal investment formutual funds.

Journal of Economic Theory , 146(4):1598–1630.[15] Davis, M. H. and Norman, A. R. (1990). Portfolio selection with transaction costs.

Mathematicsof operations research , 15(4):676–713.[16] DeMarzo, P. M., Kaniel, R., and Kremer, I. (2008). Relative wealth concerns and ﬁnancialbubbles.

The Review of Financial Studies , 21(1):19–50.[17] Detemple, J. B. (1986). Asset pricing in a production economy with incomplete information.

The Journal of Finance , 41(2):383–391.[18] E, W., Han, J., and Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial diﬀerential equations and backward stochastic diﬀerential equations.

Communications in Mathematics and Statistics , 5(4):349–380.[19] Espinosa, G.-E. and Touzi, N. (2015). Optimal investment under relative performance concerns.

Mathematical Finance , 25(2):221–257.[20] Gennotte, G. (1986). Optimal portfolio choice under incomplete information.

The Journal ofFinance , 41(3):733–746. [21] G´omez, J.-P. (2007). The impact of keeping up with the Joneses behavior on asset prices andportfolio choice.

Finance Research Letters , 4(2):95–103.[22] Han, J. and Long, J. (2020). Convergence of the deep BSDE method for coupled FBSDEs.

Probability, Uncertainty and Quantitative Risk , 5(1):1–33.[23] Horst, U. (2005). Stationary equilibria in discounted stochastic games with weakly interactingplayers.

Games and Economic Behavior , 51(1):83–108.[24] Hu, Y., Imkeller, P., and M¨uller, M. (2005). Utility maximization in incomplete markets.

TheAnnals of Applied Probability , 15(3):1691–1712.[25] Hu, Y. and Peng, S. (1995). Solution of forward-backward stochastic diﬀerential equations.

Probability Theory and Related Fields , 103(2):273–283.[26] Hur´e, C., Pham, H., and Warin, X. (2020). Some machine learning schemes for high-dimensional nonlinear PDEs.

Math. Comput. , 89:1547–1579.[27] Karatzas, I., Lehoczky, J. P., and Shreve, S. E. (1987). Optimal portfolio and consumptiondecisions for a “small investor” on a ﬁnite horizon.

SIAM Journal on Control and Optimization ,25(6):1557–1586.[28] Karatzas, I. and Xue, X. (1991). A note on utility maximization under partial observations.

Mathematical Finance , 1(2):57–70.[29] Kramkov, D. and Schachermayer, W. (1999). The asymptotic elasticity of utility functions andoptimal investment in incomplete markets.

Annals of Applied Probability , pages 904–950.[30] Lacker, D. and Zariphopoulou, T. (2019). Mean ﬁeld and n-agent games for optimal investmentunder relative performance criteria.

Mathematical Finance , 29(4):1003–1038.[31] Lee, S. and Papanicolaou, A. (2016). Pairs trading of two assets with uncertainty in co-integration’s level of mean reversion.

International Journal of Theoretical and Applied Finance ,19(08):1650054.[32] Liptser, R. S. and Shiryaev, A. N. (2013).

Statistics of random processes II: Applications ,volume 6. Springer Science & Business Media.[33] Ma, J., Protter, P., and Yong, J. (1994). Solving forward-backward stochastic diﬀerentialequations explicitly—a four step scheme.

Probability Theory and Related Fields , 98(3):339–359.[34] Ma, J., Wu, Z., Zhang, D., and Zhang, J. (2015). On well-posedness of forward–backwardSDEs — a uniﬁed approach.

The Annals of Applied Probability , 25(4):2168–2214.[35] Magill, M. J. and Constantinides, G. M. (1976). Portfolio selection with transactions costs.

Journal of Economic Theory , 13(2):245–263.[36] Mania, M. and Santacroce, M. (2010). Exponential utility maximization under partial infor-mation.

Finance and Stochastics , 14(3):419–448.[37] Matoussi, A., Possama¨ı, D., and Zhou, C. (2015). Robust utility maximization in nondominatedmodels with 2BSDE: the uncertain volatility model.

Mathematical Finance , 25(2):258–287.[38] Merton, R. C. (1975). Optimum consumption and portfolio rules in a continuous-time model.In

Stochastic Optimization Models in Finance , pages 621–661. Elsevier.[39] Papanicolaou, A. (2019). Backward SDEs for control with partial information.

MathematicalFinance , 29(1):208–248.[40] Pardoux, E. and Peng, S. (1992). Backward stochastic diﬀerential equations and quasilin-ear parabolic partial diﬀerential equations. In

Stochastic partial diﬀerential equations and theirapplications , pages 200–217. Springer.[41] Pham, H. and Quenez, M.-C. (2001). Optimal portfolio in partially observed stochastic volatil-ity models.

Annals of Applied Probability , pages 210–238.

ELATIVE WEALTH CONCERNS WITH PARTIAL INFORMATION AND HETEROGENEOUS PRIORS 39 [42] Pliska, S. R. (1986). A stochastic calculus model of continuous trading: optimal portfolios.

Mathematics of Operations Research , 11(2):371–382.[43] Qiu, Z. (2017). Equilibrium-informed trading with relative performance measurement.

Journalof Financial and Quantitative Analysis , 52(5):2083–2118.[44] Rieder, U. and B¨auerle, N. (2005). Portfolio optimization with unobservable markov-modulateddrift process.

Journal of Applied Probability , 42(2):362–378.[45] Rogers, L. (2003). Duality in constrained optimal investment and consumption problems: asynthesis. In

Paris-Princeton Lectures on Mathematical Finance 2002 , pages 95–131. Springer.[46] Rouge, R. and El Karoui, N. (2000). Pricing via utility maximization and entropy.

Mathemat-ical Finance , 10(2):259–276.[47] Sass, J. and Haussmann, U. G. (2004). Optimizing the terminal wealth under partial informa-tion: The drift process as a continuous time Markov chain.

Finance and Stochastics , 8(4):553–577.[48] Shreve, S. E. and Soner, H. M. (1994). Optimal investment and consumption with transactioncosts.

The Annals of Applied Probability , 4(3):609–692.[49] Wang, G. and Wu, Z. (2008). Kalman–Bucy ﬁltering equations of forward and backwardstochastic systems and applications to recursive optimal control problems.

Journal of MathematicalAnalysis and Applications , 342(2):1280–1296.[50] Yong, J. (1997). Finding adapted solutions of forward–backward stochastic diﬀerential equa-tions: method of continuation.

Probability Theory and Related Fields , 107(4):537–572.[51] Zariphopoulou, T. (1994). Consumption-investment models with constraints.

SIAM Journalon Control and Optimization , 32(1):59–85.[52] Zhang, J. (2006). The wellposedness of FBSDEs.