[PDF] TPLVM: Portfolio Construction by Student's t -process Latent Variable Model

Abstract

Optimal asset allocation is a key topic in modern finance theory. To realize the optimal asset allocation on investor's risk aversion, various portfolio construction methods have been proposed. Recently, the applications of machine learning are rapidly growing in the area of finance. In this article, we propose the Student's t -process latent variable model (TPLVM) to describe non-Gaussian fluctuations of financial timeseries by lower dimensional latent variables. Subsequently, we apply the TPLVM to minimum-variance portfolio as an alternative of existing nonlinear factor models. To test the performance of the proposed portfolio, we construct minimum-variance portfolios of global stock market indices based on the TPLVM or Gaussian process latent variable model. By comparing these portfolios, we confirm the proposed portfolio outperforms that of the existing Gaussian process latent variable model.

Full PDF

aa r X i v : . [ q -f i n . P M ] J a n T PLV M: P

ORTFOLIO C ONSTRUCTION B Y S TUDENT ’ S t - PROC ESS L ATENT V AR IABLE M ODEL

A P

REPRINT

Yusuke Uchiyama

MAZIN Inc.,3-29-14 Nishi-Asakusa, Tito city, Tokyo 111-0035 Japan [email protected]

Kei Nakagawa

Innovation LabNomura Asset Management Co., Ltd.1-11-1 Nihonbashi, Chuo-ku, Tokyo, 103-8260, Japan [email protected]

February 18, 2020 A BSTRACT

Optimal asset allocation is a key topic in modern ﬁnance theory. To realize the optimal asset al-location on investor’s risk aversion, various portfolio construction methods have been proposed.Recently, the applications of machine learning are rapidly growing in the area of ﬁnance. In this ar-ticle, we propose the Student’s t -process latent variable model (TPLVM) to describe non-Gaussianﬂuctuations of ﬁnancial timeseries by lower dimensional latent variables. Subsequently, we applythe TPLVM to minimum-variance portfolio as an alternative of existing nonlinear factor models. Totest the performance of the proposed portfolio, we construct minimum-variance portfolios of globalstock market indices based on the TPLVM or Gaussian process latent variable model. By compar-ing these portfolios, we conﬁrm the proposed portfolio outperforms that of the existing Gaussianprocess latent variable model. K eywords Student’ t -process · Latent variable model · Factor model · Portfolio theory · Global stock markets

Estimation of covariance matrix of timeseries plays a dominant role in applications of modern ﬁnancial theory. Theoptimization of mean-variance portfolio, which is one of the pioneering works of the modern ﬁnance theory [1], isbased on the covariance matrix of the multi-dimensional timeseries of return of assets. Since the return of assetsare modelled by non-stationary stochastic processes, the covariance matrix should be estimated as a time-dependentsymmetric matrix. In practice, we often estimate the covariance matrix by empirical time averaging, because of thelack of complete information of the corresponding probabilistic space. It is however pointed out that time averagingoften causes serious estimation error of the covariance matrix in the case of larger assets [2, 3]. To overcome thisproblem, several inference methods are proposed from the point of view of the random matrix theory [4, 5].With the aid of recently growing machine learning techniques, we can improve the accuracy of the estimation of thecovariance matrix [6, 7]. Furthermore, the applications of the machine learning techniques have been spreading inboth theoretical and practical ﬁnancial problems [8, 9]. The prediction of the future price is implemented by the deepneural networks of various modeling [10, 11]. The Gaussian process is used as a model of dynamics of the covariancematrix of multi-dimensional timeseries. In the literature of option pricing theory, the model of the volatility of a riskyasset is given by the Gaussian process [12]. In particular, the application of the machine learning techniques for theportfolio optimization has attracted the interest of both academia and industry [13, 14].In the ﬁeld of mathematical ﬁnance, stochastic volatility models have been utilized in estimating dynamic covariancematrix of the return of assets. One of the most popular conditional volatility models is the generalized autoregressiveconditional heteroscedasticity (GARCH) model [15], which describes the volatility clustering of the return of assets.To introduce a time-varying correlation structure to these conditional volatility models, the dynamic conditional cor-

PREPRINT - F

EBRUARY

18, 2020relation (DCC) GARCH model has been proposed [16]. The parameters of the GARCH and DCC GARCH can beestimated by the method of maximum likelihood.On the other hand, in the literature of the machine learning, some kinds of latent variable models can be utilized toinfer the dynamics of the covariance matrix. Recently, the Gaussian process latent variable model (GPLVM) has beenemployed to the problem of the portfolio optimization, where latent variables are introduced as factors of return of theassets. Namely, this model can be interpreted as a latent variable factor model [17].Despite these practical applications, we should reconsider the assumption and validation of the use of the GPLVMfor ﬁnance because the GPLVM assumes that observed data follows the Gaussian distribution. In the most case ofﬁnancial problems, the return of assets is regarded as an observed variable. It is well known that the ﬂuctuations of thereturn of assets follow non-Gaussian distributions[18]. To describe such ﬂuctuations, some fat-tailed distributions havebeen presented and applied to the ﬁnancial timeseries. Thus, the GPLVM should be extended to fat-tailed distributionswhen we use it for the ﬁnancial problems.In this paper, we propose Student’s- t process latent variable model (TPLVM) as an extension of the GPLVM. Thismodel is developed based on the Student’s t -distribution, which is a symmetric fat-tailed distribution. Since theStudent’s t -distribution converges to the Gaussian distribution with the limit of a parameter, degree of freedom, theTPLVM includes the GPLVM as a special case. To use the TPLVM in practice, as well as the GPLVM, we derive itspredictive distribution as closed form and an estimator of hyper parameters by the variational inference in Bayesiansense.The reminder of this paper is organized as follows. Chapter 2 gives a brief introduction the GPLVM including theGaussian process with the concept of kernel functions. In Chap. 3, we introduce the formula of TPLVM, whichconsists of the kernel functions, predictive distribution and variational inference for estimating hyper parameters. As apreliminary preparation of ﬁnance, we explain the basis of factor model and portfolio optimization in Chap. 4. Chapter5 implements portfolio optimization, where we compare the performance of the GPLVM and TPLVM. Chapter 6 isdedicated to conclusions and future works. The Gaussian process, a kind of stochastic processes, is a non-parametric method of machine learning method [19, 20].This has been ﬁrstly introduced to describe random dynamics such as a ﬂuctuating pollen on water surface known asBrownian motion [21]. Without loss of generality, the argument of the Gaussian process can be extended from one-dimensional time to multi-dimensional feature space. In this chapter, we provide a short review of the Gaussian processfor multi-dimensional features as the preliminary preparation of the proposed model.For a sequence of input features { x , x , · · · , x n } , a stochastic process f ( · ) is the Gaussian process when the sequenceof random variables { f ( x ) , f ( x ) , · · · , f ( x n ) } is sampled from the multivariate Gaussian distribution. In general, theform of the multivariate Gaussian distribution is determined by the mean vector and covariance matrix. Likewise, theGaussian process are speciﬁed by the mean and covariance function of input features. Thus, the Gaussian process isregarded as the representation of the inﬁnite dimensional Gaussian distribution.The mean and covariance functions are deﬁned as follows: m ( x ) = E [ f ( x )] , (1) k ( x, x ′ ) = E [( f ( x ) − m ( x ))( f ( x ′ ) − m ( x ′ ))] , (2)where the operator E [ · ] denotes expectation, m ( · ) and k ( · , · ) are respective mean and covariance functions. The meanvector and covariance matrix of the Gaussian process for given dataset are represented by m i = m ( x i ) , (3) K i,j = k ( x i , x j ) . (4)On these setting, the stochastic process f ( · ) is sampled from N ( m ( · ) , K ( · , · )) . In this situation, the stochastic process f ( · ) is the Gaussian process expressed as f ∼GP ( m, K ) . The covariance function satisﬁes to be symmetric andpositive deﬁnite, and thus is also called as kernel function. In the literature of the Gaussian process, the covariancematrix is often called as kernel matrix. The mathematical characteristics of the kernel functions are explained in [22].Given an additional dataset D ∗ = { x ′ , x ′ , · · · , x ′ n ′ } , the corresponding outputs { y ′ , y ′ , · · · , y ′ n ′ } can be predicted bythe conditional Gaussian process with prior dataset D = { ( x , y ) , ( x , y ) , · · · , ( x n , y n ) } . With notations that X = PREPRINT - F

EBRUARY

18, 2020 [ x , x , · · · , x n ] T , X ∗ = [ x ′ , x ′ , · · · , x n ′ ] T and Y = [ y , y , cdots, y n ] , the predictive distribution of the conditionalGaussian process is also given by the Gaussian process GP ( f ∗ , K ∗ ) , where f ∗ = m X + K X ∗ ,X K − X,X Y, (5) K ∗ = K X ∗ ,X ∗ − K X ∗ ,X K − X,X K X,X ∗ . (6)In Eqs. (5) and (6), it is seen that the covariance functions propagate the information about D to D ∗ . Hence, thecovariance functions play the dominant role in the use of the Gaussian process. In the literature of big data analysis, it is often expected that observed variables can be explained by lower dimensionallatent variables. For this purpose, various methods of dimension reduction have been developed. One of the mostpopular methods is the principal component analysis (PCA), which extracts latent variables by the singular value de-composition. To extend the PCA for nonlinear and random data, the Gaussian process latent variable model (GPLVM)has been proposed [23]. The GPLVM expresses the nonlinearity of both observed and latent variables by the covari-ance function. The randomness is assumed to be originate from the Gaussian distribution.To describe an observed variable y ∈ R D , we introduce a latent variable x ∈ R Q with Q < D , and a nonliner map f : R Q → R D with a Q -dimensional noise ǫ ∼N (0 , σ I ) as y = f ( x ) + ε. (7)For this latent variable model, we assume that the nonlinear map f ( · ) is sampled from the Gaussian process as f ∼N (0 , K ) . This model is known as the GPLVM. For the sake of brevity, we introduce notations for the set oflatent and observed variables as X = [ x , x , · · · , x N ] T and Y = [ y , y , · · · , y N ] T . Assume that the columns ofthe observed matrix Y ∈ R N × D are samples from the independently identical distributed Gaussian distributions whichhave the covariance functions with respect to the latent variable matrix X ∈ R N × Q , the probability density function ofthe GPLVM is introduced as follows: p ( Y | X ) = 1(2 π ) ND/ | K X,X | D/ exp (cid:18) − Y T K − X,X Y (cid:19) . (8)In the GPLVM, hyperparameters of the covariance functions and latent variables are inferred by several existingmethods such as gradient methods, variational inference and Markov Chain Monte Carlo methods. t -process latent variable model t -process The Gaussian process has diverse applications in the ﬁelds of computer science, robotics and others. However, itseems not to be applicable to problems in ﬁnance because the ﬂuctuations of the ﬁnancial data follow non-Gaussiandistributions with fat-tails. It is thus necessary to extend the methods of the Gaussian process non-Gaussian stochasticprocesses with fat-tails.For this purpose, the Student’s t -process was proposed as a generalization of the Gaussian process [24]. This stochasticprocess follows the Student’s t -distribution, of which tails show power-law behaviours. As with the Gaussian process,the Student’s t -process is speciﬁed by the mean and covariance functions. Given the mean and covariance functions,the probability density function of the Student’s t -process is deﬁned as T ( m, K, ν ) = Γ (cid:0) ν + N (cid:1) [( ν − π ] N Γ (cid:0) ν (cid:1) | K | (cid:20) ν − y − m ) T K − ( y − m ) (cid:21) − ν + N , (9)where Γ( · ) is the multivariate gamma function and the positive real parameter ν is degrees of freedom. In this setting,the stochastic process f ( · ) is the Student’s t -process expressed as f ∼T P ( m, K ; ν ) . Note that the Student’s t -processconverges to the Gaussian process at the limit of ν →∞ .The conditional distribution of the Student’s t -process can be also derived analytically and given as the conditionalStudent’s t -distribution. Namely, we can update the mean and covariance functions and the degrees of freedom fromthe conditional distribution. Through cumbersome calculations, the renewal formulas of the mean and covariance3 PREPRINT - F

EBRUARY

18, 2020functions and the degrees of freedom are derived as follows: m ∗ = m + K X ∗ ,X K − X,X Y, (10) K ∗ = ν − β − ν − N − h K X ∗ ,X ∗ − K X ∗ ,X K − X,X K X,X ∗ i , (11) β = ( Y − m X ) T K − X,X ( Y − m X ) , (12) ν ∗ = ν + N. (13)It is seen that the renewal formula of the covariance function explicitly depends on the number of observed variables,which property does not appear in the case of the Gaussian process. Hence, the Student’s t -process is regarded toutilize prior information more effectively than the Gaussian process. t -process latent variable model To extend the GPLVM to stochastic processes following non-Gaussian distributions, we propose the Student’s- t pro-cess latent variable model (TPLVM). Suppose an observed variable y ∈ R D is explained by a low dimensional latentvariable x ∈ R Q ( Q < D ) by a nonlinear map f : R D → R Q , f ∼T P ( m, K ; ν ) , the TPLVM is introduced as follows: p ( Y | X ) = Γ (cid:0) ν + D (cid:1) [( ν − π ] D Γ (cid:0) ν (cid:1) | K X,X | (cid:20) ν − Y − m X ) T K − X,X ( Y − m X ) (cid:21) − ν + D . (14)The nonlinear dependency of the latent matrix X ∈ R N × Q is given through the covariance matrix. It is expected thatthe TPLVM provides a robust estimation especially for observed data with large ﬂuctuations because the Student’s t -distribution can capture large deviated data from the Gaussian distribution in its sampling.As with the GPLVM, the latent variable and hyperparameters of the TPLVM can be estimated from its likelihood. Thelogarithmic likelihood of the TPLVM is given as log p ( Y | X ) = log Γ (cid:18) ν + D (cid:19) − D ν − π ] − log Γ (cid:16) ν (cid:17) −

12 log | K X,X |− ν + D (cid:20) ν − Y − m X ) T K − X,X ( Y − m X ) (cid:21) , (15)By means of existing optimization methods, we can estimate the latent variables and hyperparameters of the covariancefunction and the degrees of freedom. However, it is known that the optimization of the covariance function with respectto the latent variables often induces numerical instability because of its complexity. Hence, we should carefully selectthe initial values of optimization procedures and repeat with diverse seeds of the initial values to refuse dropping inlocal minima. To overcome the shortcomings of the method of maximum-likelihood, we utilize the method of variational infer-ence [25]. Instead of optimizing the logarithmic likelihood in Eq. (15), we consider that of posterior p ( X | Y ) = p ( Y | X ) p ( X ) /p ( Y ) in the Bayesian sense. In solving the optimization problem with respect to the posterior, wetry to approximate p ( X | Y ) by q ( X ) . As a measure of the difference between two probability density functions, weintroduce the Kullback-Leibler (KL) divergence as follows: KL[ q ( X ) || p ( X | Y )] = Z log q ( X ) p ( X | Y ) q ( X )d X. (16)With the use of the Bayes theorem, the KL divergence is alternatively represented as KL[ q ( X ) || p ( X | Y )] = − Z log p ( Y | X ) p ( X ) q ( X ) q ( X )d X + log p ( Y ) . (17)Since the second term in the right hand side in Eq. (17) does not depend on q ( · ) , we just have to maximize the ﬁrstterm in the right hand side, which is known as the evidence lower bound (ELBO), to minimize the KL divergence. TheELBO provides the lower bound of the evidence log p ( Y ) because the KL divergence is non-negative. Therefore, thisprocedure realizes the sufﬁcient ﬁtting of the observed data at the same time. Indeed, the maximization of the ELBOserves the best explanation of the reduced dimension Q of the latent variable.4 PREPRINT - F

EBRUARY

18, 2020

Arbitrage pricing theory [26] assumes that the D -days expected return of an asset r n ∈ R N is explained by the factormodel as r n = α n + F β n + ǫ, (18)where α n ∈ R D is an excess return, β n ∈ R Q is weight coefﬁcients, F ∈ R D × Q is a factor matrix, and ǫ ∈ R D is an errorterm with zero mean and a ﬁnite covariance. The factor model manifests that the return of the asset is originated fromthe returns of Q -factors. In fact, without the excess return α n , the expected return of the factor model is derived asfollows: E [ r n ] = E [ F ] β n . (19)The special case of this formula with only one factor is known as the model of the capital asset pricing model, whichis a cornerstone of the modern ﬁnance theory [27].The weight coefﬁcients β n in the factor model in Eq. (18) can be interpreted as latent variables which explain thereturn of the asset. Based on this idea, we introduce a nonlinear factor model as r n = f ( β n ) . (20)This model is regarded as a latent variable counterpart of nonlinear factor model [10]. Here, we employ the Student’s t -process as the model of nonlinear mapping f : R Q → R D . In other words, the nonlinear factor model in Eq. (20)is given by the TPLVM. The nonlinear correlation of the latent variable factors depends on the speciﬁc form of thecovariance function of the TPLVM, and the predicted return of the asset can be inferred by the predicted distribution.Furthermore, the nonlinear factor model can be interpreted as a dimension reduction model when Q < D . Hence wecan expect to obtain the essential lower dimensional variable which explains the dynamics of the return of the asset.

Markowitz established the modern portfolio theory on the mean-variance portfolio. In this theory, a portfolio consistsof multi assets classes such as stock, bond, currency and commodity with their optimal allocations based on bothindividual and entangled risk of assets.The mean-variance portfolio is designed by the constrained quadratic programming problem with respect to the objec-tive function as w T Kw − λ ( E [ r ] − µ ) , (21)where w ∈ R D is the weight coefﬁcients of the portfolio, K ∈ R D × D is the covariance matrix of the returns, λ is aLagrangian multiplier, r is the return of the portfolio and µ is the expected return of the portfolio. In practical use,the return of the portfolio is quite hard to be estimated, whereby, without the constraint condition of the expectedreturn, the mean-variance portfolio is often replaced by the minimum-variance portfolio with empirically estimatedcovariance matrix. In this section, we test the performance of the minimum-variance portfolio with the TPLVM by comparing with thatwith the GPLVM. Before proceeding, we explain the experimental dataset of our performance test.As the experimental data, we use the following global stock market indices: S&P 500 (US), S&P/TSX 60 (Canada),FTSE 100 (UK), CAC 40 (France), DAX (Germany), IBEX 35 (Spain), FTSE MIB (Italy), AEX (the Netherlands),OMX 30 (Sweden), SMI (Switzerland), Nikkei 225 (Japan), HKHSI (Hong Kong), ASX 200 (Australia), KOSPI(Korea), OBX (Norway), MSCI (Singapore). These stock indices are sampled every month between Jun 1998 to Jun2019 from the Bloomberg’s data platform. The statistics of the return of the stock indices are shown in Table 1. In thistable, mean (Mean), standard deviation (Std.), the ratio of mean and standard deviation (R/R), skewness (Skew) andkurtosis (Kurtosis) of returns of the stock indices are presented.With the use of the historical returns of the stock indices, we construct the minimum-variance portfolios based onthe GPLVM (

Port G ) and TPLVM ( Port t ). The covariance matrix of each portfolio is estimated by the covariancefunction with 120 past samples. As the kernel function, we utilize the exponential kernel deﬁned as k Exp ( x, x ′ ) = θ exp ( − θ − || x − x ′ || ) (22)5 PREPRINT - F

EBRUARY

18, 2020Table 1: Statistics of global market indicesUS Canada UK France Germany Spain Italy NetherlandsMean [%] 6.00 5.41 2.39 4.08 6.87 3.20 1.35 2.96Std. [%] 14.93 14.92 13.62 18.12 21.13 20.66 21.71 19.13R/R 0.40 0.36 0.18 0.23 0.33 0.15 0.06 0.15Skew -0.66 -0.92 -0.55 -0.38 -0.50 -0.17 0.03 -0.74Kurtosis 5.23 7.36 4.53 4.52 6.12 4.96 4.80 5.88Sweden Switzerland Japan HongKong Australia Korea Norway SingaporeMean [%] 6.32 2.80 3.35 7.27 4.70 12.98 10.72 5.05Std. [%] 19.51 14.68 19.24 23.46 12.40 28.80 21.49 21.71R/R 0.32 0.19 0.17 0.31 0.38 0.45 0.50 0.23Skew -0.19 -0.73 -0.54 0.28 -0.69 1.39 -0.93 -0.26Kurtosis 5.29 6.11 4.75 5.78 4.54 11.63 6.84 6.81Table 2: Performance of

Port G and Port t Port G Port t DifferenceAnterior half (Jun 2008 - Jun 2013)Return -4.89% -2.63% 2.25%Risk 19.57% 18.33% -1.24%R/R -0.25 -0.14 0.11Posterior half (Jul 2013 - Jun 2019)Return 6.08% 6.30% 0.22%Risk 11.16% 10.56% -0.60%R/R 0.54 0.60 0.05Whole period (Jun 2008 - Jun 2019)Return 0.64% 1.87% 1.23%Risk 15.92% 14.93% -0.99%R/R 0.04 0.12 0.09with θ l ( l = 1 , being hyper parameters. For the sake of brevity, the dimension of the latent variables are ﬁxed Q = 1 . Under these conditions, we compare the performance of the Port G and Port t by its annualized return(Return), annualized risk as the standard deviation of return (Risk), risk/return (R/R) as return divided by risk. Return = 12 T T X t =1 R Pt (23) Risk = r T − × ( R Pt − µ P ) (24) R / R = Return / RISK (25)Here, R Pt indicates GPLVM or TPLVM portfolio return at time t , and µ P = (1 /T ) P Tt =1 R Pt denotes the averagereturn of the GPLVM or TPLVM portfolio.Table 2 shows the performances of the portfolios by comparing annual return, risk and return-risk ratio. The sampleperiod is separated into anterior half period (Jun 2008 - Jun 2013) and posterior half period (Jul 2013 - Jun 2019).Note that the anterior half period contains the global ﬁnancial crisis 2007-2008. As is seen in this table, the Port t outperforms the Port G in the both half periods. In particular, the difference of the annual return in the anterior halfperiod is larger than that in the posterior half period. It is said that the market volatility during the global ﬁnancialcrisis intensively ﬂuctuated whereby non-Gaussian nature clearly emerged in the global stock market. In such situation,the TPLVM is a consistent model to describe the intermittent volatility ﬂuctuations. Thus, we can construct a robustportfolio by the TPLVM based minimum-variance portfolio. In the literature of Bayesian machine learning, the Gaussian process has been developed and utilized to the diverse areaincluding ﬁnance. It is. however, well known that the historical ﬁnancial data follows non-Gaussian distributions. The6

PREPRINT - F

EBRUARY

18, 2020Student’s t -process is proposed, as the generalization of the Gaussian process, to model the observed data followingthe non-Gaussian distributions with fat-tails.In this paper, we proposed the TPLVM by incorporating the latent variables into the Student’s t -process. The TPLVMcan be used to reduce the number of explanation variable following the non-Gaussian distributions with fat-tails. Thenonlinear correlation of the TPLVM is modelled by prescribed kernel functions. The hyperparameters of the TPLVMcan be determined by the method of maximum-likelihood. As a robust parameter optimization, we presented themethod of variational inference of the TPLVM, which utilize the information of prior distribution of latent variables.The problem of the portfolio optimization has been studied in both academia and industry. We applied the TPLVM intothe portfolio optimization with the use of the minimum-variance portfolio. To test the performance of the proposedportfolio, we implemented the empirical analysis for the global stock market data and compared the Port G with Port t . It was shown that the Port t outperforms the Port G in the whole test periods because Port t can capture thenon-Gaussian nature of the global stock market especially in the period of the global ﬁnancial crisis.The TPLVM can be applied other risk-based portfolios such as risk parity [28], maximum risk diversiﬁcation [29], andcomplex valued risk diversiﬁcation [30]. These applications are expected to show high-performance compared withconventional ones. In addition, the TPLVM can be modiﬁed to a latent variable dynamical model to catch the natureof historical volatility ﬂuctuations. These ways of research are our future works. References [1] Harry Markowitz. Portfolio selection.

The journal of ﬁnance , 7(1):77–91, 1952.[2] Kei Nakagawa, Mitsuyoshi Imamura, and Kenichi Yoshida. Risk-based portfolios with large dynamic covariancematrices.

International Journal of Financial Studies , 6(2):52, 2018.[3] Robert F Engle, Olivier Ledoit, and Michael Wolf. Large dynamic covariance matrices.

Journal of Business &Economic Statistics , 37(2):363–375, 2019.[4] Olivier Ledoit, Michael Wolf, et al. Nonlinear shrinkage estimation of large-dimensional covariance matrices.

The Annals of Statistics , 40(2):1024–1060, 2012.[5] Olivier Ledoit and Michael Wolf. Nonlinear shrinkage of the covariance matrix for portfolio selection:Markowitz meets goldilocks.

The Review of Financial Studies , 30(12):4349–4388, 2017.[6] Xixian Chen, Michael R. Lyu, and Irwin King. Toward efﬁcient and accurate covariance matrix estimation oncompressed data. In

Proceedings of the 34th International Conference on Machine Learning , pages 767–776.PMLR, 2017.[7] Yue Wu, José Miguel Hernández Lobato, and Zoubin Ghahramani. Dynamic covariance models for multivari-ate ﬁnancial time series. In

Proceedings of the 30th International Conference on International Conference onMachine Learning - Volume 28 , page 558––566. JMLR.org, 2013.[8] George S Atsalakis and Kimon P Valavanis. Surveying stock market forecasting techniques–part ii: Soft com-puting methods.

Expert Systems with Applications , 36(3):5932–5941, 2009.[9] Rodolfo C Cavalcante, Rodrigo C Brasileiro, Victor LF Souza, Jarley P Nobrega, and Adriano LI Oliveira. Com-putational intelligence and ﬁnancial markets: A survey and future directions.

Expert Systems with Applications ,55:194–211, 2016.[10] Kei Nakagawa, Takumi Uchida, and Tomohisa Aoshima. Deep factor model. In

ECML PKDD 2018 Workshops ,pages 37–50. Springer, 2018.[11] Kei Nakagawa, Tomoki Ito, Masaya Abe, and Kiyoshi Izumi. Deep recurrent factor model: interpretable non-linear and time-varying multi-factor model. arXiv preprint arXiv:1901.11493 , 2019.[12] Yue Wu, José Miguel Hernández-Lobato, and Zoubin Ghahramani. Gaussian process volatility model. In

Ad-vances in Neural Information Processing Systems 27 , pages 1044–1052. Curran Associates, Inc., 2014.[13] Weiwei Shen, Jun Wang, Yu-Gang Jiang, and Hongyuan Zha. Portfolio choices with orthogonal bandit learning.In

Proceedings of the 24th International Conference on Artiﬁcial Intelligence , page 974–980. AAAI Press, 2015.[14] Qiang Song, Anqi Liu, and Steve Y. Yang. Stock portfolio selection using learning-to-rank algorithms with newssentiment.

Neurocomputing , 264(15):20–28, 2017.[15] Tim Bollerslev. Generalized autoregressive conditional heteroskedasticity.

Journal of econometrics , 31(3):307–327, 1986. 7

PREPRINT - F

EBRUARY

18, 2020[16] Robert Engle. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive condi-tional heteroskedasticity models.

Journal of Business & Economic Statistics , 20(3):339–350, 2002.[17] Rajbir S Nirwan and Nils Bertschinger. Applications of gaussian process latent variable models in ﬁnance. In

Proceedings of SAI Intelligent Systems Conference , pages 1209–1221. Springer, 2019.[18] Benoit B Mandelbrot. The variation of certain speculative prices. In

Fractals and scaling in ﬁnance , pages371–418. Springer, 1997.[19] Carl Edward Rasmussen. Gaussian processes in machine learning. In

Summer School on Machine Learning ,pages 63–71. Springer, 2003.[20] Christopher KI Williams and Carl Edward Rasmussen.

Gaussian processes for machine learning , volume 2. MITpress Cambridge, MA, 2006.[21] A. Einstein. Über die von der molekularkinetischen theorie der wärme geforderte bewegung von in ruhendenﬂüssigkeiten suspendierten teilchen.

Annalen der Physik , 322(8):549–560, 1905.[22] Thomas Hofmann, Bernhard Schölkopf, and Alexander J Smola. Kernel methods in machine learning.

Theannals of statistics , pages 1171–1220, 2008.[23] Neil D Lawrence. Gaussian process latent variable models for visualisation of high dimensional data. In

Ad-vances in neural information processing systems , pages 329–336, 2004.[24] Amar Shah, Andrew Wilson, and Zoubin Ghahramani. Student-t processes as alternatives to gaussian processes.In

Artiﬁcial intelligence and statistics , pages 877–885, 2014.[25] Andreas C Damianou, Michalis K Titsias, and Neil D Lawrence. Variational inference for latent variables anduncertain inputs in gaussian processes.

The Journal of Machine Learning Research , 17(1):1425–1486, 2016.[26] Stephen A Ross. The arbitrage theory of capital asset pricing. In

Handbook of the fundamentals of ﬁnancialdecision making: Part I , pages 11–30. World Scientiﬁc, 2013.[27] Campbell R Harvey, Yan Liu, and Heqing Zhu. . . . and the cross-section of expected returns.

The Review ofFinancial Studies , 29(1):5–68, 2016.[28] Edward Qian. Risk parity and diversiﬁcation.

The Journal of Investing , 20(1):119–127, 2011.[29] Yves Choueifaty and Yves Coignard. Toward maximum diversiﬁcation.

The Journal of Portfolio Management ,35(1):40–51, 2008.[30] Yusuke Uchiyama, Takanori Kadoya, and Kei Nakagawa. Complex valued risk diversiﬁcation.