[PDF] A New Parametrization of Correlation Matrices

Abstract

We introduce a novel parametrization of the correlation matrix. The reparametrization facilitates modeling of correlation and covariance matrices by an unrestricted vector, where positive definiteness is an innate property. This parametrization can be viewed as a generalization of Fisther's Z-transformation to higher dimensions and has a wide range of potential applications. An algorithm for reconstructing the unique n x n correlation matrix from any d-dimensional vector (with d = n(n-1)/2) is provided, and we derive its numerical complexity.

Full PDF

AA New Parametrization of Correlation Matrices ∗ Ilya Archakov a and Peter Reinhard Hansen b † a University of Vienna b University of North Carolina & Copenhagen Business SchoolDecember 7, 2020

Abstract

We introduce a novel parametrization of the correlation matrix. The reparametrization facilitatesmodeling of correlation and covariance matrices by an unrestricted vector, where positive deﬁnite-ness is an innate property. This parametrization can be viewed as a generalization of Fisher’s Z -transformation to higher dimensions and has a wide range of potential applications. An algorithmfor reconstructing the unique n × n correlation matrix from any vector in R n ( n − / is provided, andwe derive its numerical complexity. Keywords:

Correlation Matrix, Covariance Modeling, Fisher Transformation.

JEL Classiﬁcation:

C10; C22; C58 ∗ We are grateful for many valuable comments made by Immanuel Bomze, Bo HonorÃ©, Ulrich MÃŒller, Georg Pﬂug,Werner Ploberger, Rogier Quaedvlieg, and Christopher Sims, as well as many conference and seminar participants. † Address: University of North Carolina, Department of Economics, 107 Gardner Hall Chapel Hill, NC 27599-3305 a r X i v : . [ ec on . E M ] D ec Introduction

We propose a new way to parametrize a covariance matrix that ensures positive deﬁniteness withoutimposing additional restrictions. The central element of the parametrization is the matrix logarithmictransformation of the correlations matrix, log C , whose lower oﬀ-diagonal elements are stacked into thevector γ = γ ( C ). We show that this transformation deﬁnes a one-to-one correspondence between theset of n × n non-singular correlation matrices and R n ( n − / , and we propose a fast algorithm for thecomputation of the inverse mapping. In the bivariate case, n = 2, γ ( C ) is identical to the Fishertransformation, and simulation results suggest that γ ( C ) inherits some of the attractive properties ofthe Fisher transformation when n > n × n covariance matrix can be expressed as a unique vector in R n ( n +1) / that consists of the n log-variances and γ . This facilitates the modeling of covariance matricesin terms of an unrestricted vector in R n ( n +1) / . In models with dynamic covariance matrices, such asmultivariate GARCH models and stochastic volatility models, the parametrization oﬀers a new wayto structure multivariate volatility models. The vector representation oﬀers new ways to regularizinglarge covariance matrices by imposing structure on γ . The new parametrization can also be used tospecify distributions on the space of non-singular correlation matrices and covariance matrices. Thiscould be useful in multivariate stochastic volatility models and Bayesian analysis.It is convenient to reparametrize a covariance matrix as a vector that is unrestricted in R d , and theliterature has proposed several methods to this end, see Pinheiro and Bates (1996). These methodsinclude the Cholesky decomposition, the spherical trigonometric transformation, transformations basedon partial correlation vines, and methods based on the spectral representation, such as the matrixlogarithm, see e.g. Kurowicka and Cooke (2003). The matrix logarithm has been used in the modelingof covariance matrices in Leonard and Hsu (1992) and Chiu et al. (1996). In GARCH and stochasticvolatility models it was used in Kawakatsu (2006), Ishihara et al. (2016), and Asai and So (2015),and Bauer and Vorkink (2011) used the matrix logarithm for modeling and forecasting of realizedcovariance matrices. The transformation also emerges as a special case of the Box-Cox transformation,see Weigand (2014) for an application to realized covariance matrices.We do not apply the matrix logarithm to covariance matrices, but to correlation matrices. Model-ing the correlation matrix separately from the individual variances is commonly done in multivariateGARCH models, see e.g. Bollerslev (1990), Engle (2002), Tse and Tsui (2002), and Engle and Kelly(2012). The new parametrization can be used to deﬁne a new family of multivariate GARCH models, Code for this algorithm (Julia, Matlab, Ox, Python, and R) is provided in the Web Appendix. C ( γ ), in Section 5. We conclude and summarize in Section 6. All proofs are given in the Appendix,and additional results and computer code are collected in the Web Appendix, see Archakov and Hansen(2020). We motivate the proposed method by considering a non-singular 2 × σ and σ and the correlation ρ = σ / ( σ σ ) ∈ ( − , v = (log σ , log σ , (cid:122) ( ρ )) , where (cid:122) ( ρ ) = log ρ − ρ is the Fisher transformation. Because any v ∈ R maps to a unique non-singular covariance matrix this deﬁnes a one-to-one mapping between thenon-singular covariance matrices and R . The vector parametrization is convenient because a positivedeﬁnite covariance matrix is guaranteed without imposing additional restrictions.We seek a similar parametrization of covariance matrices when n >

2. Speciﬁcally, a mapping sothat 1) Any non-singular covariance matrix, Σ, maps to a unique vector v = ν (Σ) ∈ R d ; 2) Any vector v ∈ R d maps to a unique covariance matrix Σ = ν − ( v ); 3) The parametrization, v = ν (Σ), is “invariant”to the ordering of the variables that deﬁne Σ; and 4) the elements of v are easily interpretable.The parametrization, v = (log σ , log σ , log ρ − ρ ) , has all these above properties. The Choleskyrepresentation is not invariant to the ordering of variables. The matrix logarithm transformation ofcovariance matrix, log Σ, satisﬁes the ﬁrst three three properties, but the resulting elements are diﬃcultto interpret, because they depend non-linearly on all elements of Σ. For n > Returning to the case with a 2 × × For instance, the inverse Fisher transformation of, −

2, 0, and will result in three correlations that, combined, willproduce a “correlation matrix” with a negative eigenvalue.  ρρ  =  log(1 − ρ ) log ρ − ρ log ρ − ρ log(1 − ρ )  . In this paper, we propose to parametrize correlation matrices using the oﬀ-diagonal elements of log C ,so that an n × n covariance matrix, Σ, is parametrized by the n log-variances and the n ( n − / C , denoted by γ . We will show that this parametrization satisﬁes the ﬁrst threeobjectives stated above. The fourth objective is partly satisﬁed, because n elements of v will correspondto the n individual variances, whereas the remaining elements parametrize the underlying correlationmatrix. The Fisher transformation has attractive ﬁnite sample properties (variance stabilizing andskewness reducing) and γ is identical to the Fisher transformation when n = 2. Simulation resultsin the Web Appendix suggest that the oﬀ-diagonal elements of log C inherit some of these propertieswhen n > We need to introduce some useful notation and terminology. The operator, diag( · ), is used in two ways.When the argument is a vector, v = ( v , . . . , v n ) , then diag( v ) denotes the n × n diagonal matrix with v , . . . , v n along the diagonal, and when the argument is a square matrix, A ∈ R n × n , then diag( A )extracts the diagonal of A and returns it as a column vector, i.e. diag( A ) = ( a , . . . , a nn ) ∈ R n .The matrix exponential is deﬁned by e A = P ∞ k =0 A k k ! for any matrix A . For any symmetric matrix, A , we have e A = Q diag( e λ , . . . , e λ n ) Q , where A = Q Λ Q , with Q being an orthonormal matrix, i.e. Q Q = I , and Λ = diag( λ , . . . , λ n ) where λ , . . . , λ n are the eigenvalues of A . The general deﬁnition ofthe matrix logarithm is more involved, see Higham (2008), but for a symmetric positive deﬁnite matrix,we have that log A = Q log Λ Q , where log Λ = diag(log λ , . . . , log λ n ).We use vecl( A ) to denote the vectorization operator of the lower oﬀ-diagonal elements of A . For anon-singular correlation matrix, C , we let G = log C denote the logarithmically transformed correlationmatrix, and let F be the matrix of element-wise Fisher transformed correlations (whose diagonal isunspeciﬁed). The vector of correlation coeﬃcients is denoted by % = vecl C , and the correspondingelements of G and F are denoted by γ = vecl G and φ = vecl F , respectively. Deﬁnition 1 (New Parametrization of Correlation Matrices) . For a non-singular correlation matrix, C , we introduce the following parametrization: γ ( C ) := vecl(log C ).Because γ ( C ) discards the diagonal elements of log C , it is relevant to ask: Can C be reconstructedfrom γ alone? If so: Is the reconstructed correlation matrix unique for all γ ? To formalize this inversion4roblem, we introduce the following operator. For an n × n matrix, A , and any vector x ∈ R n we let A [ x ] denote the matrix A where x has replaced its diagonal. So it follows that vecl( A ) = vecl( A [ x ])and that x = diag( A [ x ]). Theorem 1.

For any real symmetric matrix, A ∈ R n × n , there exists a unique vector, x ∗ ∈ R n , suchthat e A [ x ∗ ] is a correlation matrix. This shows that any vector in R n ( n − / maps to a unique correlation matrix, so that γ ( C ) is a one-to-one correspondence between C n and R n ( n − / , where C n denotes the set of non-singular correlationmatrices. The inverse mapping, denoted C ( γ ), is therefore well deﬁned.Next, we outline the structure of the proof of Theorem 1, because it provides intuition for thealgorithm that is used to reconstruct C from γ .Consider the mapping g : R n (cid:121) R n , g ( x ) = x − log diag( e A [ x ] ), where the logarithm is appliedelement-wise to vector of diagonal elements. Because e A [ x ] is a correlation matrix if and only if alldiagonal elements are equal to one, the requirement is simply g ( x ∗ ) = x ∗ . So Theorem 1 is equivalentto the statement that g has a unique ﬁxed-point for any matrix A . This follows by showing the followingresult and applying Banach ﬁxed-point theorem. Lemma 1.

The mapping g is a contraction for any symmetric matrix A . The proof of Lemma 1 entails deriving the Jacobian for g , denoted ∇ g , and showing that all itseigenvalues are less than one in absolute value. The largest eigenvalue of ∇ g is, not surprisingly, keyfor the algorithm that reconstructs C from γ . The mapping, γ ( C ), is invariant to a reordering of variables that deﬁne C , in the sense that a permuta-tion of the variables that deﬁne C will merely result in a permutation of the elements of γ . The formalstatement is as follows. Proposition 1.

Suppose that C x = corr( X ) and C y = corr( Y ) , where the elements of X is a permu-tation of the elements of Y . Then the elements of γ x = γ ( C x ) is a permutation of the elements of γ y = γ ( C y ) . Singular correlation matrices with known null space can be parametrized applying the transformation to a full rankprincipal sub-matrix. We do not explore this topic in this paper. .3 An Algorithm for Computing C ( γ ) Evidently, the solution, x ∗ , must be such that the diagonal elements of the matrix, e A [ x ∗ ] , are all equal toone. Equivalently, log diag( e A [ x ∗ ] ) = 0 ∈ R n , where the logarithm is applied element-wise to the vectorof diagonal elements. This observation motivates the following iterative procedure for determining x ∗ : Corollary 1.

Consider the sequence, x ( k +1) = x ( k ) − log diag( e A [ x ( k ) ] ) , k = 0 , , , . . . with an arbitrary initial vector x (0) ∈ R n . Then x ( k ) → x ∗ , where x ∗ is the solution in Theorem 1. In practice we ﬁnd that the simple algorithm, proposed in Corollary 1, converges very fast. This isdemonstrated in Section 5 for matrices with dimension up to n = 100. The result in Theorem 1 andthe algorithm in Corollary 1 are easily adapted to a covariance matrix with known diagonal elements,as we show in Section 4.4. ˆ γ Next, we derive the asymptotic distributions of ˆ γ and the vector of Fisher transformed correlations, ˆ φ ,by deducing them from those of the empirical correlation matrix.Suppose that √ T ( ˆ C − C ) d → N (0 , Ω), as T → ∞ . The asymptotic covariance matrix, Ω =avar(vec( ˆ C )), will be singular because ˆ C is symmetric and has constant diagonal elements. Conve-nient closed-form expressions for Ω is available in special cases, see e.g. Neudecker and Wesselman(1990), Nel (1985), and Browne and Shapiro (1986).For the vector of correlation coeﬃcients, ˆ % = vecl( ˆ C ), it follows that √ T ( ˆ % − % ) d → N (0 , Ω % ), as T → ∞ , where Ω % = E l Ω E l and E l is an elimination matrix, characterized by vecl[ M ] = E l vec[ M ] forany n × n matrix M . For the element-wise Fisher transform, the asymptotic distribution reads √ T ( ˆ φ − φ ) d → N (0 , Ω φ ) , Ω φ = D c E l Ω E l D c , (1)where D c = diag (cid:16) − c i , − c , . . . , − c d (cid:17) and c i is an i -th element of c = vecl( C ) ∈ R d with d = n ( n − / √ T (ˆ γ − γ ) d → N (0 , Ω γ ) , Ω γ = E l A − Ω A − E l , (2)where A is a Jacobian matrix, such that ∂ vec( C ) = A ∂ vec(log C ). The expression for A is given in theAppendix, see (A.1)-(A.2), and is taken from Linton and McCrorie (1995).6n a classical setting where ˆ C is computed from i.i.d. random vectors, the diagonal elements of Ω φ are all equal to one. This demonstrates the variance stabilizing property of the Fisher transformation.The transformation γ ( C ) is, evidently, not variance stabilizing when n >

2, except in special cases.However, it does appear to reduce skewness, which is another attribute of the Fisher transformation.The two expressions for the asymptotic variances, Ω φ and Ω γ , are not easily compared unless Ω isknown. Here we will compare them in the situation where ˆ C is computed from X i ∼ iid N (0 , Σ), forfour diﬀerent choices for Σ. Scaling the elements of X i does not aﬀect the limit distributions for ˆ % , ˆ φ ,and ˆ γ . So we can, without loss of generality, focus on the case where Σ = C . Σ = C avar(ˆ % ) avar( ˆ φ ) = avar(ˆ γ ) acorr(ˆ γ )acorr(ˆ % ) = acorr( ˆ φ ) • • • ! . • • . • . ! . • • . • . ! . • • . • . ! . • • . • . ! • • . • .

25 0 . ! . • • .

316 0 . • .

070 0 .

316 0 . ! . • • .

450 1 . • .

125 0 .

450 1 . ! . • • .

018 0 . • .

021 0 .

018 0 . ! . • • .

018 1 . • .

021 0 .

018 1 . ! • • . • .

81 0 . ! . • • .

046 0 . • .

015 0 .

046 0 . ! . • • .

698 1 . • .

405 0 .

698 1 . ! . • • .

081 0 . • .

093 0 .

081 0 . ! . • • .

097 1 . • .

114 0 .

097 1 . ! • • .

99 1 • .

98 0 .

99 1 ! . • • .

006 0 . • .

002 0 .

006 0 . ! . • • .

745 1 . • .

490 0 .

745 1 . ! . • • .

106 0 . • .

134 0 .

106 0 . ! . • • .

137 1 . • .

178 0 .

137 1 . ! Table 1: Asymptotic covariance and correlation matrices for ˆ % , ˆ φ and ˆ γ , for four diﬀerent correlationmatrices. The diagonal elements of the asymptotic variance matrix for ˆ φ are all one, so it is also theasymptotic correlation matrix for ˆ φ. Because ˆ φ is based on an element-by-element transformation ofthe corresponding elements of ˆ % , it is also the asymptotic correlation matrix for ˆ % .The asymptotic variance and correlation matrices for the three vectors, ˆ % , ˆ φ and ˆ γ , are reportedin Table 1. The true correlation matrix is given in the ﬁrst column of Table 1. The asymptoticvariance of the correlation coeﬃcient, ˆ % j , is (1 − % j ) , which deﬁnes the diagonal elements of Ω % , andthe element-wise Fisher transformation ensures that avar( ˆ φ j ) = 1 for all j = 1 , . . . , n . However, weobserve a high degree of correlation across the elements of ˆ φ . The asymptotic correlation matrix forˆ φ is, in fact, identical to that of the empirical correlations, ˆ % , because the Fisher transformation isan element-by-element transformation. Its Jacobian, D c = ∂φ/∂% , is therefore a diagonal matrix.Consequently, the asymptotic correlations are unaﬀected by the element-wise Fisher transformation,and acorr( ˆ % ) = acorr( ˆ φ ). While the diagonal elements of Ω φ are invariant to C , this is not the case7or the diagonal elements of Ω γ , but it is interesting to note that the asymptotic correlations betweenelements of ˆ γ tend to be relatively small, and close to zero when the correlations in C are small.Simulation results in the Web Appendix suggest that the elements of ˆ γ tend to be weakly correlated,and that γ ( C ) reduces skewness, as is the case for the Fisher transformation. Empirical results inArchakov et al. (2020) show that the empirical distribution of transformed realized correlation matricesis well approximated by a Gaussian distribution. While the elements of γ depend on the correlation matrix in a nonlinear way, there are some interestingcorrelation structures that do carry over to the matrix G = log C , and hence γ . First, we consider thecase with an equicorrelation matrix and a block-equicorrelation matrix. Proposition 2.

Suppose C is an equicorrelation matrix with correlation parameter ρ . Then, all theoﬀ-diagonal elements of matrix G = log C are identical and equal to γ c = − n log (cid:16) − ρ ρ ( n − (cid:17) = n log(1 + n ρ − ρ ) ∈ R , (3) so that γ = γ c ι , where ι ∈ R n ( n − / is the vector of ones, ι = (1 , . . . , . This result, in conjunction with Theorem 1, establishes that γ c is a one-to-one correspondence fromthe set of non-singular equicorrelation matrices to the real line, R , and the inverse mapping is given inclosed-form by ρ ( γ c , n ) = − e − nγc n − e − nγc . It follows that ρ ( γ c , n ) is conﬁned to the interval (cid:0) − n − , (cid:1) .It is easy to verify that if C is a block diagonal matrix, with equicorrelation diagonal blocks andzero correlation across blocks, then log C will have the same block structure, and (3) can be used tocompute the elements in γ . In the more general case where C is a block correlation matrix, then it canbe shown that the logarithmic transformation preserves the block structure. This is used in Archakovet al. (2020) in a multivariate GARCH model. So that log C has the same block structure as C , andthis transformation provides a simple way to model block correlation matrices. We illustrate this withthe following example C =  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ⇔ log C =  − . . . . . . . − . . . . . . . − . . . . . . . − . . . . . . . − . . . . . . . − .  . Another interesting class of correlation matrices are the Toeplitz-correlation matrices, which arise in8ome models, such as stationary time series models. For this case, log C is a bisymmetric matrix. Since C α = e αG , it is possible to obtain powers of C from γ . For instance, the inverse covariance matrixis given by Σ − = Λ − e − G Λ − , where Λ = diag( σ , . . . , σ n ). The inverse is, for instance, of interestfor computing the partial correlation coeﬃcients and in portfolio choice problems. Some estimationmethods impose sparsity on Σ − . While it is not simple to impose sparsity on Σ − through γ , thenew parametrization facilitate new ways to impose a parsimonious structure on Σ or Σ − , by imposingsparsity (or some other structure) on γ directly. ∂%/∂γ Next we establish a result that shows that ∂%/∂γ = ∂ vecl[ C ] /∂ vecl[ G ] has a relatively simple expression.This is convenient for inference, such as computation of standard errors, and for the construction ofdynamic GARCH-type models, such as a score-driven model for γ = vecl G , see Creal et al. (2013), andfor the construction of parameter stability tests, such as that of Nyblom (1989). Proposition 3.

We have ∂%∂γ = E l (cid:16) I − AE d (cid:16) E d AE d (cid:17) − E d (cid:17) A ( E l + E u ) , where A = ∂ vec C/∂ vec G andthe matrices E l , E u and E d are elimination matrices, such that vecl M = E l vec M , vecl M = E u vec M and diag M = E d vec M for any square matrix M of the same size as C . The matrix, A , is the same matrix that appeared in the asymptotic distribution for ˆ γ , see (2). Inthe Web Appendix we compute ∂%/∂γ for two correlation matrices: A 10 ×

10 Toeplitz correlationmatrix and one based on the empirical correlation matrix for the 10 industry portfolios in the KennethR. French data library. The two have a very similar structure.

Some of our results for correlation matrices, apply equally to covariance matrices with known diagonalelements, and these could be useful in some applications that involve the matrix logarithm of covariancematrices. In Corollary 2 we state the extensions to this situation.

Corollary 2.

For any real symmetric matrix, A ∈ R n × n , and any vector, v ∈ R n with strictly positiveelements, there exists a unique vector, x ∗ ∈ R n , such that Σ = e A [ x ∗ ] is a covariance matrix withdiagonal diag(Σ) = v . Moreover, x ∗ = lim k →∞ x ( k ) , where x ( k +1) = x ( k ) + [log v − log diag( e A [ x ( k ) ] )] ,for k = 0 , , , . . . , with an arbitrary initial vector x (0) ∈ R n . Properties of the Algorithm for the Inverse Mapping, C ( γ ) The algorithm that reconstructs the correlation matrix, C , from γ converges exponentially fast, and itscomplexity is of order O ( n log n ). This follows, as we show below, from the fact that the number ofrequired iterations is of order log n , and because each iteration entails a matrix exponential evaluationwhich is of order O ( n ), see e.g. Lu (1998).Let K δ = inf { k : || x ( k +1) − x ( k ) || p ≤ δ } be the number of iterations required for convergence forsome p -norm and some threshold δ >

0. From the contraction property it follows that || x ( k +1) − x ( k ) || p ≤ L || x ( k ) − x ( k − || p ≤ L k || x (1) − x (0) || p , for k = 1 , , . . . , where L ∈ [0 ,

1) is the Lipschitzconstant given from the contraction. So the number of iterations k can be bounded from above by k ≤ c L (log || x (1) − x (0) || p − log || x ( k ) − x ( k − || p ), where c L = − L > || x || p ≤ ( n · max ≤ i ≤ n | x ( i ) | p ) /p = n /p || x || ∞ , we have K δ ≤ c L ( log np + log || x (1) − x (0) || ∞ − log δ ) = O (log n ) . (4)Note that the number of required iterations may be more sensitive to the structure of C (through theLipschitz constant) than the dimension of C . The Lipschitz constant approaches one as C approachessingularity. The number of iterations is less sensitive to the choice of initial vector x (0) , but it is usefulto know that the elements of x ∗ are non-positive. Lemma 2.

The diagonal elements of log C are non-positive for any C ∈ C n . The result in (4) is illustrated in Figure 1 where we recover the correlation matrix from γ usingthe algorithm in Corollary 1. The true C has a Toeplitz structure, C ij = ρ | i − j | , i, j = 1 , . . . , n , for n = 3 , . . . ,

100 and ρ ∈ { . , . , . } . The number of iterations needed for || x ( k ) − x ( k − || < δ =10 − √ n increases with the dimension at a rate that is consistent with log n . The number of iterationsis sensitive to the correlation structure. For instance, when C is almost singular ( ρ = 0 . ρ = 0 . C translates into a Lipschitz constant close to one. To illustratethe sensitivity to the starting value, we use 1,000 diﬀerent starting values, x (0) , where the elements of x (0) are drawn independently from the negative half-normal distribution with scale σ = 10 (i.e. −| Z | with Z ∼ N (0 , ± x (0) . 10igure 1: Number of iterations needed for convergence at threshold δ = 10 − √ n , when C ij = ρ | i − j | i, j =1 , . . . , n , for n = 3 , . . . , , using random initial value, x (0) . Black lines correspond to the average number ofiterations required for convergence, and the shaded bands ( ± The results in Figure 1 are not speciﬁc to the Toeplitz structure for C . In a second design, wegenerate 50,000 distinct correlation matrices for each of the dimensions, n ∈ { , , } . This is doneby generating random vectors, γ , where each element in γ is uniformly distributed on the interval[ − b n , b n ]. The constant, b n , is chosen to provide a suﬃciently wide range of the smallest eigenvalueof C , denoted λ min , and the spectral radius of ∇ g ( x ∗ ), denoted ν max . The Lipschitz constant for thecontraction, g ( x ), is approximately equal to ν max , so we should expect − / log ν max ’ c L to be linearlyrelated to (the bound on) the number of iterations.The number of iterations needed for convergence is shown in Figure 2, for n = 5, n = 10, and n = 25, using scatter plots against three characteristics of C . The starting value is x (0) = 0 ∈ R n in allsimulations and δ = 10 − √ n was used as the tolerance level.The left panels reveal a fairly tight linear relationship between the number of iterations and − / log ν max ( ≈ c L ). Similarly, λ max and γ max , which are easier to compute, are also related to thenumber of iterations, albeit not as tightly as ν max .11igure 2: The number of iterations needed for convergence plotted against three characteristics of C .The left panels plots the number of iterations against − / log ν max ’ c L . The smallest eigenvalue of C (middle panels) and the largest | γ i | (right panels) are also useful indicators. In this paper, we have shown that the space of non-singular n × n correlation matrices is one-to-one with R n ( n − / . A non-singular covariance matrix can therefore be parametrized by the n (log-)variancesand the vector, γ ( C ), which has unrestricted domain in R n ( n − / . This opens new ways to modelcorrelation and covariance matrices where positive deﬁniteness is an intrinsic property. For instance, inmultivariate GARCH models, as explored in Archakov et al. (2020). The transformation can be used tospecify probability distributions on correlation and covariance matrices. Any distribution on R n ( n − / induces a distribution on the space of positive deﬁnite correlation matrices, C . This could be used in12ultivariate stochastic volatility modeling, and deﬁnes a new approach to specifying Bayesian priorson C .We have derived results for the asymptotic distribution of γ ( ˆ C ). Much is known about the ﬁnitesample properties when n = 2, because γ ( C ) is identical to the Fisher transformation in this case.The Fisher transformation has variance stabilizing and skewness eliminating properties. The variancestabilizing property does not carry over to the case n >

2. However, simulation results suggest thatit continues to have skewness reducing properties, and that the empirical distribution of γ ( ˆ C ) (in aclassical setting) is well approximated by a Gaussian distribution even in small samples. Moreover, theelements of γ ( ˆ C ) tend to be weakly dependent, as suggested by the asymptotic results in Table 1. Thismakes the transformation potentially useful for regularization, see Pourahmadi (2011), and inference.These attributes tend to deteriorate as C approaches singularity. This is not unexpected, because it isalso true for the Fisher transformation when the correlation is close to ± C ( γ ) is not given in closed-form when n >

2, except in some special cases.Instead, we proposed a fast algorithm to evaluate C ( γ ), and showed that its numerical complexity isof order O ( n log n ), where n × n is the dimension of C . References

Archakov, I. and Hansen, P. R. (2020), ‘Web-appendix for: A New Parametrization of Correlation Matrices’, https://sites.google.com/site/peterreinhardhansen/ .Archakov, I., Hansen, P. R. and Lunde, A. (2020), ‘A Multivariate Realized GARCH Model’,

Working Paper .Asai, M. and So, M. (2015), ‘Long memory and asymmetry for matrix-exponential dynamic correlation processes’,

Journalof Time Series Econometrics , 69–74.Bauer, G. H. and Vorkink, K. (2011), ‘Forecasting multivariate realized stock market volatility’, Journal of Econometrics , 93–101.Bauwens, L., Storti, G. and Violante, F. (2012), ‘Dynamic conditional correlation models for realized covariance matrices’,

Working Paper (2012060).Bollerslev, T. (1990), ‘Modelling the coherence in short-run nominal exchange rates: A multivariate generalized ARCHmodel’,

The Review of Economics and Statistics , 498–505.Browne, M. and Shapiro, A. (1986), ‘The asymptotic covariance matrix of sample correlation coeﬃcients under generalconditions’, Linear Algebra and its Applications , 169 – 176.Chiriac, R. and Voev, V. (2011), ‘Modelling and forecasting multivariate realized volatility’, Journal of Applied Econo-metrics , 922–947.Chiu, T., Leonard, T. and Tsui, K.-W. (1996), ‘The matrix-logarithmic covariance model’, Journal of the AmericanStatistical Association , 198–210. real, D. D., Koopman, S. J. and Lucas, A. (2013), ‘Generalized autoregressive score models with applications’, Journalof Applied Econometrics , 777–795.Engle, R. F. (2002), ‘Dynamic Conditional Correlation: A Simple Class of Multivariate Generalized Autoregressive Con-ditional Heteroskedasticity Models’, Journal of Business & Economic Statistics (3), 339–350.Engle, R. and Kelly, B. (2012), ‘Dynamic Equicorrelation’, Journal of Business & Economic Statistics (2), 212–228.Golosnoy, V., Gribisch, B. and Liesenfeld, R. (2012), ‘The conditional autoregressive Wishart model for multivariate stockmarket volatility’, Journal of Econometrics (1), 211–223.Gorgi, P., Hansen, P. R., Janus, P. and Koopman, S. J. (2019), ‘Realized Wishart-GARCH: A score-driven multi-assetvolatility model’,

Journal of Financial Econometrics , 1–32.Hansen, P. R., Lunde, A. and Voev, V. (2014), ‘Realized beta GARCH: A multivariate GARCH model with realizedmeasures of volatility’, Journal of Applied Econometrics , 774–799.Higham, N. J. (2008), Functions of Matrices: Theory and Computation , Siam, Philadelphia.Ishihara, T., Omori, Y. and Asai, M. (2016), ‘Matrix exponential stochastic volatility with cross leverage’,

ComputationalStatistics & Data Analysis , 331–350.Kawakatsu, H. (2006), ‘Matrix exponential GARCH’,

Journal of Econometrics , 95–128.Kurowicka, D. and Cooke, R. (2003), ‘A parameterization of positive deﬁnite matrices in terms of partial correlationvines’,

Linear Algebra and Its Applications , 225–251.Leonard, T. and Hsu, J. S. J. (1992), ‘Bayesian inference for a covariance matrix’,

Annals of Statistics , 1669–1696.Linton, O. and McCrorie, J. R. (1995), ‘Diﬀerentiation of an exponential matrix function: Solution’, Econometric Theory , 1182–1185.Liu, Q. (2009), ‘On portfolio optimization: How and when do we beneﬁt from high-frequency data?’, Journal of AppliedEconometrics (4), 560–582.Lu, Y. Y. (1998), ‘Exponential of symmetric matrices though tridiagonal reductions’, Linear Algebra and its Applications , 317–324.Nel, D. (1985), ‘A matrix derivation of the asymptotic covariance matrix of sample correlation coeﬃcients’,

Linear Algebraand its Applications , 137 – 145.Neudecker, H. and Wesselman, A. (1990), ‘The asymptotic variance matrix of the sample correlation matrix’, LinearAlgebra and its Applications , 589 – 599.Noureldin, D., Shephard, N. and Sheppard, K. (2012), ‘Multivariate high-frequency-based volatility (HEAVY) models’,

Journal of Applied Econometrics , 907–933.Nyblom, J. (1989), ‘Testing for the constancy of parameters over time’, Journal of the American Statistical Association , 223–230.Pinheiro, J. C. and Bates, D. M. (1996), ‘Unconstrained parametrizations for variance-covariance matrices’, Statistics andComputing , 289–296.Pourahmadi, M. (2011), ‘Covariance estimation: The glm and regularization perspectives’, Statistical Science , 369–387.Tse, Y. K. and Tsui, A. K. C. (2002), ‘A multivariate generalized autoregressive conditional heteroscedasticity model withtime-varying correlations’, Journal of Business and Economic Statistics , 351–362.Weigand, R. (2014), ‘Matrix Box-Cox models for multivariate realized volatility’, Working Paper . ppendix of Proofs We prove g is a contraction by deriving its Jacobian, J ( x ), and showing that all its eigenvalues areless than one in absolute value. Since g ( x ) = x − log δ ( x ) , where δ ( x ) = diag( e G [ x ] ), an intermediatestep towards the Jacobian for g , is to derive the Jacobian for δ ( x ). To simplify notation, we sometimessuppress the dependence on x for some terms. For instance, we sometimes write δ i to denote the i -thelement of the vector δ ( x ). It follows that [ J ( x )] i,j = ∂ [ g ( x )] i ∂x j = 1 { i = j } − δ i ∂ [ δ ( x )] i ∂x j , so that J ( x ) = I − [ D ( x )] − H ( x ), where D ( x ) = diag( δ , . . . , δ n ) is a diagonal matrix and H ( x ) is the Jacobian matrixof δ ( x ), derived below.Let G [ x ] = Q Λ Q , where Λ is the diagonal matrix with the eigenvalues, λ , . . . , λ n , of G [ x ] and Q is an orthonormal matrix (i.e. Q = Q − ) with the corresponding eigenvectors. From Linton andMcCrorie (1995), we have dvec e G [ x ] = A ( x ) dvec G [ x ], where A ( x ) = ( Q ⊗ Q )Ξ (cid:0) Q ⊗ Q (cid:1) , (A.1)is and n × n matrix and Ξ is the n × n diagonal matrix with elements given by ξ ij = Ξ ( i − n + j, ( i − n + j =  e λ i , if λ i = λ je λi − e λj λ i − λ j , if λ i = λ j (A.2)for i = 1 , . . . , n and j = 1 , . . . , n . Evidently, we have ξ ij = ξ ji , for all i and j . Moreover, A ( x ) is asymmetric positive deﬁnite matrix, because all the diagonal elements of Ξ are strictly positive.Our interest concerns δ ( x ) = diag[ e G [ x ] ] (a subset of the elements of vec[ e G [ x ] ]) so the Jacobianof δ ( x ), denoted H ( x ), is a principal sub-matrix of A ( x ), deﬁned by the elements [ A ( x )] l,m , l, m =( i − n + i , for i = 1 , . . . , n . Thus[ H ( x )] i,j = ∂δ ( x ) i ∂x j = ( e i ⊗ e i ) (cid:0) Q ⊗ Q (cid:1) Ξ (cid:0) Q ⊗ Q ) ( e j ⊗ e j )= (cid:0) e i Q ⊗ e i Q (cid:1) Ξ (cid:0) Q e j ⊗ Q e j (cid:1) = (cid:0) Q i,. ⊗ Q i,. (cid:1) Ξ (cid:0) Q j,. ⊗ Q j,. (cid:1) (A.3)= n X k =1 n X l =1 q ik q jk q il q jl ξ kl , where e i is a n × i -th position and zeroes otherwise and Q i,. denotes the i -th row of Q .Interestingly, the Jacobian of g is such that J ( x ) ι = 0, so that the vector of ones, ι , is an eigenvectorof J ( x ) associated with the eigenvalue 0, i.e. J ( x ) has reduced rank. Because the i -th row of J ( x )15imes ι reads1 − n X j =1 δ i n X k =1 n X l =1 q ik q jk q il q jl ξ kl = 1 − δ i n X k =1 n X l =1 q ik q il ξ kl n X j =1 q jk q jl = 1 − δ i n X k =1 q ik ξ kk = 0 , due to P nk =1 q ik q jk = 1 { i = j } . Proof that g is a Contraction: Lemma 1 We now want to prove that the mapping g ( x ) is a contraction. In order to show this, it is suﬃcient todemonstrate that all eigenvalues of the corresponding Jacobian matrix J ( x ) are below one in absolutevalues for any real vector x . First we establish a number of intermediate results. Lemma A.1. ( i ) e y − y − > for all y = 0 , and ( ii ) 1 + e y − y ( e y − > for y = 0 .Proof. The ﬁrst and second derivatives of f ( y ) = e y − y − f is strictly convex with uniqueminimum, f (0) = 0, which proves ( i ). Next we prove ( ii ). Now let f ( y ) = 1 + e y − y ( e y − f ( y ) = e y y − g ( y ), where g ( y ) = y − y + 2 − e − y , so that f ( y ) < y < f ( y ) > y >

0. Since lim y → f ( y ) = 0 (by l’Hospital’s rule) the result follows.From the deﬁnition, (A.2), it follows that ξ ij = ξ ii = ξ jj whenever λ i = λ j . When λ i = λ j we havethe following results for the elements of Ξ: Lemma A.2. If λ i < λ j , then ξ ii < ξ ij < ξ jj and ξ ij < ξ ii + ξ jj .Proof. From the deﬁnition, (A.2), ξ ij − ξ ii = e λj − e λi λ j − λ i − e λ i = e λ i ( e λj − λi − λ j − λ i −

1) = e λ i e λj − λi − − ( λ j − λ i ) λ j − λ i > i ). So are e λ i and λ j − λ i , which proves ξ ij > ξ ii .Analogously, ξ jj − ξ ij = e λ j − e λj − e λi λ j − λ i = e λ j (1 − − e λi − λj λ j − λ i ) = e λ j − ( λ i − λ j ) − e λi − λj λ j − λ i >

0, because allterms are positive, where we again used Lemma A.1 ( i ). Next, ξ ii + ξ jj − ξ ij = e λ i + e λ j − e λi − e λj λ i − λ j = e λ i (1 + e λ j − λ i − e λj − λi − λ j − λ i ) >

0, where the inequality follows by Lemma A.1. ii , because λ i = λ j . Lemma A.3. J ( x ) and ˜ J ( x ) = I − D − HD − have the same eigenvalues, where ˜ J ( x ) = n − X k =1 n X l = k ϕ kl (cid:16) D − u kl u kl D − (cid:17) , with u kl = Q · ,k (cid:12) Q · ,l ∈ R n and ϕ kl = ξ kk + ξ ll − ξ kl .Proof. For a vector y and a scalar ν , J y = νy ⇔ ˜ J w = νw , where y = D − w , because J = I − D − H = D − ( I − D − HD − ) D = D − ˜ J D . Next, we turn to the expression for ˜ J . First, note that P nk =1 q ik ξ kk = P nk =1 q ik e λ k = Q i, · e Λ Q i, · = [ e Q Λ Q ] ii = [ e G ] ii = δ i . So diagonal elements of ˜ J are given16y˜ J ii = 1 − H ii δ i = 1 δ i (cid:16) n X k =1 q ik ξ kk − n X k =1 n X l =1 q ik q il ξ kl (cid:17) = 1 δ i (cid:16) n X k =1 q ik ξ kk − n X k =1 q ik q ik ξ kk − n − X k =1 n X l = k q ik q il ξ kl (cid:17) = 1 δ i (cid:16) n X k =1 q ik ξ kk (1 − q ik ) − n − X k =1 n X l = k q ik q il ξ kl (cid:17) = 1 δ i (cid:16) n X k =1 q ik ξ kk n X l =1 l = k q il − n − X k =1 n X l = k q ik q il ξ kl (cid:17) = 1 δ i (cid:16) n − X k =1 n X l = k q ik q il ( ξ kk + ξ ll ) − n − X k =1 n X l = k q ik q il ξ kl (cid:17) = 1 δ i n − X k =1 n X l = k q ik q il ϕ kl , where we used (A.3). Similarly for the oﬀ-diagonal elements we have˜ J ij = − H ij p δ i δ j = − p δ i δ j n X k =1 n X l =1 q ik q jk q il q jl ξ kl = − p δ i δ j (cid:16) n X k =1 q ik q jk ξ kk + 2 n − X k =1 n X l = k q ik q jk q il q jl ξ kl (cid:17) = − p δ i δ j (cid:16) n X k =1 q ik q jk (cid:16) − n X l =1 l = k q il q jl (cid:17) ξ kk + 2 n − X k =1 n X l = k q ik q jk q il q jl ξ kl (cid:17) = − p δ i δ j (cid:16) − n − X k =1 n X l = k q ik q jk q il q jl ( ξ kk + ξ ll ) + 2 n − X k =1 n X l = k q ik q jk q il q jl ξ kl (cid:17) = 1 p δ i δ j n − X k =1 n X l = k q ik q jk q il q jl ϕ kl . In the derivations above we used that P nk =1 q ik q jk = 1 { i = j } , since Q Q = QQ = I . Proof of Lemma 1.

Because A ( x ) is symmetric and positive deﬁnite, then so is the principal sub-matrix, H ( x ). Consequently, M = D − H ( x ) D − is symmetric and positive deﬁnite. Thus, anyeigenvalue, µ of M is strictly positive. So if ν is an eigenvalue of ˜ J ( x ) = I − D − HD − , then ν = 1 − µ where µ is an eigenvalue of M , from which it follows that all eigenvalues of ˜ J are strictly less than 1.Consider a quadratic form of ˜ J with an arbitrary vector z ∈ R n . Using Lemma A.3, it follows thatany quadratic form is bounded from below by z ˜ J z = n − X k =1 n X l = k ϕ kl (cid:16) z D − u kl u kl D − z (cid:17) = n − X k =1 n X l = k ϕ kl (cid:16) z D − u kl (cid:17) ≥ , because ϕ kl > J is positive semi-deﬁnite and ν i ≥

0, for all i = 1 , . . . , n .Finally, since J ( x ) and ˜ J ( x ) have the same eigenvalues, it follows that all eigenvalues of J ( x ) liewithin the interval [0 , g ( x ) is a contraction. (cid:3) Proof of Theorem 1.

The Theorem is equivalent to the statement that for any symmetric matrix G , there always exists a unique solution to g ( x ) = x . This follows from Lemma 1 and Banach’s ﬁxedpoint theorem. (cid:3) Proof of Proposition 1.

We have Y = P X , for some permutation matrix, P , so that C y = P C x P .17et C x = Q Λ Q be the spectral decomposition of C x , such that log C x = Q log Λ x Q , where Q Q = I and Λ is a diagonal matrix. So C y = P C x P = P Q Λ Q P , where Q P P Q = Q Q = I . The ﬁrstequality uses the fact that P is a permutation matrix. Therefore, C y = ( P Q )Λ(

P Q ) is the spectraldecomposition of C y and log C y = ( P Q ) log Λ(

P Q ) = P [log C x ] P .Next, let the i -th and j -th rows of P be the r -th and s -th unit vectors, e r and e s , respectively. Thenwe have [log C y ] ij = [log C x ] rs and, by symmetry,[log C y ] ij = [log C y ] ji = [log C x ] rs =[log C x ] sr , which shows that γ y is simply a permutation of the elements in γ x . (cid:3) Proof of Proposition 2.

An equicorrelation matrix can be written as C = (1 − ρ ) I n + ρU n , where I n ∈ R n × n is the identity matrix and U n ∈ R n × n is a matrix of ones. Using the Sherman–Morrisonformula, we can obtain the inverse, C − = − ρ ( I n − ρ n − ρ U n ), so that G = log C = − log( C − ) = − log( − ρ I n ) − log( I n − ρ n − ρ U n ) . (A.4)Because the ﬁrst term is a diagonal matrix, the oﬀ-diagonal elements of G are determined only by thesecond term, which equals − log( I n − ϕU n ) = − ∞ X k =1 ( − k +1 ( − ϕU n ) k k = − " n ∞ X k =1 ( − k +1 ( − nϕ ) k k U n = − n log(1 − nϕ ) U n , (A.5)where ϕ = ρ/ (1 + ( n − ρ ) and we have used the fact that U kn = n k − U n . It now follows that G ij = γ c = − n log(1 − nρ n − ρ ) = − n log − ρ ρ ( n − = − n log − ρ ρ ( n − , for all i = j for all i and j , such that i = j . (cid:3) Proof of Proposition 3.

From Theorem 1 it follows that the diagonal, x = diag G , is fully character-ized by the oﬀ-diagonal elements, y = vecl G = vecl G , and we may write x = x ( y ). For the oﬀ-diagonalelements of the correlation matrix, z = vecl C = vecl C , we have z = z ( x, y ) = z ( x ( y ) , y ), since C = e G ,and it follows that dz ( x, y ) dy = ∂z ( x, y ) ∂x dx ( y ) dy + ∂z ( x, y ) ∂y . (A.6)With A ( x, y ) = d vec C/d vec G and the deﬁnitions of E l and E u , the second term is given by: ∂z ( x, y ) ∂y = E l A ( x, y ) E l + E l A ( x, y ) E u . (A.7)The expression has two terms because a change in an element of y aﬀects two symmetric entries in the18atrix G . Similarly, for the ﬁrst part of the ﬁrst term in (A.6) we have, ∂z ( x, y ) ∂x = E l A ( x, y ) E d , (A.8)and what remains is to determine dx ( y ) dy . For this purpose we introduce D ( x, y ) = diag[ e G ( x,y ) ] − ι whichimplicitly deﬁnes the relation between x and y . The requirement that e G is a correlation matrix, isequivalent to D ( x, y ) = 0. Next, let ∂D∂x and ∂D∂y denote the Jacobian matrices of D ( x, y ) with respect to x and y , respectively. These Jacobian matrices have dimensions n × n and n × n ( n − /

2, respectively,and can also be expressed in terms of matrix A ( x, y ), as follows ∂D∂x = E d A ( x, y ) E d , ∂D∂y = E d A ( x, y ) E l + E d A ( x, y ) E u . Note that ∂D∂x is a principal sub-matrix of positive deﬁnite matrix A and, hence, is an invertible matrix.Therefore, from the Implicit Function Theorem it follows dx ( y ) dy = − (cid:16) ∂D∂x (cid:17) − ∂D∂y = − (cid:16) E d A ( x, y ) E d (cid:17) − (cid:16) E d A ( x, y ) E l + E d A ( x, y ) E u (cid:17) . (A.9)The results now follows by inserting (A.7), (A.8) and (A.9) into (A.6). (cid:3) Proof of Lemma 2.

We have G = Q log Λ Q , where C = Q Λ Q is the spectral decomposition of thecorrelation matrix. Thus a generic element of G can be written as G ij = P nk =1 q ik q jk log λ k . By Jensen’sinequality it follows that G ii = P nk =1 q ik log λ k ≤ log (cid:0)P nk =1 q ik λ k (cid:1) , where we used that P nk =1 q ik q jk =1 { i = j } , because Q Q = I . Finally, since P nk =1 q ik λ k = C ii = 1, it follows that G ii ≤ log 1 = 0. (cid:3)(cid:3)