A Canonical Representation of Block Matrices with Applications to Covariance and Correlation Matrices
AA Canonical Representation of Block Matrices with Applications toCovariance and Correlation Matrices ∗ Ilya Archakov a and Peter Reinhard Hansen b † a University of Vienna b University of North Carolina & Copenhagen Business SchoolDecember 7, 2020
Abstract
We obtain a canonical representation for block matrices. The representation facilitates simplecomputation of the determinant, the matrix inverse, and other powers of a block matrix, as wellas the matrix logarithm and the matrix exponential. These results are particularly useful for blockcovariance and block correlation matrices, where evaluation of the Gaussian log-likelihood and es-timation are greatly simplified. We illustrate this with an empirical application using a large panelof daily asset returns. Moreover, the representation paves new ways to regularizing large covari-ance/correlation matrices and to test block structures in matrices.
Keywords:
Block Matrices, Block Covariance Matrix, Block Correlation Matrix, Equicorrelation, CovarianceRegularization, Covariance Modeling, High Dimensional Covariance Matrices, Matrix Logarithm
JEL Classification:
C10; C22; C58 ∗ The second author would like to thank the Department of Statistics and Operations Research at University of Viennafor their hospitality during a visit in early 2020. † Address: University of North Carolina, Department of Economics, 107 Gardner Hall Chapel Hill, NC 27599-3305 a r X i v : . [ ec on . E M ] D ec Introduction
We derive a canonical representation for a broad class of block matrices, which includes block covariancematrices. A special case of particular interest are block correlation matrices. The representation is asemi-spectral decomposition of block matrices, that are diagonalized with the exception of a singlediagonal block, whose dimension is given by the number of blocks.The canonical representation facilitates simple computations of several matrix functions, such asthe matrix inverse, the matrix exponential, and the matrix logarithm. Consequently, the decompositiongreatly simplifies the evaluation of Gaussian log-likelihood functions when the covariance matrix, orthe correlation matrix, has a block structure.We contribute to the literature on block correlation models by providing simple expressions for theinverse of any (invertible) block correlation matrix, as well as a simple expression for its determinant.This greatly eases the computational burden in the evaluation of a Gaussian (quasi-) log-likelihoodfunction. The results apply to block correlation matrices with an arbitrary number of blocks. For blockcorrelation matrices with two blocks, an expression for its inverse was obtained in Engle and Kelly(2012, lemma 2.3), and related results can be found in Viana and Olkin (1997).As a preview of some of the results in this paper, one can consider the following n × n correlationmatrix, C = ρ · · · ρρ ρρ · · · ρ . This correlation matrix is known as an equicorrelation matrix, and it is well known that its eigenvaluesare 1 + ρ ( n −
1) and 1 − ρ , where the latter has multiplicity n −
1. This follows directly from the spectraldecomposition, Q CQ = D = ρ ( n −
1) 00 (1 − ρ ) I n − , (1)where Q is an orthonormal matrix, so that Q Q = I n . Here I n denotes the n × n identity matrix. Thematrix Q is given by Q = ( v n , v n ⊥ ), where v n is the n -dimensional vector, v n = ( √ n , . . . , √ n ) , and v n ⊥ is an n × ( n −
1) matrix that is orthogonal to v n , i.e., v n ⊥ v n = 0, and orthonormal, i.e., v n ⊥ v n ⊥ = I n − . It can now be verified that QQ = I , so that Q is orthonormal and C = QQ CQQ = QDQ . In this When n = 1, v ⊥ is an 1 × v ⊥ v ⊥ = ∅ (dimension 0 ×
0) and v ⊥ v ⊥ = 0(dimension 1 × D , is the canonical form of C , which is obtained via a rotation of C , where therotation does not depend on ρ . In this example, where K = 1, D coincides with the diagonal matrix ofeigenvalues in the spectral decomposition of C .In this paper, we derive a similar decomposition for a broad class of block matrices that includesblock covariance matrices and block correlation matrices. In the general case with multiple blocks, K ≥
2, the canonical representation does not fully disentangle all eigenvalues, and some eigenvaluesmay be complex-valued. The canonical representation decomposes any block matrix into a K × K matrix and n − K real-valued eigenvalues, where K is the number of blocks. We can illustrate thegeneral results with a 2 × C = C ρ ρ n × n • C ρ where C ρ and C ρ are equicorrelation matrices with correlations ρ and ρ , respectively, and di-mensions n × n and n × n respectively, and n × n is the n × n whose elements are all equal toone. Now define Q = v n v n ⊥ v n v n ⊥ . For the equicorrelation matrix, we now have the following representation, Q CQ = ρ ( n − ρ √ n n ρ √ n n ρ ( n −
1) 0 00 0 (1 − ρ ) I n −
00 0 0 (1 − ρ ) I n − . (2)We denote the upper-left 2 × A . In general, A will be a K × K matrix, whose eigenvaluesare also eigenvalues of C . The general result for block matrices with K blocks will be presented inTheorem 1, with a structure similar to that in (2). An important feature is that the matrix Q does notdepend on the elements in block matrix, but is solely determined by the block partition, ( n , . . . , n K ),where n = n + · · · + n K .The canonical representation is obtained for general block matrices that need not be symmetric,nor positive semidefinite. In fact, our results are applicable to non-square matrices. Block covariancematrices and block correlation matrices are interesting special cases. For block correlation matrices,the A -matrix, which emerges in (2), was previously established in Huang and Yang (2010) and Cadima3t al. (2010), as we will discuss in Section 3. We derive additional results for block correlation matricesthat simplify the evaluation of the log-likelihood function.The rest of this paper is organized as follows. We present the main result in Section 2, where thecanonical representation is established for a broad class of block matrices, along with some relatedresults for the matrix exponential, matrix logarithm of block matrices, and matrix powers, includingthe matrix inverse. In Section 3, we consider the special case with block covariance matrices andblock correlation matrices. Many of these results are useful for maximum likelihood estimation witha Gaussian log-likelihood function, as we show in Section 4. In Section 5, we apply the results toestimation of block covariance matrices for a very large panel of daily stock returns. We conclude inSection 6 and all proofs are presented in the Appendix. Let B be a square n × n matrix. The extension to rectangular matrices, which is trivial, will beaddressed in the end of this section. The matrix, B , is called a block matrix with block partition, n , . . . , n K , if it can be expressed as: B = B [1 , B [1 , · · · B [1 ,K ] B [2 , B [2 , ... . . . B [ K, B [ K,K ] , where B [ i,j ] is an n i × n j matrix with the following structure B [ i,i ] = d i b ii · · · b ii b ii d i . . .... . . . . . . b ii d i and B [ i,j ] = b ij · · · b ij ... . . . b ij b ij if i = j, (3)for some constants, d i and b ij , i, j = 1 , . . . , K . So the diagonal elements of the diagonal blocks, B [ i,i ] ,can take a different value than the off-diagonal elements, whereas all elements in an off-diagonal block, B [ i,j ] , i = j , are identical.We introduce the following notation that relates to orthogonal projections. Let P [ i,j ] = v n i v n j bethe n i × n j matrix whose elements are all equal to √ n i n j . It is simple to verify that P [ i,k ] P [ k,j ] = P [ i,j ] ,4nd with i = k = j it follows that P [ i,i ] P [ i,i ] = P [ i,i ] , so that P [ i,i ] is a projection matrix. It then followsthat P ⊥ [ i,i ] = I n i − P [ i,i ] is a projection matrix, and it can be verified that P ⊥ [ i,i ] = v n i ⊥ v n i ⊥ , where thematrix, v n ⊥ , was characterized in the introduction.Finally, we define the n × n matrix Q = v n · · · v n ⊥ · · · v n v n ⊥ ...... . . . . . .0 · · · v n K · · · v n K ⊥ , and observe that Q is an orthonormal matrix, characterized by the identity Q Q = I . The first K columns of Q can be used to form averages within each of the K blocks, whereas the remaining columnsof Q capture “differences” within each block. The two sets of columns span orthogonal subspaces thatcorrespond to distinct components of the block decomposition. Note that Q is solely defined by theblock partition, n , . . . , n K , and it is therefore invariant to the actual values taken by the elements inthe block matrix. Theorem 1.
Suppose that B is a block matrix with block partition n , . . . , n K . Then B [ i,j ] = a ij P [ i,j ] + 1 { i = j } λ i P ⊥ [ i,i ] , for i, j = 1 , . . . , K, where a ij = b ij √ n i n j , for i = j , a ii = d i + ( n i − b ii , and λ i = d i − b ii . Moreover, B = QDQ , with D = A · · · λ I n − . . . ...... . . . . . . · · · λ K I n K − . (4)The matrix Q rotates B into its canonical form, D . The first K columns of Q span an eigenspaceof B , associated with the eigenvalues that A and B have in common. The last n − K columns of Q arethe remaining eigenvectors of B .Theorem 1 can be used to characterize properties of B and simplifies the computation of sometransformations of B , including the matrix logarithm of B , which is denoted by log B . These resultsare stated in the following corollary: 5 orollary 1. Suppose that B is a block matrix as defined above. ( i ) The eigenvalues of B are givenby those of A as well as λ i = d i − b ii , i = 1 , . . . , K , so that det B = det( A ) λ n − · · · λ n K − K . ( ii ) B isinvertible, if and only if A is invertible and d i = b ii , for all i = 1 , . . . , K . ( iii ) The q -th power of theblock matrix, B q , is well-defined whenever A q and λ qi , i = 1 , . . . , K , are well-defined, in which case B q has the same block structure as B , with blocks given by B q [ i,j ] = a ( q ) ij P [ i,j ] + 1 { i = j } λ qi P ⊥ [ i,i ] , where a qij is the ij -th element of A q , for i, j = 1 , . . . , K . ( iv ) The matrix exponential of B has the sameblock structure as B , with blocks given by exp( B ) [ i,j ] = a exp ij P [ i,j ] + 1 { i = j } e λ i P ⊥ [ i,i ] , where a exp ij is the ij -th element of exp A , for i, j = 1 , . . . , K . ( v ) If log A and log λ i , i = 1 , . . . , K , exist,then log B has the same block structure as B , with blocks given by log( B ) [ i,j ] = a log ij P [ i,j ] + 1 { i = j } log λ i P ⊥ [ i,i ] , where a log ij is the ij -th element of log A . It follows that B q is well-defined for all positive integers of q , and the matrix inverse, B − , existswhenever A is invertible and λ i = 0, for all i = 1 , . . . , K , in which case B d is also well-defined for othernegative integers of d . The logarithms, log A and log( d k − b kk ), exist provided that A is invertible and d k − b kk = 0. This may result in a complex-valued solution to the matrix logarithm. If a real-valuedsolution is required, then the conditions are that A is positive definite and that d k − b kk > k = 1 , . . . , K . Many of the expressions can be simplified further, in the special case, where all block sizes are identical,so that n = n = . . . = n K = n , with n = N/K . In this situation, we have B = A ⊗ P + Λ ⊗ P ⊥ , where P is the n × n matrix with 1 /n in all entries, P ⊥ = I n − P , and Λ = diag( λ , . . . , λ K ). In this case, itfollows that h ( B ) = h ( A ) P + h (Λ) P ⊥ , where h ( · ) represents the matrix inverse, the matrix exponential,or the matrix logarithm, provided these are well-defined.6 .2 Rectangular Block Matrices Suppose that B has blocks, B [ i,j ] ∈ R n i × n j , as specified in (3), where i = 1 , . . . , K and j = 1 , . . . , K ,and K = K , so that B is a non-square matrix. Set K = max( K , K ) and suppose that K > K .Then, by appending blocks with zero elements to B , we obtain a square matrix, ˜ B = ( B, n , . . . , n K . Our results apply to ˜ B , so that it has the canonicalform ˜ B = QDQ and B = QD ˜ Q , where ˜ Q is made up of the first n + · · · + n K columns of Q . If K > K , we can instead define ˜ B = ( B , , and the results follow similarly. A block correlation matrix is characterized by the correlation coefficients that form a block structure, sothat the correlation between two variables is solely determined by the blocks to which the two variablesbelong. This results in a correlation matrix with a common correlation coefficient within each block.Block correlation matrices offer a way to parameterize large covariance matrices in a parsimoniousmanner, and can be used to impose economically relevant structures that reduce the complexity of thecovariance matrix. This structure is used in some multivariate GARCH models, see Engle and Kelly(2012) and Archakov et al. (2020).An n × n block correlation matrix, C , with K blocks, is a symmetric block matrix with blocks, C [ i,i ] = ρ ii · · · ρ ij ρ ii ρ ii and for i = j, C [ i,j ] = ρ ij · · · ρ ij ... . . . ρ ij ρ ij , (5)where ρ ii is within-block correlations, and ρ ij = ρ ji , i = j , are between-block correlations, for i, j =1 , . . . , K . For C to be a correlation matrix, we obviously need ρ ij ∈ [ − ,
1] for all i, j = 1 , . . . , K .However, this alone is not sufficient to produce a valid correlation matrix, because negative eigenvaluescan arise with some combinations of correlation coefficients, even if these area all strictly smaller thanone.The case with block equicorrelation matrices corresponds to the case where the diagonal elementsof all diagonal blocks, B [ kk ] equal d k = 1, for all k = 1 , . . . , K . So Theorem 1 fully characterizes theset of correlation coefficients that yields a positive (semi-) definite correlation matrix. We formulatethis result as a separate Corollary. Note that the canonical form, (4), for C in (5), is such that A is7ymmetric with elements given by a ij = ρ ij √ n i n j , for i = j , a ii = 1 + ρ ii ( n i − λ i = 1 − ρ ii . Corollary 2 (Block correlation matrices) . Let C be a block correlation matrix. Then det C = det A · K Y i =1 (1 − ρ ii ) n i − , so that C is a non-singular block correlation matrix if and only if A is positive definite and | ρ ii | < .In this case, both the inverse correlation matrix, C − , and the matrix logarithm, log C , have the sameblock structure as C , with blocks given by C − i,j ] = a ij P [ i,j ] + 1 { i = j } − ρ ii P ⊥ [ i,i ] , and log( C ) [ i,j ] = ˜ a ij P [ i,j ] + 1 { i = j } log(1 − ρ ii ) P ⊥ [ i,i ] , respectively, where a ij is the ij -th element of A − and ˜ a ij is the ij -th element of log A . The conditions for C in (5) to be a (possibly singular) correlation matrix is that A is positive semi-definite and | ρ ii | ≤ A is positive definite and | ρ ii | < A was previously obtained inHuang and Yang (2010, proposition 5) and in Cadima et al. (2010, theorem 3.1). The focus in Huangand Yang (2010) was on computational issues, which might explain that their paper is overlooked inmuch of the literature. Their results add valuable insight about the block-DECO model by Engleand Kelly (2012). For instance, their results provide a simple way to evaluate if a block matrix apositive definite (or semidefinite) correlation matrix. The expression for the determinant of a correlationmatrix in Corollary 2 is a simple implication of the eigenvalues derived in Huang and Yang (2010) andCadima et al. (2010), whereas the expressions for the inverse and logarithmically transformed correlationmatrices are new. We were, until recently, also unaware of the results in Huang and Yang (2010) and Cadima et al. (2010). An anonymousreferee (on a different paper than the present one) directed us to Roustant and Deville (2017) and we subsequentlydiscovered the more detailed results in Huang and Yang (2010) and Cadima et al. (2010). Some of their results, e.g.,Huang and Yang (2010, eq. 6), were rediscovered in Roustant and Deville (2017), who do not cite Huang and Yang (2010)or Cadima et al. (2010). In fact, none of the papers, Cadima et al. (2010), Huang and Yang (2010), Engle and Kelly(2012), and Roustant and Deville (2017) cite any of the other papers listed here. .1 Parametrization of Block Correlation Matrices A new parametrization of correlation matrices was introduced in Archakov and Hansen (2020). Thenew parametrization consists of the elements below the diagonal of log C (matrix logarithm of C ). Let % denote the vector with these n ( n − / n is the dimension of C ( n × n ). Archakovand Hansen (2020) showed that the % = % ( C ) is a one-to-one mapping between the set of non-singularcorrelation matrices and R n ( n − / .For a block equicorrelation matrix, C , it follows (from Corollary 2) that log C has the same blockstructure as C . So, for i = j , all elements in [log C ] i,j are identical and given by ˜ a ij √ n i n j , and the off-diagonal elements of the diagonal blocks, [log C ] k,k , k = 1 , . . . , K , are all equal to ˜ a kk − log(1 − ρ kk ) n k , where˜ a ij are the elements of log A . Thus, the unique elements of [log C ] areΛ − n [log A − log Λ − ρ ]Λ − n = Λ − n [log(Λ n R Λ n + Λ − ρ ) − log Λ − ρ ]Λ − n , where R = ρ · · · ρ K ... . . . ρ K ρ KK Λ n = √ n
0. . .0 √ n K , Λ − ρ = − ρ
0. . .0 1 − ρ KK . In this section, we focus on covariance and correlation matrices for normally distributed random vari-ables. We derive simplified expressions for the corresponding log-likelihood functions, that greatlyreduce the computational burden when n is large relative to K . We derive the maximum likelihoodestimators, and provide a simple expression for the first derivatives of the log-likelihood function withrespect to the unknown parameters (the scores).We will follow the conventional notation for covariances and variances, we write σ ij in place of b ij , i, j = 1 , . . . , K , and σ k in place of d k , k = 1 , . . . , K . Similarly, for correlation matrices we write ρ ij inplace of b ij , and observe that d k = 1.The density function for the multivariate Gaussian distribution with mean zero and an n × n covari-ance matrix, Σ, is f ( x ) = (2 π ) − n (det Σ) − exp( − x Σ − x ). Suppose that Σ has the block structuregiven by ( n , . . . , n K ), so that it can be expressed as Σ = QDQ , using the canonical representation.9he corresponding log-likelihood function (multiplied by −
2) can now be expressed as − ‘ = n log 2 π + log det D + X QD − Q X, where D = diag( A, λ I n − , . . . , λ K I n K − ), with λ i = σ i − σ i,i . a ij = σ i + ( n i − σ i,i for i = j,σ i,j √ n i n j for i = j. So, if we define Y = ( y , y , . . . , y K ) = Q X , where y is K -dimensional and y k is n k − k = 1 , . . . , K , then it follows that − ‘ = n log 2 π + log det A + y A − y + K X k =1 (cid:16) ( n k −
1) log λ k + y k y k λ k (cid:17) . (6)This expression of the log-likelihood function shows that the block structure yields a considerablesimplification in the evaluation of the log-likelihood. Instead of inverting the n × n matrix Σ andcomputing det Σ, it suffices to invert the smaller K × K matrix, A , and evaluate its determinant.Moreover, the maximum likelihood estimator based on a random sample, X , . . . , X N , is easily expressedin terms of the transformed variables, Y = Q X , . . . , Y N = Q X N , as formulated in the followingTheorem. Theorem 2.
Suppose that X , . . . , X N are independent and identically distributed as N (0 , Σ) , where Σ is a block covariance matrix with block partition, n , . . . , n K . Define the transformed variables, Y s = Q X s , s = 1 , . . . N , where Y s = ( y ,s , y ,s , . . . , y K,s ) and where y ,s is K -dimensional and y k,s is n k − dimensional, k = 1 , . . . , K .The maximum likelihood estimator of Σ is given by ˆΣ = Q ˆ DQ , where ˆ D = diag( ˆ A, ˆ λ I n − , . . . , ˆ λ K I n K − ) with ˆ A = 1 N N X s =1 y ,s y ,s and ˆ λ k = 1 N N X s =1 y k,s y k,s n k − , k = 1 , . . . , K. The maximum likelihood estimates of the individual parameters can be obtained directly from ˆ A and ˆ λ k , k = 1 , . . . , K . For i = j , it follows from the definition of A that ˆ σ i,j = ˆ a ij / √ n i n j . For i = j ,we have ˆ σ i,i = (ˆ a ii − ˆ λ i ) /n i and ˆ σ i = ˆ λ i + ˆ σ i,i = n i ˆ a ii + n i − n i ˆ λ i .In the special case where a block has size one, we have Σ [ k,k ] = σ k and σ k,k is obviously undefined.10n this situation, the corresponding variables, y k,s , s = 1 , . . . , N , are also undefined, and hence, sois ˆ λ k . Yet the expressions for the maximum likelihood estimators continue to be valid, including theexpression for ˆΣ in Theorem 2. If n k = 1, then ˆ σ k = ˆ a kk , while the expression for ˆ σ k,k is undefined andcan be ignored.Estimation when the correlation matrix is assumed to have a block structure, as opposed to thecovariance matrix, is similar. However, a block correlation matrix is entirely given by the A -matrix,and computation of the eigenvalues λ , . . . , λ K is redundant. Corollary 3.
Suppose that X , . . . , X N are independent and identically distributed as N n (0 , Σ) , where Σ = Λ σ C Λ σ with Λ σ = diag( σ , . . . , σ n ) and C is a block correlation matrix with block partition, n , . . . , n K . The maximum likelihood estimates of the diagonal elements of Σ are given by ˆ σ i = N − P Ns =1 X i,s , for i = 1 , . . . , n . Next define ˜ X i,s = X i,s / ˆ σ i , and introduce the transformed vari-ables, ˜ Y s = Q ˜ X s , s = 1 , . . . N , where ˜ Y s = (˜ y ,s , ˜ y ,s , . . . , ˜ y K,s ) and where ˜ y ,s is K -dimensional and ˜ y k,s is n k − dimensional, k = 1 , . . . , K .The maximum likelihood estimator of C is given by ˆ C = Q ˜ DQ , where ˜ D = diag( ˜ A, ˜ λ I n − , . . . , ˜ λ K I n K − ) with ˜ A = 1 N N X s =1 ˜ y ,s ˜ y ,s and ˜ λ k = n k − ˜ a kk n k − , k = 1 , . . . , K. So the estimate of D can be obtained solely from ˜ A . For the individual correlations we haveˆ ρ i,j = ˜ a ij / √ n i n j , for i = j , and for i = j , we have ˆ ρ i,i = (˜ a ii − /n i .The score of the log-likelihood function is often of separate interest. For instance, the score isused for the computation of robust standard errors, in Lagrange multiplier tests, in tests for structuralbreaks, see e.g., Nyblom (1989), and in dynamic models with time-varying parameters (the so-calledscore-drive models), see Creal et al. (2013). So we provide the expressions for the score in this contextwith a block covariance matrix.Suppose that Σ is a block covariance matrix, and consider its canonical representation Σ = QDQ .Since Q is entirely given by the block partition ( n , . . . , n K ), and does not depend on the unknownparameters in Σ, the expressions for the partial derivatives are relatively simple. Proposition 1.
Let
Σ =
QDQ be the canonical representation of Σ . Then ∂ ( − ‘ ) /∂A = M =11 − − A − y y A − and for, k = 1 , . . . , K , we have ∂ ( − ‘ ) ∂σ k = M k,k + (cid:18) n k − λ k − y k y k λ k (cid:19) ∂ ( − ‘ ) ∂σ kk = ( n k − M k,k − (cid:18) n k − λ k − y k y k λ k (cid:19) , and, for i = j , we have ∂ ( − ‘ ) ∂σ ij = 2 √ n i n j M i,j . The hessian could be derived similarly. In some applications, it might be preferable to parametrizethe block covariance matrix with A and ( λ , . . . , λ K ). In this case, one can use ∂ ( − ‘ ) /∂A = M , and ∂ ( − ‘ ) /∂λ k = n k − λ k − y k y k λ k , for k = 1 , . . . , K . We proceed to illustrate how high-dimensional covariance matrices with a block structure are straight-forward to estimate in practice. We estimated block structures for a large panel of assets for twocalendar years, 2008 and 2013, using daily returns. We included all stocks in the CRSP database thatcould be matched with a unique ticker symbol, and which did not have any missing observations. Thisresulted in 3958 assets in 2008 and 2998 assets in 2013. The objective of this empirical application isdemonstrate that high-dimensional covariance matrices can be estimated with relatively few observationonce block structures are imposed, and that the canonical representation makes it simple to evaluatethe log-likelihood function and to obtain the maximum likelihood estimates. Given the well-knownvariation in conditional variances and covariances, our estimated covariance matrices should be viewedas estimates of the average covariance matrix for 2008 and 2013, rather than an accurate descriptionof the data generating process.We impose five nested structures on the correlation matrix, where the equicorrelation structure( K = 1) is the simplest and most restrictive model. The remaining four correlation models use blockstructures defined by the Sector, Group, Industry, and Sub-Industry categories, as classified by theGlobal Industry Classification Standard (GICS) in 2013. The five specifications correspond to K = 1,10, 24, 67, and 151, respectively, in 2008, and the same number of blocks in 2013, except for Sub-Industry categories, which had K = 146.We estimated the canonical correlation matrix using the results in Corollary 3. Thus, we firstcompute the estimates of the variances, ˆ σ i = P Nt =1 X i,t , for each of the individual assets. Thenwe define the standardized variables, ˜ X i,t = X i,t / ˆ σ i , and ˜ y ,t ∈ R K , whose k -th element is given by12 √ n k P n + ··· + n k i = n + ··· + n k − +1 ˜ X i,t . From Corollary 3, we have ˆ ρ i,j = ˜ a ij / √ n i n j , for i = j , and ˆ ρ i,i = (˜ a ii − /n i ,where ˜ A = N P Nt =1 ˜ y ,t ˜ y ,t . Thus, the entire n × n covariance matrix with a block correlation structureis estimated by computing the estimates of the n variances, and the K × K matrix ˜ A . Given thehigh number of assets and just over 250 daily returns, the unrestricted sample covariance matrix wouldbe singular, because most of its eigenvalues will be zero. Once a block structure is imposed, we cancompute the inverse covariance matrix from the invertible K × K matrix, ˜ A , and it becomes simple toevaluate the log-likelihood function.The empirical results are summarized in Table 1. We report the range of estimated correlationsfor each of the block structures, with the results for 2008 and 2013 are reported separately. Therange of estimated correlations. i.e., the interval between the smallest and the largest coefficient inthe correlation matrix, obviously increases with the number of blocks in the correlation matrix, andthe correlations are generally higher in 2008 than in 2013. These estimates are likely biased, becausethey entail cross-sectional averaging within each sector/group/industry and time averaging, over a fullcalendar year. We also report the value of the maximized log-likelihood function (scaled by − / ( nN ))and the corresponding value of the Bayesian Information Criterion (BIC). The minimum BIC is obtainedwith a block structures based on Groups in both 2008 and 2013. The last column reports the number K ( K + 1) / K blocks, and while this numberincreases rapidly with K , the gains in the log-likelihood are relatively modest. Consequently, the BICincreases substantially once the number of blocks are defined by Industries and Sub-industries.The estimated block structures are illustrated in Figures 1 and 2. The upper panels of Figure 1 showthe estimated correlation coefficients of assets within and between sectors. The lower panels are theestimates for the 24 groups, with the actual estimates indicated by color coding. A darker shade of reddenotes a stronger correlation. The left panels are for 2008 and the right panels are for 2013. Figure 2presents the estimated correlations using a block structure based on industries and sub-industries. Weobserve that the correlations were generally higher in 2008 than in 2013, in part because of the turmoilperiod leading up to, and following, the collapse of Lehman Brothers in late 2008. The block structureis perhaps more visible in 2008, which might be explained by the Global Financial Crises having adifferentiated impact on different sections. For instance, Figure 1 shows that the correlations betweenthe Energy (10), Materials (15), and Utilities (55) sectors were relatively high, while Financials (40)were relatively uncorrelated with other sectors in 2008. The partition by the GICS groups in the lower The BIC adds the penalty p log( nN ) to − ‘ , where p is the number of free parameters. For comparison, the AIC,which uses the penalty 2 p , selects the most general specification in both years. It is well known that the AIC tends tofavor more heavily parametrized models. Figure 2 presents thecorresponding results for industries and sub-industries. The number of blocks are too plentiful to belisted individually, however the industries and sub-industries are placed chronologically in the ascendingorder in Figure 2 according to their GICS code.
We have derived a canonical representation of block matrices, that is particularly useful for covarianceand correlation matrices. We derived a number of expressions that greatly simplify the computationof the log-likelihood function. We illustrated this in an empirical application, where we estimatedthe covariance matrix for nearly 4000 stocks returns using daily returns from a single calendar year,i.e., just over 250 observations. Inverting the covariance matrix, and evaluating the log-likelihood isstraightforward once a block structure is imposed, where we used as many as K = 151 blocks, motivatedby the Global Industry Classification Standard.The canonical representation and the related results are potentially useful for regularizing largecovariance matrices. For instance, one could shrink the sample correlation matrix towards a blockcorrelation matrix, analogous to the way Ledoit and Wolf (2004) proposed to shrink towards theequicorrelation matrix. The canonical representation also paves new way to testing block structures incovariance and correlation matrices. This predominantly amounts to testing a large number of zero-restrictions in the canonical representation. We identified a number of transformations that preservesthe block structures, so testing of block structures could be based on any of the transformations, ratherthan the original matrix. For instance, block structures in a correlation matrix C would be tested onthe canonical representation for log C . This is potentially interesting, because the connection betweenlogarithmically transformed correlation matrix and the Fisher transformation, see Archakov and Hansen(2020). Finally, the group assignments, and hence K , will be unknown in many empirical applications.The literature has therefore proposed various techniques that aim to determine the most appropriateblock structure. It is possible that the canonical representation will be useful for this type of modelselection problem. Note that we are using the GICS classification as it was in 2013, where Real Estate (4040) was a part of the FinancialsSector. A separate Real Estate Sector (50) was added to the GICS classification in 2016. ppendix of Proofs Proof of Theorem 1.
For i = j , we have B [ i,j ] = a ij P [ i,j ] if a ij = b ij √ n i n j , since the elementsof P [ i,j ] are all equal to √ n i n j . For i = j , the diagonal elements differs from off-diagonal elements by λ i = d i − b ii , so that B [ i,i ] = b ii n i P [ i,i ] +( d i − b ii ) I n i . Since I n i = P [ i,i ] + P ⊥ [ i,i ] , we have B [ i,i ] = ( b ii n i + d i − b ii ) P [ i,i ] + ( d i − b ii ) P ⊥ [ i,i ] = a ii P [ i,i ] + λ i P ⊥ [ i,i ] . The canonical representation, (4), follows by verifying that Q BQ is equal to the block-diagonal matrix in (4). This follows from the identities: v n i P [ i,j ] v n j = 1, v n i P [ i,j ] v n j ⊥ = 0, v n i ⊥ P [ i,j ] v n j ⊥ = 0, v n i P ⊥ [ i,i ] v n i = 0, v n i P ⊥ [ i,i ] v n i ⊥ = 0, and v n i ⊥ P ⊥ [ i,i ] v n i ⊥ = I n i − , andthe fact that Q Q = I n , so that Q − = Q , and hence B = QQ BQQ . This proves (4). (cid:3) Proof of Corollary 1.
The first result for the eigenvalues of B and the determinant of B , followsimmediately from (4). The results for f ( B ), where f denotes the q -th power of a matrix, the matrixexponential, or the matrix logarithm, follow by f ( B ) = Qf ( D ) Q and using the structure in Q , suchas v n i v n j = P [ ij ] and v n i ⊥ v n i ⊥ = P ⊥ [ ii ] . This completes the proof. (cid:3) Proof of Corollary 2.
It follows from Theorem 1 and Corollary 1 by setting d k = 1 for all k . Someexpressions can also be verified directly. For instance, one can verify the expression for C − , by notingthat diagonal blocks of C − are given by( C − ) [ i,i ] = K X k =1 a ik P [ ik ] a ki P [ k,i ] + (1 − ρ ii ) P ⊥ [ i,i ] 11 − ρ ii P ⊥ [ i,i ] = K X k =1 a ik a ki P [ i,i ] + P ⊥ [ i,i ] = I, where we used that a ki are the elements of the A − so we have P Kk =1 a ik a ki = 1. Next, for i = j , wehave ( C − ) [ i,j ] = K X k =1 a ik P [ i,k ] a kj P [ k,j ] + a ij b j P [ i,j ] P ⊥ [ j,j ] + b i a ij P ⊥ [ i,i ] P [ i,j ] = K X k =1 a ik a kj P [ i,j ] = 0 , where we used that P [ i,k ] P [ k,j ] = P [ i,j ] and P ij P ⊥ [ j,j ] = P [ i,j ] ( I s j − P [ j,j ] ) = 0, and that P Kk =1 a ik a kj = 0,for i = j . This completes the proof. (cid:3) Proof of Theorem 2.
The expression, (6), shows that the log-likelihood function is made up of twoterms: − N " log det A + tr { A − N N X s =1 Y ,s Y ,s } , and − N K X k =1 ( n k − log λ k + N P Ns =1 Y k,s Y k,s n k − λ k .
15t is well known that ˆ A = N P Ns =1 Y ,s Y ,s maximizes the first term and that ˆ λ k = N P Ns =1 Y k,s Y k,s n k − maximizes the elements of the second term. Since ( A, λ , . . . , λ K ) is merely a reparametrization ofthe elements of the block covariance matrix Σ, it follows that ˆΣ = Q ˆ DQ is the maximum likelihoodestimator of Σ. It is easy to verify that this result is also valid in the special case, where one or moreof the blocks are 1-dimensional. In this case, σ ii is undefined, and so is ˆ λ k . In this case, σ i is identifiedfrom the corresponding diagonal element of A , since ˆ a ii = σ i , when n i = 1. (cid:3) Proof of Corollary 3.
We have det Σ = det(Λ σ C Λ σ ) = Q ni =1 σ i det C = Q ni =1 σ i det D . So thelog-likelihood function for the observation, X ∈ R n , is − ‘ ( σ , . . . , σ n , C ) = n log 2 π + n X j =1 log σ j + log det C + X Λ − σ C Λ − σ X, and given a sample X , . . . , X N , the first order condition for σ j :0 = Nσ j − N X t =1 1 σ j X t e j e j Ce j e j X t = Nσ j − σ j N X t =1 X t e j e j X t = Nσ j − σ j N X t =1 X j,t , is invariant to C . So that ˆ σ j = N P Nt =1 X j,t is the maximum likelihood estimator of σ j regardless ofthe structure imposed on C . The concentrated log-likelihood function for the observation X ∈ R n canbe expressed as − ‘ ( C ) ≡ − ‘ (ˆ σ , . . . , ˆ σ n , C ) = n log 2 π + n X j =1 log ˆ σ j + log det D + ˜ X Q DQ ˜ X, and it follows from the proof of Theorem 2 that minimizing N log det D + P Nt =1 ˜ X s Q DQ ˜ X t is solvedby the D -matrix whose elements are given by ˜ A = N P Nt =1 ˜ y ,t ˜ y ,t and ˜ λ k = N P Nt =1 ˜ y k,t ˜ y k,t n k − , for k =1 , . . . , K . For Q ˜ DQ to be a correlation matrix (have ones along its diagonal), we need ˜ a ii = 1+( n i −
1) ˆ ρ ii and ˜ λ i = 1 − ˆ ρ ii , which implies ˜ λ i = n i − ˜ a ii n i − . (cid:3) Proof of Proposition 1.
Recall that a kk = σ k + ( n k − σ kk , a ij = σ ij √ n i n j , for i = j , and λ k = σ k − σ kk . It follows that ∂ (log det A + y A − y ) ∂a ij = tr { A − ( e i e j )( I − A − y y ) = e j ( I − A − y y ) A − e i = M j,i , M = A − − A − y y A − . From the expression (6), we find ∂ ( − ‘ ) ∂σ k = ∂ (log det A + y A − y ) ∂a kk + ( n k − λ k − y k y k λ k ) = M k,k + ( n k − λ k − y k y k λ k ) ∂ ( − ‘ ) ∂σ kk = ( n k − ∂ (log det A + y A − y ) ∂a kk − (cid:18) n k − λ k − y k y k λ k (cid:19) = n k M k,k − ∂ ( − ‘ ) ∂σ k , and, for i = j , we find that ∂ ( − ‘ ) ∂σ ij = √ n i n j ∂ (log det A + y A − y ) ∂a ij + ∂ (log det A + y A − y ) ∂a ji ! = 2 √ n i n j M i,j , where we used that M is symmetric. (cid:3) eferences Archakov, I. and Hansen, P. R. (2020), ‘A new parametrization of correlation matrices’,
Econometrica forthcoming .Archakov, I., Hansen, P. R. and Lunde, A. (2020), ‘A Multivariate Realized GARCH Model’,
Working Paper .Cadima, J., Calheiros, F. L. and Preto, I. P. (2010), ‘The eigenstructure of block-structured correlation matrices and itsimplications for principal component analysis’,
Journal of Applied Statistics , 577–589.Creal, D. D., Koopman, S. J. and Lucas, A. (2013), ‘Generalized autoregressive score models with applications’, Journalof Applied Econometrics , 777–795.Engle, R. and Kelly, B. (2012), ‘Dynamic Equicorrelation’, Journal of Business & Economic Statistics (2), 212–228.Huang, J. and Yang, L. (2010), ‘Correlation matrix with block structure and efficient sampling methods’, Journal ofComputational Finance , 81–94.Ledoit, O. and Wolf, M. (2004), ‘Honey, I shrunk the sample covariance matrix’, Journal of Portfolio Management , 110–119.Nyblom, J. (1989), ‘Testing for the constancy of parameters over time’, Journal of the American Statistical Association , 223–230.Roustant, O. and Deville, Y. (2017), ‘On the validity of parametric block correlation matrices with constant within andbetween group correlations’, arXiv math.ST/1705.09793 .Viana, M. and Olkin, I. (1997), Correlation analysis of ordered observations from a block-equicorrelated multivariatenormal distribution, in B. N. Panchapakesan S., ed., ‘Advances in Statistical Decision Theory and Applications’,Birkhäuser, Boston. ables and Figures Summary statistics of estimated block correlations Q Q Q Max − ‘nN nN BIC K K ( K +1)2 U.S. market in 2008 (3958 stocks and 253 days)
Equicorrelation 0.228 0 0.228 0.228 0.228 0.228 0.228 2.57829 2.57830 1 1Sectors 0.269 0.073 0.162 0.191 0.252 0.381 0.521 2.53748 2.53824 10 55Groups 0.253 0.059 0.119 0.177 0.253 0.321 0.521 2.52519 2.52933 24 300Industries 0.263 0.074 0.088 0.172 0.258 0.362 0.659 2.51123 2.54266 67 2278Sub-industries 0.273 0.095 -0.036 0.157 0.268 0.394 0.886 2.48261 2.64086 151 11476
U.S. market in 2013 (2998 stocks and 252 days)
Equicorrelation 0.165 0 0.165 0.165 0.165 0.165 0.165 2.66583 2.01128 1 1Sectors 0.181 0.059 0.107 0.126 0.168 0.242 0.507 2.63742 1.99057 10 55Groups 0.174 0.045 0.092 0.126 0.166 0.231 0.507 2.62850 1.98715 24 300Industries 0.183 0.060 0.065 0.118 0.173 0.260 0.712 2.61095 2.00065 67 2278Sub-industries 0.182 0.067 -0.040 0.108 0.175 0.268 0.810 2.58182 2.09290 146 10731
Table 1: Summary statistics for the estimated block correlation matrices.19
U.S. market, 2008
10 15 20 25 30 35 40 45 50 5510152025303540455055
U.S. market, 2013
U.S. market, 2008
U.S. market, 2013
Figure 1: Estimated correlations for a block structure based on GICS sectors (upper panels) and GICSgroups (lower panels). Left panels are the estimates based on 253 daily returns in 2008, and right panelsare the estimated based on 252 daily returns from 2013. The numbers to the left and below each plotare GICS codes for sectors or groups. 20 .S. market, 2008 U.S. market, 2013
U.S. market, 2008 U.S. market, 20130.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8