aa r X i v : . [ s t a t . M E ] F e b Correlation Based Principal Loading Analysis
Jan O. Bauer
Baden-Wuerttemberg Cooperative State University Mannheim
Mannheim, [email protected] ✦ Abstract —Principal loading analysis is a dimension reduction methodthat discards variables which have only a small distorting effect onthe covariance matrix. We complement principal loading analysis andpropose to rather use a mix of both, the correlation and covariancematrix instead. Further, we suggest to use rescaled eigenvectors andprovide updated algorithms for all proposed changes.
Index Terms —Component Loading, Dimensionality Reduction, MatrixPerturbation Theory, Principal Component Analysis, Principal LoadingAnalysis
NTRODUCTION
Principal loading analysis (PLA) is a tool developed by [1]to reduce dimensions. Their method chooses a subset ofobserved variables by discarding the other variables basedon the impact of the eigenvectors on the covariance matrix.While the method itself is new, some parts of PLA corre-spond with principal component analysis (PCA) which is apopular dimension reduction technique first formulated by[6] and [3]. Despite their intersection however, the outcomeis as different as can be since PCA yields a reduced set ofvariables by transforming the original variables while PLAselects a subset of the original variables. Nonetheless, PLApartially adopts established concepts from PCA.PLA is originally based on the covariance matrix. However,we feel that this comes at a price due to the lack of scaleinvariance of the covariance matrix and, hence, we proposerather to use both, the covariance and the correlation matrix.Each for different steps of PLA. Therefore, our contributionis an adjusted method of PLA. Further, we suggest to rescalethe eigenvectors and we provide simulations to find optimalcut-off values.This article is organized as follows: Section 2 providesnotation needed for the remainder of this work. In Section 3,we recap PLA based on the covariance matrix. The focusof this article is Section 4 where we elaborate the issueregarding the usage of the covariance matrix. Our mainresult is summarized in Corollary 4.1. In Section 5 andSection 6 we provide updated algorithms based on the cor-relation matrix while we implement rescaled eigenvectorsin the latter algorithm. Section 7 contains our conclusion foroptimal threshold values from the simulation studies whilewe also briefly cover simulation difficulties regarding PLA.Finally, we summarize our work in Section 8.
ETUP
Let x = (cid:0) x · · · x M (cid:1) be a N × M sample containing n ∈ { , . . . , N } observations of a random vector X =( X · · · X M ) with covariance matrix Σ = ( σ i,j ) for definedindices i, j ∈ { , . . . , M } throughout this work. We considerthe case when the covariance matrix is slightly perturbed bya sparse matrix E = ( ε i,j ) such that ˜ Σ ≡ Σ + E . E is a technical construction and contains small componentswe want to extract from Σ . Hence, ε i,j = 0 ⇒ σ i,j = 0 . Thesample counterpart of ˜ Σ is of the form ˆ˜ Σ ≡ (ˆ˜ σ i,j ) ≡ Σ + E + H N where H N = ( η i,j | N ) is a perturbation in the form of arandom noise matrix. The noise is due to having only afinite number of observations in the sample. The correlationmatrix and the sample correlation matrix are denoted by ˜ P and ˆ˜ P respectively. We consider the eigendecomposition of ˆ˜ Σ to be given by ˆ˜ Σ ≡ ˆ˜ V ˆ˜ Λ ˆ˜ V ⊤ (2.1)with ˆ˜ V ⊤ ˆ˜ V = I , ˆ˜ Λ = diag(ˆ˜ λ , . . . , ˆ˜ λ M ) and ˆ˜ λ ≥ . . . ≥ ˆ˜ λ M .The eigenvectors ˆ˜ V = (cid:16) ˆ˜ v · · · ˆ˜ v M (cid:17) are ordered accord-ing to the respective eigenvalues. The eigendecompositionof ˜ Σ is denoted analogously. The eigendecomposition of thesample correlation matrix is given by ˆ˜ P = ˆ˜ U ˆ˜ Ω ˆ˜ U ⊤ where ˆ˜ U ⊤ ˆ˜ U = I and ˆ˜ Ω = diag(ˆ˜ ω , . . . , ˆ˜ ω M ) .When two blocks of random variables X i , . . . , X i I and X j , . . . , X j J are uncorrelated we write ( X i , . . . , X i I ) c ⊥⊥ ( X j , . . . , X j J ) . Further, we recap the definition of ε -uncorrelatedness provided by [1]: Definition 2.1.
We say that two blocks of random variables X i , . . . , X i I and X j , . . . , X j J are ε -uncorrelated if ˜ σ i ¯ i ,j ¯ j = ε i ¯ i ,j ¯ j for ¯ i ∈ { , . . . , I } and ¯ j ∈ { , . . . , J } . We then write ( X i , . . . , X i I ) ε ⊥⊥ ( X j , . . . , X j J ) . For practical purposes we define
D ⊂ { , . . . , M } such that |D| = M ∗ where ≤ M ∗ < M . D will contain the indicesof the variables { X d } ≡ { X d } d ∈D we consider to discardand we introduce the shortcut d c / ∈ D when referring tothe case that d c ∈ { , . . . , M }\D . In the same spirit, ∆ with | ∆ | = M ∗ will be used to index eigenvectors with respectiveeigenvalues linked to the { X d } and δ c / ∈ ∆ refers to δ c ∈{ , . . . , M }\ ∆ . Further, the elements of any ( M × vector ξ are denoted by ξ = (cid:0) ξ (1) · · · ξ ( M ) (cid:1) ⊤ . RINCIPAL L OADING A NALYSIS
PLA is a tool for dimension reduction where a subset ofexisting variables is selected while the other variables arediscarded. The intuition is that blocks of variables are dis-carded which distort the covariance matrix only slightly. Itwill turn out that those blocks are specified by ˆ˜ Σ . Firstly, werecap PLA and deepen the understanding of the explainedvariance in subsection 3.1 and afterwards restate the proce-dure of PLA in subsection 3.2. We start to recap the method by assuming that only a singleblock, consisting of the variables { X d } is discarded. Thosevariables distort the covariance matrix only by a little ifthe M ∗ rows indexed by d ∈ D of M − M ∗ eigenvectorsare small in absolute terms and if the contribution of thoseeigenvectors to the explained variance is large, hence if thecontribution of the other eigenvectors is small.We assume that the eigenvectors { ˆ˜ v δ c } d c / ∈ ∆ are the eigen-vectors with small absolute elements {| ˆ˜ v ( d ) δ c |} d ∈D ,δ c / ∈ ∆ .Consequently, { ˆ˜ v δ } δ ∈ ∆ contain small absolute elements {| ˆ˜ v ( d c ) δ | ≤ τ } δ ∈ ∆ ,d c / ∈D , where τ is a chosen threshold, sincethe eigenvectors are orthonormal due to the symmetry of ˆ˜ Σ . The percent contribution of the { ˆ˜ v δ } δ ∈ ∆ to the explainedvariance of ˆ˜ Σ is then given by X i ˆ˜ λ i ! − X δ ∈ ∆ ˆ˜ λ δ X d ∈D (ˆ˜ v ( d ) δ ) + X δ c / ∈ ∆ ˆ˜ λ δ c X d ∈D (ˆ˜ v ( d ) δ c ) (3.1)which equals the contribution of the block containing the { X d } . The intuition of (3.1) is as follows: considering τ = 0 ,the expression reduces to X i ˆ˜ λ i ! − X δ ∈ ∆ ˆ˜ λ δ ! (3.2)since ˆ˜ v ( d ) δ c = ˆ˜ v ( d c ) δ = 0 for all d ∈ D , d c / ∈ D , δ ∈ ∆ and δ c / ∈ ∆ . Hence, it also holds that k ˆ˜ v δ k = P d ∈D (ˆ˜ v ( d ) δ ) . (3.2)then is the percent contribution of the linear combination of { X d } to the explained variance since Var X d ∈D ˜ v ( d ) δ X d ! = Var(˜ v ⊤ δ X ) = ˜ v ⊤ δ ˜ Σ ˜ v δ = ˜ λ δ is the population contribution of the { X d } into the directionof ˜ v δ and since the population contribution of the { X d } inall directions is therefore given by P δ ∈ ∆ ˜ λ δ . Consequently,the sample counterpart is P δ ∈ ∆ ˆ˜ λ δ which we divide by theoverall explained variance P i ˆ˜ λ i in order to obtain a percentcontribution. Note that the elements in (3.1) are squaredsince the eigenvectors are normed one and therefore eachsquared element, say the i -th element, can be interpreted as the percent contribution of the corresponding randomvariable X i into the direction of the eigenvector. [1]Since usually τ = 0 with τ small however, (3.2) servesas a fairly good approximation of (3.1). This is due to thesparseness of H which bounds max j | ˜ λ j − λ j | ≤ k H k F asshown by [1] and further because ˆ˜ v ( d ) δ c ≈ ˆ˜ v ( d c ) δ ≈ . Variables cause small components in the eigenvectors whenthe variables are arranged in blocks such that ( X , . . . , X M ) | {z } κ -many ε ⊥⊥ . . . ε ⊥⊥ ( X M L − +1 , . . . , X M L ) | {z } κ L -many which can be denoted as ˜ Σ |{z} κ × κ ε ⊥⊥ . . . ε ⊥⊥ ˜ Σ L |{z} κ L × κ L (3.3)to emphasize the block structure. This follows since thepopulation eigenvectors of Σ are then of shape (cid:18) ∗ κ (cid:19) , . . . , (cid:18) ∗ κ (cid:19)| {z } κ -many , ∗ κ , . . . , ∗ κ | {z } κ -many , . . . , (cid:18) ∗ κ L (cid:19) , . . . , (cid:18) ∗ κ L (cid:19)| {z } κ L -many (3.4)where ∗ κ l with l ∈ { , . . . , L } are vectors of length κ l and are vectors of suitable dimension containing zeros. The first κ b eigenvectors have (at least) M − κ b zero-components,the following κ b eigenvectors have (at least) M − κ b zero-components and so on. The eigenvectors of ˆ˜ Σ follow thesame shape however they are slightly perturbed due to E and distorted by the noise H N .PLA for discarding, say, B blocks Σ b , . . . , Σ b B with Σ b β c ⊥⊥ Σ b l ∀ l = β for β ∈ { , . . . , B } is then given by the followingalgorithm provided by [1]. Algorithm 3.1 ( PLA ) . Discard the variables corresponding to Σ b , . . . , Σ b B according to PLA proceeds as follows: Check if the eigenvectors of Σ satisfy the required struc-ture in (3.4) to discard Σ b , . . . , Σ b B . Decide if Σ b , . . . , Σ b B are relevant according to theexplained variance of the realisations { x d } of their con-tained random variables { X d } by calculating (3.1) (or(3.2)). Discard Σ b , . . . , Σ b B . SSUES W HEN U SING THE C OVARIANCE M ATRIX
While [1] propose to check if the absolute elements of theeigenvectors of the covariance matrix are below a threshold τ , we complement their results by providing an incentiveto consider the usage of the correlation matrix instead.Our contribution is to show that the small elements of theeigenvectors converge towards zero when { Var( X d ) } d ∈D increase. This is a concern because the variance is not scaleinvariant hence PLA might yield different results for the
1. Note that we can assume that the covariance matrix behaves inthis convenient way because we can always obtain this structure usinga permutation matrix. same but rescaled data set.We consider to drop only a single block, say, Σ b containing { X d } where it is assumed for convenience purposes that D = { , . . . , M ∗ } . Again, we can assume those elements for D because we can obtain this structure of the covariancematrix using a permutation matrix. The extension to thegeneral case when discarding several blocks is analogue. Theorem 4.1.
Let D and ∆ as introduced in section 2 and ˆ˜ σ jj ≡ ˆ˜Var( X j ) and let i ∈ { , . . . , M } . We assume that ˆ˜ λ δ = 0 . Foreach d ∈ D there exists one δ ∈ ∆ such that ∂ | ˆ˜ v ( i ) δ | ∂ ˆ˜Var( X d ) < and ∂ ˆ˜ v ( d ) δ ∂ ˆ˜Var( X d ) > for i = d and as long as | ˆ˜ v ( d ) δ | 6 = 1 and | ˆ˜ v ( i ) δ | 6 = 0 .Proof. Let i, m ∈ { , . . . , M } with i = d . From the trace tr( ˆ˜ Σ ) = X m ˆ˜ λ m = X m ˆ˜ σ mm ≡ X m ˆ˜Var( X m ) (4.1)we can conclude that ˜ λ j = P m ˆ˜Var( X m ) − P m = j ˆ˜ λ j . If weconsider now that ˆ˜Var( X j ) changes by, say, µ j ˆ˜Var( X j ) ˆ˜Var( X j ) + µ j it holds that ˆ˜ λ m ˆ˜ λ m + p mj µ j (4.2)changes as well with p mj ∈ [0 , and P m p mj = 1 such that(4.1) is satisfied.Further, from the eigendecomposition in (2.1) we can con-clude that ˆ˜ v δ = ˆ˜ λ − δ ˆ˜ Σ ˆ˜ v δ and hence from (4.1) that ˆ˜ v ( i ) δ = P m ˆ˜ σ im ˆ˜ v ( m ) δ ˆ˜ λ δ = P m ˆ˜ σ im ˆ˜ v ( m ) δ P m ˆ˜ σ mm − P m = δ ˆ˜ λ m (4.3) ˆ˜ v ( d ) δ = P m ˆ˜ σ dm ˆ˜ v ( m ) δ ˆ˜ λ δ = ˆ˜ σ dd ˆ˜ v ( d ) δ + P m = d ˆ˜ σ dm ˆ˜ v ( m ) δ P m ˆ˜ σ mm − P m = δ ˆ˜ λ m (4.4)which can both be considered as a function of ˆ˜Var( X d ) ≡ ˆ˜ σ dd . We can now derive the partial derivatives.Case I: starting with the case that p δd ∈ (0 , i.e. that p δd = 0 and P m = δ p md < , we obtain from (4.3) due to (4.1) and(4.2) that ˆ˜ v ( i ) δ (cid:16) ˆ˜Var( X d ) + µ d (cid:17) ≡ ˆ˜ v ( i ) δ (cid:16) ˆ˜ σ dd + µ d (cid:17) = P m ˆ˜ σ im ˆ˜ v ( m ) δ µ d (cid:16) − P m = δ p md µ d (cid:17) + P m ˆ˜ σ mm − P m = δ ˆ˜ λ m = P m ˆ˜ σ im ˆ˜ v ( m ) δ µ d (cid:16) − P m = δ p md µ d (cid:17) + ˆ˜ λ δ = P m ˆ˜ σ im ˆ˜ v ( m ) δ p δd µ d + ˆ˜ λ δ . (4.5) Let f ( µ d ) ≡ (cid:12)(cid:12) ˆ˜ v ( i ) δ (cid:16) ˆ˜Var( X d ) + µ d (cid:17) (cid:12)(cid:12) − (cid:12)(cid:12) ˆ˜ v ( i ) δ (cid:16) ˆ˜Var( X d ) (cid:17) (cid:12)(cid:12) .From (4.3) and (4.5) we see that lim µ d → f ( µ d ) = 0 . Hence,the partial derivative is given by ∂ | ˆ˜ v ( i ) δ | ∂ ˆ˜Var( X d ) = lim µ d → f ( µ d ) µ d = lim µ d → ∂f ( µ d ) ∂µ d = − | P m ˆ˜ σ im ˆ˜ v ( m ) δ | ˆ˜ λ δ | ˆ˜ λ δ | (4.6)where we used L’Hospital’s rule in the second step since lim µ d → µ d = 0 as well as ∂µ d /∂µ d = 1 . The final step is animmediate result of the chain rule ∂/∂x · / | x | = ∂/∂ | x | · / | x | · ∂ | x | /∂x . The result follows since ˆ˜ Σ is positive semi-definite by construction hence ˆ˜ λ δ > because ˆ˜ λ δ = 0 isassumed.To obtain the second result we conclude from k ˆ˜ v δ k = 1 that ˆ˜ v ( d ) δ = s − X m = d (ˆ˜ v ( m ) δ ) . (4.7)If now | ˆ˜ v ( i ) δ | 6 = 0 decreases which is the case if ˆ˜Var( X d ) increases since ∂ | ˆ˜ v ( i ) δ | /∂ ˆ˜Var( X d ) < , then | ˆ˜ v ( d ) δ | increasesdue to (4.7). Hence ∂ | ˆ˜ v ( d ) δ | /∂ ˆ˜Var( X d ) > .Case II: when p δd = 0 we obtain from (4.4) that ˆ˜ v ( d ) δ = ( ˆ˜Var( X d ) + µ d )ˆ˜ v ( d ) δ + P m = d ˆ˜ σ dm ˆ˜ v ( m ) δ ˆ˜ λ δ (4.8)since ˆ˜ λ δ does not change in ˆ˜Var( X d ) . Analogue to case I, let g ( µ d ) ≡ (cid:12)(cid:12) ˆ˜ v ( d ) δ (cid:16) ˆ˜Var( X d ) + µ d (cid:17) (cid:12)(cid:12) − (cid:12)(cid:12) ˆ˜ v ( d ) δ (cid:16) ˆ˜Var( X d ) (cid:17) (cid:12)(cid:12) . From(4.4) and (4.8) we see that lim µ d → g ( µ d ) = 0 . Hence, thepartial derivative is given by ∂ | ˆ˜ v ( d ) δ | ∂ ˆ˜Var( X d ) = lim µ d → g ( µ d ) µ d = lim µ d → ∂g ( µ d ) ∂µ d = | ˆ˜ v ( d ) δ || ˆ˜ λ δ | > following the same arguments as in (4.6) and due to thestructure in (3.4). We obtain that ∂ | ˆ˜ v ( i ) δ | /∂ ˆ˜Var( X d ) < analogue to case I when solving (4.7) for ˆ˜ v ( i ) δ instead of ˆ˜ v ( d ) δ and follow the arguments above however reversed byconsidering first that | ˆ˜ v ( d ) δ | 6 = 1 increases.The intuition behind Theorem 4.1 is that Var( X d ) is presentin both, the numerator and denominator of ˆ˜ v ( d ) δ while Var( X d ) only enters the denominator of ˆ˜ v ( i ) δ for i = d via ˆ˜ λ d . Strictly speaking, a change of Var( X d ) enters alsothe numerator of ˆ˜ v ( i ) δ due to ˆ˜Cov( X i , X d ) ≡ ˆ˜ σ id . Con-sider to increase or decrease X by c · X , for c ∈ R being a constant, as it might occur when changing scales.Then ˆ˜Cov( X i , c · X d ) = c · ˆ˜Cov( X i , X d ) . However, since ˆ˜Var( c · X d ) = c · ˆ˜Var( X d ) we can simply shorten the fractionto get rid of c in the numerator. When considering a changein a variable as a sum, as we did in the proof of Theorem 4.1,the change will not be present in the numerator since ˆ˜Cov( X i , X d + c ) = ˆ˜Cov( X i , X d ) . Further, the assumptionthat ˆ˜ λ δ = 0 is reasonable since this case barely occurs for a covariance matrix and is less strict than assuming positivedefiniteness. Corollary 4.1.
Let D and ∆ as introduced in section 2 and ˆ˜ σ jj ≡ ˆ˜Var( X j ) . We assume that ˆ˜ λ δ = 0 . For each d ∈ D there existsone δ ∈ ∆ such that for all d c / ∈ D ∂ | ˆ˜ v ( d c ) δ | ∂ ˆ˜Var( X d ) < . Corollary 4.1 shows the issue when using the covariancematrix for PLA. Since the covariance matrix is not scaleinvariant, the respective eigenvectors are neither: if thevariance decreases, the elements that we check to lie un-der a certain threshold increase. Further they decrease, infact converge towards zero lim ˆ˜Var( X d ) →∞ ˆ˜ v ( d c ) δ → , if thevariance increases. Corollary 4.2.
Let D and ∆ as introduced in section 2 and ˆ˜ σ jj ≡ ˆ˜Var( X j ) . We assume that ˆ˜ λ δ = 0 . For each d ∈ D there existsone δ ∈ ∆ such that for all i = δ∂ | ˆ˜ v ( d ) i | ∂ ˆ˜Var( X d ) < . Proof.
The result follows from Theorem 4.1 and since therows of ˆ˜ V are normed to 1.When introducing PLA, [1] also considered to check not(only) the rows of the eigenvectors but (also) the columns.According to Corollary 4.2 we face the same problem whenfollowing the latter approach. SING THE C ORRELATION M ATRIX
In this section, we introduce PLA based on the correlationmatrix and address a concern regarding the eigenvalues.Since, both the correlation matrix P is invariant to linearchanges [5] and the eigenvectors have the same shape (3.4)as the eigenvectors of Σ , PLA based on P is a natural choice.However, there is a downside when it comes down tocalculate the explained variance. For simplicity, we considerto use (3.2). Note that the same issues hold for (3.1) as well.We denote the block structure of ε -uncorrelated randomvariables analogue to (3.3) by ˜ P |{z} κ × κ ε ⊥⊥ . . . ε ⊥⊥ ˜ P L |{z} κ L × κ L . Since tr( P ) = P i ω i = P i ρ ii = M and tr( P l ) = P δ l ω δ l = P M l m = M l − ρ mm = κ l , where δ l indexes the eigenvaluescorresponding to the eigenvectors linked to the randomvariables contained in P l , the explained variance for anyblock P l is given by X i ω i ! − X δ l ω δ l = κ l M .
Hence, the explained variance for any block ˆ˜ P l is approx-imately given by κ l /M , since E and H N are sparse. Thismeans that we loose the information provided by ˆ˜ Λ toevaluate the importance of each block since we standardized each variable to unit variance. Therefore, we propose to usethe eigenvectors of ˆ˜ P to search the blocks of concern andto use the eigenvalues of ˆ˜ Σ to decide whether to discardor not. However, that ˆ˜ Σ is not scale invariant is the pricewe pay when coming back to ˆ˜ Σ to calculate the explainedvariance. Since this is a well known concern in classic PCA,we refer to [2] and [4] for elaborate explanations. Algorithm 5.1 ( PLA based on the correlation matrix ) . Discard the variables corresponding to P b , . . . , P b B accordingto PLA based on the correlation matrix proceeds as follows: Check if the eigenvectors of P satisfy the required struc-ture in (3.4) to discard P b , . . . , P b B . Decide if P b , . . . , P b B are relevant according to theexplained variance of the realisations { x d } of their con-tained random variables { X d } by calculating (3.1) (or(3.2)). Discard P b , . . . , P b B . ESCALED E IGENVECTORS
A minor addition to PLA is rescaling the eigenvectors. Inthis section we briefly cover this change and provide amodified algorithm.Originally coming from PCA, the idea is to rescale theeigenvectors so the maximum value equals one [4] whichis easily done by dividing by the largest element. Hence, wedo not check the elements of ˆ˜ v j but rather the elements of ˆ˜ u j / arg max i | ˆ˜ u ( i ) j | . (6.1)This modifies PLA to a more standardised procedure. Weprovide the algorithm when using rescaled eigenvectors ofthe correlation matrix. The change of the algorithm basedon the covariance matrix is analogue. Algorithm 6.1 ( PLA based on rescaled eigenvectors ofthe correlation matrix ) . Discard the variables correspondingto P b , . . . , P b B according to PLA based on rescaled eigenvectorsof the correlation matrix proceeds as follows: Check if the rescaled eigenvectors (6.1) of P satisfy therequired structure in (3.4) to discard P b , . . . , P b B . Decide if P b , . . . , P b A are relevant according to theexplained variance of the realisations { x d } of their con-tained random variables { X d } by calculating (3.1) (or(3.2)). Discard P b , . . . , P b B . IMULATION S TUDY
We conduct a simulation study in this section to evaluate theperformance of PLA based on the rescaled eigenvectors ofthe correlation matrix for different threshold values. There isa concern regarding the simulation due to the perturbations E and H which we discuss as well.Choosing the optimal cut-off value τ is crucial for PLA,however finding such a value theoretically is rather difficultdue to the fuzziness of algorithm step 2 [1]. Hence, weconducted a simulation study. We simulated the case whendropping k ∈ { , . . . , } uncorrelated blocks of dimension × , i.e. single variables, and the case when dropping a single uncorrelated κ × κ block with κ ∈ { , . . . , } .The population X consisting of M variables with
100 000 realisations was simulated S = 10 000 times. Then, foreach S a sample x of size N ∈ { ,
10 000 } has beendrawn and we conducted PLA according to Algorithm 6.1for τ ∈ { . , . , . , . } and τ ∈ { . , . , . , . } for thesingle variable case and for the block case respectively. Weconsidered cut-off values used in published studies as anorientation [7]. The resulting type I error probabilities arecalculated as the share of iterations where PLA did not leadto a consideration of a drop.The concern when using a simulation study to find optimalthresholds is described in Theorem 4.2 in [1]. The theoremprovides an intuition of the possible magnitude of theperturbations of Σ that results in a drop. The result for theperturbations of P is analogue. For completion, we restatethe theorem as well as the proof. Theorem 7.1.
Denote ˜ λ = λ ≡ ∞ and ˜ λ M +1 = λ M +1 ≡−∞ . For j ∈ { , . . . , M } it holds that / k E + H N k F min( λ j − − λ j , λ j − λ j +1 ) < τ ⇒ k ε j + η j | N k ∞ < τ . Proof.
From Corollary 1 in [8] we can conclude that k ε j + η j | N k ≤ / k E + H N k F / min( λ j − − λ j , λ j − λ j +1 ) whichyields our desired result since k ε j + η j | N k ∞ ≤ k ε j + η j | N k .Hence, discarding depends on the size of E and H N whichenlarges the amount of parameters that have to be simu-lated. In this work however we simulate the special casewhen E = and focus on the influence of the sample noisereflected by the sample size N . For completion, we shallemphasise that Theorem 7.1 is not always feasible for P without assuming that the eigenvalues are distinct since theeigenvalues for P are more close in general. Nonetheless,the intuition behind the theorem is valid.In Table 1 in Appendix A we see that the type I error fordiscarding single uncorrelated variables is smaller . formost cases when τ ∈ { . , . } . Of course, the error prob-ability decreases when the thresholds increases. However,since one should expect that the type II error increases withlarger thresholds, we recommend to use the smallest cut-offyielding sufficient results hence τ ≤ . . τ ≡ τ ( N, M, k ) is hereby a function of sample size, number of variablesand number of uncorrelated variables and can be adjustedaccording to those values. In an analogue manner accordingto Table 2, the type I errors for single uncorrelated blocksperforms well for τ ≤ . where τ ≡ τ ( N, M, κ ) is a func-tion of sample size, number of variables and the dimensionof the uncorrelated block. We shall emphasize however thatthe choice of thresholds depends on the data as well ason the purpose of statistical analysis. Hence, choosing evensmaller or wider cut-off values might be reasonable if largertype I or type II errors are tolerable.
2. We considered cut-off values ranging from . to . duringresearch. However, we only present thresholds that are suitable forpractice. Further, the tails of the tables sufficiently indicate the decreasein performance for wider or tighter thresholds. This makes illustratingmore extreme cut-off values dispensable. ONCLUDING R EMARKS
We propose to use both, the covariance and the correlationmatrix to conduct PLA. This is because the covariance ma-trix is not scale invariant which may result in different out-comes of PLA. Hence, we recommend to use Algorithm 5.1or Algorithm 6.1 instead. For the latter one, an orientation isto use a threshold τ ≤ . for the case of single uncorrelatedvariables and τ ≤ . for a block of uncorrelated variables. R EFERENCES [1] J. O. Bauer and B. Drabant, “Principal Loading Analysis,” arXiv:2007.05215 [math.ST] , 2020.[2] B. Flury,
A First Course in Multivariate Statistics , 1st ed. SpringerTexts in Statistics, 1997.[3] H. Hotelling, “Analysis of a Complex of Statistical Variables intoPrincipal Components,”
Journal of Educational Psychology , vol. 24,no. 6, pp. 417–441, 1933.[4] I. Jolliffe,
Principal Component Analysis , 2nd ed. Springer Series inStatistics, 2002.[5] I. Jolliffe and J. Cadima, “Principal component analysis: a reviewand recent developments,”
Phil. Trans. R. Soc. A , vol. 374, 2016.[6] K. Pearson, “On Lines and Planes of Closest Fit to Systems of Pointsin Space,”
Philosophical Magazine , vol. 2, pp. 559–572, 1901.[7] P. R. Peres-Neto, D. A. Jackson, and K. M. Somers, “GivingMeaningful Interpretation to Ordination Axes: Assessing LoadingSignificance in Principal Component Analysis,”
Ecology , vol. 84,no. 9, pp. 2347–2363, 2003.[8] Y. Yu, T. Wang, and R. J. Samworth, “A Useful Variant of theDavis–Kahan Theorem for Statisticians,”
Biometrika , vol. 102, no. 2,p. 315–323, 2015. A PPENDIX
We provide the type I error rates for k ∈ { , . . . , } single uncorrelated variables and for an uncorrelated κ × κ block with κ ∈ { , . . . , } respectively. As specified in Section 7, the error probabilities are calculated as the share of iterations wherePLA did not lead to a consideration of a drop.TABLE 1: Type I error for k ∈ { , , , , } uncorrelated variables with sample size N ∈ { ,
10 000 } and threshold τ ∈ { . , . , . , . } N = 5000 N = 10 000 M k τ = 0 . τ = 0 . τ = 0 . τ = 0 . M k τ = 0 . τ = 0 . τ = 0 . τ = 0 .
20 1 0.0434 0.0079 0.0014 0.0000 20 1 0.0245 0.0041 0.0000 0.000040 1 0.0419 0.0053 0.0003 0.0000 40 1 0.0196 0.0014 0.0000 0.000060 1 0.0489 0.0058 0.0005 0.0000 60 1 0.0174 0.0023 0.0001 0.000180 1 0.0576 0.0065 0.0004 0.0001 80 1 0.0159 0.0010 0.0001 0.0000100 1 0.0688 0.0080 0.0004 0.0000 100 1 0.0192 0.0012 0.0000 0.0000120 1 0.0822 0.0090 0.0002 0.0001 120 1 0.0205 0.0012 0.0001 0.0000140 1 0.0911 0.0119 0.0006 0.0002 140 1 0.0226 0.0008 0.0000 0.0000160 1 0.1070 0.0131 0.0013 0.0001 160 1 0.0250 0.0010 0.0001 0.0000180 1 0.1226 0.0140 0.0016 0.0000 180 1 0.0281 0.0019 0.0000 0.0000200 1 0.1462 0.0176 0.0011 0.0000 200 1 0.0285 0.0025 0.0002 0.000020 2 0.1055 0.0364 0.0103 0.0011 20 2 0.0681 0.0232 0.0065 0.000940 2 0.1293 0.0297 0.0054 0.0013 40 2 0.0692 0.0151 0.0015 0.000060 2 0.1411 0.0291 0.0042 0.0005 60 2 0.0660 0.0105 0.0007 0.000080 2 0.1588 0.0299 0.0043 0.0005 80 2 0.0611 0.0087 0.0014 0.0002100 2 0.1743 0.0318 0.0056 0.0002 100 2 0.0636 0.0069 0.0007 0.0000120 2 0.2014 0.0337 0.0039 0.0003 120 2 0.0651 0.0086 0.0005 0.0002140 2 0.2281 0.0385 0.0047 0.0007 140 2 0.0721 0.0080 0.0005 0.0000160 2 0.2434 0.0438 0.0042 0.0003 160 2 0.0754 0.0088 0.0004 0.0000180 2 0.2650 0.0507 0.0059 0.0005 180 2 0.0802 0.0069 0.0006 0.0000200 2 0.2950 0.0507 0.0077 0.0003 200 2 0.0861 0.0095 0.0004 0.000120 3 0.1470 0.0546 0.0159 0.0037 20 3 0.0970 0.0397 0.0093 0.002240 3 0.1978 0.0511 0.0107 0.0017 40 3 0.1085 0.0270 0.0038 0.000660 3 0.2258 0.0508 0.0093 0.0010 60 3 0.1011 0.0178 0.0025 0.000680 3 0.2513 0.0513 0.0078 0.0008 80 3 0.1071 0.0156 0.0026 0.0000100 3 0.2749 0.0532 0.0081 0.0007 100 3 0.1079 0.0157 0.0020 0.0003120 3 0.3076 0.0587 0.0079 0.0013 120 3 0.1130 0.0131 0.0013 0.0001140 3 0.3303 0.0626 0.0080 0.0008 140 3 0.1136 0.0139 0.0010 0.0003160 3 0.3722 0.0687 0.0093 0.0007 160 3 0.1230 0.0156 0.0017 0.0002180 3 0.3964 0.0771 0.0089 0.0008 180 3 0.1348 0.0155 0.0015 0.0001200 3 0.4315 0.0845 0.0094 0.0010 200 3 0.1415 0.0136 0.0011 0.000120 4 0.1981 0.0807 0.0265 0.0074 20 4 0.1251 0.0501 0.0169 0.003740 4 0.2566 0.0745 0.0153 0.0032 40 4 0.1455 0.0361 0.0056 0.001560 4 0.2957 0.0678 0.0117 0.0029 60 4 0.1472 0.0299 0.0051 0.000680 4 0.3213 0.0777 0.0118 0.0022 80 4 0.1464 0.0254 0.0028 0.0004100 4 0.3680 0.0777 0.0109 0.0027 100 4 0.1543 0.0228 0.0027 0.0002120 4 0.3953 0.0824 0.0107 0.0014 120 4 0.1552 0.0224 0.0019 0.0002140 4 0.4302 0.0924 0.0113 0.0018 140 4 0.1672 0.0216 0.0026 0.0008160 4 0.4702 0.0949 0.0112 0.0002 160 4 0.1736 0.0201 0.0018 0.0000180 4 0.4975 0.1044 0.0133 0.0018 180 4 0.1789 0.0218 0.0025 0.0000200 4 0.5409 0.1168 0.0157 0.0012 200 4 0.1949 0.0186 0.0017 0.000320 5 0.2170 0.0977 0.0373 0.0103 20 5 0.1451 0.0667 0.0214 0.005540 5 0.3107 0.0961 0.0237 0.0050 40 5 0.1784 0.0547 0.0109 0.001960 5 0.3588 0.0956 0.0162 0.0033 60 5 0.1884 0.0375 0.0077 0.000780 5 0.4057 0.1005 0.0163 0.0023 80 5 0.1881 0.0347 0.0033 0.0005100 5 0.4401 0.1035 0.0145 0.0019 100 5 0.1902 0.0298 0.0036 0.0001120 5 0.4812 0.1071 0.0161 0.0018 120 5 0.2008 0.0267 0.0033 0.0002140 5 0.5248 0.1117 0.0153 0.0017 140 5 0.2016 0.0243 0.0031 0.0004
TABLE 1: Type I error for k ∈ { , , , , } uncorrelated variables with sample size N ∈ { ,
10 000 } and threshold τ ∈ { . , . , . , . } (continued) N = 5000 N = 10 000 M k τ = 0 . τ = 0 . τ = 0 . τ = 0 . M k τ = 0 . τ = 0 . τ = 0 . τ = 0 .
160 5 0.5628 0.1234 0.0142 0.0014 160 5 0.2142 0.0263 0.0028 0.0001180 5 0.5975 0.1328 0.0170 0.0013 180 5 0.2240 0.0254 0.0031 0.0001200 5 0.6382 0.1477 0.0188 0.0017 200 5 0.2425 0.0276 0.0016 0.0004
Notes: the type I error is computed as the share of iterations where the k variables have not been discarded. TABLE 2: Type I error for a single uncorrelated κ × κ block with κ ∈ { , , , , } , sample size N ∈ { ,
10 000 } andthreshold τ ∈ { . , . , . , . } N = 5000 N = 10 000 M κ τ = 0 . τ = 0 . τ = 0 . τ = 0 . M κ τ = 0 . τ = 0 . τ = 0 . τ = 0 .
20 2 0.0684 0.0256 0.0084 0.0011 20 2 0.0429 0.0178 0.0041 0.000840 2 0.0573 0.0165 0.0031 0.0012 40 2 0.0249 0.0064 0.0011 0.000460 2 0.0540 0.0143 0.0040 0.0009 60 2 0.0174 0.0036 0.0003 0.000180 2 0.0629 0.0143 0.0028 0.0003 80 2 0.0180 0.0021 0.0001 0.0001100 2 0.0647 0.0149 0.0030 0.0006 100 2 0.0144 0.0035 0.0004 0.0001120 2 0.0746 0.0176 0.0024 0.0003 120 2 0.0175 0.0030 0.0004 0.0000140 2 0.0790 0.0188 0.0036 0.0003 140 2 0.0176 0.0015 0.0001 0.0000160 2 0.0903 0.0219 0.0034 0.0005 160 2 0.0162 0.0025 0.0003 0.0001180 2 0.0931 0.0238 0.0037 0.0006 180 2 0.0186 0.0027 0.0000 0.0000200 2 0.1038 0.0239 0.0035 0.0009 200 2 0.0224 0.0032 0.0003 0.000020 3 0.0847 0.0325 0.0091 0.0016 20 3 0.0556 0.0209 0.0049 0.001140 3 0.0731 0.0213 0.0048 0.0013 40 3 0.0341 0.0063 0.0018 0.000460 3 0.0821 0.0254 0.0039 0.0009 60 3 0.0275 0.0049 0.0009 0.000380 3 0.0831 0.0215 0.0049 0.0007 80 3 0.0249 0.0053 0.0010 0.0000100 3 0.0894 0.0213 0.0052 0.0006 100 3 0.0250 0.0055 0.0010 0.0003120 3 0.0937 0.0217 0.0052 0.0016 120 3 0.0254 0.0048 0.0006 0.0001140 3 0.1037 0.0236 0.0040 0.0008 140 3 0.0270 0.0047 0.0007 0.0001160 3 0.1078 0.0254 0.0061 0.0008 160 3 0.0263 0.0054 0.0006 0.0001180 3 0.1177 0.0264 0.0040 0.0010 180 3 0.0259 0.0039 0.0011 0.0002200 3 0.1262 0.0283 0.0050 0.0008 200 3 0.0251 0.0052 0.0007 0.000120 4 0.1409 0.0658 0.0225 0.0073 20 4 0.0895 0.0388 0.0137 0.003740 4 0.1337 0.0449 0.0146 0.0030 40 4 0.0704 0.0223 0.0045 0.000560 4 0.1329 0.0442 0.0125 0.0036 60 4 0.0567 0.0152 0.0036 0.000380 4 0.1459 0.0397 0.0122 0.0018 80 4 0.0547 0.0125 0.0026 0.0006100 4 0.1595 0.0461 0.0105 0.0023 100 4 0.0514 0.0116 0.0017 0.0003120 4 0.1622 0.0496 0.0114 0.0023 120 4 0.0503 0.0111 0.0013 0.0002140 4 0.1780 0.0491 0.0122 0.0030 140 4 0.0504 0.0113 0.0024 0.0003160 4 0.2021 0.0493 0.0106 0.0020 160 4 0.0582 0.0099 0.0013 0.0002180 4 0.2069 0.0490 0.0114 0.0017 180 4 0.0570 0.0087 0.0014 0.0004200 4 0.2065 0.0548 0.0119 0.0021 200 4 0.0592 0.0100 0.0013 0.000320 5 0.1884 0.0990 0.0443 0.0146 20 5 0.1255 0.0698 0.0277 0.007440 5 0.1991 0.0778 0.0244 0.0060 40 5 0.1043 0.0342 0.0112 0.002560 5 0.1982 0.0766 0.0191 0.0054 60 5 0.0892 0.0280 0.0075 0.001180 5 0.2221 0.0722 0.0199 0.0051 80 5 0.0854 0.0258 0.0039 0.0008100 5 0.2354 0.0740 0.0193 0.0040 100 5 0.0862 0.0215 0.0040 0.0010120 5 0.2580 0.0783 0.0233 0.0036 120 5 0.0865 0.0207 0.0041 0.0007140 5 0.2726 0.0792 0.0209 0.0033 140 5 0.0895 0.0187 0.0039 0.0008160 5 0.2858 0.0895 0.0180 0.0038 160 5 0.0855 0.0188 0.0029 0.0004180 5 0.2914 0.0867 0.0222 0.0040 180 5 0.0928 0.0201 0.0033 0.0004200 5 0.3129 0.0907 0.0192 0.0038 200 5 0.0932 0.0191 0.0036 0.000520 6 0.2440 0.1413 0.0703 0.0245 20 6 0.1708 0.0938 0.0458 0.0178
TABLE 2: Type I error for a single uncorrelated κ × κ block with κ ∈ { , , , , } , sample size N ∈ { ,
10 000 } andthreshold τ ∈ { . , . , . , . } (continued) N = 5000 N = 10 000 M κ τ = 0 . τ = 0 . τ = 0 . τ = 0 . M κ τ = 0 . τ = 0 . τ = 0 . τ = 0 .
40 6 0.2711 0.1194 0.0399 0.0136 40 6 0.1564 0.0598 0.0180 0.005660 6 0.2822 0.1073 0.0315 0.0104 60 6 0.1399 0.0429 0.0104 0.003180 6 0.3018 0.1082 0.0350 0.0080 80 6 0.1334 0.0336 0.0094 0.0014100 6 0.3288 0.1137 0.0328 0.0077 100 6 0.1340 0.0352 0.0075 0.0012120 6 0.3472 0.1197 0.0316 0.0081 120 6 0.1363 0.0318 0.0074 0.0008140 6 0.3723 0.1210 0.0310 0.0092 140 6 0.1388 0.0296 0.0067 0.0007160 6 0.3861 0.1281 0.0326 0.0071 160 6 0.1337 0.0312 0.0070 0.0009180 6 0.3958 0.1404 0.0353 0.0065 180 6 0.1399 0.0314 0.0050 0.0004200 6 0.4152 0.1389 0.0340 0.0070 200 6 0.1405 0.0305 0.0049 0.0008
Notes: the type I error is computed as the share of iterations where the block containing κκ