[PDF] Eigenvector Component Calculation Speedup over NumPy for High-Performance Computing

Abstract

Applications related to artificial intelligence, machine learning, and system identification simulations essentially use eigenvectors. Calculating eigenvectors for very large matrices using conventional methods is compute-intensive and renders the applications slow. Recently, Eigenvector-Eigenvalue Identity formula promising significant speedup was identified. We study the algorithmic implementation of the formula against the existing state-of-the-art algorithms and their implementations to evaluate the performance gains. We provide a first of its kind systematic study of the implementation of the formula. We demonstrate further improvements using high-performance computing concepts over native NumPy eigenvector implementation which uses LAPACK and BLAS.

Full PDF

EEigenvector Component Calculation Speedupover

NumPy for High-Performance Computing

Shrey Dabhi , − − − and ManojkumarParmar , − − − Robert Bosch Engineering and Business Solutions Private Limited, Bengaluru,India [email protected] Department of Computer Science and Engineering, Institute of Technology, NirmaUniversity, Ahmedabad, India HEC Paris, Jouy-en-Josas Cedex, France [email protected]

Abstract.

Applications related to artiﬁcial intelligence, machine learn-ing, and system identiﬁcation simulations essentially use eigenvectors.Calculating eigenvectors for very large matrices using conventional meth-ods is compute-intensive and renders the applications slow. Recently, the

Eigenvector-Eigenvalue Identity formula promising signiﬁcant speedupwas identiﬁed. We study the algorithmic implementation of the formulaagainst the existing state-of-the-art algorithms and their implementa-tions to evaluate the performance gains. We provide a ﬁrst of its kindsystematic study of the implementation of the formula. We demonstratefurther improvements using high-performance computing concepts overnative

NumPy eigenvector implementation which uses LAPACK andBLAS.

Keywords:

Eigenvalues · Eigenvectors · Eigenspace · Symmetric matrix · Hermitian matrix · NumPy · LAPACK · BLAS · High-performancecomputation · Vectorization · Parallelization

In mathematics, eigenvectors are fundamental to many matrix operations. Theyform the base of all the dimensionality reduction operations in Artiﬁcial Intelli-gence and Machine Learning. Moreover, eigenvectors are of extreme importancein system simulation and calculation of many real-world phenomena. They arethe key to analyzing the physics of rotating bodies, the stability of physicalstructures, oscillations of vibrating bodies, computational biology, economics,etc.In linear algebra, an eigenvector is the characteristic vector of a linear trans-formation. It is a nonzero vector that changes at most by a scalar factor whenthat linear transformation is applied to it. The corresponding eigenvalue is theAccepted at ICRTC-2020, to be published in Springer LNNS a r X i v : . [ c s . PF ] J un age 2 Speeding up of Eigenvector Component Calculation for HPCfactor by which the eigenvector is scaled. Geometrically, an eigenvector, corre-sponding to a real nonzero eigenvalue, points in a direction in which it is stretchedby the transformation and the eigenvalue is the factor by which it is stretched.If the eigenvalue is negative, the direction is reversed. This complexity causesthe computations and thereby the applications to slow down.In the following sections, we discuss our approach and the processes we ap-plied to conclude. In Section 2 we explain the concept of eigenvalues and eigen-vectors in brief. We also discuss the drawbacks of conventional methods and thecurrent state-of-the-art algorithms. We then review the existing implementationsof the Eigenvector-Eigenvalue Identity formula to formulate our strategy for thesystematic improvement and evaluation. In Section 3 we shed light on the actualexperimental setup and present algorithms for the baseline (given by Denton etal. [1][2]) and the most performant variants of the implementations out of allthe variants that we tested. We also present our thoughts on why exhibit Al-gorithm 2 gives the highest amount of speed up over

NumPy ’s implementation.In Section 4 we discuss the results of our experiments. In Section 5 we give thedirection for future researchers to improve over our work.

The eigenvalues of an n × n matrix A are the roots of the characteristic equation: det ( A − λI ) = 0 (1)where I is an n × n identity matrix Eq. (1) has n solutions. The polynomialleft-hand side of the characteristic equation is known as the characteristic poly-nomial. Hermitian matrices have real eigenvalues, i.e. the roots of Eq (1) arereal. A real and symmetric matrix is simply a special case of a hermitian matrix.Traditionally, eigenvectors are calculated substituting the value of λ in the char-acteristic polynomial. These polynomials are n th -order polynomials and solvingthem directly to calculate the eigenvectors can be very compute-intensive.Currently, the state-of-the-art algorithm for the computation of eigenvectorsof hermitian matrices is the divide and conquer algorithm proposed by Cuppenet al. [3]. There are only 2 known implementations of this algorithm: one writtenin FORTRAN [4] and another one written in C language [5]. This algorithm ﬁrstconverts the hermitian or real symmetric matrix into a tridiagonal matrix, andthen recursively divides the matrix into 2 tridiagonal matrices until the low-est possible size is reached. Then it calculates the eigenvectors for the smallestmatrices and adds rank 1 corrections while accumulating the results to compen-sate for the loss of information incurred during the division of the matrix. Theoriginal recursive algorithm was later improved upon by various researchers forincreasing the speed [6][7][8].A very simple formula has been recently brought to light again after it wasﬁrst discovered by Jacobi in 1834 and then subsequently rediscovered by manyindependent research teams throughout history. The new formula is proposedAccepted at ICRTC-2020 Shrey Dabhi et al.peeding up of Eigenvector Component Calculation for HPC Page 3by Denton et al. [1] and formally named as the “ Eigenvector-Eigenvalue Iden-tity ”. They have traced the appearance of identity (2) throughout history and toprevent further disappearance and rediscovery of the identity, decided to formal-ize it ﬁnally in the form mentioned above. Various mathematical proofs of theidentity have been covered in [1], however, computational gains have not beeninvestigated systematically. The formal deﬁnition of identity has been given be-low.If A is an n × n hermitian matrix with eigenvalues λ i ( A ) , . . . , λ n ( A ) and i, j = 1 , . . . , n , then the j th component v i,j of a unit eigenvector v i associated tothe eigenvalue λ i ( A ) is related to the eigenvalues λ ( M j ) , . . . , λ n − ( M j ) of theminor M j of A formed by removing the j th row and column by the formula | v i,j | = (cid:81) n − k =1 ( λ i ( A ) − λ k ( M j )) (cid:81) nk =1 ,k (cid:54) = i ( λ i ( A ) − λ k ( A )) (2)In its current form identity (2) does not provide the direction of the compo-nents of the eigenvectors, but just the magnitude. This makes it unsuitable forapplications requiring directions. But as mentioned in [1], it is possible to inferthe directions of the components through various methods. It has been suggestedby Ashok et al. [9] that for small matrices, the directions of eigenvectors can of-ten be inferred by direct inspection of the eigenvector equation Av i = λ i ( A ) v i .In general, we can apply Eq. (2) on multiple bases to retrieve the direction.The basic implementation for identity formula (Eq. (2)) was provided byDenton et al. [1][2]. We improved upon this implementation by vectorizing, par-allelizing and adding batched processing capabilities. Despite the observations ofAmdahl’s law, we did not achieve tangible speed-up in execution time in certaincases, due to the possible overhead of thread management. In this paper, we focus on implementing identity formula (2) as eﬃciently aspossible and attempt to match the accuracy, speed, and robustness of the state-of-the-art implementations. The majority of the popular machine learning andlinear algebra libraries use the low-level FORTRAN implementations from BLASpackages and LAPACK under the hood, for eigendecomposition. All the machinelearning libraries just provide high-level wrappers which abstract out a lot ofcomplexities and provide sensible defaults for the required hyper-parameters toachieve the desired speed and accuracy from the eigendecomposition algorithmsfrom LAPACK.

The baseline implementation of identity formula Eq (2) as provided by Dentonet al. [2] is quite slow as it repeatedly computes eigenvalues of the same set of Amdahl’s Law gives the theoretical speedup in the execution time of a routine ata ﬁxed workload that can be expected out of a system whose resources are improved.In our case, we are increasing the computational power of the system by allowing ourroutine to utilize multiple CPU cores in parallel.

Shrey Dabhi et al. Accepted at ICRTC-2020age 4 Speeding up of Eigenvector Component Calculation for HPCmatrices, implying that it does not provide any advantage over other existingalgorithms. The pseudo-code for the baseline implementation is provided in ex-hibit Algorithm 1. We used concepts of high-performance computing to improveon it. We conceived, designed and implemented 5 variations of the algorithmand the baseline version. The comparison of the performance of each version for3 diﬀerent tasks is discussed in the results section. The pseudo-code for the bestperforming variant is given in exhibit Algorithm 2.Charis et al. [10] provide an alternative implementation for the formula (2).They calculate only a particular eigenvector for the given matrix. A similarimplementation in Python does not provide improvements over

NumPy [11][12]. We then decided to test the speed up in the calculation of one of the componentsof a particular eigenvector of the given matrix.

Algorithm 1:

Baseline pseudo-algorithm Function

EigenComponentBaseline( matrix , i , j ) : n ← matrix.shape minor ← del( matrix , i ) matrixEV ← EigenValues( matrix ) minorEV ← EigenValues( minor ) numerator ← . for k = 0; k < n − k ← k + 1 do numerator ← numerator ∗ ( matrixEV [ j ] − matrixEV [ k ]) end denominator ← . for k = 0; k < n ; k ← k + 1 do if k (cid:54) = j then denominator ← denominator ∗ ( matrixEV [ j ] − minorEV [ k ]) end end component ← numerator ÷ denominator return component The experiments to measure the speed-up were carried out on a high-performanceworkstation running Windows 10 operating system. It is equipped with 32 GiBof random access memory and Intel Xeon R (cid:13) CPU E3-1270 v6 @ 3.8 GHz with 4physical cores and 8 logical cores.We use Python 3.7.6 compiled for MSC v.1916 for a 64-bit processor. Thecomputation was run 10 times for each matrix size using each variation of theimplementation. The execution time was measured using cProﬁle utility pro-vided in Python’s standard library. The mean values for the 10 runs have beenreported.For calculating all the components of all the eigenvectors, the baseline imple-mentation is the slowest. For an n × n matrix, the algorithm calls the function One of the reasons can be the lack of native parallel for loops in Python.

Accepted at ICRTC-2020 Shrey Dabhi et al.peeding up of Eigenvector Component Calculation for HPC Page 5

Algorithm 2:

Optimized pseudo-algorithm Function

EigenComponentOptimized( matrix , i , j , batchSize ) : n ← matrix.shape minor ← del( matrix , i ) matrixEV ← EigenValues( matrix ) eigenV alue ← matrixEV [ j ] matrixEV ← del(matrixEV, j ) minorEV ← EigenValues( minor ) batches, nBatch ← PrepareBatches( matrixEV, minorEV, batchSize ) for k = 1; k < nBatch ; k ← k + 1 do tNumer [ k ] , tDenom [ k ] ← dispatch ( BatchP rocessor ( batches [ k ])) end component ← . for k = 0; k < nBatch ; k ← k + 1 do component ← component ∗ ( join ( tNumer [ k ]) ÷ join ( tDenom [ k ])) end return component n times as shown in exhibit Algorithm 1. By vectorizing the algorithm we canreduce the number of calls, therefore providing speed up.We expected that parallel computation would reduce the execution time, butthe possible overhead of thread creation and management leads to an increasein the time complexity.The baseline and vectorized versions have limited performance for very largematrices (of the order of 150 ×

150 and greater). One of the reasons could bethe intermediate calculations leading to a datatype underﬂow or overﬂow. Webatched all the calculations to combat this issue and also checked if computingall the batches in parallel would lead to any signiﬁcant performance gains aswell.LAPACK’s implementation of Cuppen’s algorithm [3] used by

NumPy [11][12]only returns the complete set of all the eigenvectors. Hence, we further exploredif we could beneﬁt from the calculation of single vectors or a single componentof any one vector.

In Fig. 1(a) and 1(b), identity refers to the batched vectorized implementationof the Eq. (2) and identity parallelized refers to exhibit Algorithm 2. In all theﬁgures, X-axis represents the size of the matrix for which we are calculating theeigendecomposition and the Y-axis represents the time taken in seconds.Fig. 1(a) shows the speed up over

NumPy ’s implementation for the calcula-tion of a single component of a particular eigenvector for the given matrix. Fig.1(b) shows the performance of our implementation against the current state-of-the-art, i.e.

NumPy , for calculating a complete eigenvector for the given matrix.Shrey Dabhi et al. Accepted at ICRTC-2020age 6 Speeding up of Eigenvector Component Calculation for HPCFig. 1(c) and Fig. 1(d) show how we step-by-step implemented the conceptsof high-performance computing to improve the performance of the formula. Weachieved speedup with each iteration of optimization and ended up with ex-hibit Algorithm 2 as the most optimized and performant implementation of theformula (2). (a) For one of the components of aneigenvector (b) For all the components of aneigenvector(c) For all the eigenvectors (d) For all the eigenvectors

Fig. 1: Comparison of the execution time of all the variants of the algorithmand LAPACK’s implementation of Cuppen’s algorithmFrom Table 1 we can infer that by using Eq. (2) and exhibit Algorithm 2with an increase in the size of the matrix, we can achieve up to 4 . × speed upover NumPy .The formula is more eﬃcient than the state-of-the-art algorithms only whenwe need a few components of a particular eigenvector. Currently, only the ap-plications such as web indexing, web search, signal pre-processing, etc. whichonly require partial eigenvectors, can beneﬁt from the implementation of thisidentity. Always computes the entire set

Accepted at ICRTC-2020 Shrey Dabhi et al.peeding up of Eigenvector Component Calculation for HPC Page 7Table 1: Time in seconds for calculating 1 eigenvector component

Size of matrix NumPy Exhibit Alg. 22 0.000057 0.000233502 0.170962 0.1197221002 1.100800 0.5959351502 4.165680 1.6035792002 9.522480 3.2120012502 17.632952 5.7073743002 30.342656 9.5227453502 47.404600 11.8869004002 69.955400 16.7752004502 98.680800 22.9435005002 134.324200 30.5472005502 177.629000 39.7417006002 229.338600 50.682400

It might be possible to further improve the performance if the formula is im-plemented in a lower-level language like C or FORTRAN to reduce overheads.LAPACK has also been written in FORTRAN, and therefore it can achievesuch high performance. LAPACK algorithms also beneﬁt from clever machinelevel multi-threading, which is extremely limited in a higher level language likePython, despite that we are getting a glimpse of the potential improvements asseen in Fig. 1 (a).

The

Eigenvector-Eigenvalue Identity formula theoretically holds a promise toachieve speedup over the current state-of-the-art algorithms for the calculationof eigenvectors. We tried to realize this promise in an optimized algorithm. Ourimplementation can achieve up to 4 . × speed up over NumPy for applicationsrequiring partial eigenvectors. We are of the opinion that this algorithm canfurther provide a speedup if implemented using lower-level languages.

Acknowledgment

We want to thank Sri Krishnan V, Mohan B V, Palak Pareshkumar Sukharamwalaand Tanya Motwani from Robert Bosch Engineering and Business Solutions Pri-vate Limited, India, for their valuable comments, contributions and continuedsupport to the project. We are grateful to all experts for providing us with theirvaluable insights and informed opinions ensuring completeness of our study.Shrey Dabhi et al. Accepted at ICRTC-2020age 8 Speeding up of Eigenvector Component Calculation for HPC

References

1. Denton, P.B., Parke, S.J., Tao, T., Zhang, X.: Eigenvectors from Eigenvalues: asurvey of a basic identity in linear algebra (2019), https://arxiv.org/abs/1908.037952. Denton, P.: Peterdenton/eigenvector-eigenvalue-identity (Jan 2020), https://github.com/PeterDenton/Eigenvector-Eigenvalue-Identity3. Cuppen, J.J.M.: A divide and conquer method for the symmetric tridi-agonal eigenproblem. Numerische Mathematik (1),172–191 (1995). https://doi.org/10.1137/S0895479892241287, https://doi.org/10.1137/S08954798922412878. Tisseur, F., Dongarra, J.: A parallel divide and conquer algorithmfor the symmetric eigenvalue problem on distributed memory architec-tures. SIAM Journal on Scientiﬁc Computing (6), 2223–2236 (1999).https://doi.org/10.1137/S1064827598336951, https://doi.org/10.1137/S10648275983369519. Mukherjee, A.K., Datta, K.K.: Two new graph-theoretical methods forgeneration of eigenvectors of chemical graphs. Proceedings of the IndianAcademy of Sciences - Chemical Sciences (6), 499–517 (Dec 1989).https://doi.org/10.1007/BF02880817, https://doi.org/10.1007/BF0288081710. Gyurgyik, C.: cgyurgyik/eigenvectors-from-eigenvalues (Dec 2019), https://github.com/cgyurgyik/eigenvectors-from-eigenvalues/11. Oliphant, T.E.: A guide to NumPy, vol. 1. Trelgol Publishing USA (2006)12. van der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: A structure foreﬃcient numerical computation. Computing in Science Engineering (2), 22–30(March 2011). https://doi.org/10.1109/MCSE.2011.37(2), 22–30(March 2011). https://doi.org/10.1109/MCSE.2011.37