[PDF] Error correction in fast matrix multiplication and inverse

Abstract

We present new algorithms to detect and correct errors in the product of two matrices, or the inverse of a matrix, over an arbitrary field. Our algorithms do not require any additional information or encoding other than the original inputs and the erroneous output. Their running time is softly linear in the number of nonzero entries in these matrices when the number of errors is sufficiently small, and they also incorporate fast matrix multiplication so that the cost scales well when the number of errors is large. These algorithms build on the recent result of Gasieniec et al (2017) on correcting matrix products, as well as existing work on verification algorithms, sparse low-rank linear algebra, and sparse polynomial interpolation.

Full PDF

aa r X i v : . [ c s . S C ] F e b Error correction in fast matrix multiplicationand inverse

Daniel S. RocheComputer Science DepartmentUnited States Naval AcademyAnnapolis, Maryland, U.S.A.February 8, 2018

Abstract

We present new algorithms to detect and correct errors in the prod-uct of two matrices, or the inverse of a matrix, over an arbitrary ﬁeld.Our algorithms do not require any additional information or encodingother than the original inputs and the erroneous output. Their runningtime is softly linear in the number of nonzero entries in these matriceswhen the number of errors is suﬃciently small, and they also incorporatefast matrix multiplication so that the cost scales well when the num-ber of errors is large. These algorithms build on the recent result ofGasieniec, Levcopoulos, Lingas, Pagh, and Tokuyama [2017] on correct-ing matrix products, as well as existing work on veriﬁcation algorithms,sparse low-rank linear algebra, and sparse polynomial interpolation.

Eﬃciently and gracefully handling computational errors is critical in moderncomputing. Such errors can result from short-lived hardware failures, commu-nication noise, buggy software, or even malicious third-party servers.The ﬁrst goal for fault-tolerant computing is veriﬁcation . Freivalds [1979]presented a linear-time algorithm to verify the correctness of a single matrixproduct. Recently, eﬃcient veriﬁcation algorithms for a wide range of computa-tional linear algebra problems have been developed [Kaltofen, Nehring, and Saunders,2011, Dumas and Kaltofen, 2014, Dumas, Kaltofen, Thomé, and Villard, 2016,Dumas, Lucas, and Pernet, 2017].A higher level of fault tolerance is achieved through error correction , which isthe goal of the present study. This is a strictly stronger model than veriﬁcation,where we seek not only to identify when a result is incorrect, but also to computethe correct result if it is “close” to the given, incorrect one. Some recent error-correction problems considered in the computer algebra literature include Chi-nese remaindering [Goldreich, Ron, and Sudan, 1999, Khonji, Pernet, Roch, Roche, and Stalinski,1 reprint dated February 8, 2018 in transmission would be to usean error-correcting code, writing the result with some redundancy to supporttransmission over a noisy channel. But this approach requires extra bandwidth,is limited to a ﬁxed number of errors, and does not account for malicious alter-ations or mistakes in the computation itself.Instead, we will use the information in the problem itself to correct errors.This is always possible by re-computing the result, and so the goal is always tocorrect a small number of errors in (much) less time than it would take to dothe entire computation again. Our notion of error correction makes it possibleto correct an unbounded number of mistakes that occur either in computationor transmission, either at random or by a malicious party.Some other applications of error correction have nothing to do with alter-ations or mistakes. One example is sparse matrix multiplication, which can besolved with an error-correction algorithm by simply assuming the “erroneous”matrix is zero. Another example is computing a result over Z using Chineseremaindering modulo small primes, where the entries have varying bit lengths.The small entries are determined modulo the ﬁrst few primes, and the remaininglarge entries are found more quickly by applying an error correction algorithmusing (known) small entries. Matrix multiplication with errors

We consider two computational errorcorrection problems. The ﬁrst problem is the same as in Gasieniec et al. [2017],correcting a matrix product. Given

A, B, C P F n ˆ n , the task is to compute theunique error matrix E P F n ˆ n such that AB “ C ´ E .Algorithm 5 solves this problem using r O ` t ` k ¨ min `P tr T , n { min p r, kr q ´ ω ˘˘ ﬁeld operations, where • t “ A ` B ` C is the number of nonzero entries in the input matrices,at most O p n q ; • k “ E is the number of errors in the given product C ; and • r ď n is the number of distinct rows (or columns) in which errors occur,whichever is larger.An even more detailed complexity statement, incorporating rectangular ma-trices and a failure probability, can be found in Theorem 7.1. To understand theimplications of this complexity, consider six cases, as summarized in Table 1.The cost of our algorithm depends on the locations of the errors. If thereare a constant number of errors per row or column, we can use sparse mul-tiplication to achieve softly-linear complexity r O p t ` n ` k q . This is similar to[Gasieniec et al., 2017] except that we do not require the input matrices to bedense. Page 2 of 21 reprint dated February 8, 2018 If k ą n errors are spread out among all rows and columns, the worst-casecost using dense multiplication is r O ` k . n . ˘ . As the number of errors growsto n , this cost approaches that of normal matrix multiplication without errorcorrection.In the other extreme, if the errors are isolated to a square (but not necessarilycontiguous) submatrix, then the worst-case cost using dense multiplication is r O ` t “ k . n ˘ . Again this scales to n ω as k Ñ n , but the cost is better for k ă n . This situation may also make sense in the context of distributed computing,where each node in a computing cluster might be assigned to compute onesubmatrix of the product.Our algortihm is never (asymptotically) slower than that of [Gasieniec et al.,2017] or the O p n ω q cost of naïve recomputation, but it is faster in many cases:for example when the inputs are sparse, when there are many errors, and whenthe errors are compactly located. Matrix inverse with errors

The second problem we consider is that ofcorrecting computational errors in a matrix inverse. Formally, given

A, B P F n ˆ n , the task is to compute the unique error matrix E P F n ˆ n such that A ´ “ B ` E .Algorithm 6 shows how to solve this problem using r O ` t ` nk { min p r, kr q ´ ω ` r ω ˘ ﬁeld operations, where the parameters t, k, n are the same as before, and r isthe number of rows or columns which contain errors, whichever is smaller . Thisis the same complexity as our algorithm for matrix multiplication with errors,plus an additional r ω term.The additional r ω term makes the situation here less fractured than withthe multiplication algorithm. In the case of spread-out errors, the cost is r O p t ` nk ` k ω q , which becomes simply r O ` n ` k ω ˘ when the input matricesare dense. The case of compact errors is better, and the complexity is just r O ` t ` nk p ω ´ q{ ˘ , the same as that of our multiplication algorithmAs before, the cost of this algorithm approaches the O p n ω q cost of the naïveinverse computation as the number of errors k grows to n .Spread-out errors Compact errors r “ min p n, k q r “ ? kk ď n k ą n Sparse t ` k ` n k P tn T ` n ? kt ` k ` n Dense t ` kn k ω ´ n ´ ω t ` k p ω ´ q{ nk . n . t ` k . n Table 1: Soft-oh cost of Algorithm 5 in diﬀerent cases. The two rows indicatewhether sparse or dense rectangular multiplication is used as a subroutine.Page 3 of 21 reprint dated February 8, 2018

Domains

The algorithms we present work over any ﬁeld and require only thata ﬁeld element of order larger than n can be found. Because our algorithmsexplicitly take O p n q time to compute the powers of this element anyway, theusual diﬃculties with eﬃciently generating high-order elements in ﬁnite ﬁeldsdo not arise.Over the integers Z or rationals Q , the most eﬃcient approach would be towork modulo a series of primes and use Chinese remaindering to recover theresult. A similar approach could be used over polynomial rings.Over the reals R or complex numbers C , our algorithms work only underthe unrealistic assumption of inﬁnite precision. We have not considered theinteresting question of how to recover from the two types of errors (noise andoutliers) that may occur in ﬁnite precision. Let ω be a constant between 2 and 3 such that two n ˆ n matrices can bemultiplied using O p n ω q ﬁeld operations. In practice, ω “ for dimensions n upto a few hundred, after which Strassen-Winograd multiplication is used giving ω “ă . . The best asymptotic algorithm currently gives ω ă . [Le Gall, 2014].Even though the practical value of ω is larger than the asymptotic one, ma-trix multiplication routines have been the subject of intense implementation de-velopment for decades, and highly-tuned software is readily available for a vari-ety of architectures [Dumas, Giorgi, and Pernet, 2008, Wang, Zhang, Zhang, and Yi,2013]. Running times based on n ω have practical as well as theoretical signif-icance; the ω indicates where an algorithm is able to take advantage of fastlow-level matrix multiplication routines.Special routines for multiplying sparse matrices have also been developed.Writing t for the number of nonzero entries in the input, the standard row- orcolumn-wise sparse matrix multiplication costs O p tn q ﬁeld operations. Yuster and Zwick[2005] improved this to r O ` n ` t . n . ˘ . Later work also incorporates thenumber k of nonzeros in the product matrix. Lingas [2009] gave an output-sensitive algorithm with running time O p k . n q . Amossen and Pagh [2009]have another with complexity r O ` t . k . ` t . k . ˘ , an improvementwhen the outputs are not too dense. Pagh [2013] developed a diﬀerent ap-proach with complexity r O p t ` kn q ; that paper also includes a nice summary ofthe state of the art. Our multiplication with errors algorithm can also be usedfor sparse multiplication, and it provides a small improvement when the inputand output are both not too sparse.The main predecessor to our work is the recent algorithm of Gasieniec et al.[2017], which can correct up to k errors in the product of two n ˆ n matricesusing r O ` n ` kn ˘ ﬁeld operations. Their approach is based in part on the earlierwork of Pagh [2013] and makes clever use of hashing and fast Fourier transformsin order to achieve the stated complexity.Computing the inverse of a matrix has the same asymptotic complexityPage 4 of 21 reprint dated February 8, 2018 O p n ω q as fast matrix multiplication [Bunch and Hopcroft, 1974]. In practice,using recent versions of the FFPACK library [Dumas et al., 2008] on an Inteldesktop computer, we ﬁnd that a single n ˆ n matrix inversion is only about33% more expensive than a matrix-matrix product of the same size. Multiplication with errors

A high-level summary of our multiplication al-gorithm is as follows:1. Determine which rows in the product contain errors.2. Write the unknown sparse matrix of errors as E “ C ´ AB . Remove therows with no errors from E, C, and A , so we have E “ C ´ A B .3. Treating the rows of E as sparse polynomials, use structured linear alge-bra and fast matrix multiplication to evaluate each row polynomial at asmall number of points.4. Use sparse polynomial interpolation to recover at least half of the rowsfrom their evaluations.5. Update C and iterate O p log k q times to recover all k errors.To determine the rows of E which are nonzero in Step 1, we apply a simplevariant of Frievalds’ veriﬁcation algorithm: for a random vector v , the nonzerorows of E v are probably the same as the nonzero rows of E itself.The evaluation/interpolation approach that follows is reminiscent of thatin Gasieniec et al. [2017], with two important diﬀerences. First, using sparsepolynomial techniques rather than hashing and FFTs means the complexityscales linearly with the number of nonzeros t in the input rather than the densematrix size n . Second, by treating the rows separately rather than recoveringall errors at once, we are able to incorporate fast matrix multiplication so thatour worst-case cost when all entries are erroneous is never more than O p n ω q ,the same as that of naïve recomputation.Section 4 provides details on the Monte Carlo algorithm to determine therows and columns where errors occur (Step 1 above), Section 5 presents a variantof sparse polynomial interpolation that suits our needs for Step 4, and Section 6explains how to perform the row polynomial evaluations in Step 3. A certainrectangular matrix multiplication in this step turns out to dominate the over-all complexity. We connect all these pieces of the multiplication algorithm inSection 7. Inverse with errors

Our algorithm for correcting errors in a matrix inverseis based on the multiplication algorithm, but with two complications. Writing A for the input matrix, B for the erroneous inverse, and E for the (sparse) matrixof errors in B , we have: AE “ I ´ AB, EA “ I ´ BA.

Page 5 of 21 reprint dated February 8, 2018

We want to compute E v for a random vector v in order to determine whichrows of E are nonzero. This is no longer possible directly, but using the secondformulation as above, we can compute EA v . Because A must be nonsingular, A v has the same distribution as a random vector, so this approach still works.The second complication is that removing zero rows of E does not change thedimension of the right-hand side, but only removes corresponding columns from A on the left-hand side. We make use of the recent result of Cheung, Kwok, and Lau[2013], which shows how to quickly ﬁnd a maximal set of linearly independentrows in a sparse matrix. This allows us to ﬁnd a small submatrix X of A suchthat XE “ I ´ A B , with E , I , A being compressed matrices as before. Be-cause the size of X depends on the number of errors, it can be inverted muchmore quickly than re-computing the entire inverse of A .The dominating cost in most cases is the same rectangular matrix product asneeded as in matrix multiplication with error correction. However, here we alsohave the extra cost of ﬁnding X and computing its inverse, which dominatesthe complexity when there are a moderate number of errors and their locationsare spread out.This algorithm — which depends on the subroutines of Sections 4 to 6 — ispresented in detail in Section 8. . The soft-oh notation r O p¨ ¨ ¨ q is the same as the usual big-oh notation but ignoringsub-logarithmic factors: f P r O p g q if and only if f P O ´ g log O p q g ¯ , for someruntime functions f and g .We write N for the set of nonnegative integers and F q for the ﬁnite ﬁeld with q elements.For a ﬁnite set S , the number of elements in S is denoted S , and P p S q isthe powerset of S , i.e., the set of subsets of S .The number of nonzero entries in a matrix M P F m ˆ n is written as M .Note that M ď mn , and if M has rank k then M ě k .For a univariate polynomial f P F r x s , supp p f q Ď N denotes the exponents ofnonzero terms of f . Note that supp p f q ď deg f ` .We assume that the number of ﬁeld operations to multiply two (dense) poly-nomials in F r x s with degrees less than n is r O p n q . This is justiﬁed by the genericalgorithm from Cantor and Kaltofen [1991] which gives O p n log n loglog n q , ormore result results of Harvey, van der Hoeven, and Lecerf [2017] which improvethis to O ´ n log n log ˚ n ¯ for ﬁnite ﬁelds.As discussed earlier, we take ω with ď ω ď to be any feasible exponentof matrix multiplication. By applying fast matrix multiplication in blocks, itis also possible to improve the speed of rectangular matrix multiplication in astraightforward way. In particular: Fact 3.1.

The product of any m ˆ ℓ matrix times a ℓ ˆ n matrix can be foundusing O ` mℓn { min p m, ℓ, n q ´ ω ˘ ring operations. Page 6 of 21 reprint dated February 8, 2018

Note that it might be possible to improve this for certain values of m, ℓ, n using Le Gall [2012], but that complexity is much harder to state and we havenot investigated such an approach.We restate two elementary facts on sparse matrix multiplication.

Fact 3.2.

For any M P F m ˆ n and v P F n , the matrix-vector product M v canbe computed using O p M q ﬁeld operations. Corollary 3.3.

For any A P F m ˆ ℓ and B P F ℓ ˆ n , their product AB can becomputed using O p A ¨ n q ﬁeld operations. Our algorithms for error correction in matrix product and inverse computationboth begin by determining which rows and columns contain errors. This is ac-complished in turn by applying a random row or column vector to the unknownerror matrix E .For now, we treat the error matrix E as a black box , i.e., an oracle matrixwhich we can multiply by a vector on the right or left-hand side. The details ofthe black box construction diﬀer for the matrix multiple and inverse problem,so we delay those until later.The idea is to apply a row or column vector of random values to the unknownmatrix E , and then examine which entires in the resulting vector are zero. Thisis exactly the same idea as the classic Monte Carlo veriﬁcation algorithm ofFreivalds [1979], which we extend to which rows or columns in a matrix arenonzero. This black-box procedure is detailed in Algorithm 1 FindNonzeroRows . Algorithm 1:

FindNonzeroRows p V ÞÑ M V, ǫ q Input:

Black box for right-multiplication by an unknown matrix M P F m ˆ n and error bound ǫ P R , ă ǫ ă Output:

Set J Ď t , . . . , m ´ u that with probability at least ´ ǫ contains the index of all nonzero rows in M ℓ Ð P log F mǫ T V Ð matrix in F n ˆ ℓ with uniform random entries from F B Ð M V using the black box return Indices of rows of B which are not all zeros The running time of

FindNonzeroRows depends entirely on the size of F andthe cost of performing one black-box apply over the ﬁeld F . The correctnessdepends on the parameter ǫ . Lemma 4.1.

FindNonzeroRows always returns a list of nonzero row indices in M With probability at least ´ ǫ , it returns the index of every nonzero row in M . Page 7 of 21 reprint dated February 8, 2018Proof.

Consider a single row u T of M . If u is zero, then the corresponding rowof M V will always be zero; hence every index returned must be a nonzero row.Next assume u T has at least one nonzero entry, say at index i , and considerone row v P F n of the random matrix V . Whatever the other entries in u and v are, there is exactly one value for the i ’th entry of v such that u T v “ . Sothe probability that u T v “ is F , and by independence the probability that u T V “ ˆ ℓ is {p F q ℓ .Because M has at most m nonzero rows, the union bound tells us thatthe probability any one of the corresponding rows of M V is zero is at most m {p F q ℓ . From the deﬁnition of ℓ , this ensures a failure probability of at most ǫ . Because we mostly have in mind the case that F is a ﬁnite ﬁeld, we assumein FindNonzeroRows and in the preceding proof that it is possible to sampleuniformly from the entire ﬁeld F . However, the same results would hold ifsampling from any ﬁxed subset of F (perhaps increasing the size of ℓ if thesubset is small). Our algorithms use techniques from sparse polynomial interpolation to ﬁnd thelocations and values of erroneous entries in a matrix product or inverse.Consider an unknown univariate polynomial f P F r x s with degree less than n and at most s nonzero terms. The nonzero terms of f can be uniquelyrecovered from evaluations f p q , f p θ q , . . . , f p θ s ´ q at s consecutive powers[Ben-Or and Tiwari, 1988, Kaltofen and Yagati, 1989].Doing this eﬃciently in our context requires a few variations to the typicalalgorithmic approach. While the state of the art seen in recent papers suchas [Javadi and Monagan, 2010, Kaltofen, 2010, Arnold, Giesbrecht, and Roche,2015, Huang and Gao, 2017] achieves softly-linear complexity in terms of thesparsity t , there is either a higher dependency on the degree, a restriction tocertain ﬁelds, or both.Here, we show how to take advantage of two unique aspects of the problemin our context. First, we will interpolate a batch of sparse polynomials at once,on the same evaluation points, which allows for some useful precomputation.Secondly, the maximum degree of any polynomial we interpolation is n , thecolumn dimension of the unknown matrix, and we are allowed linear-time in n .The closest situation in the literature is that of van der Hoeven and Lecerf[2013], who show how to recover a sparse polynomial if the exponents are known.In our case, we can tolerate a much larger set containing the possible exponentsby eﬀectively amortizing the size of this exponent set over the group of batchedsparse interpolations.Recall the general outline of Prony’s method for sparse interpolation of f P F r x s from evaluations p f p θ i qq ď i ă s :1. Find the minimum polynomial Γ P F r z s of the sequence of evaluations.Page 8 of 21 reprint dated February 8, 2018

2. Find the roots v , . . . , v s of Γ .3. Compute the discrete logarithm base θ of each v i to determine the expo-nents of f .4. Solve a transposed Vandermonde system built from v i and the evaluationsto recover the coeﬃcients of f .Steps 1 and 4 can be performed over any ﬁeld using fast multipoint eval-uation and interpolation in r O p s q ﬁeld operations [Kaltofen and Yagati, 1989].However, the other steps depend on the ﬁeld and in particular Step 3 can beproblematic over large ﬁnite ﬁelds.To avoid this issue, we ﬁrst pre-compute all possible roots θ e i for any minpoly Γ i in any of the batch of r sparse interpolations that will be performed. Althoughthe set of possible roots is larger than the degree of any single minpoly Γ i , in ourcase the set is always bounded by the dimension of the matrix, so performingthis precomputation once is worthwhile. While fast root-ﬁnding procedures over many ﬁelds have already been developed,we present a simple but eﬃcient root-ﬁnding procedure for our speciﬁc case thathas the advantage of running in softly-linear time over any coeﬃcient ﬁeld. Thegeneral idea is to ﬁrst compute a coprime basis (also called a gcd-free basis) forthe minpolys Γ i , then perform multi-point evaluation with the precomputed listof possible roots.The details of procedure FindRoots are in Algorithm 2 below. In the algo-rithm, we use a product tree constructed from a list of t polynomials. This is abinary tree of height O p log t q , with the original polynomials at the leaves, andwhere every internal node is the product of its two children. In particular, theroot of the tree is the product of all t polynomials. The total size of a producttree, and the cost of computing it, is softly-linear in the total size of the inputpolynomials; see Borodin and Munro [1975] for more details. Lemma 5.1.

Given a set Q Ď F with of size Q “ c and a list of r polynomials Γ , . . . , Γ r P F r z s each with degree at most s , Algorithm 2 FindRoots correctlydetermines all roots of all Γ i ’s which are in Q using r O p rs ` c q ﬁeld operationsProof. Algorithms 18.1 and 21.2 of Bernstein [2005] are deterministic and com-pute (respectively) a coprime basis and then a factorization of the Γ i ’s accordingto the basis elements.The polynomials M i,j form a standard product tree from the polynomialsin the coprime basis. For any element of the product tree, its roots must be asubset of the roots of its parent node. So the multi-point evaluations on Line 9determine, for every polynomial in the tree, which elements of Q are roots ofthat polynomial.The cost of computing the coprime basis using Algorithm 18.1 in [Bernstein,2005] is r O p rs q with at least a log p rs q factor. This gives the ﬁrst term in thecomplexity (and explains our use of soft-oh notation throughout). This costPage 9 of 21 reprint dated February 8, 2018 Algorithm 2:

FindRoots p Q , r Γ , . . . , Γ r sq Input:

Set of possible roots Q Ď F and r polynomials Γ , . . . , Γ r P F r z s Output:

A list of sets r R , . . . , R r s P P p Q q r such that g i p α j q “ forevery ď i ď r and α j P R i t g , . . . , g t ´ u Ð coprime basis of t Γ i u ď i ď r using Algorithm 18.1 ofBernstein [2005] /* Compute product tree from bottom-up */ for j Ð , , . . . , t ´ do M ,j Ð g j for i Ð , , . . . , r log t s do for j Ð , , . . . , P t { i T ´ do M i,j Ð M i ´ , j ¨ M i ´ , j ` /* Find roots of basis elements */ S r log t s ` , Ð Q for i Ð r log t s , r log t s ´ , . . . , do for j Ð , , . . . , P t { i T ´ do Evaluate M i,j at each point in S i ` , t j { u using fast multipointevaluation S i,j Ð roots of M i,j found on previous step /* Determine roots of original polynomials */ Compute exponents p e i,j q ď i ď r, ď j ă t such that Γ i “ g e i, ¨ ¨ ¨ g e i,t ´ t ´ for all i , using algorithm 21.2 of Bernstein [2005] for i Ð , , . . . , r do R i Ð Ť ď j ă t,e i,j ě S ,j Page 10 of 21 reprint dated February 8, 2018 dominates the cost of constructing the product tree and factoring the Γ i ’s onLine 11.Using the standard product tree algorithm [Borodin and Munro, 1975], thecost of multi-point evaluation of a degree- n polynomial at each of n points is r O p n q ﬁeld operations. In our algorithm, this happens at each level down theproduct tree. At each of r log t s P O p log p rs qq levels there are at most c points(since the polynomials at each level are relatively prime). The total degreeat each level is at most deg ś i Γ i ď rs . If rs ą c , then this cost is alreadybounded by that of determining the coprime basis. Otherwise, the multi-pointevaluations occur in O p c {p rs qq batches of rs points each, giving the second termin the complexity statement. We are now ready to present the full algorithm for performing simultaneoussparse interpolation on r polynomials based on s evaluations each. Algorithm 3:

MultiSparseInterp p r, n, s, S , θ, Y q Input:

Bounds r, n, s P N , set S Ď t , . . . , n ´ u of possible exponents,high-order element θ P F , and evaluations Y i,j P F such that f i p θ j q “ Y i,j for all ď i ď r and ď j ă s Output:

Nonzero coeﬃcients and corresponding exponents of f i P F r x s or indicating failure, for each ď i ď r for i “ , , . . . , r do Γ i Ð MinPoly p Y i, , . . . , Y i, s ´ q using fast Berlekamp-Masseyalgorithm R , . . . , R r Ð FindRoots pt θ d | d P S u , Γ , . . . , Γ r q for i “ , , . . . , r do if R i ‰ deg Γ i then f i Ð else t i Ð R i t e i, , . . . , e i,t i u Ð t d P S | θ d P R i u r a i, , . . . , a i,t i s Ð solution of transposed Vandermonde systemfrom roots r θ e i, , . . . , θ e i,tt s and right-hand vector r Y i, , . . . , Y i,t i ´ s f i Ð a i, x e i, ` ¨ ¨ ¨ ` a i,t i x e i,ti Theorem 5.2.

Let F be a ﬁeld with an element θ P F whose multiplicative orderis at least n , and suppose f , . . . , f r P F r x s are unknown univariate polynomials.Given bounds n, s P N , a set S Ď N of possible exponents, and evaluations f i p θ j q for all ď i ď r and ď j ă s , Algorithm 3 MultiSparseInterp requires r O p rs ` S log n q ﬁeld operations in the worst case.The algorithm correctly recovers every polynomial f i such that supp f i ď s , supp f i Ď S , and deg f i ă n . Page 11 of 21 reprint dated February 8, 2018Proof.

From Lemma 5.1, the cost of

FindRoots is r O p rs ` S q . This dominatesthe cost of the minpoly and transposed Vandermonde computations, which areboth r O p rs q [Kaltofen and Yagati, 1989]. Using binary powering, computing theset t θ d | d P S u on Line 3 requires O p S log n q ﬁeld operations.If f i has at most t i ď s nonzero terms, then the minpoly Γ i computed onLine 2 is a degree- t i polynomial with exactly t i distinct roots. If supp p f i q Ď S ,then by Lemma 5.1, FindRoots correctly ﬁnds all these roots. Because theseare the only terms in f i , solving the transposed Vandermonde system cor-rectly determines all corresponding coeﬃcients; see [Kaltofen and Yagati, 1989]or [van der Hoeven and Lecerf, 2013] for details. Consider the matrix multiplication with errors problem, recovering an sparseerror matrix E P F r ˆ n according to the equation E “ C ´ AB . In this section,we show how to treat each row of E as a sparse polynomial and evaluate each ofthese at consecutive powers of a high-order element θ , as needed for the sparseinterpolation algorithm from Section 5. The same techniques will also be usedin the matrix inverse with errors algorithm.Consider a column vector of powers of an indeterminate, x “ r , x, x , . . . , x n ´ s T .The matrix-vector product E x then consists of m polynomials, each degree lessthan n , and with a 1-1 correspondence between nonzero terms of the polynomialsand nonzero entries in the corresponding row of E .Evaluating every polynomial in E x at a point θ P F is the same as multiplying E by a vector v “ r , θ, θ , . . . , θ n ´ s T . Evaluating each polynomial in E x atthe ﬁrst s powers , θ, θ , . . . , θ s ´ allows for the unique recovery of all t -sparserows, according to Theorem 5.2. This means multiplying E times an evaluationmatrix V P F n ˆ s such that V i,j “ θ ij .Consider a single row vector u P F ˆ n with c nonzero entries. Multiplying u V means evaluating a c -sparse polynomial at the ﬁrst s powers of θ ; we canview it as removing the n ´ c zero entries from u to give u P F ˆ c , and removingthe same n ´ c rows from V to get V P F c ˆ s , and then computing the product u V . This evaluation matrix with removed rows V is actually a transposedVandermonde matrix built from the entries θ i for each index i of a nonzeroentry in u .An explicit algorithm for this transposed Vandermonde matrix applicationis given in Section 6.2 of Bostan, Lecerf, and Schost [2003], and they show thatthe complexity of multiplying u V is r O p c ` s q ﬁeld operations, and essentiallyamounts to a power series inversion and product tree traversal. (See also Section5.1 of van der Hoeven and Lecerf [2013] for a nice description of this approachto fast evaluation of a sparse polynomial at consecutive powers.) We repeat thisfor each row in a given matrix M to compute AV .Now we are ready to show how to compute EV “ p C ´ AB q V eﬃciently,using the approach just discussed to compute CV and BV before multiplying A times BV and performing a subtraction. As we will show, the rectangularPage 12 of 21 reprint dated February 8, 2018 Algorithm 4:

DiffEval p A, B, C, θ, s q Input:

Matrices A P F r ˆ ℓ , B P F ℓ ˆ n , C P F r ˆ n , high-order element θ P F ,and integer s P N Output:

Matrix Y “ p C ´ AB q V P F r ˆ s , where V is the r ˆ s evaluation matrix given by p θ ij q ď i ă n, ď j ă s . Pre-compute θ j for all indices j of nonzero columns in B or C Y C Ð CV using the pre-computed θ j ’s and fast transposed Vandermondeapplications row by row Y B Ð BV similarly Y AB Ð AY B using dense or sparse multiplication, whichever is faster return Y C ´ Y AB multiplication of A times BV is the only step which is not softly-linear time,and this dominates the complexity. Lemma 6.1.

Algorithm 4

DiffEval always returns the correct matrix of eval-uations p C ´ AB q V and uses r O ˆ B ` C ` n ` s ¨ min ˆ A ` r, rℓ min p r, s, ℓ q ´ ω ˙˙ ﬁeld operations.Proof. Correctness comes from the discussion above and the correctness of thealgorithm in section 6.2 of Bostan et al. [2003] which is used on Lines 2 and 3.The cost of pre-computing all powers θ j using binary powering is O p n log n q ﬁeld operations, since the number of nonzero columns in B or C is at most n . Using these precomputed values, each transposed Vandermonde apply costs r O p k ` s q , where k is the number of nonzero entries in the current row. Summingover all rows of C and B gives r O p B ` C ` rs ` ℓs q for these steps.The most signiﬁcant step is the rectangular multiplication of A P F r ˆ ℓ times Y B P F ℓ ˆ s . Using sparse multiplication, the cost is O p A ¨ s q , by Corollary 3.3.Using dense multiplication, the cost is O ` rℓs { min p r, ℓ, s q ω ´ ˘ according to Fact 3.1.Because all the relevant parameters A, s, r, ℓ are known before the multi-plication, we assume that the underlying software makes the best choice amongthese two algorithms.Note that rs and ℓs are deﬁnitely dominated by the cost of the dense mul-tiplication. In the sparse multiplication case, the situation is more subtle when A ă ℓ . But in this case, we just suppose that rows of BV which are not usedin the sparse multiplication A p BV q are never computed, so the complexity is asstated. Page 13 of 21 reprint dated February 8, 2018 We now have all the necessary components to present the complete multiplica-tion algorithm, Algorithm 5

MultiplyEC . The general idea is to repeatedly use

FindNonzeroRows to determine the locations on nonzeros in E , then DiffEval and

MultiSparseInterp to recover half of the nonzero rows at each step.Setting the target sparsity of each recovered row to s “ r k { r s , where r isthe number of nonzero rows, ensures that at least half of the rows are recoveredat each step. To get around the fact that the number of errors k is initiallyunknown, we start with an initial guess of k “ , and double this guess wheneverfewer than half of the remaining rows are recovered.The procedure is probabilistic of the Monte Carlo type only because of theMonte Carlo subroutine FindNonzeroRows to determine which rows are erro-neous. The other parts of the algorithm — computing evaluations and recover-ing nonzero entries — are deterministic based on the bounds they are given.

Theorem 7.1.

With probability at least ´ ǫ , Algorithm 5 MultiplyEC ﬁndsall errors in C and uses r O ˆ m ` P log F ǫ T p n ` t q ` k ¨ min ˆR tr V , ℓ { min p ℓ, r, kr q ´ ω ˙˙ ﬁeld operations, where k is the actual number of errors in the given product C .Otherwise, it uses r O ` mℓn { min p m, ℓ, n q ´ ω ˘ ﬁeld operations and may return anincorrect result.Proof. The only probabilistic parts of the algorithm are the calls to

FindNonzeroRows ,which can only fail by incorrectly returning too few rows. If this never happens,then each iteration through the while loop either (a) discovers that the value of k was too small and increases it, or (b) recovers half of the rows or columns of E . So the total number of iterations of the while loop if the calls to FindNonzeroRows are never incorrect is at most r log k s ` r log m s ` r log n s . By the union bound,from the way that ǫ is computed and the fact that there are two calls to FindNonzeroRows on every iteration, the probability that

FindNonzeroRows is never incorrect is at least ´ ǫ .In this case, the rest of the correctness and the running time follow directlyfrom Lemmas 4.1 and 6.1 and Theorem 5.2.If a call to FindNonzeroRows returns an incorrect result, two bad things canhappen. First, if J is too small during some iteration, all of the evaluations maybe incorrect, leading to k being increased. In extremely unlucky circumstances,this may happen O p log p rn qq times until the product is naïvely recomputed onLine 19, leading to the worst-case running time.The second bad thing that can occur when a call to FindNonzeroRows failsis that it may incorrectly report all errors have been found, leading to the small ´ ǫ chance that the algorithm returns an incorrect result.Page 14 of 21 reprint dated February 8, 2018 Algorithm 5:

MultiplyEC p A, B, C, θ, ǫ q Input:

Matrices A P F m ˆ ℓ , B P F ℓ ˆ n , C P F m ˆ n , high-order element θ P F , and error bound ă ǫ ă Output:

Matrix E P F m ˆ n such that, with probability at least ´ ǫ , AB “ C ´ E k Ð E Ð m ˆ n ǫ Ð ǫ {p r log p mn q s ` q J Ð FindNonzeroRows p V ÞÑ CV ´ A p BV q , ǫ q while J ě do if FindNonzeroRows p V ÞÑ p V T p C ´ E q ´ p V T A q B q T , ǫ q ą J then Transpose C and E , swap and transpose A and B , replace J r Ð J A Ð submatrix of A from rows in J C Ð submatrix of p C ´ E q from rows in J s Ð r p k ´ E q{ r s Y Ð DiffEval p A , B, C , θ, s q f , . . . , f r Ð MultiSparseInterp p r, n, s, θ, Y q for i Ð , , . . . , r do Set p J i , e q th entry of E to c for each term cx e of f i J Ð FindNonzeroRows p V ÞÑ p C ´ E q V ´ A p BV q , ǫ q if J ą r { then k Ð k if k ě n J then return C ´ AB foreach i P J do Clear entries from row i of E added on this iteration return E Page 15 of 21 reprint dated February 8, 2018

As discussed in Section 2, our algorithm for correcting errors in a matrix inversefollows mostly the same outline as that for correcting a matrix product, withtwo important changes. These are the basis for the next two lemmas.

Lemma 8.1.

For any matrices

E, A P F n ˆ n where A is invertible, a call to FindNonzeroRows p V ÞÑ EAV, ǫ q correctly returns the nonzero rows of E withprobability at least ´ ǫ .Proof. Recall from the proof of Lemma 4.1 that the correctness of

FindNonzeroRows depends on applying a matrix V P F n ˆ ℓ of random entries to the unknown ma-trix E .In the current formulation, instead of applying the random matrix V directlyto E , instead the product AV is applied. But because A is nonsingular, there isa 1-1 correspondence between the set of all matrices V P F n ˆ ℓ and matrices theset t AV | V P F n ˆ ℓ u . That is, applying a random matrix to EA is the same asapplying a random matrix to E itself, and therefore the same results hold. Lemma 8.2.

Given any rank- r matrix A P F n ˆ r , it is possible to compute amatrix X P F r ˆ r formed from a subset of the rows of A , and its inverse X ´ ,using r O p A ` r ω q ﬁeld operations.Proof. This is a direct consequence of [Cheung et al., 2013, Theorem 2.11], alongwith the classic algorithm of [Bunch and Hopcroft, 1974] for fast matrix inver-sion. Alternatively, one could use the approach of [Storjohann and Yang, 2015]to compute the lexicographically minimal set of linearly independent rows in M , as well as a representation of the inverse, in the same running time.The resulting algorithm for matrix inversion with errors is presented inAlgorithm 6 InverseEC . Theorem 8.3.

With probability at least ´ ǫ , Algorithm 6 InverseEC ﬁnds allerrors in B and uses r O `P log F ǫ T t ` nk { min p r, kr q ´ ω ` r ω ˘ ﬁeld operations, where k is the actual number of errors in the given product C .Otherwise, it uses r O p n ω q ﬁeld operations and may return an incorrect result.Proof. In this algorithm, we make use of two diﬀerent formulas for E : EA “ I ´ BA (8.1) AE “ I ´ AB (8.2)The ﬁrst formula (8.1) is used to determine the nonzero rows of E on Lines 4and 19, and the second formula (8.2) is used to evaluate the rows of XE onLine 14. Page 16 of 21 reprint dated February 8, 2018 Algorithm 6:

InverseEC p A, B, θ, ǫ q Input:

Matrices

A, B P F n ˆ n , high-order element θ P F , and error bound ă ǫ ă Output:

Matrix E P F n ˆ n such that, with probability at least ´ ǫ , A ´ “ B ` E k Ð E Ð n ˆ n ǫ Ð ǫ {p r log n s ` q J Ð FindNonzeroRows p V ÞÑ V ´ B p AV q , ǫ q while J ě do if FindNonzeroRows p V ÞÑ V ´ A pp B ` E q V q , ǫ q ą J then Transpose A , B , and E , and replace J r Ð J A Ð submatrix of A from columns in J J Ð set of r linearly independent rows in A according toLemma 8.2 I , A Ð submatrices of I, A from the rows chosen in J X ´ Ð inverse of submatrix of A according to Lemma 8.2 s Ð r p k ´ E q{ r s Y Ð DiffEval p A , B, I , θ, s q Y Ð X ´ Y f , . . . , f r Ð MultiSparseInterp p r, n, s, θ, Y q for i Ð , , . . . , r do Set p J i , e q th entry of E to c for each term cx e of f i J Ð FindNonzeroRows p V ÞÑ V ´ p B ` E qp AV q , ǫ q if J ą r { then k Ð k if k ě n J then return A ´ ´ B foreach i P J do Clear entries from row i of E added on this iteration return E Page 17 of 21 reprint dated February 8, 2018

The main diﬀerence in this algorithm compared to

MultiplyEC is the needto ﬁnd r nonzero rows of A to constitute the matrix X and compute its in-verse X ´ . According to Lemma 8.2 these both cost r O p r ω q , which gives theadditional term in the complexity statement. Note that this also eliminates theutility of sparse multiplication in all cases, simplifying the complexity statementsomewhat compared to that of MultiplyEC .The rest of the proof is identical to that of Theorem 7.1.

References

Rasmus Resen Amossen and Rasmus Pagh. Faster join-projects and sparsematrix multiplications. In

Proceedings of the 12th International Conferenceon Database Theory , ICDT ’09, pages 121–126, New York, NY, USA, 2009.ACM. doi: . Referenced on page 4.Andrew Arnold, Mark Giesbrecht, and Daniel S. Roche. Faster sparse multivari-ate polynomial interpolation of straight-line programs.

Journal of SymbolicComputation , 2015. doi: . Referenced on page8.Michael Ben-Or and Prasoon Tiwari. A deterministic algorithm for sparse mul-tivariate polynomial interpolation. In

Proceedings of the twentieth annualACM symposium on Theory of computing , STOC ’88, pages 301–309, NewYork, NY, USA, 1988. ACM. doi: . Referenced onpage 8.Daniel J. Bernstein. Factoring into coprimes in essentially linear time.

Jour-nal of Algorithms , 54(1):1–30, 2005. doi: .Referenced on pages 9 and 10.Janko Böhm, Wolfram Decker, Claus Fieker, and Gerhard Pﬁster. The useof bad primes in rational reconstruction.

Math. Comp. , 84(296):3013–3027,2015. doi: . Referenced on page 2.A. Borodin and I. Munro.

The computational complexity of algebraic and nu-meric problems . Number 1 in Elsevier Computer Science Library; Theory ofComputation Series. American Elsevier Pub. Co., New York, 1975. Refer-enced on pages 9 and 11.A. Bostan, G. Lecerf, and É. Schost. Tellegen’s principle into prac-tice. In

Proceedings of the 2003 International Symposium on Symbolicand Algebraic Computation , ISSAC ’03, pages 37–44. ACM, 2003. doi: . Referenced on pages 12 and 13.Brice Boyer and Erich L. Kaltofen. Numerical linear system solving with para-metric entries by error correction. In

Proceedings of the 2014 Symposiumon Symbolic-Numeric Computation , SNC ’14, pages 33–38. ACM, 2014. doi: . Referenced on page 2.Page 18 of 21 reprint dated February 8, 2018

James R. Bunch and John E. Hopcroft. Triangular factorization and inversionby fast matrix multiplication.

Mathematics of Computation , 28(125):231–236,1974. URL . Referenced on pages5 and 16.David G. Cantor and Erich Kaltofen. On fast multiplication of polyno-mials over arbitrary algebras.

Acta Informatica , 28:693–701, 1991. doi: . Referenced on page 6.Ho Yee Cheung, Tsz Chiu Kwok, and Lap Chi Lau. Fast matrix rank al-gorithms and applications.

J. ACM , 60(5):31:1–31:25, October 2013. doi: . Referenced on pages 6 and 16.Matthew T. Comer, Erich L. Kaltofen, and Clément Pernet. Sparse polynomialinterpolation and Berlekamp/Massey algorithms that correct outlier errors ininput values. In

Proceedings of the 37th International Symposium on Symbolicand Algebraic Computation , ISSAC ’12, pages 138–145, New York, NY, USA,2012. ACM. doi: . Referenced on page 2.Jean-Guillaume Dumas and Erich Kaltofen. Essentially optimal interactive cer-tiﬁcates in linear algebra. In

Proceedings of the 39th International Symposiumon Symbolic and Algebraic Computation , ISSAC ’14, pages 146–153. ACM,2014. doi: . Referenced on page 1.Jean-Guillaume Dumas, Pascal Giorgi, and Clément Pernet. Dense lin-ear algebra over word-size prime ﬁelds: the FFLAS and FFPACK pack-ages.

ACM Trans. Math. Softw. , 35:19:1–19:42, October 2008. doi: . Referenced on pages 4 and 5.Jean-Guillaume Dumas, Erich Kaltofen, Emmanuel Thomé, and Gilles Villard.Linear time interactive certiﬁcates for the minimal polynomial and the de-terminant of a sparse matrix. In

Proceedings of the ACM on InternationalSymposium on Symbolic and Algebraic Computation , ISSAC ’16, pages 199–206. ACM, 2016. doi: . Referenced on page 1.Jean-Guillaume Dumas, David Lucas, and Clément Pernet. Certiﬁcates fortriangular equivalence and rank proﬁles. In

Proceedings of the 2017 ACMon International Symposium on Symbolic and Algebraic Computation , ISSAC’17, pages 133–140. ACM, 2017. doi: . Referencedon page 1.R¯usin , š Freivalds. Fast probabilistic algorithms. In Jiří Bečvář, editor, Mathe-matical Foundations of Computer Science 1979 , pages 57–69. Springer BerlinHeidelberg, 1979. Referenced on pages 1 and 7.Leszek Gasieniec, Christos Levcopoulos, Andrzej Lingas, Rasmus Pagh, andTakeshi Tokuyama. Eﬃciently correcting matrix products.

Algorithmica , 79(2):428–443, Oct 2017. doi: . Referenced onpages 1, 2, 3, 4 and 5. Page 19 of 21 reprint dated February 8, 2018

Oded Goldreich, Dana Ron, and Madhu Sudan. Chinese remainderingwith errors. In

Proceedings of the Thirty-ﬁrst Annual ACM Symposiumon Theory of Computing , STOC ’99, pages 225–234. ACM, 1999. doi: . Referenced on page 1.David Harvey, Joris van der Hoeven, and Grégoire Lecerf. Faster polynomialmultiplication over ﬁnite ﬁelds.

J. ACM , 63(6):52:1–52:23, January 2017. doi: . Referenced on page 6.Joris van der Hoeven and Grégoire Lecerf. On the bit-complexity of sparsepolynomial and series multiplication.

Journal of Symbolic Computation , 50:227–0254, 2013. doi: . Referenced on pages 8and 12.Qiao-Long Huang and Xiao-Shan Gao. Faster deterministic sparse interpola-tion algorithms for straight-line program multivariate polynomials.

CoRR ,abs/1709.08979, 2017. URL http://arxiv.org/abs/1709.08979 . Refer-enced on page 8.Seyed Mohammad Mahdi Javadi and Michael Monagan. Parallel sparse poly-nomial interpolation over ﬁnite ﬁelds. In

Proceedings of the 4th InternationalWorkshop on Parallel and Symbolic Computation , PASCO ’10, pages 160–168, New York, NY, USA, 2010. ACM. doi: .Referenced on page 8.Erich Kaltofen and Lakshman Yagati. Improved sparse multivariate polyno-mial interpolation algorithms. In P. Gianni, editor,

Symbolic and AlgebraicComputation , volume 358 of

Lecture Notes in Computer Science , pages 467–474. Springer Berlin / Heidelberg, 1989. doi: .Referenced on pages 8, 9 and 12.Erich L. Kaltofen. Fifteen years after DSC and WLSS2: What parallelcomputations I do today [invited lecture at PASCO 2010]. In

Proceed-ings of the 4th International Workshop on Parallel and Symbolic Compu-tation , PASCO ’10, pages 10–17, New York, NY, USA, 2010. ACM. doi: . Referenced on page 8.Erich L. Kaltofen and Zhengfeng Yang. Sparse multivariate function recoveryfrom values with noise and outlier errors. In

Proceedings of the 38th Interna-tional Symposium on Symbolic and Algebraic Computation , ISSAC ’13, pages219–226. ACM, 2013. doi: . Referenced on page2.Erich L. Kaltofen, Michael Nehring, and B. David Saunders. Quadratic-timecertiﬁcates in linear algebra. In

Proceedings of the 36th International Sym-posium on Symbolic and Algebraic Computation , ISSAC ’11, pages 171–176.ACM, 2011. doi: . Referenced on page 1.Page 20 of 21 reprint dated February 8, 2018

Erich L. Kaltofen, Clément Pernet, Arne Storjohann, and Cleveland Waddell.Early termination in parametric linear system solving and rational functionvector recovery with error correction. In

Proceedings of the 2017 ACM onInternational Symposium on Symbolic and Algebraic Computation , ISSAC ’17,pages 237–244. ACM, 2017. doi: . Referenced onpage 2.Majid Khonji, Clément Pernet, Jean-Louis Roch, Thomas Roche, andThomas Stalinski. Output-sensitive decoding for redundant residue sys-tems. In

Proceedings of the 2010 International Symposium on Symbolicand Algebraic Computation , ISSAC ’10, pages 265–272. ACM, 2010. doi: . Referenced on page 1.François Le Gall. Faster algorithms for rectangular matrix multiplication. In ,pages 514–523, Oct 2012. doi: . Referenced on page7.François Le Gall. Powers of tensors and fast matrix multiplication. In

Proceed-ings of the 39th International Symposium on Symbolic and Algebraic Com-putation , ISSAC ’14, pages 296–303, New York, NY, USA, 2014. ACM. doi: . Referenced on page 4.Andrzej Lingas. A fast output-sensitive algorithm for boolean matrixmultiplication. In Amos Fiat and Peter Sanders, editors,

Algorithms- ESA 2009 , pages 408–419. Springer Berlin Heidelberg, 2009. doi: . Referenced on page 4.Rasmus Pagh. Compressed matrix multiplication.

ACM Trans. Comput. The-ory , 5(3):9:1–9:17, August 2013. doi: . Referencedon page 4.Arne Storjohann and Shiyun Yang. A relaxed algorithm for online matrix in-version. In

Proceedings of the 2015 ACM on International Symposium onSymbolic and Algebraic Computation , ISSAC ’15, pages 339–346, New York,NY, USA, 2015. ACM. doi: . Referenced on page16.Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi. Augem: Automati-cally generate high performance dense linear algebra kernels on x86 cpus. In

Proceedings of the International Conference on High Performance Comput-ing, Networking, Storage and Analysis , SC ’13, pages 25:1–25:12. ACM, 2013.doi: . Referenced on page 4.Raphael Yuster and Uri Zwick. Fast sparse matrix multiplication.

ACM Trans.Algorithms , 1(1):2–13, July 2005. doi:10.1145/1077464.1077466