[PDF] Group-Theoretic Partial Matrix Multiplication

Abstract

A generalization of recent group-theoretic matrix multiplication algorithms to an analogue of the theory of partial matrix multiplication is presented. We demonstrate that the added flexibility of this approach can in some cases improve upper bounds on the exponent of matrix multiplication yielded by group-theoretic full matrix multiplication. The group theory behind our partial matrix multiplication algorithms leads to the problem of maximizing a quantity representing the "fullness" of a given partial matrix pattern. This problem is shown to be NP-hard, and two algorithms, one optimal and another non-optimal but polynomial-time, are given for solving it.

Full PDF

GGroup-Theoretic Partial Matrix Multiplication

Richard Strong Bowen, Bo Chen, Hendrik Orem,and Martijn van SchaardenburgHarvey Mudd College

Abstract

A generalization of recent group-theoretic matrix multiplication al-gorithms to an analogue of the theory of partial matrix multiplication ispresented. We demonstrate that the added ﬂexibility of this approach canin some cases improve upper bounds on the exponent of matrix multipli-cation yielded by group-theoretic full matrix multiplication. The grouptheory behind our partial matrix multiplication algorithms leads to theproblem of maximizing a quantity representing the “fullness” of a givenpartial matrix pattern. This problem is shown to be NP -hard, and twoalgorithms, one optimal and another non-optimal but polynomial-time,are given for solving it. a r X i v : . [ c s . CC ] F e b ontents Introduction

In 1969, Volker Strassen showed that the na¨ıve algorithm for square matrixmultiplication, which takes O ( n ) time to multiply matrices of dimension n , isnot optimal [8]; the algorithm he presented multiplied matrices in O ( n log ) ≈ O ( n . ). Together with the simple lower bound of O ( n ) on the number ofmultiplications needed to multiply n × n matrices, Strassen’s result originatedthe problem of determining the “best possible” exponent of matrix multiplica-tion. To be precise, if M ( n ) is the number of ﬁeld operations in characteristic0 required to multiply two n × n matrices, Strassen made the ﬁrst step towardsdetermining ω = inf { r ∈ R | M ( n ) = O ( n r ) } , the exponent of matrix multiplication.Gradual improvements were made to the upper bound on ω . In 1990, Cop-persmith and Winograd [4] showed that ω < .

38, a bound which remains theworld record. A promising group-theoretic approach was presented by Cohnand Umans in 2003 [3]. They described an embedding of matrices into a groupalgebra that would allow for fast convolution via a Fourier transform, in muchthe same way that polynomials can be multiplied eﬃciently by embedding themin a group algebra, applying an FFT and then performing the convolution inthe frequency domain. The challenge was to ﬁnd an appropriate group togetherwith three subsets which serve to index the matrix entries in the embedding.Using this method, Cohn et al. [2] tied the record of ω < . ω is a long-standing open problem in theo-retical computer science. It is widely believed that ω = 2, but no progress hasbeen made on the best known upper bound in nearly two decades.In this paper, we generalize the results of Cohn et al., which only deal withfull matrix multiplication, to a theory of group-theoretic partial matrix multipli-cation and use this approach to prove bounds on ω . In particular, Theorem 2.12states that ω ≤ (cid:80) i d ωi )log f ( A ) , where the d i are the character degrees of the chosen group and f ( A ) represents,roughly, the amount of information computed in the product of two partialmatrices of a particular “pattern.”The group-theory behind our partial matrix multiplication algorithm leadsto an additional computational challenge, namely optimizing the quantity f ( A )given a set of possible patterns. We show this problem to be NP -hard, and de-scribe a non-optimal but polynomial-time algorithm, as well as an exponential-time algorithm for solving it. In a particular case, we show how to improvean upper bound on ω obtained in [2] by using the greater generality of group-theoretic partial matrix multiplication.3 Full and Partial Group-Theoretic Matrix Mul-tiplication

Our main theorems describe an algorithm for multiplying matrices using triplesof subsets not satisfying the triple product property (see Deﬁnition 2.4). Someentries must be set to zero, and then partial matrix multiplications are per-formed. This section introduces the original group-theoretic algorithm by Cohnand Umans [3], as well as the notion of ‘aliasing’, the motivation for our focuson partial matrix multiplication.

Deﬁnition 2.1. If S, T, U are ordered subsets of a group G , then the Cohn-Umans algorithm [3] for matrix multiplication computes the product of matrices M and N of dimensions | S | × | T | and | T | × | U | , respectively, as follows.Index the rows of M by S − , the columns of M by T , the rows of N by T − , and the columns of N by U . Then let f M = (cid:80) i,j M i,j s − i t j and f N = (cid:80) j,k N j,k t − j u k . Compute f P = f M f N , and assign to P i,k the coeﬃcient of s − i u k in f P . Theorem 2.2.

The Cohn-Umans algorithm computes, in position i, k of theproduct matrix, the sum of all terms M i (cid:48) ,j N j (cid:48) ,k (cid:48) , where s − i (cid:48) t j t − j (cid:48) u k (cid:48) = s − i u k . Proof.

Every term in f P is a product of a term in f M with a term in f N . The s − i u k term is exactly the sum of all terms ( zm )( z (cid:48) n ), where z, z (cid:48) ∈ C n × n , m ∈ S − T and n ∈ T − U , and mn = s − i u k . But this is exactly the sum in thestatement of the theorem. (cid:4) Corollary 2.3.

The Cohn-Umans algorithm is correct if and only if for all s, s (cid:48) ∈ S, t, t (cid:48) ∈ T, u, u (cid:48) ∈ U , we have that ss (cid:48)− tt (cid:48)− uu (cid:48)− = e implies s = s (cid:48) , t = t (cid:48) , u = u (cid:48) .Proof. This result follows from the previous theorem since s − i (cid:48) t j t − j (cid:48) u k (cid:48) = s − i u k implies i = i (cid:48) , j = j (cid:48) , u = u (cid:48) , meaning that entry ( i, k ) of the product onlycontains terms formed by multiplying entry ( i, j ) by ( j, k ) in the left and rightfactor matrices, respectively. (cid:4) Deﬁnition 2.4.

The property in 2.3 is called the triple product property [3].

Example 2.5.

The following sets in D = (cid:104) x, y | x = y = 1 , xy = yx − (cid:105) havethe triple-product property: S = { , y } T = { , yx , x , xy } U = { , yx } S , T , and U can be used to index the product of a full 2 × × ω to that of searching for groups with a good combination of character degreesand subsets satisfying the triple product property. It is, however, unnecessaryto require that the group element index sets produce a fully correct product.Even when terms in the group algebra multiplication incorrectly appear in anentry of the product matrix due to a violation of the triple product property byour chosen subsets S, T, and U (we call this phenomenon aliasing to emphasizethe analogy to the usual Fourier transform in signal processing), these indexsets will still compute the correct product in the case where one of the inputentries contributing to each aliasing term contains a zero.In the next section, we show how to apply the classical theory of partial ma-trix multiplication to the group-theoretic framework developed by Cohn et al.We will present bounds on ω realizable through subsets which may or may notsatisfy the triple product property; in a special case, we can show that our algo-rithm yields strictly stronger results than the original Cohn-Umans full matrixmultiplication algorithm. For a speciﬁc family of constructions satisfying thetriple product property, the associated bound on ω can be improved by addinga single element to each of the sets, described in Section 4. This means that theadditional information computed by increasing the matrix dimensions outwieghsthe information lost due to the partial nature of the larger multiplication. Deﬁnition 2.6. If S, T, U are subsets of a group G , the set of all triples(( i, j ) , ( j (cid:48) , k ) , ( i (cid:48) , k (cid:48) )) where s − i t j t − j (cid:48) u k = s − i (cid:48) u k (cid:48) and i (cid:54) = i (cid:48) , j (cid:54) = j (cid:48) , or k (cid:54) = k (cid:48) is called the set of aliasing triples , A .Aliasing sets can be visualized as sets of lines representing the triples asshown in Figure 1. Each line is broken up into two pieces: the ﬁrst runs fromthe left factor matrix to the right factor matrix and represents which pair ofinput entries combine to produce an incorrect term in the product; the secondruns from the right factor matrix to the product, indicating where the incorrectterm appears. Deﬁnition 2.7.

The left aliasing set of a set of aliasing triples A is { x : there exist y, z such that ( x, y, z ) ∈ A } . The right aliasing set and the product aliasing set are deﬁned analagously. Theleft aliasing set is the set of indices in the left factor matrix in Figure 1 that arethe endpoints of one of the lines. 5igure 1: A visualization of the aliasing set in Example 2.10, where the inputmatrices are the left and middle rectangles, and the output is on the right. Atriple (( i, j ) , ( j (cid:48) , k ) , ( i (cid:48) , k (cid:48) )) corresponds to a pair of lines from ( i, j ) in the leftfactor matrix to ( j (cid:48) , k ) in the right, and from ( j (cid:48) , k ) in the right factor matrixto ( i (cid:48) , k (cid:48) ) in the product; the set of all ( i, j ) which is the start of a line in thediagram is the left aliasing set.It is impossible to have only one of i (cid:54) = i (cid:48) , j (cid:54) = j (cid:48) , k (cid:54) = k (cid:48) (if, for example,only i (cid:54) = i (cid:48) held, then we would have s − i eu k = s − i (cid:48) u k ). Thus, an incorrect termin the Cohn-Umans algorithm will only occur having at least two of1. being in the wrong row given its ﬁrst multiplicand,2. being in the wrong column given its second multiplicand, or3. having its multiplicands coming from diﬀerent positions in their respectiverow and column. Deﬁnition 2.8.

Let A be a set of aliasing triples for S, T, U ⊆ G . We say that I and J cover A if I and J are subsets of the indices of entries of a | S | × | T | and | T | × | U | matrix, respectively, such that for all a in A , either the ﬁrst entryof a is in I or the second is in J . If M and N are | S | × | T | and | T | × | U | entriessuch that for every index i in I , M i is 0, and similarly for N and J , we say that M and N realize I and J . Theorem 2.9.

Let G be a group and let S, T, U be indexing sets with aliasing set A . Let M, N be matrices of size | S | × | T | , | T | × | U | , respectively, and let I, J besubsets of the indices that cover A . If M, N realize

I, J , then the Cohn-Umansalgorithm correctly computes the partial matrix product

M N .Proof.

By Theorem 2.2, the extra terms arise from entries in the input matriceswith indices in the aliasing set A . Thus setting the entries corresponding toentries of I and J to zero sets the coeﬃcient on each incorrect term to zero,yielding the correct product of the partial matrices of M, N . (cid:4) xample 2.10. Consider our earlier example in D , with a change to the lastelement of T : S = { , y } T = { , yx , x , x } U = { , yx } . This triple has aliasing set A = { ((2 , , (3 , , (1 , , ((2 , , (3 , , (1 , , ((1 , , (3 , , (2 , , ((1 , , (3 , , (2 , } , as depicted in Figure 1. The ﬁrst element of A describes the indices in theproduct s − t t − u = s − u that erroneously form an extra term in the top left corner of the product matrix.Thus, using these sets, the Cohn-Umans algorithm correctly computes thesetypes of partial matrix multiplication: (cid:20) a , a , a , a , a , a , (cid:21) ×  b , b , b , b , b , b , b , b , (cid:20) a , a , a , a , a , a , a , a , (cid:21) ×  b , b , b , b , b , b ,  . The aliasing triples are visually depicted in Figure 1.We will now introduce a function that will be an integral part of our partialmatrix multiplication algorithm. It computes the number of ones in a tensor ofpartial matrix multiplication , which intuitively means the amount of informationcomputed by this partial multiplication. Its importance will become clear in thenext theorem.

Deﬁnition 2.11.

Let A be a set of aliasing triples and let I and J cover A .The function f ( I, J ) is equal to (cid:88) i k i n i , where k i is the number of entires in the i th column of the left factor matrixwhich do not appear in I and n i is the number of entries in the i th row of theright factor matrix which do not appear in J . Finally, f ( A ) is f ( A ) = max { f ( I, J ) | I and J cover A } . f is a measure of how much computation is being done by apartial matrix multiplication. Notice that if there is no zeroing in a multiplica-tion of m by n by n by p , then by I and J both empty, f ≥ mnp (and it’s easyto see that f = mnp ). The following theorem is used to derive many of ourresults; it provides a bound on ω given subsets which need not satisfy the tripleproduct property. For this proof, it is suﬃcient to consider only matrices ofcomplex numbers. Note that in the special case where the aliasing set is empty(that is, S, T, U have the triple product property), f ( A ) = | S || T || U | and ourbound recovers Theorem 1.8 in [2]. This mimics the proof of Theorem 4.1 in [3],and uses some if its terminology. Theorem 2.12.

Let

S, T, U ⊆ G with aliasing triples A , and suppose G hascharacter degrees { d i } . Then ω ≤ (cid:80) i d ωi )log f ( A ) Proof.

Let t be the tensor of partial matrix multiplication corresponding to I, J ,the patterns which maximize f . It is clear that t ≤ C G ∼ = (cid:77) i (cid:104) d i , d i , d i (cid:105) (similar to Theorem 2.3 in [3]). Then the l th tensor power of t satisﬁes t l ≤ (cid:77) i ,...,i l (cid:104) d i . . . d i l , d i . . . d i l , d i . . . d i l (cid:105) . By the deﬁnition of ω , each (cid:104) d i . . . d i l , d i . . . d i l , d i . . . d i l (cid:105) has rank at most C ( d i . . . d i l ) ω + ε for some C and for all ε . So, taking the rank of both sidesgives R ( t ) l ≤ C (cid:16)(cid:88) d ω + εi (cid:17) l , from Proposition 15.1 in [1]. Since this is true of all ε >

0, it holds for ε = 0 bycontinuity: R ( t ) l ≤ C (cid:16)(cid:88) d ωi (cid:17) l . Taking l th roots as l → ∞ gives R ( t ) ≤ (cid:88) i d ωi . By Theorem 4.1 in [7] ω ≤ (cid:80) i d ωi )log f ( A ) . (cid:4) Algorithms for Aliasing Structure

In the study of aliasing, the following problem comes up: there is a pattern A ,and one wishes to ﬁnd the value f ( A ) by trying various I, J . This problem is NP -hard; this section describes the worst-case exponential-time algorithm weuse to solve it exactly, as well as a polynomial time algorithm used to ﬁnd areasonable solution. In this section we will give a polynomial-time algorithm for ﬁnding coveringsets

I, J . This is not an approximation algorithm in the complexity-theoreticsense; it is merely a “pretty good” algorithm which we found useful in research.Instead of ﬁnding the cover which minimizes f , we ﬁnd the cover which zerosthe fewest entries. Viewing the entries in the factor matrices as vertices in abipartite graph, and the pairs in the aliasing set as edges, it is clear that wedesire a minimal vertex cover. By K¨onig’s theorem, this is equivalent to ﬁndinga maximum matching (for an excellent explanation of the associated algorithm,see [5]), which can be solved eﬃciently in bipartite graphs with [6]. When computing f by exhaustive search, one must choose, for each aliasingtriple, whether to satisfy it by zeroing the left or by zeroing the right. Aftereach choice, however, one can compute the current value of f as if the onlytriples in A were those already assigned a zero. Then making further choiceswill only lower this value of f , so if the computed value is below the alreadyknown best value, the entire search tree can be pruned. In pseudocode, procedure maximum_f(A)S = new StackF = new Frame(A) rame2 = copy(frame)frame1.I.append(left entry of a)frame2.J.append(right entry of a)S.push(frame1,frame2) In this section we present an improvement over a construction presented in § Let H = C n × C n × C n ,G = H (cid:111) S , and let H i < H be the subgroup isomorphic to C n in the i th coordinate. By z we mean the generator of S , and by e H we mean the identity element of H .We write elements of G as ( a, b ) z j where a, b ∈ H and j = 0 or 1.Deﬁne, for i ∈ , ,

3, the subsets of GS i = { ( a, b ) z j | a ∈ H i \ e H , b ∈ H i +1 , j = 0 or 1 } where subscripts are taken mod 3. Finally, we let S = S , T = S , U = S . By [2], Lemma 2.1,

S, T, U have the triple product property. Note that | S | = | T | = | U | = 2 n ( n − , and so f = 8 n ( n − . This construction gives ω ≤ . n = 17. Let S i be as deﬁned in the previous section, and let S (cid:48) i = S i ∪ { ( e H , e H ) } . Let S (cid:48) = S (cid:48) , T (cid:48) = S (cid:48) , U (cid:48) = S (cid:48) , and let A be the associated aliasing set, showngraphically in Figure 2.We ﬁnd that A can be partitioned into three easily analyzed categories:10a) “Bottom aliasing” occurs in the rows of the product which are not indexedby the identity. All aliasing of this type can be covered by zeroing some( n − entries in the ( e H , e H ) column of R .(b) “Top-Easy aliasing” occurs in the ( e H , e H ) row of the product. These areentirely covered by zeroing ( n − entries of the ( e H , e H ) column of L .(c) “Top-Hard aliasing” also occurs in the ( e H , e H ) row of the product. Thedistinction is in the manner in which they arise. Alasing in this category canbe covered by two things: the same entries which cover Top-Easy aliasing,combined with an additional 2 n ( n −

1) entries in the ( e H , e H ) column of R .This decomposition is depicted in Figure 3.There exists a pair I, J with ( n − elements in the ﬁrst column in L , andthe entire ﬁrst column in R , that cover A . Thus f ≥ (2 n ( n − + (2 n ( n − + (2 n ( n − (cid:2) n ( n − − ( n − + 1 (cid:3) , which is strictly greater than f for S, T, U . For n = 17, we acheive ω ≤ . S, T, U , and then some more. Thus, by relaxing the restriction on

S, T, U ,we strictly increased the amount of computation done, without increasing thework necessary (since G is constant). Often we are confronted with this problem: given some triple of subsets, ﬁndthe best way to put aliasing in the factor matrices and have the best bound on ω , i.e., the best f ( I, J ). We show this problem is computationally hard.Consider the problem PARTIAL-TENSOR-ONES: given the dimensions oftwo matrices m, n, p , a set of pairs A = { (( a i , b i ) , ( c i , d i )) } , and an integer k ,are there I and J realizing A such that f ( I, J ) = k ? (This is the problemFigure 2: A visualization of the aliasing in the construction introduced in Sec-tion 4.2. In this case, n = 2. 11 a) Bottom Aliasing(b) Top-Easy Aliasing(c) Top-Hard Aliasing Figure 3: A visualization of the three types of aliasing in the construction givenin Section 4.2.of maximizing the dot product when all the aliasing is to be taken care ofby the left and right matrices). We show that PARTIAL-TENSOR-ONES is NP -complete via a reduction from INDEPENDENT-SET, a well-known NP -complete problem. Theorem 5.1.

PARTIAL-TENSOR-ONES is NP -completeProof. An instance of INDEPENDENT-SET consists of (some encoding of) agraph and an integer k . Let G = ( V, E ) be this graph. We will generate aninstance of PARTIAL-TENSOR-ONES. Let m = p = | V | and n = 1. For eachedge ( v i , v j ), add constraints of the form ((1 , i ) , ( j, j, , (1 , i )).Suppose there is an independent set of size k . Then there is an I, J such that f ( I, J ) = k . For each v i in the independent set, allow ( i,

1) and (1 , i ) to be freeand all other entries in the two vectors to be zeroed. It’s clear that f ( I, J ) = k ,and every constraint is fulﬁlled because the constraints correspond exactly tothe edges, so no two free variables appear in the same constraint.From an aliasing pattern with f ( I, J ) = k , we can construct an independentset of the same size. If any (1 , i ) is free in I while ( i,

0) is zeroed in J , modify I to set (1 , i ) to zero. Then the value of f is unchanged, but all pairs areeither both free or both 0. This is the sort of aliasing pattern one gets fromthe previous reduction, and we can easily run the argument of the previousparagraph backwards to ﬁnd an independent set in G of size k .Since there is an independent set of size k if and only if there are some I, J such that f ( I, J ) = k , and the reduction is clearly polynomial time, PARTIAL-12ENSOR-ONES is NP -hard.To show that PARTIAL-TENSOR-ONES is in NP , we must show a polynomial-sized certiﬁcate which can be checked in polynomial time. Given an instance ofPARTIAL-TENSOR-ONES, a certiﬁcate can be a list of three symbols, L , R ,or B , one for each constraint, indicating whether that constraint is satisﬁed byzeroing on the left, on the right, or in both. This is clearly polynomial in sizeof the input. To check the certiﬁcate, one only needs to check two conditions:ﬁrst, that it is consistant, that is, that no pair of constraints on the same entryof the matrix constrain it to be both free and zero, which can be done with thesquare of the number of constraints such checks, and second that f ( I, J ) ≥ k ,which can be done by making a list of rows and columns with zeored entries, andfor each of these the number of nonfree entries in that row or column. Then f can be computed from this easily. This takes time proportional to the numberof constraints as well. So, the certiﬁcate can be veriﬁed in polynomial time.Therefore, PARTIAL-TENSOR-ONES is NP -complete. (cid:4) Remark : We have not shown, in the reduction, a group (and appropriatesubsets) which provides the appropriate aliasing. So, any polynomial time algo-rithm to ﬁnd the best aliasing pattern from a given group and triple of subsetsmust either use more group theory, or show that P = NP . We have shown that an analogue the algorithm described in [3] can be applied toindexing sets that do not satisfy the triple product property, and provide sometechniques for addressing the resulting optimization problems. In particular, wetake sets satisfying the property and modify them in a small way to achieve alower bound on ω . As the group-theoretic approach is known to tie the best-known upper bound, this suggests a possible path to improving upon the currentrecord. We would like to thank our research advisors, Professors Michael Orrison andNicholas Pippenger of Harvey Mudd College Mathematics, for the opportunityto work on this project, as well as their assistance and collaborative eﬀortsthroughout the summer. Professor Chris Umans of Caltech Computer Science,was also very generous with his time and supportive of our research eﬀorts.We are grateful for the help of Claire Connelly in setting up our computingenvironment and allowing us to use hex , a parallel-computing system operatedby Harvey Mudd College’s Department of Mathematics and the QuantitativeLife Sciences Center. hex was made available through a grant from the W.M.Keck Foundation.Our research was funded in part by National Science Foundation grantCCF 0430656. 13 eferences [1] P. B¨urgisser, M. Clausen, and M.A. Shokrollahi.

Algebraic Complexity The-ory . Springer, 1997.[2] H. Cohn, R. Kleinberg, B. Szegedy, and C. Umans. Group-theoretic algo-rithms for matrix multiplication.

Foundations of Computer Science. 46thAnnual IEEE Symposium on , pages 23–25, 2005.[3] H. Cohn and C. Umans. A group-theoretic approach to fast matrix multipli-cation.

Foundations of Computer Science, 2003. Proceedings. 44th AnnualIEEE Symposium on , pages 438–449, 2003.[4] D. Coppersmith and S. Winograd. Matrix multiplication via arithmeticprogressions.

Journal of Symbolic Computation , 9(3):251–280, 1990.[5] J.M. Harris, J.L. Hirst, and M.J. Mossinghoﬀ.

Combinatorics and GraphTheory . Springer, 2000.[6] J.E. Hopcroft and R.M. Karp. An n / Algorithm for Maximum Matchingsin Bipartite Graphs.

SIAM Journal on Computing , 2:225, 1973.[7] A. Sch¨onhage. Partial and Total Matrix Multiplication.

SIAM Journal onComputing , 10:434, 1981.[8] V. Strassen. Gaussian elimination is not optimal.