[PDF] Barriers for rectangular matrix multiplication

Abstract

We study the algorithmic problem of multiplying large matrices that are rectangular. We prove that the method that has been used to construct the fastest algorithms for rectangular matrix multiplication cannot give optimal algorithms. In fact, we prove a precise numerical barrier for this method. Our barrier improves the previously known barriers, both in the numerical sense, as well as in its generality. We prove our result using the asymptotic spectrum of tensors. More precisely, we crucially make use of two families of real tensor parameters with special algebraic properties: the quantum functionals and the support functionals. In particular, we prove that any lower bound on the dual exponent of matrix multiplication α via the big Coppersmith-Winograd tensors cannot exceed 0.625.

Full PDF

BBarriers for Rectangular Matrix Multiplication

Matthias Christandl

University of Copenhagen [email protected]

François Le Gall

Nagoya University [email protected]

Vladimir Lysikov

University of Copenhagen [email protected]

Jeroen Zuiddam

Institute for Advanced Study [email protected]

Abstract.

We study the algorithmic problem of multiplying large matrices that arerectangular. We prove that the method that has been used to construct the fastestalgorithms for rectangular matrix multiplication cannot give optimal algorithms. Infact, we prove a precise numerical barrier for this method. Our barrier improves thepreviously known barriers, both in the numerical sense, as well as in its generality.We prove our result using the asymptotic spectrum of tensors. More precisely, wecrucially make use of two families of real tensor parameters with special algebraicproperties: the quantum functionals and the support functionals. In particular, weprove that any lower bound on the dual exponent of matrix multiplication α viathe big Coppersmith–Winograd tensors cannot exceed 0.625.

1. Introduction

Given two large matrices, how many arithmetic operations, plus and times, arerequired to compute their matrix product?The high school algorithm for multiplying two square matrices of shape n × n costs roughly n arithmetic operations. On the other hand, we know that atleast n operations are required. Denoting by ω the optimal exponent of n in thenumber of operations required by any arithmetic algorithm, we thus have ≤ ω ≤ .What is the value of ω ? Since Strassen published his matrix multiplication algorithmin 1969 we know that ω ≤ . [Str69]. Over the years, more constructions offaster matrix multiplication algorithms, relying on insights involving direct sumalgorithms, approximative algorithms and asymptotic induced matchings, lead tothe current upper bound ω ≤ . [CW90, Sto10, Wil12, LG14].In applications the matrices to be multiplied are often very rectangular insteadof square; see the examples in [LU18]. For any nonnegative real p , given an n × (cid:100) n p (cid:101) matrix and an (cid:100) n p (cid:101) × n matrix, how many arithmetic operations are required?Denoting, similarly as in the square case, by ω ( p ) the optimal exponent of n in thenumber of operations required by any arithmetic algorithm, we a priori have thebounds max(2 , p ) ≤ ω ( p ) ≤ p . (Formally speaking, ω ( p ) is the inﬁmum overall real numbers b so that the product of any n × (cid:100) n p (cid:101) matrix and any (cid:100) n p (cid:101) × n matrix can be computed in O ( n b ) arithmetic operations. Of course, ω = ω (1) , andif ω = 2 , then ω ( p ) = max(2 , p ) .) What is the value of ω ( p ) ? Parallel to thedevelopments in upper bounding ω , the upper bound p was improved drasticallyover the years for the several regimes of p [HP98, KZHP08, LG12, LU18]. Thebest lower bound on ω ( p ) , however, has remained max(2 , p ) . a r X i v : . [ c s . CC ] M a r o the matrix multiplication exponent ω characterises the complexity of squarematrix multiplication and, for every nonnegative real p , the rectangular matrixmultiplication exponent ω ( p ) characterises the complexity of rectangular matrixmultiplication. Coppersmith proved that there exists a value < p < suchthat ω ( p ) = 2 [Cop82]. The largest p such that ω ( p ) = 2 is denoted by α . We willrefer to α as the dual matrix multiplication exponent. The algorithms constructedin [LU18] give the currently best bound α > . . If α = 1 , then of course ω = 2 . In fact, ω + ω α ≤ (Remark 3.20). Thus we study ω ( p ) not only tounderstand rectangular matrix multiplication, but also as a means to prove ω = 2 .The value of α appears explicitly in various applications, for example in the recentwork on solving linear programs [CLS19] and empirical risk minimization [LSZ19].The goal of this paper is to understand why current techniques have not closedthe gap between the best lower bound on ω ( p ) and the best upper bound on ω ( p ) ,and to thus understand where to ﬁnd faster rectangular matrix multiplicationalgorithms. We prove a barrier for current techniques to give much better upperbounds than the current ones. Our work gives a very precise picture of thelimitations of current techniques used to obtain the best upper bounds on ω ( p ) and the best lower bounds on α .Our ideas apply as well to n × (cid:100) n p (cid:101) by (cid:100) n p (cid:101) × (cid:100) n q (cid:101) matrix multiplication fordiﬀerent p and q . We focus on p = q for simplicity. To understand what are the current techniques that we prove barriers for, weexplain how the fastest algorithms for matrix multiplication are constructed, ona high level. An algorithm for matrix multiplication should be thought of as areduction of the “matrix multiplication problem” to the natural “unit problem” thatcorresponds to multiplying numbers,matrix multiplication problem ≤ unit problem . Mathematically, problems correspond to families of tensors. Several diﬀerentnotions of reduction are used in this context. We will discuss tensors and reductionsin more detail later.In practice, the fastest matrix multiplication algorithms, for square or rectangularmatrices, are obtained by a reduction of the matrix multiplication problem to someintermediate problem and a reduction of the intermediate problem to the unitproblem,matrix multiplication problem ≤ intermediate problem ≤ unit problem . The intermediate problems that have been used so far to obtain the best upperbounds on ω ( p ) correspond to the so-called small and big Coppersmith–Winogradtensors cw q and CW q .Depending on the intermediate problem and the notion of reduction, we prove abarrier on the best upper bound on ω ( p ) that can be obtained in the above way.Before we say something about our new barrier, we discuss the history of barriersfor matrix multiplication. .2. History of matrix multiplication barriers We call a lower bound for all upper bounds on ω or ω ( p ) that can be obtained bysome method, a barrier for that method. We give a high-level historical account ofbarriers for square and rectangular matrix multiplication.Ambainis, Filmus and Le Gall [AFLG15] were the ﬁrst to prove a barrier in thecontext of matrix multiplication. They proved that a variety of methods appliedto the Coppersmith–Winograd intermediate tensors (which gave the best upperbounds on ω ) cannot give ω = 2 and in fact cannot give ω ≤ . .Alman and Vassilevska Williams [AW18a, AW18b] proved barriers for a notionof reduction called monomial degeneration, extending the realm of barriers beyondthe scope of the Ambainis et al. paper. They prove that some collections ofintermediate tensors, including the Coppersmith–Winograd intermediate tensors,cannot be used to prove ω = 2 . Their analysis is based on studying the so-called asymptotic independence number of the intermediate problem (also calledmonomial asymptotic subrank). This paper also for the ﬁrst time studies barriersfor rectangular matrix multiplication, for ≤ p ≤ and monomial degeneration.For example, they prove that the intermediate tensor CW can only give α ≤ . .Blasiak et al. [BCC + + We prove new barriers for rectangular matrix multiplication using the quantumfunctionals and support functionals.We ﬁrst set up a general barrier framework that encompasses all previously usednotions of reductions and then numerically compute barriers for the degenerationnotion of reduction and the Coppersmith–Winograd intermediate problems. Wealso discuss barriers for “mixed” intermediate problems, which covers a methodused by, for example, Coppersmith [Cop97].We will explain our barrier in more detail in the language of tensors, but ﬁrstwe will give a numerical illustration of the barriers. .3.1. Numerical illustration of the barriers For the popular intermediate tensor CW our barrier to get upper bounds on ω ( p ) via degeneration looks as follows. In Fig. 1, the horizontal axis goes over all p ∈ [0 , .The blue line is the upper bound on ω ( p ) obtained via CW as in [LG12]. Theyellow line is the barrier and the red line is the best lower bound max { , p } on ω ( p ) . (In [LG12] the best upper bounds on ω ( p ) are obtained using CW q with q = 5 for p ≤ . , q = 6 for . < p ≤ . and q = 7 for p > . .) Figure 1. The blue line is the upper bound on ω ( p ) obtained via CW as in [LG12] where p ∈ [0 , in on the horizontal axis; the yellow line is our barrier for upper bounds on ω ( p ) viadegeneration and the intermediate tensor CW ; the red line is the lower bound on ω ( p ) . How about the barrier for CW q for other values of q ? To see what happensthere, we give in Fig. 2 the barrier for several values of q in terms of the dualmatrix multiplication exponent α . (We recall that α is the largest value of p suchthat ω ( p ) = 2 .) For q = 6 this barrier corresponds to the smallest value of p inFig. 1 where the yellow line goes above . Figure 2. The blue points are the lower bound on α obtained via CW q as in [LG12] for all q ∈ { , . . . , } , the yellow points are our barrier for the best lower bound on α obtainablevia degeneration and the intermediate tensor CW q , and the red points are the bestupper bounds on α , namely . The best lower bound α > . is attained at q = 5 .Any lower bound on α using degeneration and CW q for any q , cannot exceed . ,the highest yellow point. 4 ur results give that the best lower bound on α obtainable with degenerations via CW q for any q , cannot exceed . . (This value corresponds to the highest yellowpoint in Fig. 2.) Recall that the currently best lower bound is α > . [LU18].Compared to [AW18a] our barriers are more general, numerically higher andapply not only for ≤ p ≤ but also for p ≥ . For example, [AW18a] proves thatmonomial degeneration via CW can only give . ≤ α whereas we get that thestronger degenerations via CW can only give . ≤ α . Let us continue the discussion that we started in Section 1.1 of how algorithms areconstructed, but now in the language of tensors. The goal is to explain our barrierin more detail.As we mentioned, algorithms correspond to reductions from the matrix multi-plication problem to some natural unit problem and the problems correspond totensors. Let F be our base ﬁeld. (The value of ω ( p ) may in fact depend on thecharacteristic of the base ﬁeld.) A tensor is a trilinear map F n × F n × F n → F .The problem of multiplying an (cid:96) × m matrix and an m × n matrix corresponds tothe matrix multiplication tensor (cid:104) (cid:96), m, n (cid:105) = (cid:96) (cid:88) i =1 m (cid:88) j =1 n (cid:88) k =1 x ij y jk z ki . The unit problem corresponds to the family of diagonal tensors (cid:104) n (cid:105) = n (cid:88) i =1 x i y i z i . There are several notions of reduction that one can consider, but the following isthe most natural one. For two tensors S and T we say S is a restriction of T andwrite S ≤ T if there are three linear maps A, B, C of appropriate formats such that S is obtained from T by precomposing with A , B and C , that is, S = T ◦ ( A, B, C ) .A very important observation (see, e.g., [BCS97] or [Blä13]) is that any matrixmultiplication algorithm corresponds to an inequality (cid:104) (cid:96), m, n (cid:105) ≤ (cid:104) r (cid:105) . Square matrix multiplication algorithms look like (cid:104) n, n, n (cid:105) ≤ (cid:104) r (cid:105) and rectangular matrix multiplication, of the form that we study, look like (cid:104) n, n, (cid:100) n p (cid:101)(cid:105) ≤ (cid:104) r (cid:105) . In general, faster algorithms correspond to having smaller r on the right-hand side.In fact, if (cid:104) n, n, n (cid:105) ≤ (cid:104) n c + o (1) (cid:105) hen ω ≤ c , and similarly for any p ≥ , if (cid:104) n, n, (cid:100) n p (cid:101)(cid:105) ≤ (cid:104) n c + o (1) (cid:105) then ω ( p ) ≤ c . For example, if (cid:104) n, n, n (cid:105) ≤ (cid:104) n c + o (1) (cid:105) then ω (2) ≤ c .Next we utilise a natural product structure on matrix multiplication tensorswhich is well known as the fact that block matrices can be multiplied block-wise.For tensors S and T one naturally deﬁnes a Kronecker product S ⊗ T generalizingthe matrix Kronecker product. Then the matrix multiplication tensors multiply like (cid:104) n , n , n (cid:105) ⊗ (cid:104) m , m , m (cid:105) = (cid:104) n m , n m , n m (cid:105) and the diagonal tensors multiplylike (cid:104) n (cid:105) ⊗ (cid:104) m (cid:105) = (cid:104) nm (cid:105) .We can thus say: if (cid:104) , , (cid:105) ⊗ n ≤ (cid:104) (cid:105) ⊗ cn + o ( n ) then ω (2) ≤ c . We now think of our problem as the problem of determining theoptimal asymptotic rate of transformation from (cid:104) (cid:105) to (cid:104) , , (cid:105) . Of course we cando similarly for values of p other than p = 2 , if we deal carefully with p that arenon-integer. For clarity we will in this section stick to p = 2 .In practice, as mentioned before, algorithms are obtained by reductions viaintermediate problems. This works as follows. Let T be any tensor, the intermediatetensor. Then clearly, if (cid:104) , , (cid:105) ⊗ n ≤ T ⊗ an + o ( n ) ≤ (cid:104) (cid:105) ⊗ abn + o ( n ) , (1)then ω (2) ≤ ab . The barrier we prove is a lower bound on ab depending on T andthe notion of reduction used in the inequality (cid:104) , , (cid:105) ⊗ n ≤ T ⊗ an + o ( n ) , which inthis section we take to be restriction.We obtain the barrier as follows. Imagine that F is a map from the set oftensors to the nonnegative real numbers that is ≤ -monotone, ⊗ -multiplicative and (cid:104) n (cid:105) -normalised, meaning that for any tensors S and T the following holds: if S ≤ T then F ( S ) ≤ F ( T ) ; F ( S ⊗ T ) = F ( S ) F ( T ) and F ( (cid:104) n (cid:105) ) = n . We apply F to bothsides of the ﬁrst inequality to get F ( (cid:104) , , (cid:105) ) ≤ F ( T ) a and so log F ( (cid:104) , , (cid:105) )log F ( T ) ≤ a Let G be another map from tensors to reals that is ≤ -monotone, ⊗ -multiplicativeand (cid:104) n (cid:105) -normalised. We apply G to both sides of the second inequality to get G ( T ) ≤ b nd so log G ( T ) ≤ b. We conclude that log F ( (cid:104) , , (cid:105) )log F ( T ) log G ( T ) ≤ ab. Our barrier is thus max

F,G log F ( (cid:104) , , (cid:105) )log F ( T ) log G ( T ) ≤ ab. where the maximisation is over the ≤ -monotone, ⊗ -multiplicative and (cid:104) n (cid:105) -normalisedmaps from tensors to reals.For tensors over the complex numbers, we know a family of ≤ -monotone, ⊗ -multiplicative and (cid:104) n (cid:105) -normalised maps from tensors to reals, the quantum func-tionals. For tensors over other ﬁelds, we know a family of maps with slightly weakerproperties, that are still suﬃcient to prove the barrier, the support functionals. Theorem.

Upper bounds on ω ( p ) obtained via the intermediate tensor T are atleast max F,G log( F ( (cid:104) , , (cid:105) ) F ( (cid:104) , , (cid:105) ) F ( (cid:104) , , (cid:105) ) p )log F ( T ) log G ( T ) where the maximisation is over all support functionals, or all quantum functionals. See Theorem 3.13 for the precise statement of the result and Section 1.3.1 forillustrations.

We discussed that, in practice, the best upper bound on, say, ω (2) is obtained by achain of inequalities of the form (cid:104) , , (cid:105) ⊗ n ≤ T ⊗ an + o ( n ) ≤ (cid:104) (cid:105) ⊗ abn + o ( n ) . (2)We utilised this structure to obtain the barrier. A closer look reveals that themethods used in practice have even more structure. Namely, they give an inequalitythat also has diagonal tensors on the left-hand side: (cid:104) (cid:105) ⊗ cn ⊗ (cid:104) , , (cid:105) ⊗ n ≤ T ⊗ an + o ( n ) ≤ (cid:104) (cid:105) ⊗ abn + o ( n ) . (3)Part of the tensor (cid:104) (cid:105) ⊗ abn + o ( n ) on the far right-hand side acts as a catalystsince (cid:104) (cid:105) ⊗ cn is returned on the far left-hand side. We obtain better barriers whenwe have a handle on the amount of catalyticity c that is used in the method (seethe schematic Fig. 3), again by applying maps F and G to both sides of the twoinequalities and deducing a lower bound on ab . The precise statement appears inTheorem 3.13. .5 1.0 1.5 2.02.02.22.42.62.83.03.2 Figure 3. This is the graph from Fig. 1 with arrows that indicate the inﬂuence of catalyticity.Roughly speaking, the barrier for CW (the yellow line) moves upwards when morecatalyticity is used. In Section 2 we discuss in more detail the methods that are used to constructrectangular matrix multiplication algorithms and the diﬀerent notions of reduction.In Section 3 we introduce and prove our barriers in the form of a generalframework, dealing formally with non-integer p . We also discuss how to analyse“mixed” intermediate tensors.In Section 4 we discuss how to compute the barriers explicitly using the supportfunctionals and we compute them for the Coppersmith–Winograd tensors CW q .

2. Algorithms

At the core of the methods that give the best upper bounds on ω ( p ) lies thefollowing theorem, which can be proven using the asymptotic sum inequality forrectangular matrix multiplication [LR83] and the monotonicity of ω ( p ) . Theorem 2.1.

Let m ≥ n p . If (cid:101) R( (cid:104) n, n, m (cid:105) ⊕ s ) ≤ r , then s n ω ( p ) ≤ r . Here ⊕ denotes the naturally deﬁned direct sum for tensors. The rank R( T ) of a tensor T is the smallest number n such that T ≤ (cid:104) n (cid:105) , or equivalently, thesmallest number n such that T ( x, y, z ) = (cid:80) ni =1 u i ( x ) v i ( y ) w i ( z ) where u i , v i , w i arelinear. The asymptotic rank (cid:101) R( T ) is deﬁned as the limit lim n →∞ R( T ⊗ n ) /n , whichequals the inﬁmum inf n R( T ⊗ n ) /n since tensor rank is submultiplicative under ⊗ and bounded.Equivalently, phrased in the language of the introduction, for m ≥ n p , if (cid:104) s (cid:105) ⊗ k ⊗ (cid:104) n, n, m (cid:105) ⊗ k ≤ (cid:104) r (cid:105) ⊗ k + o ( k ) (4)then sn ω ( p ) < r . In practice, the upper bound (cid:101) R( (cid:104) n, n, m (cid:105) ⊕ s ) ≤ r is obtained froma restriction (cid:104) s (cid:105) ⊗ k ⊗ (cid:104) n, n, m (cid:105) ⊗ k ≤ T ⊗ ak + o ( k ) for some intermediate tensor T andan upper bound on (cid:101) R( T ) . The restriction in (cid:104) s (cid:105) ⊗ k ⊗ (cid:104) n, n, m (cid:105) ⊗ k ≤ T ak + o ( k ) maybe replaced by other types of reductions that we will discuss below. eductions. We say S is a monomial restriction of T and write S ≤ M T if S can be obtained from T by setting some variables to zero. We say S is a monomialdegeneration of T and write S (cid:69) M T if S can be obtained from T by multiplyingthe variables by integer powers of ε so that S appears in the lowest ε -degree.Strassen’s application of the laser method uses monomial degenerations and themodiﬁcation of Coppersmith and Winograd [CW90] uses combinatorial restrictionswhere the variables zeroed out are chosen using a certain combinatorial gadget (aSalem–Spencer set). Degeneration is a very general reduction that generalises theabove reductions. We say S is a degeneration of T and write S (cid:69) T if S appears inthe lowest ε -degree in T ( A ( ε ) x, B ( ε ) y, C ( ε ) z ) for some linear maps A ( ε ) , B ( ε ) , C ( ε ) whose matrices have coeﬃcients that are Laurent polynomials in ε . Restriction ≤ is the special case of degeneration where the Laurent polynomials are constant. Coppersmith–Winograd intermediate tensors.

All improvements on ω ( p ) since Coppersmith and Winograd use the Coppersmith–Winograd tensors CW q deﬁned by CW q ( x, y, z ) = x y z q +1 + x y q +1 z + x q +1 y z + (cid:80) qi =1 ( x y i z i + x i y z i + x y i z i ) as intermediate tensors. Degeneration methods give (cid:101) R(CW q ) = q + 2 . Mixed Coppersmith–Winograd tensors.

Coppersmith [Cop97] combines CW q tensors with diﬀerent q ’s to upper bound ω ( p ) . We show how to use the barrier inthis situation in the full version. The best upper bounds in [LG12, LU18] do notmix q ’s.

3. Barriers

Let ≤ denote restriction on tensors as deﬁned in the introduction. We remark thateverything we discuss in this section also holds if ≤ is replaced with degenerationor monomial degeneration or monomial restriction.Let F : { tensors } → R ≥ be a map from all tensors to the reals. The statementswe prove in this section hold for F with certain special properties. Two familiesthat satisfy the properties are the quantum functionals (which we will not explicitlyuse in this paper — we refer to [CVZ18] for the deﬁnition) and the (upper) supportfunctionals. For concreteness, we will think of F as the support functionals. Wewill deﬁne the support functionals in the next section. For now, we will use thefollowing properties. Lemma 3.1 (Strassen [Str91]) . Any support functional F is (i) ≤ -monotone, (ii) ⊗ -submultiplicative, (iii) mamu- ⊗ -multiplicative: F is ⊗ -multiplicative for any two matrix multiplicationtensors, (iv) ⊕ -additive, v) at most (cid:101) R . More is known about the support functionals than Lemma 3.1. For example,they are multiplicative not only on the matrix multiplication tensors, but also on alarger family of tensors called oblique tensors.

Remark 3.2.

The statements in this section can be proven more generally forcertain preorders ≤ (including degeneration, monomial degeneration and monomialrestriction) and certain maps F : { tensors } → R ≥ . Here for concreteness wediscuss everything in terms of restriction and the support functionals. A precisediscussion will appear in the full version. p Recall that p is a nonnegative real number. To deal with p that are not integer wewill deﬁne a notational shorthand. We ﬁrst observe the following. Lemma 3.3.

Let m ≥ n p . Suppose that a ≥ is an integer such that a p is integer.Then F ( (cid:104) n, n, m (cid:105) ) ≥ F ( (cid:104) a, a, a p (cid:105) ) log a n . Proof.

For every rational number st < log a n we have F ( (cid:104) n, n, m (cid:105) ) = F ( (cid:104) n, n, m (cid:105) ⊗ t ) t = F ( (cid:10) n t , n t , m t (cid:11) ) t ≥ F ( (cid:104) a s , a s , a ps (cid:105) ) t = F ( (cid:104) a, a, a p (cid:105) ) st . From Lemma 3.3 follows that log a F ( (cid:104) a, a, a p (cid:105) ) is the same for any a with integerpower a p . We introduce a notation for dealing with this value without referring tothe set of possible values of a Deﬁnition 3.4.

We introduce a formal symbol (cid:104) , , p (cid:105) for each real p ≥ , whichwe call a quasitensor. If p = log a b for integers a and b , then we deﬁne F ( (cid:104) , , p (cid:105) ) = 2 log a F ( (cid:104) a,a,a p (cid:105) ) . Otherwise, we deﬁne F ( (cid:104) , , p (cid:105) ) = inf { F ( (cid:104) , , P (cid:105) ) : P ≥ p, P = log a b } . If p is integer, then the value of F on (cid:104) , , p (cid:105) as a tensor and as a quasitensorcoincide. Thus we identify the quasitensor (cid:104) , , p (cid:105) with the tensor (cid:104) , , p (cid:105) whenthe latter exists.Using this notation, Lemma 3.3 can be rephrased as follows. Lemma 3.5. If m ≥ n p , then F ( (cid:104) n, n, m (cid:105) ) ≥ F ( (cid:104) , , p (cid:105) ) log n . Lemma 3.6. F ( (cid:104) , , p (cid:105) ) = F ( (cid:104) , , (cid:105) ) F ( (cid:104) , , (cid:105) ) F ( (cid:104) , , (cid:105) ) p . roof. We have F ( (cid:104) a, , (cid:105) ) = F ( (cid:104) , , (cid:105) ) log a because if log a ≤ bc , then a c ≤ b and F ( (cid:104) a, , (cid:105) ) c ≤ F ( (cid:104) , , (cid:105) ) b , and if log a ≥ bc , then F ( (cid:104) a, , (cid:105) ) c ≥ F ( (cid:104) , , (cid:105) ) b .Analogous results hold for (cid:104) , a, (cid:105) and (cid:104) , , a (cid:105) .Suppose p = log a b . Then log F ( (cid:104) , , p (cid:105) ) = log a F ( (cid:104) a, a, b (cid:105) ) = log a (cid:2) F ( (cid:104) a, , (cid:105) ) F ( (cid:104) , a, (cid:105) ) F ( (cid:104) , , b (cid:105) ) (cid:3) = log F ( (cid:104) , , (cid:105) ) + log F ( (cid:104) , , (cid:105) ) + p log F ( (cid:104) , , (cid:105) ) . For arbitrary p the result follows by a continuity argument. Lemma 3.7. If m = n p + o (1) , then log n F ( (cid:104) n, n, m (cid:105) ) = log F ( (cid:104) , , p (cid:105) ) + o (1) .Proof. We have F ( (cid:104) n, n, m (cid:105) ) = F ( (cid:104) n, , (cid:105) ) F ( (cid:104) , n, (cid:105) ) F ( (cid:104) , , m (cid:105) ) and so log n F ( (cid:104) n, n, m (cid:105) ) = log F ( (cid:104) , , (cid:105) ) + log F ( (cid:104) , , (cid:105) ) + log n ( m ) log F ( (cid:104) , , (cid:105) )= log F ( (cid:104) , , p (cid:105) ) + o (1) F ( (cid:104) , , (cid:105) ) . T -method For any tensor T we deﬁne the notion of a T -method for upper bounds on ω ( p ) asfollows. Deﬁnition 3.8 ( T -method) . Suppose (cid:101) R( T ) ≤ r . Suppose we are given a collectionof inequalities (cid:104) n, n, m (cid:105) ⊕ s ≤ T ⊗ k with n p ≤ m . Then Theorem 2.1 gives the upperbound ω ( p ) ≤ ˆ ω ( p ) where ˆ ω ( p ) = inf { k log n r − log n s } where the inﬁmum is takenover all k, n, s appearing in the collection of inequalities. We then say ˆ ω ( p ) isobtained by a T -method .We say that the T -method is κ -catalytic if the set of values of n is unbounded,the bound ˆ ω ( p ) is not attained on any one reduction of the method (so ˆ ω ( p ) =lim inf { k log n r − log n s } ), and in any reduction we have s ≥ Cn κ for some con-stant C . Theorem 3.9.

Any upper bound ˆ ω ( p ) on ω ( p ) obtained by a T -method satisﬁes ˆ ω ( p ) ≥ log F ( (cid:104) , , p (cid:105) ) log (cid:101) R( T )log F ( T ) . Moreover, if the method is κ -catalytic, then ˆ ω ( p ) ≥ log F ( (cid:104) , , p (cid:105) ) log (cid:101) R( T )log F ( T ) + κ (cid:18) log (cid:101) R( T )log F ( T ) − (cid:19) . Proof.

It is enough to prove the inequality for one reduction T ⊗ k ≥ (cid:104) n, n, m (cid:105) ⊕ s with m ≥ n p , which gives an upper bound ˆ ω ( p ) = k log n (cid:101) R( T ) − log n s . sing Lemma 3.5 and superadditivity of F , we have F ( (cid:104) n, n, m (cid:105) ⊕ s ) ≥ sF ( (cid:104) n, n, m (cid:105) ) ≥ sF ( (cid:104) , , p (cid:105) ) log n . Therefore k log n F ( T ) ≥ log n F ( T ⊗ k ) ≥ log F ( (cid:104) , , p (cid:105) ) + log n s . For ˆ ω ( p ) we get ˆ ω ( p ) + log n s log F ( (cid:104) , , p (cid:105) ) + log n s ≥ k log n (cid:101) R( T ) k log n F ( T ) = log (cid:101) R( T )log F ( T ) . Since F ( T ) ≤ (cid:101) R( T ) , we have ˆ ω ( p ) + log n s ≥ log F ( (cid:104) , , p (cid:105) ) + log n s and therefore ˆ ω ( p )log F ( (cid:104) , , p (cid:105) ) ≥ ˆ ω ( p ) + log n s log F ( (cid:104) , , p (cid:105) ) + log n s . If the method is κ -catalytic, then log n s ≥ κ + O ( n ) , and as n → ∞ we have ˆ ω ( p ) + κ log F ( (cid:104) , , p (cid:105) ) + κ ≥ log (cid:101) R( T )log F ( T ) . This concludes the proof. T -method To cover the method that are used in practice we need the following notion.

Deﬁnition 3.10 (Asymptotic T -method.) . Let T be a tensor. Suppose (cid:101) R( T ) ≤ r .Suppose we are given a collection of inequalities (cid:104) n, n, m (cid:105) ⊕ s ≤ T ⊗ k where the valuesof n are unbounded and m ≥ f ( n ) for some function f ( n ) = n p + o (1) . Then ω ( p ) isat most ˆ ω ( p ) where ˆ ω ( p ) = lim inf { k log n r − log n s } where the limit is taken overall k, n, s appearing in the collection of inequalities as n → ∞ . We say ˆ ω ( p ) isobtained by an asymptotic T -method .We say that the asymptotic T -method is κ -catalytic if in any inequality wehave s ≥ Cn κ for some constant C . Remark 3.11.

This class of methods works because each reduction T ⊗ k ≥(cid:104) n, n, m (cid:105) ⊕ s gives an upper bound ω ( q ) ≤ k log n r − log n s where q = log m ≥ log f ( n ) → p . As the function ω ( p ) is continuous [LR83], we get the required boundon ω ( p ) in the limit. Remark 3.12.

The usual descriptions of the laser method applied to rectangularmatrix multiplication result in an asymptotic method because the constructioninvolves an approximation of a certain probability distribution by a rational proba-bility distribution. As a result of this approximation, the matrix multiplicationtensor constructed may have format slightly smaller than (cid:104) n, n, n p (cid:105) . Theorem 3.13.

Any upper bound ˆ ω ( p ) obtained by an asymptotic T -method satis-ﬁes ˆ ω ( p ) ≥ log F ( (cid:104) , , p (cid:105) ) log (cid:101) R( T )log F ( T ) nd for κ -catalytic methods, ˆ ω ( p ) ≥ log F ( (cid:104) , , p (cid:105) ) log (cid:101) R( T )log F ( T ) + κ (cid:18) log (cid:101) R( T )log F ( T ) − (cid:19) . Proof.

Suppose T k ≥ (cid:104) n, n, m (cid:105) ⊕ s . Then ˆ ω k,s,n,m = k log n (cid:101) R( T ) − log n s is an upperbound on ω ( p + o (1)) . Then, as in Theorem 3.9, we have ˆ ω k,s,n,m + log n s log n F ( (cid:104) n, n, m (cid:105) ) + log n s ≥ log (cid:101) R( T )log F ( T ) . Because F ( T ) ≤ (cid:101) R( T ) , both fractions are greater than and for ≤ A ≤ log n s itis true that ˆ ω k,s,n,m + A log n F ( (cid:104) n, n, m (cid:105) ) + A ≥ ˆ ω ( p ) k,s,n,m + log n s log n F ( (cid:104) n, n, m (cid:105) ) + log n s . As n → ∞ , we have log n F ( (cid:104) n, n, m (cid:105) ) ≥ log F ( (cid:104) , , p (cid:105) ) + o (1) and, if the methodis κ -catalytic, then log n s ≥ κ + o (1) . The upper bound ˆ ω ( p ) given by the methodis the limit lim inf ˆ ω k,s,n,m . Taking n → ∞ , we get the required inequalities. Coppersmith [Cop97] uses a combination of Coppersmith–Winograd tensors ofdiﬀerent format to get an upper bound on the rectangular matrix multiplication ex-ponent. More speciﬁcally, he considers a sequence of tensors CW ⊗ n ⊗ CW ⊗ (cid:98) . n (cid:99) .Our analysis applies to tensor sequences of this kind because their asymptoticbehaviour is similar to sequence of the form T ⊗ n in the sense of the following twolemmas. Lemma 3.14.

Let S and S be some tensors. Given functions f , f : N → N suchthat f i ( n ) = a i n + o ( n ) for some positive real numbers a , a , deﬁne a sequence oftensors T n = S ⊗ f ( n )1 ⊗ S ⊗ f ( n )2 . Then for each F the sequence n (cid:112) F ( T n ) is boundedfrom above.Proof. We have n (cid:112) F ( T n ) = n (cid:113) F ( S ⊗ f ( n )1 ⊗ S ⊗ f ( n )2 ) ≤ F ( S ) f n ) n F ( S ) f n ) n . The right-hand side converges to F ( S ) a F ( S ) a and, therefore, is bounded. Lemma 3.15.

Let S and S be some tensors. Given functions f , f : N → N suchthat f i ( n ) = a i n + o ( n ) for some positive real numbers a , a , deﬁne a sequence oftensors T n = S ⊗ f ( n )1 ⊗ S ⊗ f ( n )2 . Then the sequence n (cid:112)(cid:101) R( T n ) converges.Proof. For this, we need Strassen’s spectral characterization of the asymptoticrank [Str88]. Strassen deﬁnes the asymptotic spectrum of tensors X as the setof all ≤ -monotone, ⊗ -multiplicative, ⊕ -additive maps ξ from tensors to positivereals such that ξ ( u ⊗ v ⊗ w ) = 1 . Then X can be made into a compact Hausdorﬀ opological space such that the evaluation map ξ (cid:55)→ ξ ( T ) is continuous for all T ,and (cid:101) R( T ) = max ξ ∈ X ξ ( T ) , For ξ ∈ X we have n (cid:112) ξ ( T n ) = n (cid:113) ξ ( S ⊗ f ( n )1 ⊗ S ⊗ f ( n )2 ) = ξ ( S ) f n ) n ξ ( S ) f n ) n → ξ ( S ) a ξ ( S ) a . Because of compactness of X this convergence is uniform in ξ . Therefore n (cid:112)(cid:101) R( T n ) = n (cid:113) max ξ ∈ X ξ ( T n ) → max ξ ∈ X ξ ( S ) a ξ ( S ) a . Deﬁnition 3.16.

A sequence of tensors { T n } is called almost exponential if thesequences n (cid:112)(cid:101) R( T n ) converges and n (cid:112) F ( T n ) is bounded for each F . Abusing thenotation, we write (cid:101) R( { T n } ) := lim n (cid:112)(cid:101) R( T n ) and F ( { T n } ) := lim sup n (cid:112) F ( T n ) . Deﬁnition 3.17 (Asymptotic mixed method) . Let { T n } be an almost exponentialsequence of tensors with (cid:101) R( { T n } ) ≤ r . Suppose we are given a collection ofinequalities (cid:104) n, n, m (cid:105) ⊕ s ≤ T k where the values of n are unbounded and m ≥ f ( n ) for some f ( n ) = n p + o (1) . Then ω ( p ) is at most ˆ ω ( p ) = lim inf { k log n r − log n s } where the limit is taken over all k, n, s appearing in the collection of inequalities as n → ∞ . We say that ˆ ω ( p ) is obtained by an asymptotic mixed { T n } -method .We say that the asymptotic mixed { T n } -method is κ -catalytic if in each inequalitywe have s ≥ Cn κ for some constant C . Lemma 3.18.

Asymptotic mixed methods give true upper bounds on ω ( p ) .Proof. Note that for a ﬁxed tensor T k there are only a ﬁnite number of restrictions (cid:104) n, n, m (cid:105) ⊕ s ≤ T k possible as the left tensor is of format sn × snm × snm , whichshould be no greater than the format of T k . Thus, because in an asymptotic mixedmethod the set of values of n is unbounded, so is the set of values of k .For one restriction (cid:104) n, n, m (cid:105) ⊕ s ≤ T k we have the inequality sn ω (log n m ) ≤ (cid:101) R( T k ) ,that is, ω (log n m ) ≤ log n (cid:101) R( T k ) − log n s . Since log n m = p + o (1) and ω is acontinuous function and (cid:101) R( T k ) = ( (cid:101) R( { T k } ) + o (1)) k , we get in the limit the requiredinequality. Theorem 3.19.

Any upper bound ˆ ω ( p ) obtained by an asymptotic mixed { T n } -method satisﬁes ˆ ω ( p ) ≥ log F ( (cid:104) , , p (cid:105) ) log (cid:101) R( { T n } )log F ( { T n } ) and for κ -catalytic methods, ˆ ω ( p ) ≥ log F ( (cid:104) , , p (cid:105) ) log (cid:101) R( { T n } )log F ( { T n } ) + κ (cid:18) log (cid:101) R( { T n } )log F ( { T n } ) − (cid:19) . roof. Recall that for a ﬁxed T k the number of possible restrictions (cid:104) n, n, m (cid:105) ⊕ s ≤ T k is ﬁnite, as the left-hand side tensor has format sn × snm × snm , which shouldbe no greater than that of T k . Therefore, as n tends to inﬁnity, so does k .Consider now one restriction (cid:104) n, n, m (cid:105) ⊕ s ≤ T k . It gives the upper bound ˆ ω k,s,n,m := log n (cid:101) R( T k ) − log n s on ω ( p + o (1)) . As in previous theorems, we have ˆ ω k,s,n,m + log n s log n F ( (cid:104) n, n, m (cid:105) ) + log n s ≥ log (cid:101) R( T k )log F ( T k ) and ˆ ω k,s,n,m + A log n F ( (cid:104) n, n, m (cid:105) ) + A ≥ ˆ ω ( p ) k,s,n,m + log n s log n F ( (cid:104) n, n, m (cid:105) ) + log n s for any A such that ≤ A ≤ log n s .Consider the behaviour of the involved quantities as n and k tend to inﬁnity.Since m ≥ n p + o (1) , log n F ( (cid:104) n, n, m (cid:105) ) ≥ log F ( (cid:104) , , p (cid:105) ) + o (1) . For a catalyticmethod, we can choose A = κ + o (1) such that log n s ≥ A , and in general, we set A = 0 . Since k (cid:112)(cid:101) R( T k ) = (cid:101) R( { T k } ) + o (1) and k (cid:112) F ( T k ) ≤ F ( { T k } ) + o (1) , we have log (cid:101) R( T k )log F ( T k ) ≥ log (cid:101) R( { T k } )log F ( { T k } ) + o (1) . And ﬁnally, lim inf ˆ ω k,s,n,m is ˆ ω ( p ) . In the limit, we get the required inequalities. α The barriers for the lower bounds on the dual matrix multiplication exponent α follow from the barriers for upper bounds on ω ( p ) . A method can prove the lowerbound α ≥ ˆ α on α if it can prove ω ( ˆ α ) = 2 . For our barrier this means that log F ( (cid:104) , , ˆ α (cid:105) ) log (cid:101) R( T )log F ( T ) ≤ for all F . Using Lemma 3.6, we get that any lower bound ˆ α obtained by anasymptotic T -method satisﬁes ˆ α ≤ F ( T )log (cid:101) R( T ) log F ( (cid:104) , , (cid:105) ) − log F ( (cid:104) , , (cid:105) )log F ( (cid:104) , , (cid:105) ) for all F such that log F ( (cid:104) , , (cid:105) ) (cid:54) = 0 . Remark 3.20.

We note in passing that the matrix multiplication exponent ω and the dual exponent α are related via the inequality ω + ω α ≤ . Namely,from (cid:104)(cid:100) n α (cid:101) , n, n (cid:105) ≤ (cid:104) n o (1) (cid:105) , (cid:104) n, (cid:100) n α (cid:101) , n (cid:105) ≤ (cid:104) n o (1) (cid:105) and (cid:104) n, n, (cid:100) n α (cid:101)(cid:105) ≤ (cid:104) n o (1) (cid:105) it follows that (cid:104) n α , n α , n α (cid:105) ≤ (cid:104) n o (1) (cid:105) . Therefore, ω ≤ / (2 + α ) , and theclaim follows. . Numerical computation of barriers We will in this section show how to numerically evaluate the barrier of Theorem 3.13.We will compute explicit values for the Coppersmith–Winograd tensors.

Our main tool is a family of maps called the upper support functionals, introducedby Strassen in [Str91]. To deﬁne them, we will use the following notation. For n ∈ N let [ n ] := { , , . . . , n } . For any ﬁnite set A let P ( A ) be the set of probabilityvectors on A . For ﬁnite sets A , A , A and P ∈ P ( A × A × A ) let P i ∈ P ( A i ) be the i th marginal of P for i ∈ [3] . Let H ( P ) denote the Shannon entropy of P .Let F n × n × n be the set of 3-tensors of dimension n × n × n , viewed as 3-dimensionalarrays. For T ∈ F n × n × n let supp( T ) ⊆ [ n ] be the support of T .Let T ∈ F n × n × n . Let θ = ( θ , θ , θ ) ∈ P ([3]) . Deﬁne ζ θ ( T ) = min S ∼ = T max P ∈P (supp( S )) (cid:80) i ∈ [3] θ i H ( P i ) (5)where S goes over all tensors that can be obtained from T by a basis transformation,that is, S = T ◦ ( A, B, C ) where A, B, C are invertible linear maps. The map ζ θ iscalled the upper support functional . ζ θ ( (cid:104) a, b, c (cid:105) ) = a θ + θ b θ + θ c θ + θ Proof.

One veriﬁes this by a direct computation. See also [Str91].We obtain from Theorem 3.13 and Lemma 3.6 that any upper bound ˆ ω ( p ) on ω ( p ) obtained by asymptotic T -methods must satisfy ˆ ω ( p ) ≥ log ζ θ ( (cid:104) , , p (cid:105) )log ζ θ ( T ) log (cid:101) R( T ) , which gives ˆ ω ( p ) ≥ max θ θ + θ + θ + p ( θ + θ )log ζ θ ( T ) log (cid:101) R( T ) . (6) Before we talk about computations for CW q we brieﬂy discuss the standard way tomake use of symmetry in the optimisation problems that we need to solve. We willbe interested in computing max P ∈P (supp(CW q )) 3 (cid:88) i =1 θ i H ( P i ) . (7) ecall that the support of CW q is supp(CW q ) = { ( i, i, , ( i, , i ) , (0 , i, i ) : i ∈ [ q ] }∪ { (0 , , q + 1) , (0 , q + 1 , , ( q + 1 , , } . The symmetric group S q acts naturally on the support of CW q by permuting thelabel set [ q ] . Suppose P is feasible for (7). Then π · P for any π ∈ S q is feasible aswell and has the same value. Thus | S q | (cid:88) π ∈ S q π · P (8)is feasible and has at least the same value or better, by concavity of H . We maythus assume that P is constant on the six orbits of supp(CW q ) under the actionof S q , which are the sets { ( i, i,

0) : i ∈ [ q ] } , { ( i, , i ) : i ∈ [ q ] } , { (0 , i, i ) : i ∈ [ q ] } , { (0 , , q + 1) } , { (0 , q + 1 , } , and { ( q + 1 , , } . The same reasoning applieswhen CW q is replaced by any tensor with symmetry. CW q Taking into account the symmetry derived in Section 4.3, let P be the probability dis-tribution that gives probability p to (0 , i, i ) , probability p to ( i, , i ) , probability p to ( i, i, and probability r to ( q +1 , , , probability r to (0 , q +1 , and probabil-ity r to (0 , , q +1) where p , p , p , r , r , r ≥ and qp + qp + qp + r + r + r = 1 .The marginal probability vectors are P = ( qp + r + r , p + p , . . . , p + p , r ) (9) P = ( qp + r + r , p + p , . . . , p + p , r ) (10) P = ( qp + r + r , p + p , . . . , p + p , r ) . (11)By the grouping property of Shannon entropy, we have H ( P ) = (1 − qp − r − r )(log ( q ) + h ( r )) + h ( qp + r + r ) (12) H ( P ) = (1 − qp − r − r )(log ( q ) + h ( r )) + h ( qp + r + r ) (13) H ( P ) = (1 − qp − r − r )(log ( q ) + h ( r )) + h ( qp + r + r ) (14)and log ζ θ (CW q ) ≤ max p j ,r j (cid:88) i =1 θ i H ( P i ) (15)where p , p , p , r , r , r ≥ and qp + qp + qp + r + r + r = 1 . We know that (cid:101) R(CW q ) = q + 2 .The barrier we get for CW q is ˆ ω ( p ) ≥ max θ θ + ( p + 1)( θ + θ )log ζ θ (CW q ) log (cid:101) R(CW q ) (16) ≥ max θ θ + ( p + 1)( θ + θ )max p j ,r j (cid:80) i =1 θ i H ( P i ) log ( q + 2) , (17) hich is easy to evaluate numerically.As an illustration, we give in Table 1 the barriers for upper bounds on ω (2) viaasymptotic CW q -methods for small q by numerical optimisation. Optimal valueswere obtained for θ with θ = θ . q barrier θ θ = θ Table 1. Barriers for upper bounds on ω (2) via asymptotic CW q -methods for small q . Acknowledgements

MC and VL were supported by VILLUM FONDEN via the QMATH Centre ofExcellence under Grant No. 10059 and the European Research Council (Grantagreement No. 818761).FLG was supported by JSPS KAKENHI grants Nos. JP15H01677, JP16H01705,JP16H05853, JP19H04066 and by the MEXT Quantum Leap Flagship Program(MEXT Q-LEAP) grant No. JPMXS0118067394.JZ was supported by National Science Foundation under Grant No. DMS-1638352.

References [AFLG15] Andris Ambainis, Yuval Filmus, and François Le Gall. Fast matrixmultiplication: limitations of the Coppersmith-Winograd method (ex-tended abstract). In

Proceedings of the 47th Annual ACM Sympo-sium on Theory of Computing (STOC 2015) , pages 585–593, 2015. arXiv:1411.5414 .[Alm19] Josh Alman. Limits on the Universal Method for Matrix Multiplication.In

Proceedings of the 34th Computational Complexity Conference (CCC2019) , pages 12:1–12:24, 2019. arXiv:1812.08731 . AW18a] Josh Alman and Virginia Vassilevska Williams. Further Limitationsof the Known Approaches for Matrix Multiplication. In

Proceedingsof the 9th Innovations in Theoretical Computer Science Conference(ITCS 2018) , pages 25:1–25:15, 2018. arXiv:1712.07246 .[AW18b] Josh Alman and Virginia Vassilevska Williams. Limits on All Known(and Some Unknown) Approaches to Matrix Multiplication. In

Proceed-ings of the 59th Annual IEEE Symposium on Foundations of ComputerScience (FOCS 2018) , pages 580–591, 2018. arXiv:1810.08671 .[BCC + Discrete Anal. ,2017. arXiv:1605.06702 .[BCC + arXiv , 2017. arXiv:1712.02302 .[BCS97] Peter Bürgisser, Michael Clausen, and M. Amin Shokrollahi. Algebraiccomplexity theory , volume 315 of

Grundlehren Math. Wiss.

Springer-Verlag, Berlin, 1997.[Blä13] Markus Bläser.

Fast Matrix Multiplication . Number 5 in GraduateSurveys. Theory of Computing Library, 2013.[CLS19] Michael B. Cohen, Yin Tat Lee, and Zhao Song. Solving LinearPrograms in the Current Matrix Multiplication Time. In

Proceedingsof the 51st Annual ACM Symposium on Theory of Computing (STOC2019) , page 938–942, 2019. arXiv:1810.07896 .[Cop82] Don Coppersmith. Rapid Multiplication of Rectangular Matrices.

SIAM J. Comput. , 11(3):467–471, 1982.[Cop97] Don Coppersmith. Rectangular Matrix Multiplication Revisited.

J.Complexity , 13(1):42–49, 1997.[CVZ18] Matthias Christandl, Péter Vrana, and Jeroen Zuiddam. Universalpoints in the asymptotic spectrum of tensors. In

Proceedings of the50th Annual ACM Symposium on Theory of Computing (STOC 2018) ,pages 289–296, 2018. arXiv:1709.07851 .[CVZ19] Matthias Christandl, Péter Vrana, and Jeroen Zuiddam. Barriersfor Fast Matrix Multiplication from Irreversibility. In

Proceedings ofthe 34th Computational Complexity Conference (CCC 2019) , pages26:1–26:17, 2019. arXiv:1812.06952 .[CW90] Don Coppersmith and Shmuel Winograd. Matrix Multiplication viaArithmetic Progressions.

J. Symb. Comput. , 9(3):251–280, 1990. HP98] Xiaohan Huang and Victor Y. Pan. Fast Rectangular Matrix Multipli-cation and Applications.

J. Complexity , 14(2):257–299, 1998.[KZHP08] ShanXue Ke, BenSheng Zeng, WenBao Han, and Victor Y. Pan. Fastrectangular matrix multiplication and some applications.

Science inChina Series A: Mathematics , 51(3):389–406, 2008.[LG12] François Le Gall. Faster algorithms for rectangular matrix multi-plication. In

Proceedings of the 53rd Annual IEEE Symposium onFoundations of Computer Science (FOCS 2012) , pages 514–523, 2012. arXiv:1204.1111 .[LG14] François Le Gall. Powers of tensors and fast matrix multiplication.In

Proceedings of the 39th International Symposium on Symbolic andAlgebraic Computation (ISSAC 2014) , pages 296–303, 2014. arXiv:1401.7714 .[LR83] Grazia Lotti and Francesco Romani. On the Asymptotic Complexity ofRectangular Matrix Multiplication.

Theor. Comput. Sci. , 23:171–185,1983.[LSZ19] Yin Tat Lee, Zhao Song, and Qiuyi Zhang. Solving Empirical RiskMinimization in the Current Matrix Multiplication Time. In

Conferenceon Learning Theory (COLT 2019) , pages 2140–2157, 2019. arXiv:1905.04447 .[LU18] François Le Gall and Florent Urrutia. Improved Rectangular MatrixMultiplication using Powers of the Coppersmith-Winograd Tensor. In

Proceedings of the 29th Annual ACM-SIAM Symposium on DiscreteAlgorithms (SODA 2018) , pages 1029–1046, 2018. arXiv:1708.05622 .[Sto10] Andrew James Stothers.

On the complexity of matrix multiplication .PhD thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/4734 .[Str69] Volker Strassen. Gaussian elimination is not optimal.

NumerischeMathematik , 13(4):354–356, 1969.[Str88] Volker Strassen. The asymptotic spectrum of tensors.

J. reine angew.Math. , 384:102–152, 1988.[Str91] Volker Strassen. Degeneration and complexity of bilinear maps: someasymptotic spectra.

J. Reine Angew. Math. , 413:127–180, 1991.[Wil12] Virginia Vassilevska Williams. Multiplying matrices faster thanCoppersmith-Winograd (extended abstract). In

Proceedings of the44th Annual ACM Symposium on Theory of Computing (STOC 2012) ,pages 887–898, 2012. atthias Christandl University of Copenhagen,Universitetsparken 5, 2100 Copenhagen Ø, DenmarkEmail: [email protected]

François Le Gall

Nagoya University,Furocho, Chikusaku, Nagoya Aichi 464-8602, JapanEmail: [email protected]

Vladimir Lysikov

University of Copenhagen,Universitetsparken 5, 2100 Copenhagen Ø, DenmarkEmail: [email protected]

Jeroen Zuiddam