[PDF] Kronecker Products, Low-Depth Circuits, and Matrix Rigidity

Abstract

For a matrix M and a positive integer r, the rank r rigidity of M is the smallest number of entries of M which one must change to make its rank at most r. There are many known applications of rigidity lower bounds to a variety of areas in complexity theory, but fewer known applications of rigidity upper bounds. In this paper, we use rigidity upper bounds to prove new upper bounds in a few different models of computation. Our results include: \bullet For any d> 1, and over any field \mathbb{F}, the N \times N Walsh-Hadamard transform has a depth-d linear circuit of size O(d \cdot N^{1 + 0.96/d}). This circumvents a known lower bound of \Omega(d \cdot N^{1 + 1/d}) for circuits with bounded coefficients over \mathbb{C} by Pudl\'ak (2000), by using coefficients of magnitude polynomial in N. Our construction also generalizes to linear transformations given by a Kronecker power of any fixed 2 \times 2 matrix. \bullet The N \times N Walsh-Hadamard transform has a linear circuit of size \leq (1.81 + o(1)) N \log_2 N, improving on the bound of \approx 1.88 N \log_2 N which one obtains from the standard fast Walsh-Hadamard transform. \bullet A new rigidity upper bound, showing that the following classes of matrices are not rigid enough to prove circuit lower bounds using Valiant's approach: - for any field \mathbb{F} and any function f : \{0,1\}^n \to \mathbb{F}, the matrix V_f \in \mathbb{F}^{2^n \times 2^n} given by, for any x,y \in \{0,1\}^n, V_f[x,y] = f(x \wedge y), and - for any field \mathbb{F} and any fixed-size matrices M_1, \ldots, M_n \in \mathbb{F}^{q \times q}, the Kronecker product M_1 \otimes M_2 \otimes \cdots \otimes M_n. This generalizes recent results on non-rigidity, using a simpler approach which avoids needing the polynomial method.

Full PDF

aa r X i v : . [ c s . D S ] F e b Kronecker Products, Low-Depth Circuits, and Matrix Rigidity

Josh Alman ∗ February 25, 2021

Abstract

1, and over any ﬁeld F , the N × N Walsh-Hadamard transform has a depth- d linear circuit of size O ( d · N . /d ). This circumvents a known lower bound of Ω( d · N /d ) for circuits with bounded coeﬃcients over C [Pud00], by using coeﬃcients ofmagnitude polynomial in N . Our construction also generalizes to linear transformationsgiven by a Kronecker power of any ﬁxed 2 × • The N × N Walsh-Hadamard transform has a linear circuit of size ≤ (1 .

81 + o (1)) N log N ,improving on the bound of ≈ . N log N which one obtains from the standard fastWalsh-Hadamard transform. • A new rigidity upper bound, showing that the following classes of matrices are not rigidenough to prove circuit lower bounds using Valiant’s approach: – for any ﬁeld F and any function f : { , } n → F , the matrix V f ∈ F n × n given by,for any x, y ∈ { , } n , V f [ x, y ] = f ( x ∧ y ), and – for any ﬁeld F and any ﬁxed-size matrices M , . . . , M n ∈ F q × q , the Kronecker product M ⊗ M ⊗ · · · ⊗ M n .This generalizes recent results on non-rigidity, using a simpler approach which avoidsneeding the polynomial method. • New connections between recursive linear transformations like Fourier and Walsh-Hadamardtransforms, and circuits for matrix multiplication. ∗ Harvard University. [email protected] . Supported by a Michael O. Rabin postdoctoral fellowship.

Introduction

For a matrix M and a positive integer r , the rank r rigidity of M , denoted R M ( r ), is the smallestnumber of entries of M which one must change to make its rank at most r . Matrix rigidity wasintroduced by L. Valiant [Val77] as a tool for proving low-depth circuit lower bounds. He showedthat for any family { M N } N ∈ N of matrices with M N ∈ F N × N , if R M N ( O ( N/ log log N )) ≥ N ε forany ﬁxed ε >

0, then the linear transformation which takes as input a vector x ∈ F N and outputs M N x cannot be computed by an arithmetic circuit of size O ( N ) and depth O (log N ). We say M N is Valiant-rigid if it satisﬁes this rigidity lower bound. It remains a major open problem to prove thatany explicit family of matrices cannot be computed by circuits of size O ( N ) and depth O (log N ),and one of the most-studied approaches to this problem is to try to construct an explicit family ofValiant-rigid matrices.Many researchers have subsequently shown that rigidity lower bounds for explicit matrices,both in this parameter regime and others, would lead to new lower bounds in a variety of areas,including in arithmetic complexity, communication complexity, Boolean circuit complexity, andcryptography. We refer the reader to [Lok09] for more on the background and known applicationsof matrix rigidity. However, despite 40+ years of eﬀorts, and plenty of known applications, thereare no known fully explicit constructions of rigid matrices.A recent line of work [AW17, DE19, DL19] has instead shown that a number of families ofexplicit matrices are in fact not Valiant rigid, including the Walsh-Hadamard transform [AW17]and the discrete Fourier transform [DL19]. These had been some of the most-studied candidaterigid matrices, which are now ruled out for proving lower bounds using this approach. This raisesthe question: Do these rigidity upper bounds imply any other interesting upper bounds? Althoughthere are many results showing that rigid matrices imply a variety of lower bounds, there are fewknown connections showing that rigidity upper bounds would yield new algorithms or circuits.In this paper, we give new upper bounds in a few diﬀerent models which make use of recentrigidity upper bounds. Some of them apply rigidity upper bounds directly, while others are inspiredby the proof techniques of recent rigidity upper bounds. We begin by studying linear circuits for computing a linear transformation M ∈ F N × N . Theseare circuits in which the inputs are the N entries of a vector x ∈ F N , the outputs must be the N entries of M x , and each gate computes an F -linear combination of its inputs. We focus on low-depth circuits with unbounded fan-in gates, so we measure their size by the number of wires in thecircuit. A special type of linear circuit which we focus on is a synchronous linear circuit, in whichthe inputs to each gate must all have the same depth. One can see that a synchronous linear circuitof size s and depth d for M corresponds to d matrices M , . . . , M d such that M = M × · · · × M d and nnz( M ) + · · · + nnz( M d ) = s , where nnz( A ) denotes the number of nonzero entries in matrix A . A depth d linear circuit can be converted into a depth d synchronous linear circuit with amultiplicative size blowup of only d .Rigidity upper bounds naturally give depth-2 linear circuit constructions. Indeed, it is not hardto see that any M ∈ F N × N has a depth-2 linear circuit of size O ( N · rank( M )), and a depth-1 linearcircuit of size O (nnz( M )), and hence, for any r , a depth-2 linear circuit of size O ( N · r + R M ( r )).Thus, for instance, letting H n denote the N × N Walsh-Hadamard transform for N = 2 n , using We say { M N } N ∈ N with M N ∈ F N × N is explicit if there is an algorithm which, on input N , outputs M N inpoly( N ) deterministic time. R H n ( N − Θ( ε / log (1 /ε )) ) ≤ N ε for any ε > δ > H n has a depth-2 linear circuit of size O ( N − δ ).However, there is actually a smaller and simpler circuit known for H n . Using an approach similarto the fast Walsh-Hadamard Transform, we can see that for any d , H n has a depth- d synchronouslinear circuit of size only O ( d · N /d ). (The circuit involves, at each depth, computing N − /d independent copies of the N /d × N /d Walsh-Hadamard transform H n/d .) Thus, H n has a depth-2circuit of size only O ( N . ), which is much better than O ( N − δ ). Despite a fair bit of work by theauthor, it is unclear how to use the rigidity upper bound of [AW17] to improve on O ( N . ).Nonetheless, we are able to construct smaller circuits for H n , as well as any other family oftransforms deﬁned as the Kronecker power of a ﬁxed matrix, by making use of new, diﬀerentrigidity upper bounds for H n . For a ﬁxed 2 × M = (cid:20) a bc d (cid:21) over a ﬁeld F , the family of Kronecker powers of M , denoted by M ⊗ n ∈ F n × n , is deﬁned recursivelyby M ⊗ = M , and for n ≥ M ⊗ ( n +1) = (cid:20) a · M ⊗ n b · M ⊗ n c · M ⊗ n d · M ⊗ n (cid:21) . For instance, the 2 n × n Walsh-Hadamard transform H n is deﬁned as H n := H ⊗ n , where H := (cid:20) − (cid:21) . Kronecker powers arise naturally in many settings. For instance, when M = (cid:20) ω (cid:21) for some element ω ∈ F , then the linear transformation M ⊗ n corresponds to evaluating an n -variatemultilinear polynomial over F on all inputs in { , ω } n .Our main result is as follows: Theorem 1.1.

Let F be any ﬁeld, and let M ∈ F × be any matrix over F . There is a constant ε > . such that, for any positive integers n, d , the linear transformation M ⊗ n ∈ F N × N for N = 2 n has a depth- d synchronous linear circuit of size ε · d · N − ε ) /d . When M = H , so that M ⊗ n is the Walsh-Hadamard transform H n , we can improve the bound to ε > . . Our new result shows that H n has a depth-2 linear circuit of size only O ( N . ), and moregenerally improves the size of a depth- d linear circuit for H n or any n th Kronecker power when d < o (log n ). When d divides n , we can improve the upper bound to d · N − ε ) /d , removing the2 ε factor. This construction may be of practical interest, as it improves on the previous bound of d · N /d , even for small constant values of N and d .Theorem 1.1 is also particularly interesting when compared to a lower bound of Pudl´ak [Pud00]against low-depth linear circuits with bounded coeﬃcients for computing H n over C . Recall that ina linear circuit over C , each gate computes a C -linear combination of its inputs. For a positive realnumber c , we say the circuit has c -bounded coeﬃcients if, for each gate, the coeﬃcients of the linearcombination are complex numbers of magnitude at most c . Motivated by the fact that the best2nown linear circuits for many important linear transformations, including the Walsh-Hadamardtransform and the discrete Fourier transform, use only 1-bounded coeﬃcients (prior to this paper), aline of work [Mor73, Cha94, Lok01, NW96, Pud00, BL04, Raz02] (see also [Lok09, Section 3.3]) hasshown strong, often tight lower bounds for linear circuits with bounded coeﬃcients. Pudl´ak [Pud00]showed that the aforementioned circuit of depth d and size O ( d · N /d ) is optimal for boundedcoeﬃcient circuits: Theorem 1.2 ([Pud00]) . Any depth d synchronous linear circuit with c -bounded coeﬃcients forcomputing the Walsh-Hadamard transform H n ∈ C N × N for N = 2 n has size ≥ d · N /d /c . Our Theorem 1.1 circumvents this lower bound by using large coeﬃcients. Indeed, we will seethat over F = C , we use coeﬃcients which are integers of magnitude up to N O (1) . That said,it should be noted that, since our coeﬃcients are only O (log N )-bit integers, the additional timerequired to do the arithmetic for the coeﬃcients of our circuit is still negligible compared to thecircuit size savings in any reasonable model of computation.To our knowledge, this is the ﬁrst non-trivial upper bound surpassing one of the aforementionedbounded-coeﬃcient lower bounds. This shows that using larger coeﬃcients can make a substantialdiﬀerence in the circuit size required, even when computing the linear transformation of a matrixwhose entries are all in {− , } . At the same time, it is interesting to note that our Theorem 1.1works over any ﬁeld, even a constant-sized ﬁnite ﬁeld like F where there are no ‘large’ coeﬃcients.One could have imagined that overcoming bounded-coeﬃcient lower bounds, when possible, requires using an inﬁnite ﬁeld and large coeﬃcients, but at least in this setting, that is not the case.Our proof of Theorem 1.1 begins with a new general framework for designing smaller low-depthcircuits for recursively-deﬁned families of matrices like H n . We show that a nontrivial synchronouscircuit construction for any ﬁxed matrix in the family leads to a smaller circuit for every matrix inthe family. Lemma 1.3.

Let M ∈ F q × q be a q × q matrix over any ﬁeld F , and suppose there are matrices A , . . . , A d such that M = Q dj =1 A j and nnz( A i ) ≤ q c for all i ∈ [ d ] . Then, for every positiveinteger n , letting N = q n , the N × N matrix M ⊗ n has a depth- d synchronous linear circuit of size O ( N c ) . Lemma 1.3 follows by simply calculating how taking a Kronecker power changes the given cir-cuit for M , but it is nonetheless conceptually interesting: in order to design a small circuit forthe entire family of matrices M ⊗ n , it suﬃces to design one for any ﬁxed matrix in the family.Lemma 1.3 is similar to the approach for designing matrix multiplication algorithms spearheadedby Strassen [Str69], where an identity for quickly multiplying ﬁxed size matrices implies asymptoticimprovements for multiplying matrices of any sizes. Our proof was inspired by this, as Kroneckerproducts also play a central role in the deﬁnition and study of matrix multiplication tensors.We then use rigidity upper-bounds for the q × q matrix M to construct ﬁxed upper bounds.One can see by concatenating the two parts of a non-rigidity expression for M that, for any rank r , we can ﬁnd matrices B, C with M = B × C , nnz( B ) = q ( r + 1), and nnz( C ) = q · r + R M ( r ).We can ‘symmetrize’ this construction using a Kronecker product trick, then apply Lemma 1.3 toyield: Lemma 1.4.

Let M ∈ F q × q be a q × q matrix over any ﬁeld F , and ≤ r ≤ q be any rank, anddeﬁne c := log q (( r + 1) · ( r + R M ( r ) /q )) . Then, for any positive integer n , setting N = q n , the N × N matrix M ⊗ n has a depth- d synchronouscircuit of size O ( d · N c/d ) . H m for a ﬁxed m can give nontrivial low-depth circuit upperbounds for H n for all n . Unfortunately, we cannot simply substitute in the rigidity upper boundof [AW17] to prove our result. Indeed, to achieve c < q × q matrix H m for q = 2 m , it is not hard to see that we need r < √ q . By comparison, thebound from [AW17] is primarily interesting for higher rank r > q − ε ′ for small ε ′ >

0. Otherknown constructions, including those from probabilistic polynomials [AW15], do not seem to givea nontrivial bound here either. Instead, to prove our upper bound, we use a new rigidity upperbound for H n for rank r = 1, and more speciﬁcally, Theorem 1.1 ultimately follows from a newconstruction we give for the 16 ×

16 matrix H showing that R H (1) ≤ M ⊗ n for all n : Lemma 1.5.

Let M ∈ F q × q be a q × q matrix over any ﬁeld F , and suppose there are matrices A , . . . , A d such that M = Q dj =1 A j , which is nontrivial in the sense that Q dj =1 nnz( A j ) ≤ q d + c ′ forsome c ′ < . Then, for every positive integer n , letting N = q n , the N × N matrix M ⊗ n has adepth- d synchronous circuit of size O ( N c/d ) for a constant c < which depends only on c ′ . Note that one could achieve c ′ = 1 in Lemma 1.5 trivially by picking A = M and A = · · · = A d = I q , the q × q identity matrix. Lemma 1.5 shows that any construction which improves on thisat all leads to an asymptotically smaller circuit for M ⊗ n . While Lemma 1.3 required that each A i has nnz( A i ) < q /d , Lemma 1.5 instead only requires that the geometric mean of all the nnz( A i )is less than q /d . However, it results in a slightly worse ﬁnal size bound, which is why we useLemma 1.3 to prove Theorem 1.1. It is natural to ask next whether our techniques can be used to overcome other bounded-coeﬃcientlower bounds. We discuss a few more:

Unbounded-Depth Circuits for H n Pudl´ak [Pud00] also showed a lower bound against unbounded-depth bounded-coeﬃcient synchronous linear circuits for computing H n . Theorem 1.6 ([Pud00]) . Any synchronous linear circuit with c -bounded coeﬃcients for computingthe Walsh-Hadamard transform H n ∈ C N × N for N = 2 n has size ≥ e · log e (2) c N log N . For c = 1 (as is the case in all previous circuits for H n ), this gives a lower bound of e · log e (2) · N log N ≈ . · N log N . This is known to be tight, as optimizing for d in the usual fast Walsh-Hadamard transform gives a matching upper bound. In fact, we give a new construction whichalso beats this lower bound, although only by a constant factor. Theorem 1.7.

Let F be any ﬁeld, and let M ∈ F × be any matrix over F . There is a constant ε > . such that, for any positive integer n , the linear transformation M ⊗ n ∈ F N × N for N = 2 n has a synchronous linear circuit of size (1 − ε + o (1)) · e · log e (2) · N log N . When M = H ,so that M ⊗ n is the Walsh-Hadamard transform H n , we can improve the bound to ε > . . It is no coincidence that our bounds on ε in Theorem 1.7 are the same as those in Theorem 1.1:We prove Theorem 1.7 by introducing a gadget which increases the depth in Theorem 1.1 but4emoves the additional unwanted 2 ε term in the circuit size (which would otherwise impact ourconstant-factor savings), and then optimizing over all choices of d .Of course, it would be much more exciting to design a circuit of size o ( N log N ) for H n , but thatis currently beyond our techniques. That said, we believe Theorem 1.7 gives the ﬁrst improvementof any kind on the standard fast Hadamard transform for computing H n , and we are optimisticthat further improvements are possible. Circuits for the Fourier Transform

Pudl´ak showed that both Theorem 1.2 and Theorem 1.6also hold for the Discrete Fourier transform , F N ∈ C N × N . Can our approach be used to beat theselower bounds as well? We remark that F N is actually too rigid for our approach using Lemma 1.4to apply to overcome this bound. Interestingly, the rigidity lower bound we use to show this isnot the asymptotically best known bound of R F N ( r ) ≥ Ω( N r log( N/r )), but instead the bound R F N ( r ) ≥ ( N − r ) / ( r + 1) [Shp99] which has better known constant factors for small r .It should be noted that we do not rule out the existence of o ( d · N /d ) size depth- d linearcircuits for F N , or even rule out that Lemma 1.3 could be used to construct such circuits. How-ever, an approach diﬀerent from our non-rigidity approach would be needed to give the nontrivialconstruction needed by Lemma 1.3. Matrix Multiplication

Raz [Raz02] showed that any bilinear circuit with bounded coeﬃcientsfor computing the product of two N × N matrices over C requires size Ω( N log N ). This is notknown to be tight: the best known circuit for N × N × N matrix multiplication has size N ω + o (1) where ω ≤ .

373 [Wil12, LG14, AW21] is the matrix multiplication exponent. That said, as we willdiscuss soon in more detail in Section 1.4, there is a strong connection between this lower boundand the aforementioned bounded-coeﬃcient lower bounds: if one could surpass Raz’s lower boundand design an o ( N log N ) size circuit for matrix multiplication, it would lead to linear circuits ofsize o ( N log N ) for both the N × N discrete Fourier transform and the N × N Walsh-Hadamardtransform, as well as many related linear transformations.

Our next upper bound is a new non-rigidity result, which generalizes and sheds new light on thenon-rigidity of the Walsh-Hadamard transform [AW17]. We focus on two families of matrices M which generalize H n .1. Matrices M ∈ F q n × q n of the form M = N nℓ =1 M i for positive integers q, n and any matrices M , . . . , M n ∈ F q × q (where ⊗ denotes the Kronecker product). Kronecker power matrices like H n which we discussed earlier are of this form with M = M = · · · = M n , but here we alsoallow for diﬀerent choices of the matrices M , . . . , M n .2. Matrices M ∈ F q n × q n whose entries are given by, for x, y ∈ { , , . . . , q − } n : M [ x, y ] = f (max { x [1] , y [1] } , max { x [2] , y [2] } , max { x [3] , y [3] } , . . . , max { x [ n ] , y [ n ] } )for any function f : { , , . . . , q − } n → F . For instance, H n is of this form with q = 2 when f is the parity function, but we also allow for more complicated choices of f . Morgenstern [Mor73] ﬁrst showed such a result for linear circuits which need not be synchronous, with slightlylower leading constant factors. heorem 1.8. Any matrix of either of the above forms with q ≤ O (log n ) is not Valiant-rigid.More precisely, setting N = q n , any such M satisﬁes, for any suﬃciently small ε > : R M ( N − q q · O ( ε / log (1 /ε )) ) ≤ N ε . The constant hidden by the O in Theorem 1.8 is not too small; for instance, we show that when q = 2, any such M has R M ( O ( N . )) < o ( N ).Theorem 1.8 shows that it was not just a ‘coincidence’ that H n is not rigid, but in fact a numberof big families of matrices generalizing H n are also not rigid. It, of course, rules out the Valiant-rigidity approach for proving circuit lower bounds for any of these linear transformations. We nowdiscuss the two families of matrices in some more detail.1. Aside from being a natural generalization of H n , Kronecker products like this are ubiquitousin many areas of computational science (see e.g. [VL00]). The non-rigidity of these matricesis also interesting compared with our observation which we discuss in detail in the upcomingSection 1.4 that: if there are Valiant-rigid matrices in this family for any ﬁxed n and growing q , then we would get a lower bound for N × N × N n matrix multiplication. By comparison,Theorem 1.8 shows there are no Valiant-rigid matrices in this family for ﬁxed q and growing n . The diﬀerence between this family of matrices when n is growing versus when q is growingis not unlike the diﬀerence between the families of Walsh-Hadamard transforms and Fouriertransforms (which are both Hadamard matrices for diﬀerent choices of which of the twodeﬁning parameters is growing). Perhaps the techniques of [DL19] for showing that Fouriertransforms are not rigid could help to approach this other setting.2. As noticed by [AW17], matrices of this form for diﬀerent choices of the function f : { , , . . . , q − } n → F arise frequently in ﬁne-grained complexity, especially in the case q = 2. In fact, thebest known algorithms for a number of diﬀerent problems have used, as their key insight, thatthis type of matrix M is not rigid, including the Orthogonal Vectors problem [AWY14] (for f = AN D ), All-Pairs Shortest Paths [Wil14] (also for f = AN D ), and Hamming NearestNeighbors [AW15, ACW16] (for f = M AJ ORIT Y ). These algorithms all use the ‘polyno-mial method’ to show that M is not rigid in a low-rank, high-error regime, but it is unclearhow to extend them to less structured functions f . By comparison, Theorem 1.8 shows that M is not rigid in a higher-rank, lower-error regime, and it applies to any function f .In fact, in addition to these aforementioned algorithms, all the prior work on showing that ma-trices of interest are not Valiant-rigid [AW17, DE19, DL19] has used the polynomial method.For instance, the previous proof of the non-rigidity of the Walsh-Hadamard transform [AW17]critically used the fact that the corresponding function f = P ARIT Y has low-degree poly-nomial approximations (which are correct on most inputs) over any ﬁeld. Our rigidity upperbound does not use the polynomial method (at least explicitly), and applies to any function f without any restriction on how well it can be approximated by polynomials. In other words,this central property of f that was used by prior work is actually unnecessary for provingthat M is not Valiant-rigid.Our proof of Theorem 1.8 in the case q = 2 is actually quite simple, and it simpliﬁes the previousproof of the non-rigidity of the Walsh-Hadamard transform. Inspired by Dvir and Liu [DL19], whofrequently make use of the fact that the product of a constant number of matrices which are notValiant-rigid is, itself, not Valiant-rigid (see Lemma 2.11 below), we begin by noticing that anymatrix M from either of the two families can be written as M = D × R n × D ′ × R n × D ′′ , (1)6here D, D ′ , D ′′ ∈ F n × n are three carefully-chosen diagonal matrices (which are evidently notValiant-rigid), and R n ∈ { , } n × n is the disjointness matrix, given by R n := R ⊗ n where R := (cid:20) (cid:21) . Thus, to show that any such M is not Valiant-rigid, it suﬃces to show that R n is not Valiant-rigid.However, this is not too diﬃcult, since R n is a fairly sparse matrix to begin with! Indeed, R n isa 2 n × n matrix, but has only 3 n nonzero entries. Moreover, most of these nonzero entries areconcentrated in a few rows and columns: for each integer 0 ≤ k ≤ n , the matrix R n has (cid:0) nk (cid:1) rows(or columns) with 2 k nonzero entries. Using standard bounds on binomial coeﬃcients, we thus seethat, by removing only the 2 n (1 − Θ( ε / log (1 /ε ))) densest rows and columns of R n , we are left with amatrix with only 2 n · ε nonzero entries per row or column. Since changing a single row or column ofa matrix is a rank-1 update, this shows that R n is not Valiant-rigid as desired.Extending this result to larger q is quite a bit more involved. Let us focus for now on family1 of matrices above (Kronecker products of n diﬀerent q × q matrices); the proof for family 2 issimilar. We will proceed by induction on q . Our starting point is the remark that any q × q matrix M i can be written as the sum of a q × q rank-1 matrix J i , and a ( q − × ( q −

1) matrix L i (paddedwith a row and column of 0s). For instance, in the case q = 3 we have (assuming the top-left entry a is nonzero):  a b cd e fg h i  =  a b cd bda bca g bga bca  +  e − bda f − bca h − bga i − bca  . We have now written M i = J i + L i , and we know that N ni =1 J i is not Valiant-rigid (in fact, it hasrank 1), and N ni =1 L i is not Valiant-rigid, even when thought of as a ( q − n × ( q − n matrix,by the inductive hypothesis. This does not imply that N ni =1 M i is not Valiant-rigid on its own,however, because there are cross-terms: n O i =1 M i = n O i =1 ( J i + L i ) = X K ⊆{ , ,...,n } n O i =1 ([ i ∈ K ] ? L i : J i )(Here, we are using ([ i ∈ K ] ? L i : J i ) as the ternary operator, which equals L i when i ∈ K , andequals J i when i / ∈ K ). For any particular K , the matrix M K := N ni =1 ([ i ∈ K ] ? L i : J i ) can beseen as the Kronecker product of a q | K | × q | K | matrix of rank 1, and a q n −| K | × q n −| K | matrix which,by the inductive hypothesis, is not Valiant-rigid. It can be shown (see e.g. [DL19, Section 6]) thatthe Kronecker product of matrices which are not Valiant-rigid is itself not Valiant-rigid, and hencethat M K is not Valiant-rigid. However, this is still not suﬃcient: we have now only expressed M as the sum of 2 n matrices which are not Valiant-rigid, but whose sum might still be.We instead ﬁrst perform a number of low-rank updates to M to simplify the problem. Weﬁrst subtract away all the matrices M K for which | K | is not close to ( q − n/q . Next, we removeall rows and columns corresponding to x ∈ { , , . . . , q − } n for which nnz( x ) is not close to( q − n/q . Finally, we observe that each remaining row of M only intersects with a nonzero rowof q O ( ε · n ) diﬀerent choices of remaining matrices M K (compared with q n before). Hence, the factthat each M K is not Valiant-rigid implies our desired non-rigidity, as the sparsity per row is nowonly multiplied by q O ( ε · n ) . We have, of course, glossed over many important and intricate aspectsof the proof; we refer the reader to Section 7 for the details.We brieﬂy remark that the techniques for manipulating Kronecker products used by Dvir andLiu [DL19] do not appear suﬃcient to prove our Theorem 1.8. They observed that the Kronecker7roduct of matrices M , . . . , M n which are not Valiant-rigid is itself not Valiant-rigid. In particular,they begin with a decomposition M i = J i + L i where J i has low rank like in our setting, but theyfurther assume that L i is very sparse. In our case, M , . . . , M n are arbitrary matrices, and may allbe very rigid on their own, and so a more intricate argument seems necessary. As we previously mentioned, Raz [Raz02] showed that any bilinear circuit with bounded coeﬃcientsfor computing the product of two N × N matrices over C requires size Ω( N log N ). A key insightbehind Raz’s lower bound is that, for a ﬁxed matrix A ∈ F N × N , the following two problems areequivalent: • Given as input a matrix B ∈ F N × N , output the matrix A × B . • Given as input a vector b ∈ F N , output the linear transformation ( I N ⊗ A ) b .In particular, if one could show that there is any matrix A ∈ F N × N for which the linear transfor-mation I N ⊗ A ∈ F N × N does not have O ( N ) size circuits, then N × N × N matrix multiplicationdoes not have O ( N ) size circuits. One intriguing avenue toward showing this is to show thatthere exists an A ∈ F N × N such that I N ⊗ A is Valiant-rigid . In contrast with the usual setting inmatrix rigidity, here, to show a lower bound against a particular problem (matrix multiplication),it suﬃces to show that there exists a rigid matrix among a large family of matrices. (Roughly,Raz’s lower bound is proved by showing there exists an A ∈ F N × N such that I N ⊗ A has a highvalue of a variant of rigidity which corresponds to bounded-coeﬃcient circuits.)We take this observation further, showing that there is a much larger family of matrices forwhich a circuit lower bound would imply lower bounds for matrix multiplication. The key idea isthe following algorithm for using matrix multiplication to compute linear transformations deﬁnedby Kronecker products (which is not very diﬃcult to prove, and is likely folklore): Proposition 1.9.

For any ﬁeld F , and any ﬁxed positive integer k , suppose that N × N × N k − matrix multiplication over F has an arithmetic circuit of size o ( N k log N ) . Then, the N × N Fourier transform, N × N Walsh-Hadamard transform, and any transform which can be writtenas the Kronekcer product of k diﬀerent N /k × N /k size matrices, have arithmetic circuits of size o ( N log N ) . Applying Proposition 1.9 with k = 2, we see that if one shows there are any matrices A, B ∈ F N × N such that A ⊗ B ∈ F N × N requires circuits of size Ω( N log N ) (perhaps making use of aproof that A ⊗ B is Valiant-rigid , or in some other way), then N × N matrix multiplication requirescircuits of size Ω( N log N ). By comparison, even for very simple matrices of the form A ⊗ B suchas the N × N Discrete Fourier transform or Walsh-Hadamard transform, the best known circuitsize is only Θ( N log N ).Proposition 1.9 becomes more exciting from an algorithmic perspective as we consider larger k .For k = 2, the upper bound of o ( N log N ) needed for N × N × N matrix multiplication is quite far Actually, showing that A ⊗ B is Valiant-rigid would only prove a ω ( N ) lower bound against O (log N )-depthcircuits for N × N × N matrix multiplication. Normally, a O (log N ) depth restriction on circuits for N × N × N matrix multiplication is not very limiting, since it is known that arithmetic circuits for matrix multiplication canbe converted into logarithmic-depth circuits with only a O ( N ε ) blowup in size for any ε > ω ). However, in our setting where the resulting lowerbounds are only for size Ω( N log N ), this N ε term may be non-negligible. O ( N . ). However, as k grows, the exponentis known to approach k as well: Proposition 1.10 ([HP98]) . For every ﬁeld F and integer k > , there is a circuit of size O ( N k · log k − ( k ) ) for performing N × N × N k − matrix multiplication. Here, the O is hiding afunction of k . Note that the exponent is k · log k − ( k ) = k + O (cid:18) k (cid:19) . In fact, working through the details (see Section 8 below), we ﬁnd that for a slightly super-constant choice of k = log N/ log log N , a circuit of size O ( N k · log k − ( k ) ) for N × N × N k − matrixmultiplication would lead to an o ( N log N ) time algorithm for the N × N Fourier transform andthe N × N Walsh-Hadamard transform. Unfortunately, this is not exactly what is guaranteed to usby Proposition 1.10; we only know there is such a circuit of size f ( k ) · N k · log k − ( k ) for some function f . When k is super-constant, the term f ( k ), which is usually part of the leading constant in fastmatrix multiplication algorithms, becomes relevant and may swamp our other savings. We showin Section 8 below that any bound f ( k ) < o (log k ) would suﬃce to speed up the N × N Fouriertransform and the N × N Walsh-Hadamard transform. The growth of f ( k ) in fast rectangularmatrix multiplication algorithms is typically not the focus of study, as one typically thinks of k asa constant , but it may warrant further investigation! For our last new upper bound, we remark that some ideas in the proof of Theorem 1.8 can beused to extend certain algorithms for the Orthogonal Vectors problem (which corresponds to thedisjointness matrix R n ) to a more general class of problems. Recall that in the Orthogonal Vectorsproblem, we are given as input m vectors from { , } d , and the goal is to determine whether there is apair which is orthogonal (over Z ). Equivalently, we are given as input m row and column indices intothe matrix R d , and we want to determine whether there are any 1s in the corresponding submatrix.This can be solved in O ( m · d ) time (and even faster when d ≤ O (log m ) [AWY14]), but in theregime when m ≥ ˜Ω(2 d/ ), there is a faster folklore algorithm running in time only O ( m + d · d ).In fact, this latter algorithm corresponds directly to the fact that the linear transformation R d canbe computed in time O ( d · d ).Using Equation (1), we can extend this to a more general class of problems, deﬁned as follows.Let f : { , } d → F be a function which can be evaluated in time T . Then, given as input aset S ⊆ { , } d of size | S | = m , there is an algorithm running in time O ( m + ( d + T ) · d ) forcomputing, for all s ∈ S , the sum P t ∈ S f ( s [1] ∧ t [1] , s [2] ∧ t [2] , . . . , s [ d ] ∧ t [ d ]). When f = N OR ,this algorithm counts the number of Orthogonal Vectors. However, other functions f correspondto other interesting tasks. For instance, when f is a threshold function (such as M AJ ORIT Y ),this algorithm counts the number of pairs of points which share a certain number of 1s in common,which is a basic nearest neighbor search problem, in time O ( m + d · d ). This improves on the morestraightforward O ( m · d ) time algorithm for this problem when d = o ( m ). The only work proving something like a bound on f ( k ) that the author is aware of is Williams’ [Wil14] analysisof Coppersmith’s [Cop82] rectangular matrix multiplication algorithm. He shows the algorithm for N × N . × N matrix multiplication has a running time of only N polylog( N ), compared to the bound of O ( N ε ) for any ε > .6 Other Related Work Rigidity Upper Bounds from Low-Depth Circuit Upper Bounds

Our results discussedin Section 1.1 above show how rigidity upper bounds for a matrix M can be used to construct smalllow-depth circuits for M . Relatedly, Pudl´ak [Pud94] showed a type of converse: that low-depthcircuit upper bounds can be used to show rigidity upper bounds. Proposition 1.11 ([Pud94, Proposition 2]) . For any ﬁeld F , positive integers r, d , real c, ε ≥ and M ∈ F N × N , if M has a depth- d linear circuit of size O ( d · N c/d ) , then R M ( ε · N ) ≤ ( d/ε ) d · N c . Although this can be combined with our Theorem 1.1 to prove rigidity upper bounds for H n and other Kronecker power matrices, the resulting bounds are weaker than what we prove inTheorem 1.8 using a diﬀerent approach, and do not suﬃce to prove that these matrices are notValiant-rigid. Perhaps there is a diﬀerent way to reconcile the two? Data Structures and Rigidity

Rigidity upper bounds are known to give rise to data structurebounds: Dvir, Golovnev, and Weinstein [DGW19] recently showed this for static data structures,and Ramamoorthy and Rashtchian [NRR20] showed this for systematic linear data structures.

Small Depth Circuit Lower Bounds

The best-known lower bounds on the size of a depth-2 lin-ear circuit for computing an explicit N × N linear transformation are only Ω( N log N/ (log log N ) )for eﬃcient error-correcting codes over constant-size ﬁnite ﬁelds [GHK + N log N/ log log N )for matrices arising from super-concentrator graphs over larger ﬁelds [RTS00]. Two recent lowerbounds were also shown for less-explicit matrices: Kumar and Volk [KV19] constructed a matrix intime exp( N Θ(1) ), over a ﬁeld of size exp( N Θ(1) ), which requires depth- d circuits of size N / (2 d ) .With Chen [AC19], we construct a matrix in P NP which has { , } entries over any ﬁxed-size ﬁniteﬁeld and which requires depth-2 circuits of size Ω( N · (log N ) / − δ ) for any δ >

0. In other words,the known techniques are far from proving that any of the depth- d upper bounds presented here,which are of the form O ( N − ε ) /d ) for somewhat small constants ε >

0, are tight.

Other Circuit Models for Matrices

Circuit models other than linear circuits have also beenstudied for computing matrices in certain settings. For instance, when working with matrices overa semigroup (like the OR semigroup) or a semiring (like the SUM semiring) instead of a ﬁeld,one can consider circuits where the gates compute sums from that semigroup or semiring instead.See, for instance, the book by Jukna and Sergeev which studies these models in detail [JS13].These models have applications to areas like communication complexity, and the techniques forconstructing circuits in these models often apply to the linear circuit model as well. For instance,we remark in Section 4.4 below that a construction by Jukna and Sergeev for the disjointness matrix R n , which takes advantage of both the recursive deﬁnition and the sparsity of R n , leads to a betterupper bound for low-depth circuits for R n than we are able to prove using our rigidity approach. In Section 2, we introduce the notions and notation we will use, and we present a number of basictools for working with Kronecker products and linear circuits. We then prove Theorem 1.1 inSections 3 and 4: we prove Lemma 1.3 and Lemma 1.4 in Section 3, and then we study low-rankrigidity upper bounds for a number of families of matrices in Section 4. In Sections 5–7 we proveTheorem 1.8: we prove that R n is not Valiant-rigid in Section 5, we show how to express othermatrices of interest in terms of R n in Section 6, and we give our extension to Kronecker products10f larger matrices (the q > For a positive integer n , we write [ n ] := { , , . . . , n } and [ n ] := { , , . . . , n − } .By default, we use zero-based numbering for the indices of matrices, meaning, for any set S , positive integers n, m , matrix M ∈ S n × m , i ∈ [ n ] and j ∈ [ m ] , we write M [ i, j ] for thecorresponding entry of M . That said, if S n , S m are sets of sizes | S n | = n and | S m | = m , we maysometimes say that the rows and columns of M are indexed by S n and S m , respectively. In thiscase, we implicitly deﬁne bijections f S n : S n → [ n ] and f S m : S m → [ m ] , and then for s n ∈ S n and s m ∈ S m we write M [ s n , s m ] := M [ f S n ( s n ) , f S m ( s m )]. For any ﬁeld F , positive integers n A , n B , m A , m B , and matrices A ∈ F a × a , B ∈ F b × b , the Kronecker product of A and B , denoted A ⊗ B , is the matrix A ⊗ B ∈ F ( a · b ) × ( a · b ) ,whose rows and columns are indexed by [ a ] × [ b ] and [ a ] × [ b ] , respectively, and whose entriesare given by A ⊗ B [( i A , i B ) , ( j A , j B )] := A [ i A , j A ] · B [ i B , j B ] . The Kronecker product is not commutative in general, however, there are always permutationmatrices P ∈ { , } ( a · b ) × ( a · b ) and P ′ ∈ { , } ( a · b ) × ( a · b ) , which depend only on a , a , b , and b , such that A ⊗ B = P × ( B ⊗ A ) × P ′ . For a matrix A and positive integer n , we write A ⊗ n to denote the Kronecker product of n copies of A , i.e., A ⊗ = A and A ⊗ n = A ⊗ ( n − ⊗ A .We will need some additional notation for dealing with more complicated Kronecker products.For positive integers n, q , matrices A, B ∈ F q × q , and sets S A ⊆ [ n ] and S B = [ n ] \ S A , we write A ⊗ S A ⊗ B ⊗ S B for the matrix in F q n × q n given by, for i, j ∈ [ q ] n , A ⊗ S A ⊗ B ⊗ S B [ i, j ] :=  Y ℓ ∈ S A A [ i [ ℓ ] , j [ ℓ ]]  ·  Y ℓ ∈ S B B [ i [ ℓ ] , j [ ℓ ]]  . Similarly, if A ∈ F q × q and B ∈ F q | SB | × q | SB | then we write A ⊗ S A ⊗ B ⊗ S B for the matrix in F q n × q n given by, for i, j ∈ [ q ] n , A ⊗ S A ⊗ B ⊗ S B [ i, j ] :=  Y ℓ ∈ S A A [ i [ ℓ ] , j [ ℓ ]]  · ( B [ i | S B , j | S B ]) . Here, ‘ i | S B ’ denotes i restricted to the coordinates of S B .In addition to using ⊗ to denote the Kronecker product of matrices, we will use × to denote the(usual) product of matrices, and for emphasis, we will use · to denote the product of ﬁeld elements.11 .1.3 Matrix Sparsity and Rigidity For a matrix A ∈ F a × a , its sparsity, written nnz( A ), denotes number of non-zero entries in A .We similarly deﬁne its row sparsity, nnz r ( A ), to be the maximum number of non-zero entries in arow of A , and its column sparsity, nnz c ( A ), to be the maximum number of non-zero entries in acolumn of A . Some basic properties we will use are that, for any A ∈ F a × a and B ∈ F b × b : • nnz( A ⊗ B ) = nnz( A ) · nnz( B ), • nnz r ( A ⊗ B ) = nnz r ( A ) · nnz r ( B ), • if a = b then nnz r ( A × B ) ≤ nnz r ( A ) · nnz r ( B ), • if a = b then nnz( A × B ) ≤ nnz( A ) · nnz r ( B ), and • if D ∈ F a × a is a diagonal matrix, then nnz( D × A ) ≤ nnz( A ) and nnz r ( D × A ) ≤ nnz r ( A ).For a matrix A ∈ F a × a and a nonnegative integer r , we write R A ( r ) to denote the rank- r rigidityof A over F , which is the minimum number of entries of A which must be changed to other valuesin F to make its rank at most r . In other words: R A ( r ) := min B ∈ F a × a, rank( A + B ) ≤ r nnz( B ) . The deﬁnition of R A ( r ) depends on the ﬁeld F , which we will explicitly mention when it is notclear from context.We similarly deﬁne the rank- r row/column rigidity of A , denoted R rcA ( r ), to be the minimumnumber of entries which must be changed per row or column of A to make its rank at most r , i.e. R rcA ( r ) := min B ∈ F a × a, rank( A + B ) ≤ r max { nnz r ( B ) , nnz c ( B ) } . It follows that, for any positive integer r , and any A ∈ F a × a , we have R A ( r ) ≤ a · R rcA ( r ) . • The family of Walsh-Hadamard transforms, H n ∈ {− , } n × n , is deﬁned by H = (cid:20) − (cid:21) and for n ∈ N , H n = H ⊗ n . • The family of Disjointness matrices, R n ∈ { , } n × n , is deﬁned by R = (cid:20) (cid:21) and for n ∈ N , R n = R ⊗ n . • The family of Fourier transforms, F N ∈ C N × N , is deﬁned by picking ω N := e πi/N to be aprimitive N th root of unity, then setting F N [ i, j ] = ω i · jN .12 For k ∈ N we write I k to denote the k × k identity matrix. • A diagonal matrix D ∈ F N × N is any matrix such that, if i = j , then D [ i, j ] = 0. D has fullrank if and only if D [ i, i ] = 0 for all i . • A weighted permutation matrix Π ∈ F N × N is a matrix with exactly one nonzero entry ineach row or column. A permutation matrix is a weighted permutation matrix in which eachnonzero entry is 1. An arithmetic circuit over a ﬁeld F is a circuit whose inputs are variables and constants from F ,and whose gates compute the product or the sum over F of their inputs. A linear circuit over F is a circuit whose inputs are variables from F , and whose gates compute F -linear combinations oftheir inputs. The depth of a circuit is the length (number of edges) of the longest path from aninput to an output. The size might either be measured by number of gates, or number of wires.For a ﬁeld F and matrix A ∈ F q × q , we say that a circuit C computes the linear transformation A (or simply ‘computes A ’) if C has q inputs and q outputs, such that on input x ∈ F q , theoutput of C is A × x .In a synchronous linear circuit, the inputs to each gate must all have the same depth. Asynchronous linear circuit C of depth d for a matrix A corresponds to matrices A , . . . , A d suchthat A = Q dj =1 A j , and the size (number of wires) of C is given by P dj =1 nnz( A j ). Any depth- d linear circuit can be converted into a depth- d synchronous linear circuit for the same lineartransformation with at most a O ( d ) multiplicative blow-up in the size. In this paper, O ( d ) willtypically be negligible, so we will focus on synchronous linear circuits. The binary entropy function H : [0 , → [0 ,

1] is deﬁned by H ( p ) := − p · log ( p ) − (1 − p ) · log (1 − p ) , where we take 0 · log (0) = 0. For every integer n > p ∈ (0 , n + 1 2 n · H ( p ) ≤ (cid:18) np · n (cid:19) ≤ n · H ( p ) . We will make use of the following calculations:

Lemma 2.2.

For any integer q > and any real < δ < /q − / ( q + 1) we have:1. H (1 /q ) = log ( q ) − q − q log ( q − ,2. H (1 /q + δ ) − H (1 /q ) ≤ δ · log ( q − − δ · q ( q −

1) log e (4) + O ( δ ) , and3. H (1 /q ) − H (1 /q − δ ) ≤ δ · log ( q −

1) + δ · q ( q −

1) log e (4) + O ( δ ) .Proof. (1) is a simple rearrangement of the deﬁnition: H (1 /q ) = 1 q log ( q ) + q − q log ( q/ ( q − ( q ) − q − q log ( q − .

13o prove (2), start by writing H (1 /q ) − H (1 /q − δ ) = Z /q /q − δ H ′ ( z ) dz = Z /q /q − δ log (cid:18) − zz (cid:19) dz. Since log((1 − z ) /z ) is convex, we can bound this above using the midpoint value by δ · log (cid:18) − (1 /q + δ/ /q + δ/ (cid:19) dz = δ · log ( q − − δ · q ( q −

1) log e (4) + O ( δ ) , where the last step is the Taylor expansion at δ = 0.Similarly, (3) follows by H (1 /q ) − H (1 /q − δ ) ≤ δ · log (cid:18) − (1 /q − δ/ /q − δ/ (cid:19) dz = δ · log ( q −

1) + δ · q ( q −

1) log e (4) + O ( δ ) . We now give a number of basic tools which will be of use throughout our proofs.

Proposition 2.3 (The mixed-product property) . Let F be any ﬁeld, and let A ∈ F a × a , B ∈ F b × b , C ∈ F c × c , D ∈ F d × d be any matrices over F with a = c and b = d . Then, ( A ⊗ B ) × ( C ⊗ D ) = ( A × C ) ⊗ ( B × D ) . Proposition 2.4.

For any ﬁeld F , any positive integers a, b , and any matrices A ∈ F a × a and B ∈ F b × b , we have rank( A ⊗ B ) = rank( A ) · rank( B ) . Proposition 2.5.

For any ﬁeld F , integers d , d , d , d and matrices X ∈ F d × d , X ∈ F d × d , X ∈ F d × d , and X ∈ F d × d , we have X × X + X × X = ( X | X ) × (cid:18) X X (cid:19) , where we are writing ‘ | ’ to denote matrix concatenation. Lemma 2.6.

For any ﬁeld F , positive integers q, n , and matrices M , . . . , M n ∈ F q × q , we have n O i =1 M i = n Y i =1 M ⊗{ i } i ⊗ ( I q n − ) ⊗ [ n ] \{ i } . (2) Proof.

We proceed by induction on n . The base case n = 1 is true since then the right-hand side14f Equation (2) is simply equal to M . For the inductive step, we see that n O i =1 M i = n − O i =1 M i ⊗ M n = n − Y i =1 M ⊗{ i } i ⊗ ( I q n − ) ⊗ [ n − \{ i } ! ⊗ M n = n − Y i =1 M ⊗{ i } i ⊗ ( I q n − ) ⊗ [ n − \{ i } ! × I q n − ! ⊗ ( I q × M n )= n − Y i =1 M ⊗{ i } i ⊗ ( I q n − ) ⊗ [ n − \{ i } ! ⊗ I ! × ( I q n − ⊗ M n ) (by Proposition 2.3)= n − Y i =1 (cid:16)(cid:16) M ⊗{ i } i ⊗ ( I q n − ) ⊗ [ n − \{ i } (cid:17) ⊗ I (cid:17)! × ( I q n − ⊗ M n )= n − Y i =1 M ⊗{ i } i ⊗ ( I q n − ) ⊗ [ n ] \{ i } ! × ( I q n − ⊗ M n )= n Y i =1 M ⊗{ i } i ⊗ ( I q n − ) ⊗ [ n ] \{ i } , as desired. Deﬁnition 2.7.

For any ﬁeld F , positive integer q , and matrix M ∈ F q × q , we say M is an outer-1 matrix if, for all i, j ∈ { , , . . . , q − } with i = 0 or j = 0 (or both) we have M [ i, j ] = 1. Wesimilarly say M is an outer-0 matrix if we have M [ i, j ] = 0 for all such i, j , and an outer-nonzero matrix if we have M [ i, j ] = 0 for all such i, j . Lemma 2.8.

For any ﬁeld F , positive integer q , and outer-nonzero matrix M ∈ F q × q , there are • an outer-1 matrix M ′ ∈ F q × q , and • two invertible diagonal matrices D, D ′ ∈ F q × q ,such that M = D × M ′ × D ′ .Proof. We ﬁrst deﬁne the diagonal matrices

G, G ′ ∈ F q × q by: For i ∈ { , , . . . , q − } , set G [ i, i ] =1 /M [ i,

0] and G ′ [ i, i ] = M [0 , /M [0 , i ]. These are well-deﬁned and invertible since M is an outer-nonzero matrix. Let M ′ = G × M × G ′ ; we can see that for any i ∈ { , , . . . , q − } we have M ′ [ i,

0] = M [ i, · G [ i, i ] · G ′ [0 ,

0] = M [ i, · (1 /M [ i, · ( M [0 , /M [0 , j ∈ { , , . . . , q − } we have M ′ [0 , j ] = M [0 , j ] · G [0 , · G ′ [ j, j ] = M [0 , j ] · (1 /M [0 , · ( M [0 , /M [0 , j ]) = 1, so M ′ isan outer-1 matrix. Finally we can pick D = G − and D ′ = G ′− so that M = D × M ′ × D ′ . Lemma 2.9.

For any ﬁeld F , positive integers n, q , and outer-nonzero matrices M , . . . , M n ∈ F q × q ,there are • outer-1 matrices M ′ , . . . , M ′ n ∈ F q × q , and • two invertible diagonal matrices D, D ′ ∈ F q n × q n ,such that N nℓ =1 M ℓ = D × ( N nℓ =1 M ′ ℓ ) × D ′ . roof. By Lemma 2.8, for each ℓ ∈ [ n ], there are invertible diagonal matrices D ℓ , D ′ ℓ ∈ F q × q andan outer-1 matrix M ′ ℓ ∈ F q × q such that M ℓ = D ℓ × M ′ ℓ × D ′ ℓ . Then, by Proposition 2.3, n O ℓ =1 M ℓ = n O ℓ =1 ( D ℓ × M ′ ℓ × D ′ ℓ ) = n O ℓ =1 D ℓ ! × n O ℓ =1 M ′ ℓ ! × n O ℓ =1 D ′ ℓ ! . We can thus pick D = N nℓ =1 D ℓ and D ′ = N nℓ =1 D ′ ℓ as desired. Lemma 2.10.

For any ﬁeld F , positive integers q, r , and matrices A, B, D, D ′ ∈ F q × q such that D and D ′ are invertible diagonal matrices with A = D × B × D ′ , we have that R A ( r ) = R B ( r ) .Proof. By deﬁnition of R B ( r ), there are matrices L, S ∈ F q × q such that rank( L ) ≤ r , nnz ( S ) ≤R B ( r ), and B = L + S . It follows that A = D × L × D ′ + D × S × D ′ . Since multiplying on theleft or right by a full-rank diagonal matrix does not change the rank or sparsity of a matrix, thisexpression shows that R A ( r ) ≤ R B ( r ). A symmetric argument also shows that R A ( r ) ≥ R B ( r ) asdesired.The next Lemma, which shows that the product of non-rigid matrices is also non-rigid, was alsoused by [DL19, Lemma 2.18]. Lemma 2.11.

For any ﬁeld F , positive integers q, r , and matrices A, B, C, D ∈ F q × q with D adiagonal matrix and C = A × D × B , we have that R rcC (2 r ) ≤ R rcA ( r ) · R rcB ( r ) . Proof.

Let s A := R rcA ( r ) and s B := R rcB ( r ). Write A = L A + S A and B = L B + S B where L A , L B , S A , S B ∈ F q × q are matrices with rank( L A ) ≤ r , rank( L B ) ≤ r , nnz r ( S A ) ≤ s A , nnz c ( S A ) ≤ s A , nnz r ( S B ) ≤ s B , and nnz c ( S B ) ≤ s B . We have that C = ( L A + S A ) × D × ( L B + S B ) = L A × D × ( L B + S B ) + S A × D × L B + S A × D × S B . The ﬁrst two matrices in the right-hand-side, L A × D × ( L B + S B ) and S A × D × L B , both haverank at most r , since L A and L B have rank at most r . The third, M := S A × D × S B , has bothnnz r ( M ) ≤ nnz r ( S A ) · nnz r ( S B ) , nnz c ( M ) ≤ nnz c ( S A ) · nnz c ( S B ) . It follows that max { nnz r ( M ) , nnz c ( M ) }≤ max { nnz r ( S A ) · nnz r ( S B ) , nnz c ( S A ) · nnz c ( S B ) }≤ max { nnz r ( S A ) , nnz c ( S A ) } · max { nnz r ( S B ) , nnz c ( S B ) }≤ s A · s B . This expression thus shows that R rcC (2 r ) ≤ s A · s B as desired.16 Framework for Designing Small Circuits from Non-Rigidity

We ﬁrst note that an upper bound for a ﬁxed matrix in a family of Kronecker products leads toone for the entire family.

Lemma 3.1.

For any ﬁeld F , ﬁxed positive integers q, t, d , and matrix M ∈ F q × q , suppose M ⊗ t = Q dj =1 B j for matrices B j for all j ∈ [ d ] with nnz( B j ) = b j . Then, for all positive integers n and j ∈ [ d ] there are matrices A n,j with nnz( A n,j ) < b n/tj and M ⊗ n = Q dj =1 A n,j . If t divides n , theupper bound can be further reduced to nnz( A n,j ) ≤ b n/tj .Proof. Assuming t divides n , we will show there are matrices A n,j with nnz( A n,j ) = b n/tj and M ⊗ n = Q dj =1 A n,j . If t does not divide n , we can instead apply this construction for the nextmultiple n ′ > n of t , and then pick the appropriate submatrix of M ⊗ n ′ , to get M ⊗ n ; we will thushave nnz( A n,j ) = b n ′ /tj < b n/tj .Now, assuming t divides n , then we can simply write M ⊗ n = ( Q dj =1 B j ) ⊗ n/t = Q dj =1 B ⊗ n/tj ,and pick A n,j := B ⊗ n/tj , which has nnz( A n,j ) = nnz( B ⊗ n/tj ) = nnz( B j ) n/t = b n/tj , as desired.Next, we observe that rigidity upper bounds can be used to give depth-2 synchronous circuitupper bounds. Lemma 3.2.

For any ﬁeld F , ﬁxed positive integers r, q , and matrix M ∈ F q × q , there are matrices B ∈ F q × ( q + r ) and C ∈ F ( q + r ) × q such that M = B × C , nnz( B ) = q · r + R M ( r ) , and nnz( C ) = q · ( r + 1) .Proof. By deﬁnition of rigidity, we can write M = L + S for matrices L, S ∈ F q × q with rank( L ) = r and nnz( S ) = R M ( r ). In particular, there are matrices B ′ ∈ F q × r and C ′ ∈ F r × q such that L = B ′ × C ′ . By Proposition 2.5, our desired matrix decomposition is thus M = (cid:0) S | B ′ (cid:1) × (cid:18) I q C ′ (cid:19) . We have nnz( B ) = nnz( S )+nnz( B ′ ) ≤ R M ( r )+ q · r , and nnz( C ) = nnz( I q )+nnz( C ′ ) ≤ q + q · r . Remark 3.3.

Applying Lemma 3.2 to M T instead of M , we can alternatively obtain B ∈ F q × ( q + r ) and C ∈ F ( q + r ) × q such that M = B × C , nnz( B ) = q · ( r + 1) , and nnz( C ) = q · r + R M ( r ) . Inother words, we can choose either B or C to have higher sparsity. Finally, we show how to ‘symmetrize’ the construction of Lemma 3.2 to extend it to smallcircuits of any depth d ≥ Theorem 3.4.

For any ﬁeld F , positive integers r, q , and matrix M ∈ F q × q , let c := log q (( r + 1) · ( r + R M ( r ) /q )) . Then, for every positive integers n, d , setting N = q n , the matrix M ⊗ n ∈ F N × N can be written as M ⊗ n = Q dj =1 A n,j for matrices A n,j with nnz( A n,j ) < q − c · N c/d . If d divides n , the upper boundcan be further reduced to nnz( A n,j ) ≤ N c/d . roof. Using Lemma 3.2 and Remark 3.3, there are matrices

B, B ′ , C, C ′ such that M = B × C = C ′ × B ′ , nnz( B ) = nnz( B ′ ) = q · r + R M ( r ), and nnz( C ) = nnz( C ′ ) = q · ( r + 1). We thus have thefollowing d ways to write M as a product of d matrices: M = B × C × I q × I q × I q × · · · × I q × I q × I q M = I q × B × C × I q × I q × · · · × I q × I q × I q M = I q × I q × B × C × I q × · · · × I q × I q × I q M = I q × I q × I q × B × C × · · · × I q × I q × I q ...M = I q × I q × I q × I q × I q × · · · × B × C × I q M = I q × I q × I q × I q × I q × · · · × I q × B × CM = C ′ × I q × I q × I q × I q × · · · × I q × I q × B ′ . Applying Proposition 2.3, there are thus permutation matrices P j , P ′ j for each j ∈ [ d ] such thatwe can write M ⊗ d as: M ⊗ d = (cid:0) P × ( B ⊗ C ′ ⊗ I q d − ) × P ′ (cid:1) ×  d − Y j =2 P j × (cid:0) B ⊗ C ⊗ I q d − (cid:1) × P ′ j  × (cid:0) P d × ( B ′ ⊗ C ⊗ I q d − ) × P ′ d (cid:1) . Since nnz( B ) = nnz( B ′ ) and nnz( C ) = nnz( C ′ ), this is expressing M ⊗ d as a product of d matrices,each of which has sparsitynnz( B ⊗ C ⊗ I q d − ) = nnz( B ) · nnz( C ) · nnz( I q d − ) = ( q · r + R M ( r )) · ( q · ( r + 1)) · q d − . Assume ﬁrst that d divides n . Applying Lemma 3.1, it follows that the matrix M ⊗ n can be writtenas M ⊗ n = Q dj =1 A n,j for matrices A n,j withnnz( A n,j ) ≤ (( q · r + R M ( r )) · ( q · ( r + 1)) · q d − ) n/d = q n · ( r + R M ( r ) /q ) n/d · ( r + 1) n/d = q n · (cid:16) log q (( r +1) · ( r + R M ( r ) /q )) d (cid:17) = N (1+ cd ) , where N = q n so that M ⊗ n ∈ F N × N , and c := log q (( r + 1) · ( r + R M ( r ) /q )), as desired.Next, consider when d does not divide n . Let n ′ be the largest integer less than n such that d divides n ′ , and let k = n − n ′ so k < d . By the above argument, there are matrices A n ′ , , . . . , A n ′ ,d such that M ⊗ n ′ = Q dj =1 A n ′ ,j and nnz( A n ′ ,j ) ≤ q n ′ · (1+ c/d ) . For each 1 ≤ ℓ ≤ k we can alsowrite M = Q dj =1 ([ j = ℓ ] ? M : I q ). Combining these k + 1 expressions together, again usingProposition 2.3, it follows that there are permutation matrices P j , P ′ j for each j ∈ [ d ] such that M ⊗ n =  k Y j =1 P j × (cid:0) A n ′ ,j ⊗ M ⊗ I q k − (cid:1) × P ′ j  ×  d Y j = k +1 P j × (cid:0) A n ′ ,j ⊗ I q k (cid:1) × P ′ j  . We can calculate that nnz( A n ′ ,j ⊗ M ⊗ I q k − ) ≤ q n ′ · (1+ c/d )+ k +1 < q (1 − c )+ n · (1+ c/d ) , and similarlynnz( A n ′ ,j ⊗ I q k ) < q (1 − c )+ n · (1+ c/d ) , which concludes the proof like before.18n the proof of Theorem 3.4, we made use of Remark 3.3 that our ﬁxed upper bound fromnon-rigidity can be made symmetric. For ﬁxed upper bounds designed in other ways, this may notbe the case. Below in Section 10, we will nonetheless show that any nontrivial ﬁxed upper boundcan be used to prove a result similar to Theorem 3.4. For now, in this section and the next, we willfocus speciﬁcally on our upper bounds from non-rigidity. In this subsection, we remark that we can remove the q − c factor from the circuit size in Theorem 3.4in exchange for a slight increase in depth (but not total size): Corollary 3.5.

For any ﬁeld F , positive integers r, q , and matrix M ∈ F q × q , let c := log q (( r + 1) · ( r + R M ( r ) /q )) . Then, for every positive integers n, d , with d < o ( n ) , setting N = q n , the matrix M ⊗ n ∈ F N × N hasa synchronous linear circuit of size (1 + o (1)) · d · q n · (1+ c/d ) . Proof.

Let n ′ be the integer in the range n ≥ n ′ > n − d such that d divides n ′ , and let k = n − n ′ .Applying Theorem 3.4 to M ⊗ n ′ , we see that it has a synchronous circuit of size d · q n ′ · (1+ c/d ) . Thus, M ⊗ n ′ ⊗ I q k has a synchronous circuit of size d · q n ′ · (1+ c/d ) · q k = d · q n · (1+ c/d ) /q k · c/d . Next, againby applying Theorem 3.4, but this time for depth k , we see that M ⊗ k has a synchronous circuit ofsize k · q k + c , and so I q n ′ ⊗ M ⊗ k has a synchronous circuit of size q n ′ · k · q k + c = k · q n + c . Hence,since M ⊗ n = M ⊗ n ′ ⊗ M ⊗ k = ( M ⊗ n ′ ⊗ I q k ) × ( I q n ′ ⊗ M ⊗ k ), it follows that M ⊗ n has a synchronouscircuit of size d · q n · (1+ c/d ) /q k · c/d + k · q n + c = q n · (1+ c/d ) · (cid:18) dq kc/d + kq c ( n/d − (cid:19) ≤ (1 + o (1)) · d · q n · (1+ c/d ) . Corollary 3.6.

For any ﬁeld F , positive integers r, q , and matrix M ∈ F q × q , let c := log q (( r + 1) · ( r + R M ( r ) /q )) . Then, for every positive integer n , setting N = q n , the matrix M ⊗ n ∈ F N × N has a synchronouslinear circuit of size ( c · e · log e (2) + o (1)) · N · log ( N ) . Proof.

We will apply Corollary 3.5 with d = c · log e ( N ). The resulting circuit size is(1 + o (1)) · d · q n · (1+ c/d ) = (1 + o (1)) · c · log e ( N ) · N · e = ( c · e · log e (2) + o (1)) · N · log ( N ) . In this section, we study the rank-1 rigidities of a number of families of matrices. We will ﬁnd thatmany matrices of interest have fairly low rank-1 rigidity. These constructions can be combinedwith the results of the previous section to prove our main results.19 .1 Kronecker Power Matrices

Lemma 4.1.

For any ﬁeld F and any outer-1 matrix M ∈ F × , we have R M ⊗ (1) ≤ .Proof. Since M is an outer-1 matrix, there is an ω ∈ F such that M = (cid:20) ω (cid:21) . We can index entries of M ⊗ by vectors x, y ∈ { , } , so that M ⊗ [ x, y ] = ω h x,y i Z . Consider thematrix L ∈ F × given by L [ x, y ] =  ω − if x = y = (0 , , , x = (0 , ,

0) and y = (0 , , , x = (0 , ,

0) and y = (0 , , ,ω if x = (0 , ,

0) and y = (0 , , .L has rank 1, and we can see that L [ x, y ] = M ⊗ [ x, y ] unless: • x = y = (0 , , • x = (0 , , y = (0 , , h x, y i Z = 1.We can count that: • When x = (1 , , x = (0 , , x = (0 , , y = (0 , ,

0) with h x, y i Z = 0. • When x = (1 , , x = (0 , , x = (1 , , y = (0 , ,

0) with h x, y i Z = 0, and 2 choices with h x, y i Z = 2. • When x = (1 , , y = (0 , ,

0) with h x, y i Z = 2, and 1 choice with h x, y i Z = 3.Overall, L and M ⊗ diﬀer in 1 · · · · Lemma 4.2.

For any ﬁeld F and any matrix M ∈ F × , we have R M ⊗ (1) ≤ .Proof. By Lemma 2.9 and Lemma 2.10, it is suﬃcient to consider the case when M is an outer-1matrix. The result then follows from Lemma 4.1. Theorem 4.3.

For any ﬁeld F , matrix M ∈ F × , and positive integers d, n > , the matrix M ⊗ n ∈ F N × N for N = 2 n has a depth- d linear circuit of size ε · N − ε ) /d for some constant ε > . .Proof. Applying Theorem 3.4 with M ⊗ , q = 8, and r = 1, combined with the rigidity bound ofLemma 4.2, shows that M ⊗ n has a depth- d linear circuit of size 2 − c · N c/d for c = log q (cid:18) ( r + 1) · (cid:18) r + R M ( r ) q (cid:19)(cid:19) ≤ log (cid:18) · (cid:18) (cid:19)(cid:19) < . − ε. Corollary 4.4.

For any ﬁeld F , matrix M ∈ F × , and positive integer n > , the matrix M ⊗ n ∈ F N × N for N = 2 n has a synchronous linear circuit of size ((1 − ε ) · e log e (2) + o (1)) · N log N forsome constant ε > . .Proof. Apply Corollary 3.6 with the same rigidity bound of Lemma 4.2.20 .2 Walsh-Hadamard Transform

Lemma 4.5.

Over any ﬁeld F with ch( F ) = 2 , we have R H (1) = 4 .Proof. First, to see that R H (1) ≤

4, we can verify that H =  − −

11 1 − − − −  =  − − − − − − − − − −  +   . This is the sum of a rank-1 matrix (where each row after the ﬁrst is the negation of the ﬁrst row),and a matrix with 4 nonzero entries, as desired.The bound R H (1) ≥ R H n ( r ) ≥ n − /r [Mid05, DW06], but we prove it here for completeness using the simple proof strategyof [Mid05]. Recall that we can write H as a block matrix as H = (cid:20) H H H − H (cid:21) . Each copy of H has rank 2, so we must change at least one entry in each H to drop the rank ofthe whole matrix to 1. Since there are four disjoint copies, we must change at least four entries. Lemma 4.6.

Over any ﬁeld F , we have R H (1) ≤ .Proof. We use the same construction as in Lemma 4.2, with ω = − M ⊗ = H . In thiscase, there is one more correct entry than in the general case, since when x = y = (1 , , M ⊗ [ x, y ] = ω and L [ x, y ] = ω , but these are equal when ω = −

1, so the number of errors is only23 − Lemma 4.7.

Over any ﬁeld F , we have R H (1) ≤ .Proof. In the proof of Lemma 4.5, we showed there is a matrix A ∈ {− , } × which diﬀers from H in 4 entries, and which has rank 1 over any ﬁeld. Let B = A ⊗ ∈ {− , } × . We have thatrank( B ) = rank( A ) = 1. Indexing the rows and columns of H by { , , , } , and the rows andcolumns of H by { , , , } , we see that for a, b, c, d ∈ { , , , } we have B [( a, b ) , ( c, d )] H [( a, b ) , ( c, d )] = A [ a, c ] · A [ b, d ] H [ a, c ] · H [ b, d ] . This will equal 1 (and hence the [( a, b ) , ( c, d )] entries of B and H will be equal) whenever either: • A [ a, c ] = H [ a, c ] and A [ b, d ] = H [ b, d ], which happens for (16 − = 144 values of a, b, c, d ∈{ , , , } , or • A [ a, c ] = H [ a, c ] and A [ b, d ] = H [ b, d ] (since all these values are in {− , } ), which happensfor 4 = 16 values of a, b, c, d ∈ { , , , } .Thus, B only diﬀers from H in 16 − −

16 = 96 entries, as desired.

Remark 4.8.

I veriﬁed using a brute-force search that Lemma 4.6 and Lemma 4.7 are tight overany ﬁeld F with ch( F ) = 2 . I unfortunately haven’t found more enlightening proofs of these facts. heorem 4.9. For any ﬁeld F and positive integers d, n > , the matrix H n ∈ F N × N for N = 2 n has a depth- d linear circuit of size ≤ ε · N − ε ) /d + O ( d/n ) for some constant ε > . .Proof. Applying Theorem 3.4 with H , q = 16, and r = 1, combined with the rigidity bound ofLemma 4.7, shows that H n = H ⊗ n has a depth- d linear circuit of size 2 − c · N c/d for c = log q (cid:18) ( r + 1) · (cid:18) r + R M ( r ) q (cid:19)(cid:19) ≤ log (cid:18) · (cid:18) (cid:19)(cid:19) < . − ε. Corollary 4.10.

For any ﬁeld F and positive integer n > , the matrix H n ∈ F N × N for N = 2 n has a synchronous linear circuit of size ((1 − ε ) · e log e (2) + o (1)) · N log N for some constant ε > . .Proof. Apply Corollary 3.6 with the same rigidity bound of Lemma 4.7.

In order to use the approach of Theorem 3.4 to prove that the N × N Fourier transform F N hasdepth- d circuits of size O ( N c/d ) for some c <

1, we would need it to be the case that, for somepositive integers

N > r >

0, we havelog N (( r + 1) · ( r + R F N ( r ) /N )) < . We next remark that known rigidity lower bounds for F N show that this is never the case. In fact,the proof extends to any Vandermonde matrix. Proposition 4.11.

For any positive integers

N > r ≥ , the N × N Fourier transform matrix F N has ( r + 1) · ( r + R F N ( r ) /N ) ≥ N. Proof.

Shparlinski [Shp99] shows that R F N ( r ) ≥ ( N − r ) / ( r + 1); for completeness, we prove thisbelow in Lemma 4.13. It then follows that:( r + 1) · (cid:18) r + R F N ( r ) N (cid:19) ≥ ( r + 1) · (cid:18) r + ( N − r ) ( r + 1) · N (cid:19) = 1 N (cid:0) N + r + N r ( r − (cid:1) ≥ N (cid:0) N (cid:1) = N. We next prove a Lemma which we will need in the proof of Shparlinski’s rigidity lower bound.

Lemma 4.12.

For any positive integers

N > r ≥ , any integer ≤ k < n − r , and any S ⊆ [ n ] ofsize | S | = r , let M k,S be the r × r submatrix of F N consisting of the rows of { k, k + 1 , k + 2 , . . . , k + r − } and the columns of S . Then, M k,S has full rank.Proof. Indexing the rows of M k,s by [ r ] and the columns by S , we have for j ∈ [ r ] and s ∈ S that M k,S [ j, s ] = ω j · sN = ( ω sN ) j , where ω N = e i · π/N ∈ C is a primitive N th root of unity. Assume to thecontrary that M k,S does not have full rank. Thus, there is a nontrivial linear combination of its22ows summing to zero. This means that there are a , a , . . . , a r − ∈ C , which are not all 0, suchthat, for each s ∈ S , we have r − X j =0 a j · ( ω sN ) j = 0 . In other words, the r diﬀerent values { ω sN | s ∈ S } are all roots of the polynomial p ( z ) = P r − j =0 a j · z j . However, p is a nonzero polynomial of degree at most r −

1, so it cannot have r roots, acontradiction. Lemma 4.13 ([Shp99]) . For any positive integers

N > r ≥ , we have R F N ( r ) ≥ ( N − r ) / ( r + 1) .Proof. Suppose that one can change t entries of F N to make its rank at most r . For k ∈ [ N − r ] ,let t k be the number of changes which are in rows { k, k + 1 , k + 2 , . . . , k + r } . Since each changecontributes to at most r + 1 of the t k values, we have that P N − r − k =0 T k ≤ ( r + 1) · t . Thus, by thepigeonhole principle, there must be a k ∗ ∈ [ N − r ] such that t k ∗ ≤ ( r + 1) · t/ ( N − r ). Let S ⊆ [ N ] be the columns of F N such that none of the changes in rows { k ∗ , k ∗ + 1 , k ∗ + 2 , . . . , k ∗ + r } is in acolumn of S . It must be that | S | ≤ r , since otherwise, by Lemma 4.12, the matrix M k ∗ ,S has rank r + 1 and we did not make any changes to it. On the other hand, by deﬁnition, | S | ≥ N − t k ∗ ≥ N − ( r + 1) · t/ ( N − r ). It follows that r ≥ N − ( r + 1) · t/ ( N − r ), which rearranges to the desired t ≥ ( N − r ) / ( r + 1). Recall the Disjointness marix R n ∈ F N × N from Section 2.1.4. The approach of Theorem 3.4 canbe used to prove that R n has depth- d linear circuits of size N − ε ) /d . However, since R n is verysparse (it has nnz( R n ) = 3 n ≤ N . ) it is almost immediate that it has depth- d circuits of size O ( N c/d ) for c = log (1 . < . c < . Lemma 4.14 ([JS13, Lemma 4.2]) . Let t = log (1+ √ < . . For any ﬁeld F and positive integer n , there are matrices A n , B n ∈ F n × n with nnz( A n ) , nnz( B n ) ≤ O (2 t · n ) such that R n = A n × B n .Proof. We show how to partition the 1s of R n into squares (all-1s combinatorial rectangles with thesame number of rows and columns) and rectangles (all-1s combinatorial rectangles with twice asmany rows as columns). Our partition is deﬁned recursively. Let s n be the sum of the side-lengthsof the squares in the partition of R n , and let r n be the sum of the shorter side-lengths of therectangles. For R := (cid:20) (cid:21) , we can see that s = r = 1. Next, from the recursive deﬁnition R n := (cid:20) R n − R n − R n − (cid:21) , we see that the three copies of any s × s square in R n − can be partitioned into a s × s square anda 2 s × s rectangle in R n , and the three copies of any 2 s × s rectangle in R n − can be partitionedinto a 2 s × s rectangle and a 2 s × s square in R n . It follows that we get the recurrence (cid:20) s n r n (cid:21) = (cid:20) (cid:21) × (cid:20) s n − r n − (cid:21) . (cid:20) (cid:21) has eigenvalues 1 ± √

2, it follows that s n , r n ≤ O ((1 + √ n ). We havethus written the 1s of R n as a disjoint sum of combinatorial rectangles whose side-lengths sum to O ((1 + √ n ) = O (2 t · n ), from which the result follows.Following the same construction as Theorem 3.4, we get: Proposition 4.15.

For any ﬁeld F and any positive integers n, d , let N = 2 n and let c = 2(log (1+ √ − < . . There are d matrices A n, , . . . , A n,d such that R n = Q dj =1 A n,j and nnz( A n,j ) ≤ O ( N c/d ) for all j ∈ [ d ] . Recall that R := (cid:20) (cid:21) and R n := R ⊗ n . For x, y ∈ { , } n , we can equivalently deﬁne: R n [ x, y ] = ( ℓ ∈ [ n ] such that x [ ℓ ] = y [ ℓ ] = 1,1 otherwise.For positive integers k ≤ n , write (cid:0) n

2, then (cid:18) n< k (cid:19) ≤ (cid:18) n ≤ k (cid:19) ≤ n · H ( n/k ) , where H ( p ) is the binary entropy function. Lemma 5.1.

For any positive integers k ≤ n , we can remove (cid:0) n

For any ﬁeld F , positive integer n , and a ∈ (0 , , we have R rcR n (cid:0) · (cid:0) n we have R rcR n (2 ( ε log (1 /ε )+ O ( ε )) · n ) ≤ (1 − ε ) · n , and • For suﬃciently small ε > we have R rcR n (2 (1 − Θ( ε / log (1 /ε )) · n ) ≤ ε · n , and • We have R rcR n ( O (2 . · n )) < o (2 n/ ) .Proof. This follows from setting k = a · n in Lemma 5.1, since setting one row or column of a matrixto zero is a rank-one update. For the particular parameter settings:To see that R rcR n (2 ( ε log (1 /ε )+ O ( ε )) · n ) ≤ (1 − ε ) · n , pick a = ε . In that case, 2 · (cid:0) n<εn (cid:1) ≤ H ( ε ) · n · poly( n ) ≤ ( ε log (1 /ε )+ O ( ε )) · n , and (cid:0) (1 − ε ) n ≤ (1 − ε ) n (cid:1) ≤ (1 − ε ) n .To see that R rcR n (2 (1 − Θ( ε / log (1 /ε )) · n ) ≤ ε · n , pick a = 1 / − δ for an appropriate δ > · (cid:0) n< (1 / − δ ) · n (cid:1) ≤ H (1 / − δ ) · n ≤ (1 − Θ( δ )) · n , and (cid:0) (1 / δ ) n ≤ δn (cid:1) ≤ (1 / δ ) · H (4 δ/ (1+2 δ )) · n ≤ Θ( δ log(1 /δ )) · n . The result follows by picking δ such that the quantityΘ( δ log(1 /δ )) in the sparsity bound is equal to ε . In that case, δ = Θ( ε / log (1 /ε )).To see that R rcR n (2 . · n ) ≤ ε · n/ , let a ∗ ≈ . , /

2] to (1 − a ) · H ((1 − a ) / (1 − a )) = 1 /

2. Then, for any a > a ∗ it follows that (cid:0) (1 − a ) n ≤ (1 − a ) n (cid:1) < o (2 n/ ), and (cid:0) n

For any ﬁeld F , positive integer n , and function f : { , } n → F , let V f ∈ F n × n denote the matrix which is given by, for x, y ∈ { , } n , V f [ x, y ] := f ( x ∨ y ), where ‘ x ∨ y ’ denotesthe bit-wise OR of x and y . Deﬁnition 6.2.

For any ﬁeld F , positive integer n , and function f : { , } n → F , let a f ∈ F n denotethe vector with, for z ∈ { , } n , the entry a f [ z ] := f ( z ). Let b f ∈ F n be the vector b f := R − n × a f .Let D f ∈ F n × n be the diagonal matrix of the entries of b f , meaning for z ∈ { , } n , we have D f [ z, z ] := b f [ z ]. Lemma 6.3.

For any ﬁeld F , positive integer n , and function f : { , } n → F , we have V f = R n × D f × R n . Proof.

Recall that for x, y ∈ { , } n , R n [ x, y ] = ( h x, y i Z = 0 , x, y ∈ { , } n :( R n × D f × R n )[ x, y ] = X z ∈{ , } n R n [ x, z ] · D f [ z, z ] · R n [ z, y ]= X z ∈{ , } n h x,z i Z = h z,y i Z =0 D f [ z, z ]= X z ∈{ , } n h ( x ∨ y ) ,z i Z =0 D f [ z, z ]= X z ∈{ , } n h ( x ∨ y ) ,z i Z =0 b f [ z ]= X z ∈{ , } n R n [( x ∨ y ) , z ] · b f [ z ]= ( R n × b f )[( x ∨ y )]= a f [( x ∨ y )]= f ( x ∨ y )= V f [ x, y ] , as desired. Lemma 6.4.

For any ﬁeld F , positive integer n , and outer-1 matrices M , . . . M n ∈ F × , there isa function f : { , } n → F and permutation matrices Π n , Π ′ n ∈ F n × n such that n O i =1 M i = Π n × V f × Π ′ n . Proof.

For each i ∈ [ n ], let ω i ∈ F be the element such that M i = (cid:20) ω i (cid:21) . Further deﬁne M ′ i ∈ F × by M ′ = (cid:20) ω i

11 1 (cid:21) .M ′ i is a permutation of the rows and columns of M i , so it suﬃces to prove the result for N ni =1 M ′ i instead of N ni =1 M i . For i ∈ [ n ], letting g i : { , } → F be deﬁned by g i (0) = ω i and g i (1) = 1, wesee that M ′ i = V g i . Thus, deﬁning f : { , } n → F by f ( z [1] , . . . , z [ n ]) = n Y i =1 g ( z [ i ]) , it follows that N ni =1 M ′ i = V f , as desired. 26 emma 6.5. For any ﬁeld F , positive integer n , and outer-nonzero matrices M , . . . M n ∈ F × ,there is a function f : { , } n → F and weighted permutation matrices Π n , Π ′ n ∈ F n × n such that n O i =1 M i = Π n × V f × Π ′ n . Proof.

By Lemma 2.9, there are outer-1 matrices M ′ , . . . , M ′ n ∈ F × and invertible diagonal matri-ces D, D ′ ∈ F n × n such that N ni =1 M i = D × ( N ni =1 M ′ i ) × D ′ . The result then follows by applyingLemma 6.4 to N ni =1 M ′ i . Theorem 6.6.

For any ﬁeld F and positive integer n , let M ∈ F n × n be a matrix of any of thefollowing forms: • M = V f for any function f : { , } n → F , or • M = N nℓ =1 M i for any matrices M , . . . , M n ∈ F × .Then, for any a ∈ (0 , , we have R rcM (cid:0) · (cid:0) n we have R rcM (2 (1 − Θ( ε / log (1 /ε )) · n ) ≤ ε · n , and • We have R rcM ( O (2 . · n )) < o (2 n ) .Proof. For M = V f , this follows by substituting the expression from Lemma 6.3 and the rigiditybound from Theorem 5.2 into Lemma 2.11.For M = N nℓ =1 M i , let k be the number of i ∈ { , , . . . , n } such that M i has at most twononzero entries, and assume without loss of generality that M , . . . , M k are the matrices with atmost two nonzero entries.For each i > k we can permute the rows and columns of the 2 × M i so that it is anouter-nonzero matrix, so combining Lemma 6.5 with Lemma 6.3 shows that we can write n O ℓ = k +1 M i = Π n − k × R n − k × D × R n − k × Π ′ n − k , where Π n − k , D, Π ′ n − k ∈ F n − k × n − k are weighted diagonal matrices, and R n − k ∈ F n − k × n − k is thedisjointness matrix.For each i ≤ k , we can permute the rows and columns of the 2 × M i so that its nonzeroentries are a subset of those of R . It follows that there is a matrix B ∈ F k × k whose nonzeroentries are a subset of those of R k such that B = N kℓ =1 M i .Letting I k ∈ F k × k denote the identity matrix, and applying Proposition 2.3, we can write n O ℓ =1 M i = (Π n − k ⊗ I k ) × ( R n − k ⊗ B ) × ( D ⊗ I k ) × ( R n − k ⊗ I k ) × (Π ′ n − k ⊗ I k ) . The three matrices Π n − k ⊗ I k , D ⊗ I k , and Π ′ n − k ⊗ I k are weighted permutation matrices.The rigidity bound of Theorem 5.2 holds for the two matrices R n − k ⊗ B and R n − k ⊗ I k , since theyare each Kronecker products of R n − k and a matrix whose nonzero entries are a subset of those of R k (after permuting the rows of I k ), and so their nonzero entries are a subset of those of R n . Wecan thus once again apply Lemma 2.11 to conclude the desired rigidity upper bound for M .27 Extension to Kronecker Products of Larger Matrices

Theorem 7.1.

For any ﬁeld F , positive integer q > , matrices M , . . . , M n ∈ F q × q , and suﬃcientlysmall ε > , the Kronecker product M := N nℓ =1 M ℓ ∈ F N × N for N = q n has R rcM ( N − O (2 − q q log( q ) · ε / log (1 /ε )) ) ≤ N ε , where the O hides a universal constant. In particular, if q ≤ O (log n ) , then M is not Valiant-rigid. In the remainder of this section, we prove Theorem 7.1. We proceed by induction on q . Thebase case q = 2 was given by Theorem 6.6. Suppose q >

2, and that the result is known alreadyfor q − M ℓ ∈ F q × q is an outer-nonzero matrix for all ℓ ∈ [ n ] since our proof belowwill only use the pattern of nonzero entries of the matrix, similar to the proof of Theorem 6.6. ByLemma 2.9, we may further assume without loss of generality that M ℓ ∈ F q × q is an outer-1 matrixfor all ℓ ∈ [ n ]. For nonnegative integers i , let J i ∈ F q i × q i denote the q i × q i matrix whose entries areall 1s. There are thus outer-0 matrices A , . . . , A n ∈ F q × q such that M ℓ = J + A ℓ for each ℓ ∈ [ n ].For each subset K ⊆ [ n ] let A K := N ℓ ∈ K A ℓ . This is the Kronecker product of | K | diﬀerent( q − × ( q −

1) matrices, padded with ( q | K | − ( q − | K | ) rows and columns of 0s. By the inductivehypothesis, for every ε >

0, setting ε ′ = O (2 q − ( q −

1) log( q − · ε / log (1 /ε )) there are matrices L K , S K ∈ F q | K | × q | K | such that: • A K = L K + S K , • rank( L K ) ≤ ( q − | K |· (1 − ε ′ ) , and • for a given row x ∈ [ q ] | K | of S k : – If there is any i ∈ [ | K | ] such that x [ i ] = 0, then every entry of row x of S K is 0, – Otherwise, there are at most ( q − | K |· ε nonzero entries in row x of S K .(and similar for a given column of S k ), and thus rank( S k ) ≤ ( q − | K | .Now we can expand M : M = n O ℓ =1 M ℓ = n O ℓ =1 ( J + A ℓ )= X K ⊆ [ n ] A ⊗ KK ⊗ J ⊗ [ n ] \ K ( ∗ )=  X K ⊆ [ n ] L ⊗ KK ⊗ J ⊗ [ n ] \ K  +  X K ⊆ [ n ] S ⊗ KK ⊗ J ⊗ [ n ] \ K   X K ⊆ [ n ] L ⊗ KK ⊗ J ⊗ [ n ] \ K  ≤ X K ⊆ [ n ] rank (cid:16) L ⊗ KK ⊗ J ⊗ [ n ] \ K (cid:17) = X K ⊆ [ n ] rank ( L K ) · rank (cid:16) J ⊗ ( n −| K | )1 (cid:17) = X K ⊆ [ n ] rank ( L K ) ≤ X K ⊆ [ n ] ( q − | K |· (1 − ε ′ ) = n X k =0 (cid:18) nk (cid:19) · ( q − k · (1 − ε ′ ) = (cid:16) q − (1 − ε ′ ) (cid:17) n = q n · (1 − ε ′′ ) , where ε ′′ is given by ε ′′ := log( q ( q − − ε ′ +1 )log( q ) = ε ′ · ( q −

1) log( q − q log( q ) + O ( ε ′ ) . It remains to show that the second matrix, P K ⊆ [ n ] S ⊗ KK ⊗ J ⊗ [ n ] \ K , is not rigid. We partitionit into three parts, for some δ > a := ( q − /q : X K ⊆ [ n ] S ⊗ KK ⊗ J ⊗ [ n ] \ K =  X K ⊆ [ n ] | K | < ( a − δ ) · n S ⊗ KK ⊗ J ⊗ [ n ] \ K  +  X K ⊆ [ n ] | K | > ( a + δ ) · n S ⊗ KK ⊗ J ⊗ [ n ] \ K  +  X K ⊆ [ n ] ( a + δ ) · n ≥| K |≥ ( a − δ ) · n S ⊗ KK ⊗ J ⊗ [ n ] \ K  We will show that the ﬁrst and second parts are low-rank, and that the third part is non-rigid. For29he ﬁrst, we bound similar to before (and using Lemma 2.2 to bound H ) that:rank  X K ⊆ [ n ] | K | < ( a − δ ) · n S ⊗ KK ⊗ J ⊗ [ n ] \ K  ≤ X K ⊆ [ n ] | K | < ( a − δ ) · n rank (cid:16) S ⊗ KK ⊗ J ⊗ [ n ] \ K (cid:17) ≤ ( a − δ ) · n X k =0 (cid:18) nk (cid:19) · ( q − k ≤ (( a − δ ) · n ) · (cid:18) n ( a − δ ) · n (cid:19) · ( q − ( a − δ ) · n ≤ O ( n ) · H ( a − δ ) · n · ( q − ( a − δ ) · n = O ( n ) · H (1 /q + δ ) · n · ( q − ( a − δ ) · n ≤ (log ( q ) − a log ( q − δ log ( q − − Θ( q · δ )) · n · ( q − ( a − δ ) · n = 2 (log ( q ) − Θ( q · δ )) · n = q n (1 − Θ( δ q/ log( q ))) . We can almost identically bound the rank of the second part by:rank  X K ⊆ [ n ] | K | > ( a + δ ) · n S ⊗ KK ⊗ J ⊗ [ n ] \ K  ≤ O ( n ) · H (1 / − δ ) · n · ( q − ( a + δ ) · n ≤ q n (1 − Θ( δ q/ log( q ))) . Finally, it remains to consider the third part: B := X K ⊆ [ n ] ( a + δ ) · n ≥| K |≥ ( a − δ ) · n S ⊗ KK ⊗ J ⊗ [ n ] \ K . We will show that after a small number of rows and columns of B are removed, it is a sparsematrix. Since changing one row or column of a matrix is a rank-1 update, this will show that B isnot rigid and complete our proof.The rows and columns we remove are those corresponding to x ∈ [ q ] with nnz( x ) ≥ ( a + δ ) · n .The number of these rows and columns is n X k =( a + δ ) · n (cid:18) nk (cid:19) · ( q − n − k , which is again upper bounded by q n (1 − Θ( δ q/ log( q ))) similar to the previous two sums.Finally, let us show that there are not many nonzero entries remaining in any row or columnof B . Consider a row x ∈ [ q ] that we did not remove, meaning nnz( x ) < ( a + δ ) · n . Suppose, forsome K ⊆ [ n ] with ( a + δ ) · n ≥ | K | ≥ ( a − δ ) · n , that S ⊗ KK ⊗ J ⊗ [ n ] \ K has nonzero entries in row x .That means there cannot be any ℓ ∈ K such that x [ ℓ ] = 0. The number of choices for K is henceat most ( a + δ ) · n X k =( a − δ ) · n (cid:18) nnz( x ) k (cid:19) ≤ (2 δn ) · (cid:18) ( a + δ ) · n ( a − δ ) · n (cid:19) ≤ O ( n ) · ( a + δ ) · H (2 δ/ ( a + δ )) · n ≤ Θ( δ · log(1 /δ )) · n . K , how many nonzero entries does it contribute to row x ? A simple upper boundis nnz r ( S K ) · nnz r ( J ⊗ ( n −| K | )1 ), but we can get a better bound by noting that many of the columnswith those nonzero entries have been removed. Indeed, for a y ∈ [ q ] n , the entry B [ x, y ] will benonzero and not removed earlier only if: • nnz( y ) < ( a + δ ) · n , and • S K [ x | K , y | K ] = 0.In particular, this latter condition requires that nnz( y | K ) = | K | , which means only ( a + δ ) · n −| K | ≤ δn entries of y | [ n ] \ K may be nonzero. There are thus: • ≤ ( q − | K |· ε choices for y | K , by deﬁnition of S K , and • ≤ (cid:0) n −| K | δn (cid:1) · ( q − δn choices for y | [ n ] \ K because at most 2 δn of its entries may be nonzero.The total number of such y is thus at most( q − | K |· ε · (cid:18) n − | K | δn (cid:19) · ( q − δn ≤ ( q − ( a + δ ) · n · ε · (cid:18) (1 /q + δ ) n δn (cid:19) · ( q − δn ≤ O ( n ) · n · ( ε ( a + δ ) log( q − /q + δ ) H (2 δ/ (1 /q + δ ))+2 δ log( q − ≤ O ( n ) · n · (( aε +2 δ + εδ ) log( q − δ log((1 /q + δ ) / (2 δ ))) ≤ n · ( aε log( q − δ log(1 /δ )+ O ( δ )) = q n · ( ( q −

1) log( q − q log( q ) ε +2 δ log(1 /δ ) / log( q )+ O ( δ )) . In summary, M can be written as the sum of a matrix of rank at most q n · (1 − ( q −

1) log( q − q log q ε ′ + O ( ε ′ )) + q n · (1 − Θ( δ q/ log( q ))) , and a matrix with row/column sparsity at most q n · ( ( q −

1) log( q − q log( q ) ε +2 δ log(1 /δ ) / log( q )+ O ( δ )) . Let c = ( q −

1) log( q − q log( q ) ε , so that ε ′ = O (2 q ( q −

1) log( q − · ε / log (1 /ε )) = O ( q − q log ( q )( q −

1) log( q − · c / log (1 /c )), and pick δ such that c = δ log(1 /δ ) / log( q ). This shows as desired that R rcM ( N − O (2 q q log( q ) · c / log (1 /c )) ) ≤ N c . Theorem 7.2.

For any ﬁeld F , positive integer q > , and function f : { , , . . . , q − } n → F ,deﬁne the matrix V f ∈ F q n × q n by, for x, y ∈ { , , . . . , q − } n , V f [ x, y ] = f (max { x [0] , y [0] } , max { x [1] , y [1] } , max { x [2] , y [2] } , . . . , max { x [ n − , y [ n − } ) . For any suﬃciently small ε > , the matrix V f ∈ F N × N for N = q n has R rcV f ( N − O (2 − q q log( q ) · ε / log (1 /ε )) ) ≤ N ε , where the O hides a universal constant. In particular, if q ≤ O (log n ) , then V f is not Valiant-rigid. roof. Just like in the proof of Theorem 7.1, we proceed by induction on q . The base case q = 2was given by Theorem 6.6. Suppose q >

2, and that the result is known already for q − T ⊆ [ n ] , we deﬁne g T : [ q − | T | → [ q ] n as follows. Let t , t , . . . , t | T | be an enumerationof the elements of T . Then, for z ∈ [ q − | T | and i ∈ [ n ] we deﬁne: g T ( z )[ i ] := ( i / ∈ T,z t j + 1 if i = t j ∈ T. For every set S ⊆ [ n ], we deﬁne the function f S : { , , , . . . , q − } | S | → F as, for any z ∈ [ q ] n , f S ( z ) = X T ⊆ S ( − | S |−| T | · f ( g T ( z )) . I now claim that V f = X S ⊆ [ n ] V ⊗ Sf S ⊗ J ⊗ [ n ] \ S . Once I show this, we can simply substitute it in for Equation ( ∗ ) in the proof of Theorem 7.1, andthe remainder of the proof is exactly the same (with A K replaced by V f K throughout).For z ∈ [ q ] n , let S z ⊆ [ n ] be the set of indices i with z [ i ] = 0. Notice that, for x, y ∈ [ q ] n , letting z ∈ [ q ] n be the entry-wise max of x and y , we have that:  X S ⊆ [ n ] V ⊗ Sf S ⊗ J ⊗ [ n ] \ S  [ x, y ] = X S ⊆ [ n ] (cid:16) V ⊗ Sf S ⊗ J ⊗ [ n ] \ S [ x, y ] (cid:17) = X S ⊆ [ n ] ([ S ⊆ S z ] ? f S ( z ) : 0) . It thus suﬃces to show that for all z ∈ [ q ] n , we have P S ⊆ S z f S ( z ) = f ( z ). We can verify thisby using inclusion-exclusion: X S ⊆ S z f S ( z ) = X S ⊆ S z X T ⊆ S ( − | S |−| T | · f ( g T ( z ))= X T ⊆ S z X T ⊆ S ⊆ S z ( − | S |−| T | · f ( g T ( z ))= X T ⊆ S z f ( g T ( z )) · X T ⊆ S ⊆ S z ( − | S |−| T | = X T ⊆ S z f ( g T ( z )) · | S z |−| T | X k =0 (cid:18) | S z | − | T | k (cid:19) · ( − k = f ( g S z ( z ))= f ( z ) . Here, we used the fact that P nk =0 (cid:0) nk (cid:1) · ( − k = 0 unless n = 0.Note that Theorem 7.2 also holds with ‘max’ replaced with ‘min’, as this corresponds to appro-priately permuting the truth table of f . 32 Kronecker Products and Matrix Multiplication

Deﬁnition 8.1.

For any ﬁeld F and positive integers m, n, p , let M M F ( m, n, p ) denote the smallestsize of an arithmetic circuit for computing the product of an m × n matrix and a n × p matrix over F .For instance, M M F ( n, n, n ) ≤ n ω + o (1) where ω ≤ .

373 [Wil12, LG14] is the matrix multiplicationexponent.

Lemma 8.2.

For any ﬁeld F , positive integers q, N, and matrix M ∈ F q × q , the linear transforma-tion M ⊗ I N can be computed by an arithmetic circuit of size M M F ( q, q, N ) .Proof. Computing ( M ⊗ I N ) × v for a vector v ∈ F q · N is equivalent to computing M × v ℓ for all N ofthe vectors v , . . . , v N ∈ F q whose concatenation gives v . This, in turn, is equivalent to multiplying M × ( v | v | · · · | v N ), which can be done with a circuit of size M M F ( q, q, N ) as desired. Lemma 8.3.

For any ﬁeld F , positive integers q, n, k such that k divides n , and matrices M , . . . , M n ∈ F q × q , the linear transformation M := N nℓ =1 M ℓ ∈ F q n × q n can be computed by an arithmetic circuitof size k · M M F ( q n/k , q n/k , q n · ( k − /k ) .Proof. For each ℓ ∈ [ k ] , deﬁne the matrix M ′ ℓ ∈ F q n/k × q n/k by M ′ ℓ := n/k O i =1 M i + ℓ · n/k . Hence, k − O ℓ =0 M ′ ℓ = n O i =1 M i = M n . Applying Lemma 2.6 to the M ′ ℓ matrices shows that, in order to compute M n , it suﬃces to com-pute k linear transformations, where the ℓ th, for ℓ ∈ [ k ] , is a permutation of the rows andcolumns of M ′ ℓ ⊗ I q n · ( k − /k . By Lemma 8.2, each can be computed by an arithmetic circuit ofsize M M F ( q n/k , q n/k , q n · ( k − /k ), as desired. Corollary 8.4.

Suppose that, for any integer k > , we have M M F ( n, n, n k − ) ≤ o ( n k log n ) .Then, for any ﬁeld F , ﬁxed positive integer q , positive integer n , and matrices M , . . . , M n ∈ F q × q ,the linear transformation M := N nℓ =1 M ℓ ∈ F N × N (with N = q n ) can be computed by an arithmeticcircuit of size o ( N log N ) .Proof. Applying Lemma 8.3, we see that M can be computed by an arithmetic circuit of size k · M M F ( q n/k , q n/k , q n · ( k − /k ). By assumption, this is o (( q n/k ) k log( q n/k )) = o ( N log N ), as desired.In fact, as k gets large, it is known that the exponent of M M F ( n, n, n k − ) is the desired k : Proposition 8.5 ([HP98]) . For every ﬁeld F and integer k > , we have M M F ( n, n, n k − ) ≤ O ( n k · log k − ( k ) ) . Here, the O is hiding a function of k . Note that the exponent is k · log k − ( k ) = k + O (cid:18) k (cid:19) . Proof sketch.

This follows from [HP98, Equation (7.1)]. In the notation of their Equation (7.1),using q = r = k and a small β >

0, we ﬁnd that ω (1 , , k ) < ( k + 1) · log k ( k + 1). The result thenfollows by applying Sch¨onhage’s theorem [Sch81], using the notation of [HP98, Theorem 2.1] with ε = ( k + 1) · log k ( k + 1) − ω (1 , , k ), which is a function of only k .33nfortunately, in order to combine Proposition 8.5 with Corollary 8.4 to construct an arithmeticcircuit of size o ( N log N ), we would need to pick k = Ω(log N/ log log N ) in order for the non-leadingterm from M M F ( n, n, n k − ) (i.e. ( N /k ) O (1 / log k ) = N O (1 /k log k ) ) to be negligible. However, in thatcase, the O in Proposition 8.5 is hiding a growing function of N , which swamps our savings unlessthat growing function is relatively small: Corollary 8.6.

Let f ( k ) be the constant factor hidden in Proposition 8.5, and suppose that f ( k ) < o (log k ) . Then, for any ﬁeld F , ﬁxed positive integer q , positive integer n , and matrices M , . . . , M n ∈ F q × q , the linear transformation M := N nℓ =1 M ℓ ∈ F N × N (with N = q n ) can becomputed by an arithmetic circuit of size o ( N log N ) .Proof. Applying Lemma 8.3 with k = log N/ log log N , the resulting circuit size upper bound is O ( k · f ( k ) · N ) < o ( k log k · N ) = o ( N log N ). In this section, we focus on the complexity of linear transformations using arithmetic circuits inwhich each gate has fan-in 2. This is often the best model for counting the exact number ofarithmetic operations needed to compute a given linear transformation.

Lemma 9.1.

For any ﬁeld F and positive integer n , let M ∈ F n × n be a matrix of any of thefollowing forms: • M = V f for any function f : { , } n → F , or • M = N nℓ =1 M i for any matrices M , . . . , M n ∈ F × .Then, M ⊗ n ∈ F N × N (with N = 2 n ) can be computed by an arithmetic circuit with N log N additiongates and N multiplication gates.Proof. By Lemma 6.3 and Lemma 6.5, any such M can be written as the product of three diagonalmatrices and two copies of R n . It thus suﬃces to show that R n has an arithmetic circuit with N log N addition gates. By Lemma 2.6, to compute R n , it suﬃces to compute log N diﬀerentcopies of A := R ⊗ I N/ . In A , half the rows have two 1s, which can be computed by a singleaddition gate, and the other half of the rows have a single 1 and don’t need any gates to compute(we just output one of the inputs). Thus, in total, A needs N/ R n needs N log N addition gates, as desired.In fact, we can make this algorithm uniform, since the relevant diagonal matrices can all alsobe constructed by evaluating R n : Lemma 9.2.

For any ﬁeld F , positive integer n , and function f : { , } n → F , letting N = 2 n ,suppose there is an algorithm that outputs the truth table of f (i.e. evaluates f on all N inputsfrom { , } n ) in time T . Let M be the time to perform a multiplication over F , and A be the timeto perform an addition or subtraction over F . Then, there is an algorithm which, given as input x ∈ F N , outputs V f × x in time O ( T + A · N log N + M · N ) . For f = AN D , this corresponds to the algorithm for the Orthogonal Vectors problem with n vectors in dimension d with running time O ( n + d · d ). We hence get a the same running time forany such problem for a function f : { , } n → F .34 In Section 3 we showed how to convert a rigidity upper bound for a matrix M into a low-depthcircuit upper bound for M ⊗ n . A key intermediate step was that from a circuit upper bound for M itself, one can take Kronecker powers to get a circuit for M ⊗ n for any n . In this section, wegeneralize this to show that if M ∈ F q × q has a nontrivial construction M = B × B × · · · B d where Q di =1 nnz( B i ) < q d +1 then this can still give a nontrivial circuit upper bound for M ⊗ n of depth d and size O ( q n (1+(1 − ε ) /d ) ), even if nnz( B i ) is greater than q /d for some of the i . Note that wecan achieve Q di =1 nnz( B i ) = q d +1 by picking B = M and B = · · · = B d = I q . This more generalresult was not needed in our construction in Section 3, since the constructions from non-rigiditywere naturally symmetric, but they could be useful for designing upper bounds in other ways. Lemma 10.1.

For any ﬁeld F and positive integers q, d , and matrix M ∈ F q × q , suppose there arereal numbers a , . . . , a d ≥ such that, for any positive integer n , the matrix M ⊗ n can be writtenas M ⊗ n = A n, × A n, × · · · × A n,d for some matrices with nnz( A n,ℓ ) = O ( q a ℓ · n ) for all ℓ ∈ [ d ] . Let j ∗ = argmax j ∈ [ d ] a j , and let a := 1 + a j ∗ −

S, T of the same dimensions, and a Booleanpredicate P , we write ( P ? S : T ) to denote the matrix( P ? S : T ) := ( S if P is true, T if P is false.Let b, b , . . . , b d be positive real numbers which sum to 1 to be determined. By assumption, foreach j ∈ [ d ], there is a matrix A bn,j with nnz( A bn,j ) = O ( q b · a j · n ), and M ⊗ bn = Q dj =1 A bn,j . We canhence write: M ⊗ n = M ⊗ bn ⊗ d O ℓ =1 M ⊗ b ℓ · n =  d Y j =1 A bn,j  ⊗ d O ℓ =1  d Y j =1 ([ j = ℓ ] ? M ⊗ b ℓ · n : I q bℓ · n )  = d Y j =1 A bn,j ⊗ d O ℓ =1 ([ j = ℓ ] ? M ⊗ b ℓ · n : I q bℓ · n ) ! = d Y j =1 P j × (cid:16) A bn,j ⊗ M ⊗ b j n ⊗ I q n (1 − b − bj ) (cid:17) × P ′ j , for appropriate permutation matrices P j , P ′ j for each j ∈ [ d ], by Proposition 2.3. We will pick B n,j := P j × (cid:16) A bn,j ⊗ M ⊗ b j n ⊗ I q n (1 − b − bj ) (cid:17) × P ′ j ,

35o it is indeed the case that M ⊗ n = B n, × B n, × · · · × B n,d . Let us now bound nnz( B n,j ):nnz( B n,j ) = nnz( P j × (cid:16) A bn,j ⊗ M ⊗ b j n ⊗ I q n (1 − b − bj ) (cid:17) × P ′ j )= nnz( A bn,j ⊗ M ⊗ b j n ⊗ I q n (1 − b − bj ) )= nnz( A bn,j ) · nnz( M ⊗ b j n ) · nnz( I q n (1 − b − bj ) ) ≤ O ( q b · a j · n ) · q b j n · q n (1 − b − b j ) = O ( q (1+ b j +( a j − b ) · n ) . We pick b := 11 + P dj =1 ( a j ∗ − a j ) , and for all j ∈ [ d ], we pick b j := ( a j ∗ − a j ) · b, so that b + P dj =1 b j = 1. Hence, for every j ∈ [ d ], we have from the calculation above thatnnz( B n,j ) ≤ O ( q (1+ b j +( a j − b ) · n ) = O ( q (1+( a j ∗ − a j ) · b +( a j − b ) · n ) = O ( q (1+( a j ∗ − b ) · n ) , as desired.For the ‘in particular’ sentence of the Lemma statement: Suppose P dj =1 a j /d = 1 + c/d forsome 0 ≤ c <

1. It follows that a = 1 + a j ∗ −

11 + d · a j ∗ − d − c . The derivative of this expression with respect to a j ∗ is (1 − c ) / ( a j ∗ d − c − d + 1) , which is alwaysnonnegative, so for a ﬁxed c , the value of a is maximized when a j ∗ is as large as possible. Since a j ≥ j ∈ [ d ], we must have that a j ∗ =  d X j =1 a j  −  X j ∈ [ d ] ,j = j ∗ a j  ≤ ( d + c ) − ( d − · c + 1 . We therefore have that a ≤ c + 1) −

11 + d · ( c + 1) − d − c = 1 + c c ( d − < d , as desired.When the matrix M is symmetric (i.e. satisﬁes M = M T ), we can get an improved exponent(by improving on the choice of a j ∗ ): Lemma 10.2.

For any ﬁeld F and positive integers q, d , and matrix M ∈ F q × q with M = M T ,suppose there are real numbers a , . . . , a d ≥ such that, for any positive integer n , the matrix M ⊗ n can be written as M ⊗ n = A n, × A n, × · · · × A n,d for some matrices with nnz( A n,ℓ ) = O ( q a ℓ · n ) forall ℓ ∈ [ d ] . Deﬁne a j ∗ := max j ∈ [ d ] ( a j + a d − j ) / , and let a := 1 + a j ∗ −

11 + d · a j ∗ − P dj =1 a j . Then, for any positive integer n , we can write M ⊗ n = B n, × B n, × · · · × B n,d for some matriceswith nnz( B n,j ) = O ( q a · n ) for all j ∈ [ d ] .In particular, if ( P dj =1 a j ) /d < /d , then a < d . roof. We can write M ⊗ n = M ⊗ n/ ⊗ ( M ⊗ n/ ) T =  d Y j =1 A n/ ,j  ⊗  d Y j =1 A Tn/ ,d − j  = d Y j =1 (cid:16) A n/ ,j ⊗ A Tn/ ,d − j (cid:17) . The result then follows by applying Lemma 10.1 to this new expression of M ⊗ n as a product of d matrices, since for ℓ ∈ [ d ], we havennz (cid:16) A n/ ,j ⊗ A Tn/ ,d − j (cid:17) = nnz (cid:0) A n/ ,j (cid:1) · nnz (cid:0) A n/ ,d − j (cid:1) ≤ O (cid:16) q ( a j + a d − j ) · n/ (cid:17) . Acknowledgements

I would like to thank Amol Aggarwal, Chi-Ning Chou, Ben Edelman, Alexander Golovnev, DDLiu, Jon Schneider, Leslie Valiant, Virginia Vassilevska Williams, and Ryan Williams for helpfuldiscussions throughout this project. I’d especially like to thank Virginia Vassilevska Williams forpointing out Proposition 8.5 to me, and anonymous reviewers for many helpful comments.

References [AC19] Josh Alman and Lijie Chen. Eﬃcient construction of rigid matrices using an np oracle.In ,pages 1034–1055. IEEE, 2019.[ACW16] Josh Alman, Timothy M Chan, and Ryan Williams. Polynomial representations ofthreshold functions and algorithmic applications. In , pages 467–476. IEEE, 2016.[AW15] Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neigh-bors. In ,pages 136–150. IEEE, 2015.[AW17] Josh Alman and Ryan Williams. Probabilistic rank and matrix rigidity. In

Proceedingsof the 49th Annual ACM SIGACT Symposium on Theory of Computing , pages 641–652,2017.[AW21] Josh Alman and Virginia Vassilevska Williams. A reﬁned laser method and faster matrixmultiplication. In

SODA , 2021.[AWY14] Amir Abboud, Ryan Williams, and Huacheng Yu. More applications of the polynomialmethod to algorithm design. In

Proceedings of the twenty-sixth annual ACM-SIAMsymposium on Discrete algorithms , pages 218–230. SIAM, 2014.37BL04] Peter B¨urgisser and Martin Lotz. Lower bounds on the bounded coeﬃcient complexityof bilinear maps.

Journal of the ACM (JACM) , 51(3):464–482, 2004.[Cha94] Bernard Chazelle. A spectral approach to lower bounds. In

Proceedings 35th AnnualSymposium on Foundations of Computer Science , pages 674–682. IEEE, 1994.[Cop82] Don Coppersmith. Rapid multiplication of rectangular matrices.

SIAM Journal onComputing , 11(3):467–471, 1982.[DE19] Zeev Dvir and Benjamin L Edelman. Matrix rigidity and the croot-lev-pach lemma.

Theory of Computing , 15(8):1–7, 2019.[DGW19] Zeev Dvir, Alexander Golovnev, and Omri Weinstein. Static data structure lowerbounds imply rigidity. In

Proceedings of the 51st Annual ACM SIGACT Symposium onTheory of Computing , pages 967–978, 2019.[DL19] Zeev Dvir and Allen Liu. Fourier and circulant matrices are not rigid. In . Schloss Dagstuhl-Leibniz-Zentrum fuerInformatik, 2019.[DW06] Ronald De Wolf. Lower bounds on matrix rigidity via a quantum argument. In

Interna-tional Colloquium on Automata, Languages, and Programming , pages 62–71. Springer,2006.[GHK +

12] Anna G´al, Kristoﬀer Arnsfelt Hansen, Michal Kouck`y, Pavel Pudl´ak, and EmanueleViola. Tight bounds on computing error-correcting codes by bounded-depth circuitswith arbitrary gates. In

Proceedings of the forty-fourth annual ACM symposium onTheory of computing , pages 479–494, 2012.[HP98] Xiaohan Huang and Victor Y Pan. Fast rectangular matrix multiplication and applica-tions.

Journal of complexity , 14(2):257–299, 1998.[JS13] Stasys Jukna and Igor Sergeev. Complexity of linear boolean operators.

Foundationsand Trends ® in Theoretical Computer Science , 9(1):1–123, 2013.[KV19] Mrinal Kumar and Ben Lee Volk. Lower bounds for matrix factorization. arXiv preprintarXiv:1904.01182 , 2019.[LG14] Fran¸cois Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the39th international symposium on symbolic and algebraic computation , pages 296–303,2014.[Lok00] Satyanarayana V Lokam. On the rigidity of vandermonde matrices.

Theoretical Com-puter Science , 237(1-2):477–483, 2000.[Lok01] Satyanarayana V Lokam. Spectral methods for matrix rigidity with applications tosize–depth trade-oﬀs and communication complexity.

Journal of Computer and SystemSciences , 63(3):449–473, 2001.[Lok09] Satyanarayana V Lokam. Complexity lower bounds using linear algebra.

Foundationsand Trends ® in Theoretical Computer Science , 4(1–2):1–155, 2009.[Mid05] Gatis Midrijanis. Three lines proof of the lower bound for the matrix rigidity. arXivpreprint cs/0506081 , 2005. 38Mor73] Jacques Morgenstern. Note on a lower bound on the linear complexity of the fast fouriertransform. Journal of the ACM (JACM) , 20(2):305–306, 1973.[NRR20] Sivaramakrishnan Natarajan Ramamoorthy and Cyrus Rashtchian. Equivalence of sys-tematic linear data structures and matrix rigidity. In . Schloss Dagstuhl-Leibniz-Zentrum f¨ur In-formatik, 2020.[NW96] Noam Nisan and Avi Wigderson. Lower bounds on arithmetic circuits via partial deriva-tives.

Computational complexity , 6(3):217–234, 1996.[Pud94] Pavel Pudlak. Communication in bounded depth circuits.

Combinatorica , 14(2):203–216, 1994.[Pud00] Pavel Pudl´ak. A note on the use of determinant for proving lower bounds on the sizeof linear circuits.

Information processing letters , 74(5-6):197–201, 2000.[Raz02] Ran Raz. On the complexity of matrix product. In

Proceedings of the thiry-fourthannual ACM symposium on Theory of computing , pages 144–151, 2002.[RTS00] Jaikumar Radhakrishnan and Amnon Ta-Shma. Bounds for dispersers, extractors, anddepth-two superconcentrators.

SIAM Journal on Discrete Mathematics , 13(1):2–24,2000.[Sch81] Arnold Sch¨onhage. Partial and total matrix multiplication.

SIAM Journal on Comput-ing , 10(3):434–455, 1981.[Shp99] Igor E. Shparlinski. Private communication, cited in [Lok00], 1999.[Str69] Volker Strassen. Gaussian elimination is not optimal.

Numerische mathematik ,13(4):354–356, 1969.[Val77] Leslie G Valiant. Graph-theoretic arguments in low-level complexity. In

Interna-tional Symposium on Mathematical Foundations of Computer Science , pages 162–176.Springer, 1977.[VL00] Charles F Van Loan. The ubiquitous kronecker product.

Journal of computational andapplied mathematics , 123(1-2):85–100, 2000.[Wil12] Virginia Vassilevska Williams. Multiplying matrices faster than coppersmith-winograd.In

Proceedings of the forty-fourth annual ACM symposium on Theory of computing ,pages 887–898, 2012.[Wil14] Ryan Williams. Faster all-pairs shortest paths via circuit complexity. In