RRecent Progress on Matrix Rigidity
C. Ramya * Tata Institute of Fundamental Research, Mumbai, India.September 22, 2020
Abstract
The concept of matrix rigidity was introduced by Valiant(independently by Grig-oriev) in the context of computing linear transformations. A matrix is rigid if it is far (in terms of Hamming distance) from any matrix of low rank. Although we knowrigid matrices exist, obtaining explicit constructions of rigid matrices have remained along-standing open question. This decade has seen tremendous progress towards un-derstanding matrix rigidity. In the past, several matrices such as Hadamard matricesand Fourier matrices were conjectured to be rigid. Very recently, many of these ma-trices were shown to have low rigidity. Further, several explicit constructions of rigidmatrices in classes such as E and P NP were obtained recently. Among other things,matrix rigidity has found striking connections to areas as disparate as communicationcomplexity, data structure lower bounds and error-correcting codes. In this survey, wepresent a selected set of results that highlight recent progress on matrix rigidity andits remarkable connections to other areas in theoretical computer science. Contents * [email protected]. Research supported by a fellowship of the DAE, Government of India. a r X i v : . [ c s . CC ] S e p Upper Bounds on Matrix Rigidity 24
The concept of matrix rigidity was introduced by Valiant [33] in the context of comput-ing linear transformations by arithmetic circuits and was also studied independently byGrigoriev in [18].The rigidity of a matrix A ∈ F n × n for rank r over F (denoted by R F A ( r ) ) is the minimumnumber of entries to be changed in A so that rank of matrix A becomes r . More formally, R F A ( r ) (cid:44) min C { sparsity ( C ) | C ∈ F n × n , rank ( A + C ) ≤ r } where sparsity of a matrix C denotes the number of non-zero entries in C .A matrix is rigid if it is far in terms of Hamming distance from any low rank matrix. Matrix rigidity is an interesting and intriguing concept in that sense that it intertwines acombinatorial property such as the sparsity of a matrix with an algebraic property namelythe rank of a matrix.For instance, the rigidity of an n × n identity matrix I n for rank r is exactly ( n − r ) .Trivially, for any matrix A ∈ F n × n and for any r ≤ n , the rigidity of A is at most n .In fact, it is not difficult to observe that for any matrix A ∈ F n × n and for any r ≤ n , R F A ( r ) ≤ ( n − r ) . Moreover, over finite fields most matrices have high rigidity (rigidityclose to the upper bound). Further, over infinite fields, for every choice of n there existsan n × n matrix A such that R A ( r ) = ( n − r ) for any r . Although the existence of rigidmatrices is quite straight-forward, the major goal is to prove a super-linear lower boundon the rigidity of explicit n × n matrices. We say a sequence of matrices { A n } n ∈ N is explicit if there exists a deterministic algorithm that on input n (in unary) outputs A n intime poly ( n ) . The following question was posed by Valiant in [33] and has remained atantalizing open problem: Question 1.1.
Does there exist an explicit sequence of matrices ( A n ) n ∈ N with entries in F suchthat R F A n ( ε n ) = Ω ( n + δ ) for some ε , δ > ?
2s mentioned earlier Question 1.1 has connections to arithmetic circuits computinglinear transformations. The study of linear transformations are central to linear algebra.Linear transformations such as the Discrete Fourier Transform, Fast Fourier Transform areof practical importance. A linear circuit is a directed acyclic graph C where every gate iseither an input gate or computes a linear combination of its inputs. The size of a linear cir-cuit is the number of edges in it and the depth of a linear circuit is the length of the longestpath from the input to the output gate. Valiant [33] observed that if any linear transfor-mation is computable by a small -size small -depth linear circuit then the correspondingtransformation matrix does not have high rigidity. In other words, for any A ∈ F n × n if R A ( ε n ) ≥ n + δ for some ε , δ > A : x (cid:55)→ A · x must have either size Ω ( n log log n ) or depth Ω ( log n ) . Thus, rigidity lowerbounds imply super-linear size lower bounds on linear circuits of logarithmic depth. Thisbrings us to the following question, a variant of Question 1.1: Question 1.2.
Does there exist an explicit sequence of matrices ( A n ) n ∈ N with entries in F suchthat R F A n (cid:16) n log log n (cid:17) = Ω ( n + δ ) for some δ > ? The earliest works on matrix rigidity were due to Valiant[33] and Razborov[29].The connections between communication complexity of boolean functions and matrixrigidity were first explored by Razborov[29]. Whenever we think of matrices in the com-munication complexity setting the most natural candidates are communication matricesof boolean functions. Consider the two-party communication model with two parties
Alice and
Bob who want to jointly compute a boolean function f : {
0, 1 } n → {
0, 1 } where the input is partitioned between the two parties. For any boolean function f : {
0, 1 } n × {
0, 1 } n → {
0, 1 } , the communication matrix M f is a 2 n × n where the rows andcolumns are indexed by strings in {
0, 1 } n and M f [ x , y ] = f ( x , y ) for all x , y ∈ {
0, 1 } n .Razborov in [29, 34] considered the complexity class PH cc , the communication com-plexity analogue of the polynomial hierarchy (see [16] for a formal definition of PH cc ) andshowed that for any function f : {
0, 1 } n → {
0, 1 } in PH cc , R M f ( ( log n / δ ) c ) ≤ δ · n where δ > M f is the 2 n × n communication matrix. Thus, lowerbounds on the rigidity of explicit matrices immediately imply communication complex-ity lower bounds, a long-standing open question. This leads us to the following questionwhich is quite similar to that of Question 1.1 except for the parameters: Question 1.3.
For N = n , does there exist an explicit sequence of matrices ( A N ) N ∈ N withentries in F such that R F A N ( ( log n / δ ) c ) ≥ δ · n for some δ > ? Although we have not been able to obtain decisive answers to any of these questions,there has been considerable progress towards understanding Questions 1.1, 1.2 and 1.3 inthe recent years. In fact, several interesting matrix families were conjectured to be rigid:"Many candidate matrices are conjectured to have rigidity as high as in Valiantâ ˘A ´Zsquestion. Examples include Fourier transform matrices, Hadamard matri-ces, Cauchy matrices, Vandermonde matrices, incidence matrices of projectiveplanes, etc." —Page 15, [26].3n this article, we survey some of the recent developments on the non-rigidity of someof the matrix families conjectured above. In particular, we review the following results:• Non-rigidity of
Walsh-Hadamard matrix by Alman and Williams[2];• Non-rigidity of generalized Hadamard matrices due to Dvir and Liu[11];• Non-Rigidity of certain matrices associated with functions over finite fields[9]; and• Non-Rigidity of
Fourier and circulant matrices[11].Even though we are currently far away from answering Questions 1.1, 1.2 and 1.3,several semi-explicit constructions of rigid matrices were obtained quite recently. In thisregard, we survey the following results:• Rigidity of Random Toeplitz matrices by Goldreich and Tal[15];• Sub-exponential time constructions of rigid matrices[22]; and• Explicit rigid matrices in the class P NP based on constructions of probabilisticallycheckable proofs( PCP s)[1, 3].However, the parameters in the above mentioned results are different from each other.The first two of the above mentioned results are towards answering Question 1.1 whilethe construction in [1] is in the spirit of answering Question 1.3.Despite consistent efforts in obtaining rigid matrices, answering Question 1.1 seemsto be a distant dream. This difficulty is justified by understanding connections betweenexplicit constructions of rigid matrices and other hard problems in theoretical computerscience such as explicit constructions of error-correcting codes, communication complex-ity lower bounds as well as data structure lower bounds. In this regard, we discuss indetail the following recent connections between matrix rigidity and data structure lowerbounds as well as linear codes:• Proving data structure lower bounds is a fundamental open problem in theoreticalcomputer science. A major goal has been to understand time-space tradeoffs . That is,in the static setting how does one optimize space such that data structure queriescan be answered quickly. In [10], the authors show that a super-logarithmic lowerbound on the query time of a linear data structure with linear space implies ananswer to Question 1.2 where the matrix of high rigidity is constructible in the class P NP . Though constructions of rigid matrices in P NP are now available to us via PCP s.• In the theory of error-correcting codes, linear codes are particularly useful. One canverify that asymptotically good codes yield generator matrices of high rigidity. Wereview a result by Dvir [7] which states that if the generating matrix of a locallydecodable code is not rigid, then it defines a locally self-correctable code with rateclose to one. 4efore we delve into proving upper and lower bounds on matrix rigidity, let us inves-tigate the computational complexity of computing the rigidity of a given matrix. Con-sider the problem
RIGID ( A , F , s , r ) of deciding if R F A ( r ) ≤ s given a matrix A ∈ F n × n and s , r ∈ Z + . Note that we can guess a matrix S ∈ F n × n of sparsity at most s and test if rank ( A − S ) ≤ r .• Over finite fields, this problem is in the class NP and in fact RIGID ( A , F q , s , r ) isknown to be NP -complete[5].• Given a matrix A ∈ R n × n let m = rank ( A ) . We can brute force over all matrices ofsparsity at most s and test if there is a setting of these s entries to real numbers suchthat the rank ( S ) ≤ r − m . This is in PSPACE as computing the minimum rank of apattern matrix is the class ∃ R (existential theory of reals). Hence, RIGID ( A , R , s , r ) isin PSPACE (where the underlying computational model can handle real numbers).•
RIGID ( A , Q , s , r ) is not known to be decidable.In the parameterized regime, RIGID ( A , F q , s , r ) is known to be fixed parameter tractablewhen F = F q . The computational complexity of several variants of RIGID ( A , F , s , r ) hasbeen studied extensively in [27]. Organization of the article.
Rest of the article is organized as follows. In Section 2, webegin with some basic facts on rigidity. The goal of Section 2 is to understand certainimportant classical progress made towards understanding rigidity so that successive sec-tions are more accessible to the reader. For details on statements and proofs in Section2, we refer the reader to an excellent survey by Satya Lokam [26] and references therein.In Section 3, we review recent explicit constructions of matrices that achieve rigidity toa large extent possible. As mentioned earlier, several well-known families of matriceswere recently ruled out from having rigidity and we survey these results in Section 4. InSections 5 and 6 we investigate the connections between rigid matrices, static data struc-ture lower bounds and error-correcting codes. We conclude with some open problems inSection 7.
To begin with, we prove a straight-forward upper bound on the rigidity of any matrix.
Lemma 2.1.
Let A be an n × n matrix with entries from F . For any r ≤ n, R F A ( r ) ≤ ( n − r ) .Proof. Let A ∈ F n × n . If rank ( A ) ≤ r then R A ( r ) =
0. Therefore, assume rank ( A ) > r . Then there exists r linearly independent rows R , . . . , R r and r linearly independentcolumns C , . . . , C r in A . The rows and columns of A can be permuted such that R , . . . , R r and C , . . . , C r are the first r rows and columns of A respectively (denoted by sub-matrix5 in Figure 1). Let R (cid:48) , . . . , R (cid:48) n − r and C (cid:48) , . . . , C (cid:48) n − r be rows and columns of A and A respectively. Observe that there exists constants α i , . . . , α i r in F such that for all i ∈ [ n − r ] , R (cid:48) i = α i R + . . . + α i r R r . A R R R r ... C C . . . C r C (cid:48) n − r C (cid:48) C (cid:48) . . .R (cid:48) R (cid:48) ... R (cid:48) n − r A A A r r n − rn − rrn − r Figure 1: Rigidity upper bound on a matrix A Now, by altering the entries of the sub-matrix A based on the values of α ’s, we canensure that every row of A is a linear combination of the rows of the sub-matrix [ A | A ] implying that rank of A is r . Since A ∈ F ( n − r ) × ( n − r ) , we get R F A ( r ) ≤ ( n − r ) .In fact, over finite fields, for most matrices the above rigidity upper bound is tight. Lemma 2.2.
Let F q be a finite field. The fraction of n × n matrices over F q with rigidity at most ( n − r ) / log n for rank r is O ( n ) .Proof. For any matrix A ∈ F n × nq , R A ( r ) ≤ s if A = S + L where rank ( L ) ≤ r and S hasat most s non-zero entries. Therefore, we first count the number of matrices over F q ofrank at most r and of sparsity at most s . The number of matrices over F q of rank at most r is at most ( nr ) · q r ( n − r ) . The number of matrices over F q of sparsity at most s is at most ( n s ) · q s . Thus, the number of n × n matrices over F q with rigidity at most s for rank r is atmost q nr − r + s + s log q n + n log q . When r ≤ n − c √ n and 0 ≤ s < c ( n − r ) / log n for someconstants c , c , we have q nr − r + s + s log q n + n log q ≤ O ( n ) · q n .As a corollary, the fraction of n × n matrices over F q with rigidity at most ( n − r ) log n forrank r approaches 1. Hence, almost all matrices over F q have rigidity Ω (cid:16) ( n − r ) log n (cid:17) for rank6 . Similarly, over fields of infinite characteristic one can show that for every choice of n there exists an n × n matrix A such that R A ( r ) = ( n − r ) for any r .The best known lower bounds on rigidity for explicit matrices over finite fields is an Ω ( n r log nr ) for log n ≤ r ≤ n /2 due to Friedman [14]. The best known lower boundson rigidity for explicit matrices is an Ω ( n r log nr ) for log n ≤ r ≤ n /2 due to Shokrol-lahi, Spielman, and Stemann [30]. The lower bounds in [30] apply to any totally regularmatrix and use the following combinatorial approach called the untouched minor argument : The untouched minor argument.
Consider a matrix A almost all of whose minors have rank Ω ( r ) . Then, evenafter changing a few entries in A , there is at least one minor in A that is "un-touched" and the rank of A remains Ω ( r ) . Thus, in order to reduce the rank of A to less than r , every minor in A must be altered requiring a large number ofentries of A to be changed.Matrices all of whose minors are full-rank are called totally regular matrices. A stan-dard example of a totally regular matrix is the Cauchy matrix C = { c ij } i , j ∈ [ n ] , c ij = x i + y j for 2 n distinct elements x , . . . , x n , y , . . . , y n ∈ F . Theorem 2.3.
Let M be any totally regular matrix and log n ≤ r ≤ n /2 . Then, R M ( r ) = Ω ( n r log nr ) .Proof Sketch. Let M be any totally regular matrix. For the sake of contradiction, assumethat R M ( r ) = o ( n r log nr ) . Then rank of M can be reduced to r by altering o ( n r log nr ) entries of M . The entries of M can be viewed as a bipartite graph G M = ( U , V , E ) with | U | = | V | = n where ( u , v ) ∈ E ( G M ) if and only if entry M u , v was not altered to reducethe rank of M . Intuitively as G M has many edges, it is likely that G M has a reasonablylarge complete bipartite subgraph. If fewer than n r log nr entries were changed in M , then G M has at least n − ( n r log nr ) edges when r ≤ n /2. In order to show this, we appeal tothe Zarankiewicz problem in extremal graph theory that counts the maximum number ofedges in any bipartite graph that forbids a reasonably large complete bipartite subgraph.If r ≥ log n then any bipartite graph with at least n − ( n r log nr ) edges has a completebipartite subgraph K r + r + . This immediately implies that there is an ( r + ) × ( r + ) sub-matrix M (cid:48) of M that remains untouched. As M be any totally regular, rank ( M (cid:48) ) = r + R M ( r ) = Ω ( n r log nr ) . (cid:3) However, the untouched minor argument has its own limitations and cannot be im-proved to obtain better lower bounds so as to answer Question 1.2(see Section 2.2.1 in[26] for more details).Recall that the major goal here is to prove an Ω ( n + δ ) lower bound on the rigidity of n × n matrices for rank ε n for some ε , δ >
0. In fact, we now demonstrate Ω ( n ) lowerbounds on the rigidity of certain matrices for rank Ω ( n ) from [25, 24].7 heorem 2.4. Let p , p , . . . , p nn be n distinct primes. Let P be the n × n matrix given byP ij = √ p ij . Then, R P ( r ) ≥ n ( n − r ) .Proof Sketch. The proof is an algebraic argument based on the
Shoup-Smolensky dimen-sion of a matrix. The Shoup-Smolensky dimension of a matrix P of order nr (denoted by SSD nr ( P ) ) is the dimension of the vector space spanned by the set of products of nr dis-tinct elements of P . Observe that the elements of P are algebraically independent. Hence, SSD ( P ) of order nr is the number of polynomials in n variables of degree nr which is atmost ( n nr ) . A lower bound of ( n − R P ( r ) nr ) on the Shoup-Smolensky dimension of the matrix P follows from the fact that entries of P are algebraically independent and is not verydifficult to observe. (cid:3) Note that the matrix M in Theorem 2.3 is explicit while matrix P in Theorem 2.4 isnot. Hence, on one hand from Theorem 2.3 we have explicit matrices that are not-so rigidand on the other hand in Thorem 2.4 we have matrices such as the ones constructed fromdistinct primes that are highly rigid but not explicit.On first thoughts, it is intriguing to note Valiant’s claim that if we answer Question1.2 in the affirmative then the linear transformation corresponding to matrix A n cannotbe computed by linear circuits of size O ( n ) and depth O ( logn ) . Similar to the proof ofTheorem 2.3 Valiant’s argument from [33] that we now outline is also graph-theoretic . Theorem 2.5.
If A ∈ F n × n has a linear circuit of size O ( s ) and depth O ( d ) then R A ( s ε log d ) ≤ n · O ( d /2 ε ) for every ε > .Proof Sketch. The proof is based on a graph-theoretic argument that by removing a fewedges the length of every path in a directed acyclic graph can be reduced by a factor of 2.That is, from any directed acyclic graph having s edges and (every) path length boundedby d , by removing at most s / log d edges we can ensure that every path has length atmost d /2. Let C be a linear circuit of size s and depth d computing the matrix A ∈ F n × n .By repeating the above mentioned edge removal process ε times, C has at most s ε / log d edges and every path in C has length at most d /2 ε . As C is a linear circuit the linearfunction computed by output gate of C is a linear combination of the removed edges andthe input gates. This implies that A = S + L where rank ( L ) ≤ s ε log d and every row in S has at most d /2 ε many non-zero entries. (cid:3) By setting s = n log log n and d = log n in Theorem 2.5, we can immediately concludethat for any A ∈ F n × n if R A ( ε n ) = Ω ( n + δ ) for some ε , δ > A : x (cid:55)→ A · x must have either size Ω ( n log log n ) ordepth Ω ( log n ) . In essence, a positive answer to Question 1.2 implies super-linear sizelower bounds on linear circuits of logarithmic depth.8 Explicit constructions of Rigid Matrices
In this section, we review semi-explicit constructions of rigid matrices starting with con-structions in class E NP proceeding towards constructions in P NP . We use the term semi-explicit to broadly refer to matrices that require worse than polynomial time to constructthem. Observe that a random matrix is rigid with high probability. Goldreich and Tal[15] showedthat in order to obtain rigid matrices it is enough to look only inside the space of randomToeplitz matrices.Let a − ( n − ) , . . . , a n − be 2 n − F . A Toeplitz matrix T ∈ F n × n is givenby T ij = a j − i for all i , j ∈ [ n ] . A Hankel matrix H ∈ F n × n is given by H ij = a i + j where a , . . . , a n are in F . The matrices T and H mentioned below are examples of 3 × T = a a a a − a a a − a − a and H = a a a a a a a a a .A matrix T ∈ F n × n is a random Toeplitz (resp., Hankel) matrix T ij = a j − i (resp., H ij = a i + j )where a − ( n − ) , . . . , a n − (resp., a , . . . , a n ) are bits in {
0, 1 } chosen independently anduniformly at random. Goldreich and Tal[15] show that with high probability, a randomToeplitz matrix (resp., Hankel matrix) in F n × n is rigid. Observe that a Hankel matrix isthe mirror image of a Toeplitz matrix. Hence, rigidity of Hankel matrices translates di-rectly to rigidity of Toeplitz matrices. In this section, we prove the following result from[15]: Theorem 3.1.
Let H ∈ F n × n be a random Hankel matrix. For every r ∈ [ √ n , n /32 ] , R H ( r ) = Ω (cid:16) n r log n (cid:17) with probability − o ( ) .Proof Sketch. The high-level idea is to come up with a procedure
TEST s , r ( H ) which whengiven as input a Hankel matrix H ∈ F n × n does the following:(1) If H = S + L with sparsity ( S ) ≤ s and rank ( L ) ≤ r then reject H .(2) If H is a random matrix then accept H with probability 1 − o ( ) .If we succeed in obtaining such a test then for a random Hankel matrix H ∈ F n × n , TEST s , r ( H ) accepts H with probability 1 − o ( ) . Then with probability 1 − o ( ) , R H ( r ) = Ω ( n r log n ) when r ∈ [ √ n , n /32 ] and s = n r log n . (cid:3) The design of
TEST s , r ( H ) depends on the following simple observation that if H is notrigid then there is a super-sparse sub-matrix of H that witnesses the non-rigidity of H :9 bservation 1. Let H ∈ F n × n be a Hankel matrix such that H = S + L for some S , L ∈ F n × n with sparsity ( S ) ≤ s and rank ( L ) ≤ r. Then for every r × r sub-matrix H (cid:48) of H there existsS (cid:48) , L (cid:48) ∈ F r × r such that H (cid:48) = S (cid:48) + L (cid:48) and sparsity ( S (cid:48) ) ≤ s ( n /2 r ) and rank ( L (cid:48) ) ≤ r. Based on Observation 1, we design
TEST s , r ( H ) : Input:
Hankel matrix H ∈ F n × n Partition H into ( n /2 r ) many matrices of dimension 2 r × r each. Set s (cid:48) = s ( n /2 r ) . for every such sub-matrix H (cid:48) of H do for every s (cid:48) -sparse matrix S (cid:48) in F n × n do if rank ( H (cid:48) − S (cid:48) ) ≤ r then reject H end end end Accept H If the given Hankel matrix H in F n × n is not rigid then by Observation 1, line (6) ofAlgorithm is reached for some s (cid:48) -sparse sub-matrix S (cid:48) and TEST s , r ( H ) rejects H . Now, itremains to show that TEST s , r ( H ) accepts a random Hankel matrices with high probability.To complete the proof we show that on input H ∈ F n × n that is a random Hankelmatrix, TEST s , r ( H ) rejects H with probability o ( ) .Pr H [ TEST s , r ( H ) rejects H ] = Pr [ ∃ H (cid:48) ∃ S (cid:48) sparsity ( S (cid:48) ) ≤ s (cid:48) s.t. rank ( H (cid:48) − S (cid:48) ) ≤ r ] ≤ (cid:0) n r (cid:1) ( ( r ) ≤ s (cid:48) ) Pr [ rank ( H (cid:48) − S (cid:48) ) ≤ r ] (3.2)Now, for a moment assume that Pr [ rank ( H (cid:48) − S (cid:48) ) ≤ r ] is quite low (i.e., Pr [ rank ( H (cid:48) − S (cid:48) ) ≤ r ] ≤ − n /16 ). Plugging this into Equation (3.2), we get:Pr H [ TEST s , r ( H ) rejects H ] ≤ (cid:16) n r (cid:17) · (cid:18) ( r ) ≤ s (cid:48) (cid:19) · − n /16 .When s = n r log n , s (cid:48) = n
40 log n and as √ n ≤ r ≤ n /32 Pr H [ TEST s , r ( H ) rejects H ] is o ( ) .So condition (2) of the proof outline is satisfied by TEST s , r .In the rest of this subsection we will show that for Pr [ rank ( H (cid:48) − S (cid:48) ) ≤ r ] ≤ − n /16 when H is carefully partitoned. There are several ways of partitioning H into 2 r × r sub-matrices. For instance, one straight-forward way would be to tile of H by matricesof dimension 2 r × r (see area shaded in solid grey in Figure 2). However, H (cid:48) has only 4 r An arbitrary partition of H may not work. We need to carefully partition H so that the probabilitybounds work. F . Since, we know that a random matrixin F n × n has high rank with high probability, intuitively we want H (cid:48) to see a large numberof random bits so that rank ( H (cid:48) − S (cid:48) ) is low with low probability. By using a cleverer parti-tioning of the Hankel matrix H , we can obtain sub-matrices H (cid:48) that see Θ ( n ) random bitsand show that Pr [ rank ( H (cid:48) − S (cid:48) ) ≤ r ] ≤ − n /16 .Partition H into ( n /2 r ) many sub-matrices of dimension 2 r × r each such that eachsub-matrix H (cid:48) has 2 r consecutive columns and 2 r rows that are at a distance n /2 r apartas shown in Figure 2. As H is Hankel, every row in H (cid:48) sees n /2 r random bits and H (cid:48) sees Θ ( n ) random elements in F . Such a matrix is said to be an n /2 r-Hankel matrix. r r r n/ r n/2r Figure 2: Patition of H into 2 r × r sub-matricesLet R , . . . , R r be rows of H (cid:48) − S (cid:48) . If rank ( H (cid:48) − S (cid:48) ) ≤ r , then there exists a basis B = { R i , . . . , R i r } such that any row in H (cid:48) − S (cid:48) is spanned by a linear combination ofthe row-vectors in B . Let J be the set of rows in H (cid:48) − S (cid:48) that are not in B . In fact, if rank ( H (cid:48) − S (cid:48) ) ≤ r , by a greedy procedure we can compute a basis B such that for everyrow R j in J , R j ∈ span { R k | R k ∈ B , k < j } . Let E denote the event that for every row R j in J , R j ∈ span { R k | R k ∈ B , k < j } (i.e., R j is in the linear span of rows above it). Then,Pr H [ rank ( H (cid:48) − S (cid:48) ) ≤ r ] ≤ Pr [ ∃ B event E holds ] ≤ (cid:18) r ≤ r (cid:19) Pr [ for a fixed B event E holds ] ≤ r · Pr [ for a fixed B event E holds ] .That is, for a fixed set B of rows, we want to estimate the probability that every row notin B is spanned by rows in B occurring above it in the matrix H (cid:48) − S (cid:48) . Let J (cid:48) be the set ofrows not in the basis of H (cid:48) − S (cid:48) that are sufficiently far apart. Let J (cid:48) = { R j , . . . , R j t } ⊆ J be the set of rows such that the distance between R j p and R j p + is at least ( r ) / n for all p ∈ [ t − ] . Note that J (cid:48) ≥ | J | / ( r / n ) = n /4 r . Now, if event E holds, then every row in J (cid:48) is spanned by { R k | R k ∈ B , k < j } . Let E (cid:96) be the event that row R j (cid:96) in J (cid:48) is spanned by11 R k | R k ∈ B , k < j (cid:96) } . Now suppose for any (cid:96) ∈ [ t ] , Pr [ E (cid:96) | E , E , . . . , E (cid:96) − ] = − r , we getthe following:Pr H [ rank ( H (cid:48) − S (cid:48) ) ≤ r ] ≤ r · Pr [ for a fixed B , event E ∩ E ∩ · · · ∩ E t holds ] ≤ r · ( Pr [ E (cid:96) | E , E , . . . , E (cid:96) − ]) t ≤ r · − rt ≤ − n /16 .The last inequality follows as r ≤ n /32 and t ≥ n /4 r . In the remaining part of thissubsection, we show that for a fixed basis B , Pr [ E (cid:96) | E , E , . . . , E (cid:96) − ] = − r for any (cid:96) ∈ [ t ] .For this, we will refer to the following figure: + R j (cid:96) − R j (cid:96) N N N / N N Figure 3: Estimating Pr [ E (cid:96) | E , E , . . . , E (cid:96) − ] Pr [ E (cid:96) | E , E , . . . , E (cid:96) − ] is the probability that there exists a linear combination of therows in { R k | R k ∈ B , k < j (cid:96) } such that R (cid:96) = ∑ k < j (cid:96) α k R k . As the α (cid:48) i s are from F , thereare at most 2 | B | ≤ r many linear combinations. Now, we need to estimate the probabilitythat for a fixed linear combination of rows in { R k | R k ∈ B , k < j (cid:96) } , R (cid:96) = ∑ k < j (cid:96) α k R k . Once α , . . . , α (cid:96) are fixed we can determine the elements in the block N . Block N along with α , . . . , α (cid:96) completely determine N . This way, once the linear combination is fixed, R (cid:96) isa fixed row-vector in {
0, 1 } r . Therefore, Pr [ E (cid:96) | E , E , . . . , E (cid:96) − ] ≤ r · − r ≤ − r for any (cid:96) ∈ [ t ] . Remark 3.3.
For r = o (cid:16) n log n log log n (cid:17) , Theorem yields asymptotically better lower boundthan the current best rigidity lower bound of Ω ( n r log nr ) for rank r. ♦ Remark 3.4.
A random n × n Toeplitz matrix can be constructed by using n random bits. Hence,Theorem gives an explicit construction of rigid matrices in the complexity class E NP . ♦ .2 Construction of rigid matrices in sub-exponential time Having constructed rigid matrices in the class E NP , in this section we discuss the follow-ing result of [22] which gives an explicit family of rigid matrices constructible in sub-exponential time. Theorem 3.5.
Let F q be a finite field and E be an extension of F q of degree at most exp ( O ( n − d log n )) .There exists a family of matrices ( A n ) n ∈ N constructible in time exp ( n − Ω ( d ) ) such that anylinear circuit over F q of depth d computing A n has size at least Ω ( n + d ) . Here, F q denotes the algebraic closure of F q . The results in [22] work for any field F .In this article, we only consider the case when F is a finite field.For any matrix A ∈ F n × n , if A = S + L where sparsity ( S ) ≤ s and rank ( L ) ≤ r thenthe linear transformation x (cid:55)→ A · x can be computed by a linear circuit of depth 2 andsize 2 nr + s . Hence, the following corollary of Theorem 3.5 which gives an explicit familyof rigid matrices constructible in sub-exponential time is not very difficult to observe. Corollary 3.6.
Let F be any field. There exists a family of matrices ( A n ) n ∈ N constructible intime o ( n ) such that R A n ( n − ε ) = Ω ( n ) . Rest of this subsection is devoted to the proof of Theorem 3.5. Following the standardtemplate for proving arithmetic circuit lower bounds, the proof of Theorem 3.5, proceedsby obtaining a complexity measure that is low for matrices computable by low-depth linearcircuits of small size while obtaining explicit matrices for which the measure is large.Here, we use
Shoup-Smolensky dimension of matrices as a complexity measure.The elements of any extension E of field F are univariate polynomials over F of ap-propriate degree and can be viewed as a vector of coefficients. Now, we formally definethe Shoup-Smolensky dimension of a matrix: Definition 3.7 (Shoup-Smolensky dimension) . Let F be any field and E an extension of F . LetM ∈ E n × n . For any t ∈ N , P t ( M ) = (cid:26) ∏ ( a , b ) ∈ T M ab | T ∈ ( [ n ] × [ n ] t ) (cid:27) is the set of products of tdistinct entries of M. The Shoup-Smolensky dimension of M of order t (denoted by
SSD t ( M ) )is the dimension of the space spanned by the set P t ( M ) over F . ♦ The Shoup-Smolensky dimension of M of order t is denoted by SSD t ( M ) and is pre-cisely dim F ( span ( P t ( M )) . First, we show that the Shoup-Smolensky dimension of matri-ces computable by small-size and small-depth circuits is fairly low. Lemma 3.8.
Let M ∈ E n × n be computable by linear circuit C of size s and depth d. Then, for anyt ≤ n /4 such that s ≥ dt, SSD t ( M ) ≤ ( e ( s / dt )) dt .Proof. Let matrix M ∈ E n × n be computable by linear circuit C of size s and depth d withlayers L , . . . , L d + . Then M = P · · · P d where P i is the adjacency matrix of the graph C between layers L i and L i + . Then, 13 ij = ( P · · · P d ) ij = ∑ k ,..., k d − (cid:34) ( P ) i , k · d − ∏ (cid:96) = ( P (cid:96) ) k (cid:96) − k (cid:96) · ( P d ) k d − , j (cid:35) (3.9)As C has size s , the total number of non-zero entries in all of P , . . . , P d is at most s .From Equation (3.9), each entry of M ∈ E n × n is a sum of monomials of degree at most d in the entries of matrices P , . . . , P d . Hence, every element of P t ( M ) is a sum of monomialsof degree at most dt in at most s entries. Thus, SSD t ( M ) ≤ (cid:18) s + dtdt (cid:19) ≤ (cid:18) e ( s + dt ) dt (cid:19) dt ≤ e dt (cid:16) + sdt (cid:17) dt ≤ ( e ( s / dt )) dt as s ≥ dt .Now, we want to construct G ∈ E n × n whose Shoup-Smolensky dimension is large.For any t ∈ N , clearly SSD t ( M ) ≤ ( n t ) and we want SSD t ( G ) ≥ ( n t ) . Now, a simple wayto achieve the maximum possible dimension for SSD t ( G ) is to consider G ij = y e ij wherethe sum of any t elements in { e , . . . , e nn } ⊆ N of size n is always distinct. As everyelement in P t ( G ) is the product of t entries of G , SSD t ( G ) ≥ ( n t ) .Recall that in the end we want to construct a family of matrices in sub-exponentialtime. For this, we will require G ∈ E n × n to be constructed in sub-exponential time (i.e.,time n O ( t ) which is sub-exponential when t = n − d ). This in turn implies that the en-tries of G should be monomials of degree n O ( t ) and that every entry should be constructedin time n O ( t ) . In summary, for any t ∈ N , we require a set S ⊆ N satisfying the followingconditions:1. | S | = n and every subset of S of size t has a distinct sum;2. S can be constructed in time n O ( t ) ; and3. The maximum value of any element in S is at most n O ( t ) .To begin with, consider the following natural candidate set S (cid:48) = {
1, 2, . . . , 2 n − } forthe set S . Clearly, | S (cid:48) | = n and every subset of S (cid:48) of size t has a distinct sum but S (cid:48) doesnot satisfy condition ( ) . A natural next step is to go modulo a prime p so that set S = { a mod p | a ∈ S (cid:48) } satisfies conditions (1)-(3). To ensure (1), intuitively we want p to bequite large. To ensure (2), we need p to be not too large so that we can search for such a p and construct S in time n O ( t ) . 14n particular, we want a prime p such that for any two sets T , T (cid:48) ⊆ S with | T | = | T (cid:48) | , σ t = ∑ a ∈ T a and σ t (cid:48) = ∑ a ∈ T (cid:48) a are different. That is, p does not divide ∏ T , T (cid:48) ⊆ S | T | = | T (cid:48) | ( σ t − σ t (cid:48) ) whichis at most ( n ) n O ( t ) as every element of S is at most 2 n and there are n O ( t ) subsets of S ofsize t . Thus, by the prime number theorem , there are at most log (( n ) n O ( t ) ) distinct primesdividing ∏ T , T (cid:48) ⊆ S | T | = | T (cid:48) | ( σ t − σ t (cid:48) ) . This proves the existence of such a prime p and hence the exis-tence of such a set S ⊆ N for any t ∈ N satisfying conditions (1)-(3). With set S in hand,we now complete the proof of Theorem 3.5. Proof of Theorem . Let F q be any finite field. Let t = n − d and S = { e , . . . , e nn } be theset constructed above. For the matrix G n ∈ ( F [ y ]) n × n given by G ij = y e ij where each e ij ∈ S , we have SSD t ( G ) ≥ ( n t ) . So far, we have constructed a matrix G n ∈ ( F [ y ]) n × n . Sincewe want to obtain a matrix G ∈ ( E ) n × n we need to project y to some value preserving theShoup-Smolensky dimension. For any D , an irreducible polynomial g ( z ) of degree D + F q can be constructed in deterministic time poly ( D , | F q | ) [31]. Let α be the root of g ( z ) that is in E (cid:44) F q [ z ] / (cid:104) g ( z ) (cid:105) . Define A n (cid:44) G n | y = α . Clearly, by the properties of the set S constructed, any element of P t ( A n ) is α m where m ≤ t · n O ( t ) and every element of P t ( A n ) corresponds to a distinct power of α . Thus, by fixing D = · t · n O ( t ) , as { α , α , . . . , α D } are linearly independent over F q , SSD t ( A n ) = SSD t ( G n ) ≥ ( n t ) . Now, if A n is computableby a depth d size s linear circuit then by Lemma 3.8, ( e ( s / dt )) dt ≥ (cid:18) n t (cid:19) If s < n + d /2, the above equation contradicts binomial estimates. Hence s = Ω ( n + d ) . (cid:3) There have been recent constructions of semi-explicit rigid matrices based on a strikingconnection between rigid matrices and probabilistically checkable proofs. Informally, a probabilistically checkable proof (PCP) for a language L is a proof or a certificate for member-ship of x in L such that by probabilistically querying very few locations of the proof, if x ∈ L the verifier can always be convinced of this fact while if x (cid:54)∈ L then with high probabilitythe verifier will reject the proof. For a more formal definition and a huge body of workrevolving around PCPs see [19] and references therein.Along these lines, there have been two results one due to Alman and Chen[1] andthe other by Bhangale et. al in [3] both of which are geared towards constructing PCP s15ith nice properties that aid the construction of semi-explicit rigid matrices. We begin bystating the following construction from [1]:
Theorem 3.10.
There exists a matrix A ∈ F n × n constructible in P NP such that there exists a δ > for all ε > with R F q A n ( ( log n ) − ε ) ≥ δ · n . Very recently, Bhangale et. al in [3] obtain the following strengthening of the parame-ters in the above theorem:
Theorem 3.11.
There is a constant δ ∈ (
0, 1 ) such that there is an FNP machine that for infinitelymany n on input n outputs an matrix W n in F n × n such that R W n ( log n / Ω ( log log n ) ) ≥ δ · n . Remark 3.12.
If you are wondering what the class
FNP is, it is the function version of the class NP . A relation R ( x , y ) is in FNP if there exists a non-deterministic poly-time Turing machine Mthat on input x outputs y such that R ( x , y ) = or rejects when no such y exists. ♦ Alman and Chen also provide a strengthening of the parameters (i.e., R F q A n ( ( log n ) − ε ) ≥ δ · n ) in Theorem 3.10 by assuming that NQP ⊂ P / poly and using the easy witness lemma .However, note that the statement of Theorem 3.11 is an unconditional strengthening ofTheorem 3.10. In this sub-section, we will see a proof sketch of Theorem 3.11 and carefullydelineate the connections between rigid matrices and probabilistically checkable proofs.We begin with the three main hammers need to prove Theorem 3.11:1. There exists a unary language L ∈ NTIME ( n ) \ NTIME ( n / n ) . This is essentiallythe non-deterministic time hierarchy theorem from [36].2. A faster algorithm to compute the sparsity of a given low-rank matrix M developedin [1]. Observe that given any n × n matrix M the sparsity can be computed in time n . However if M has low rank r then it admits a product decomposition M = A · B with A , B having dimensions n × r and r × n respectively. The faster algorithm in [1]when given as input the matrices A , B computes the sparsity of the matrix M = A · B in time n − Ω (
1/ log r ) for all r = n o ( ) .3. NTIME ( n ) has PCP s with "nice" properties. See Theorem 3.21 for actual statement.Before we sketch the proof of Theorem 3.11, we review the correspondence between
PCP s and constraint satisfaction problems(
CSP s). The
PCP verifier V can be viewed as aninstance Φ which is a set of functions ( ϕ , ϕ , . . . , ϕ m ) on a set V of n variables. Each ϕ i : {
0, 1, . . . , t − } q → {
0, 1 } is a constraint or clause whose arity is q . The probabilisticallycheckable proof π is an assignment ¯ a ∈ {
0, . . . , t − } n to the n variables. The querycomplexity of the verifier V is the arity q of the constraints. The randomness complexityof the verifier V is log of the number of constraints (i.e., log m ). We say that an assignment¯ a satisfies constraint ϕ i if ϕ i ( ¯ a ) =
1. Let val ( Φ ) denote the maximum over all assignments¯ a of the fraction of clauses satisfied by ¯ a (i.e., max ¯ a ( ∑ mi = ϕ i ( ¯ a )) / m ). The soundness error s of the PCP is val ( Φ ) . 16s a first step, let us try to construct a high rank matrix using the three ingredientsmentioned above. Note that constructing a high rank matrix is a trivial problem. The goalhowever is to construct a matrix that has high rank even when a few entries are perturbed.The overall idea is to show that the length 2 n witnesses (viewed as a 2 n /2 × n /2 matrix) forthe unary language L ∈ NTIME ( n ) \ NTIME ( n / n ) cannot all be of low rank and therewill exist high rank matrices infinitely often. As NTIME ( n ) has a PCP with some nice properties there exists a verifier V that randomly queries locations in the N × N witnessmatrix W and run a decision predicate to decide if 1 n ∈ L or not. Here N = n /2 . Let us assume for the moment that the decision predicate used by the verifier is the
CSP instance Φ consisting of the N constraints ϕ = x , ϕ = x , · · · , ϕ N = x N . A proofis an assignment of 0’s and 1’s to the variables x , x , · · · , x N which can be viewed as an N × N matrix W . As L ∈ NTIME ( n ) , for every 1 n ∈ L there exists a witness W n thatcertifies the membership of 1 n in L . We begin with the following claim that asserts thatevery witness matrix W n cannot be of low rank and there will exists high rank matricesinfinitely often. Claim 3.13.
The N × N witness matrix W n corresponding to every n ∈ L cannot have low rank.Proof of Claim 3.13.
Suppose not, the N × N witness matrix W n corresponding to every1 n ∈ L is always of low rank(say rank r ). Then W n = A · B for some A ∈ {
0, 1 } N × r , B ∈{
0, 1 } r × N . Consider the following algorithm for L in NT I ME ( n / n ) : Algorithm 1:
Algorithm for L in NTIME ( n / n ) . Input : n Output:
Decide if 1 n ∈ L or not Guess low-rank representation A ∈ {
0, 1 } N × r and B ∈ {
0, 1 } r × N for W = A · B . Compute the sparsity ( W ) using A , B and fast algorithm in (2). Accept 1 n iff sparsity ( W ) > s · N .Now, we need to argue that the above algorithm correctly decides L using the PCP verifier and also investigate its running time. Note that by (3), there is a
PCP verifierfor L with soundness error s and decision predicate Φ .(Although the PCP verifier hasinteresting properties we will not need them at the moment.) Recall that we assumed theclauses of Φ are just variables. Hence, the number of clauses satisfied in Φ by any witness W n is exactly the sparsity ( W n ) .1 n ∈ L ⇒ ∃ proof π such that PCP verifier accepts π with prob. > s ⇒ ∃ proof π such that fraction of clauses satisfied in Φ is > s ⇒ ∃ proof π such that number of clauses satisfied in Φ is > s · N ⇒ ∃ W n ∈ {
0, 1 } N × N such that sparsity ( W n ) > s · N . ⇒ Algorithm 1 accepts in line 3. We consider N = n /2 for simplicity of the argument. The square of the proof length is 2 n poly ( n ) . n (cid:54)∈ L ⇒ ∀ proofs π PCP verifier accepts π with prob. < s ⇒ ∀ proofs π the fraction of clauses satisfied in Φ is < s ⇒ ∀ proofs π the number of clauses satisfied in Φ is < s · N ⇒ ∀ W n ∈ {
0, 1 } N × N sparsity ( W n ) < s · N . ⇒ Algorithm 1 rejects in line 3.We will use the assumption that every witness W n corresponding to 1 n ∈ L has lowrank to argue about the running time. By using non-determinism to guess the low rankmatrices A , B and by using the fast algorithm to computing the sparsity of a low rankmatrix we can ensure that L ∈ NTIME ( n / n ) for a suitable choose of rank r = n / Ω ( log n ) which is a contradiction to the fact that L ∈ NTIME ( n ) \ NTIME ( n / n ) . (cid:3) Note that one simplifying assumption is that every clause of the q CSP correspondingto the
PCP is just a variable(also known as
MAX - - LIN ). Now, let us relax this assumptiona bit by assuming that the
CSP corresponding to the
PCP verifier is a set of M clauseson N variables where each of the form ( x a ⊕ x b ) . As there are M clauses without lossof generality we can assume that every clause in Φ is indexed by two variables i , j ∈ [ M ] . Similarly as there are N variables we assume that every variable is indexed by twovariables a , a ∈ [ N ] . Let c ij be a clause for some i , j ∈ [ M ] then c ij = ( x a , a ⊕ x b , b ) where a , a , b , b ∈ [ N ] . Let us call such an instance where every clause satisfies theabove property as a MAX - - LIN instance. In the previous case when every clause was avariable, we had that the number of clauses in Φ satisfied by a witness W is exactly thesparsity of the witness viewed as a matrix. In the case when each clause is an XOR of twovariables we need to use W to relate the number of clauses in Φ satisfied by a witness W and the sparsity of the a low rank matrix. For this purpose, we define two matrices Q , Q ∈ {
0, 1 } M × M by: Q [ i , j ] = W [ a , a ] (3.14) Q [ i , j ] = W [ b , b ] (3.15)where c ij = ( x a , a ⊕ x b , b ) is a clause in Φ . That is, Q [ i , j ] and Q [ i , j ] contain the assign-ment (according to witness W ) to the first and second variables of the clause c ij . Now, itis easy to observe that the number of clauses in the MAX - - LIN instance Φ satisfied by awitness W is the sparsity of the matrix ( Q + Q ) mod 2. Observation 2.
Let L ∈ NTIME ( n ) \ NTIME ( n / n ) . Assume the N × N witness matrix W n corresponding to every n ∈ L has low rank. Let Q , Q ∈ {
0, 1 } M × M be matrices obtained fromW as given in Equations 3.14 and 3.15. If ( Q + Q ) has a low-rank representation then (byguessing the low-rank representation for ( Q + Q ) ) we can follow the outline of the Algorithm 1to show L ∈ NTIME ( n / n ) which is a contradiction. This implies that W has high rank. The question that remains is that if W has a low-rank representation then does ( Q + Q ) have a low-rank representation? The answer to this question is yes if the MAX - - LIN instance
CSP instance Φ mentioned above satisfies a specific property.18uppose there exists matrices A , A , B , B such that Q = A · W · A (3.16) Q = B · W · B (3.17)are satisfied. Now, observe that if W = A · B then Q + Q = A · W · A + B · W · B = A · ( A · B ) · A + B · ( A · B ) · B = (cid:2) A B (cid:3) · (cid:20) A A (cid:21)(cid:124) (cid:123)(cid:122) (cid:125) ˜ A · (cid:20) BB (cid:21) · (cid:20) A B (cid:21)(cid:124) (cid:123)(cid:122) (cid:125) ˜ B (3.18) = ˜ A · ˜ B That is, if W has a low(rank r ) representation admitting a decomposition W = A · B then Q + Q has a representation ˜ A · ˜ B with rank ( ˜ A · ˜ B ) ≤ r . Now, by using the samealgorithmic strategy as before we can construct an NTIME ( n / n ) algorithm for L : Algorithm 2:
Algorithm for L in NTIME ( n / n ) . Input: n Output:
Decide if 1 n ∈ L or not Guess low-rank representation A ∈ {
0, 1 } N × r and B ∈ {
0, 1 } r × N for W = A · B . Use
PCP verifier for L to compute matrices A , A , B , B of appropriatedimensions. Using Equation (3.18) compute matrices ˜ A , ˜ B . Calculate the sparsity of ˜ A · ˜ B . Accept if and only if sparsity ( ˜ A · ˜ B ) > s · N By an argument similar to previous case, we can conclude that the above algorithmcorrectly decides L . However, we have the following few caveats. We will address themone by one.1. How to compute matrices A , A , B , B in time γ n for some suitably chosen γ > ? Since L ∈ NTIME ( n ) there exists a PCP verifier for L which is used by [3] to devicea procedure that given a row-index i of A (respectively A , B , B ) computes the non-zero column entries of i th row in time 2 γ n . Now, using the algorithm from [1] to com-pute sparsity of ˜ A · ˜ B we can ensure that the above algorithm is in NTIME ( n / n ) .2. We have shown that if there exists matrices A , A , B , B such that Equations 3.16 and 3.17hold then the witness matrix W must be of high rank infinitely often. But what does it mean o say that there exists matrices A , A , B , B such that Equations 3.16 and 3.17 are satisfied?What structural requirement does this impose on the MAX - - LIN instance Φ ? It is not very difficult to note that the existence of A , A , B , B such that Equations3.16 and 3.17 hold is the same as placing the restriction that for any clause c ij =( x a ( i , j ) , a ( i , j ) ⊕ x b ( i , j ) , b ( i , j ) ) a ( i , j ) = a ( i ) and a ( i , j ) = a ( j ) (3.19) b ( i , j ) = b ( i ) and b ( i , j ) = b ( j ) (3.20)Any CSP instance Φ satisfying Equations 3.19 and 3.20 is said to be rectangular . In fact,rectangularity can be extended to arbitrary q CSP s. A q
CSP is rectangular if the ( i , j ) th constraint in the CSP on N variables and M clauses involves q variables x t ( i , j ) , . . . , x t q ( i , j ) then for any i ∈ [ q ] the function t i : [ M ] × [ M ] → [ N ] × [ N ] is a product of functions a i : [ M ] → [ N ] and b i : [ M ] → [ N ] . Furthermore, we say a PCP is rectangular if thecorresponding CSP is rectangular.3.
Recall that we set out to prove that W is a rigid matrix but we have shown that W has highrank. What are the "nice" properties of the
PCP that enable us to ensure that even if a few entries of W are changed the rank of matrix W remains high?
We have a unary language L ∈ NTIME ( n ) and a PCP verifier that decides L by us-ing the corresponding CSP instance Φ . That is, if 1 n ∈ L then there is a witness W n (assignment to the N variables in Φ ) that satisfies at least c (say 75%) of the clausesin Φ . Similarly, if 1 n (cid:54)∈ L then for any witness W n (assignment to the N variables in Φ )satisfies at most s (say 51%) of the clauses in Φ . Note that 1 < c < s <
0. Now, we wantto show that W when viewed as an N × N matrix has high rigidity. Proof Sketch of Theorem 3.11.
Let L ∈ NTIME ( n ) be a unary language such that L (cid:54)∈ NTIME ( n / n ) . For the sake of contradiction assume that for every 1 n ∈ L the witness W is close (say δ -close) to a a low -rank matrix W (cid:48) . That is, by changing at most 2 δ en-tries in the matrix W we can get the matrix W (cid:48) . Now as W (cid:48) has low-rank we can followthe outline in Algorithm 3 by guessing the low-rank representation of W (cid:48) (instead ofguessing the low-rank representation of W ) in line 1. In order to argue that the algo-rithm correctly decides L , we use the completeness and soundness error correspondingto the PCP verifier for L .If 1 n ∈ L then there is a witness W n that satisfies at least c − δ fraction of the clauses in Φ and when 1 n (cid:54)∈ L any witness W n satisfies at most s fraction of the clauses in Φ . Bysetting δ = ( c − s ) /3, we get that when 1 n ∈ L then there is a witness W n that satisfiesat least > s · N clauses in Φ and when 1 n (cid:54)∈ L any witness W n satisfies at most s · N clauses in Φ . This gap in the number of clauses satisfied by W n can be used by theAlgorithm 3 in line 5 to distinguish between the YES and NO instances.Since the PCP verifier randomly queries the locations in the proof it is possible thatthe verifier queries exactly the 2 δ locations in W (cid:48) in which the proofs W and W (cid:48) differ.20ote that if every proof location is equally likely to be queried by the PCP verifier thenthe probability that the verifier queries exactly the wrong δ locations in W (cid:48) is small . A PCP whose verifier has such a property is said to be smooth .For every step in the Algorithm 3 to yield the desired outcome observe that we have toprove the existence of short, efficient, smooth, rectangular PCP s for
NTIME ( n ) . These arethe "nice" properties that we expect the PCP for for
NTIME ( n ) to have. The existenceof such PCP s for
NTIME ( n ) is the major contribution of [3]. We state this formally inTheorem 3.21 below without giving the proof.From the above discussion, by choosing r = n / Ω ( log n ) we can ensure that L ∈ NTIME ( n / n ) which is a contradiction. Hence, W is rigid infinitely often and the algorithm runs in FNP . (cid:3) Theorem 3.21.
Let L be a language in
NTIME ( n ) . For every constants s ∈ (
0, 1/2 ) and τ ∈ (
0, 1 ) , there exists a constant-query, smooth and τ -almost rectangular PCP for L over theBoolean alphabet with soundness error s, proof length at most n poly ( n ) and verifier runningtime at most O ( τ n ) . The above theorem is a very informal statement of the
PCP construction in [3]. Exactstatement mentioning all the parameters of the
PCP can be found in Theorem 8.2 of [3].We do not include a proof of the above theorem here and refer the interested readersto Section 4 through 8 of [3].4.
Why should the predicate corresponding to the
PCP verifier be a
MAX - - LIN predicate?
For example the
CSP could be
MAXCUT (which is also an example of a
MAX - - LIN in-stance) in directed graphs. For a discussion in the case when the decision predicateof the
PCP verifier(which is equivalent to
CSP instance) is
MAXCUT see Section 1.3in [3]. One interesting observation is that Algorithm 3 for L is in NTIME ( n / n ) evenif predicate corresponding to the PCP verifier is a
MAX - q - LIN predicate where q is aconstant. This is because in the case of MAX - q - LIN every clause if the XOR of q vari-ables and similar to Equations 3.16 and 3.17 there exists q matrices Q , . . . , Q k such that Q = A · W · A , Q = B · W · B , Q = C · W · C and so on till Q k . As long as q is aconstant these matrices can be computed using the PCP verifier in line 2 of Algorithm3 and L ∈ NTIME ( n / n ) . The argument then proceeds similar to MAX - - LIN .Now, all we need to do to answer Question 4 is to reduce from an arbitrary
MAX - q - CSP to a
MAX - q (cid:48) - LIN for some constant q (cid:48) preserving the gap between ( c , s ) where c , s arethe completeness and soundness guarantee. Note that we have a PCP verifier for L whose predicate is a MAX - q - CSP instance that verifies the proof ˜ A · ˜ B .Those familiar with Hastad’s 3-bit PCP for NP , recall that for every δ > L ∈ NP there exists a PCP verifier V that reads 3 bits of proof π and chooses In [3], the authors actually prove the existence of short, efficient, smooth, almost-rectangular
PCP s withrandomness-oblivious property. ( i , i , i ) and bit b ∈ {
0, 1 } according to some distribution and accepts iff ( π i ⊕ π i ⊕ π i ) = b . Further, V has completeness 1 − δ and soundness 1/2 + δ .Along similar lines, in our case L ∈ NEXP we want to compute acceptance probabilityof verifier for ˜ A · ˜ B . In [3], the authors carefully design matrices ˜ A , . . . , ˜ A q (cid:48) , ˜ B , . . . , ˜ B q (cid:48) such that the acceptance probability of verifier V for ˜ A · ˜ B is at most the acceptanceprobability of verifier V for ( ˜ A · ˜ B ) ⊕ · · · ⊕ ( ˜ A q (cid:48) · ˜ B q (cid:48) ) . Relating the acceptance prob-ability of the MAX - q - CSP instance to the acceptance probability of the
MAX - q (cid:48) - LIN in-stance requires Fourier analysis.
In [21], the authors attempt to construct rigid matrices by using a approach based on algebraic geometry that we discuss in this subsection. The rigid matrices demonstrated in[21] have the same shortcomings as that of [24, 25] in the sense that these matrices arenot as explicit as we want them to be although their rigidity matches the upper bound inLemma 2.1. However, the construction of rigid matrices based on ideas from eliminationtheory is quite insightful.
Theorem 3.22.
Let p , . . . , p nn be n distinct primes greater than n n and ζ ij be the primitiveroot of unity of order p ij (i.e., ζ ij = e π i / p ij ). Let A ∈ K n × n be the matrix given by A [ i , j ] = ζ ij where K = Q ( ζ ij , . . . , ζ ij ) . Then R K A ( r ) = ( n − r ) .Proof Sketch of Theorem 3.22. The proof involves the following observations as basic build-ing blocks:(1) The set of n × n matrices of rigidity at most s for rank r have dimension n − ( n − r ) + s when viewed as an algebraic variety.(2) By using (1), prove the existence of a non-zero polynomial g of not-so-large degreein the elimination ideals associated with matrices with rigidity at most s .(3) As the matrix A has as entries primitive roots of unity of high order, A cannot satisfyany polynomial g with such a degree upper bound(i.e., g ( A ) (cid:54) = Notation 3.23. • For any n × n matrix A ∈ F n × n , Supp ( A ) denotes the positions ( i , j ) in Awhere there are non-zero entries.• Let pattern π denote a subset of positions { ( i , j ) | i , j ∈ [ n ] } in the matrix A. For anypattern π let S ( π ) be set of n × n matrices A over F that are supported only on positionsin π (i.e., Supp ( A ) ⊆ π ). For a fixed pattern π denote by RIG ( n , r , π , F ) the set of matrices in F n × n such that theirrank can be reduced to r by changing only the locations indexed by π . We will drop F whenthe field is clear from the context for ease of notation. ♦ Step (1):
Let π be a fixed pattern of size s . By definition of matrix rigidity, for every ma-trix A ∈ RIG ( n , r , π ) there exists a matrix C π ∈ F n × n with Supp ( C π ) ⊆ π and rank ( A + C π ) = r . The first observation to make is that both these conditions - Supp ( C π ) ⊆ π and rank ( A + C π ) = r can be expressed via polynomial equations(support can be expressedvia simple linear equations and rank being r can be expressed by ( r + ) × ( r + ) minorsof A + C π being 0). That is, RIG ( n , r , π ) is solution of a system of finitely many polynomialequations in variables x , . . . , x n , t , . . . , t s . Hence, RIG ( n , r , π ) is an affine algebraic vari-ety and so is RIG ( n , r , ≤ s ) = (cid:83) π : | π | = s the set of n × n matrices of rigidity at most s for rank r . Thus, it makes sense to talk about the dimension of RIG ( n , r , ≤ s ) as an affine algebraicvariety. Now, we analyse upper and lower bounds on the dimension of RIG ( n , r , ≤ s ) . Upper bound on dim ( RIG ( n , r , ≤ s )) : Clearly, dim ( RIG ( n , r , ≤ s )) ≤ n . By the defini-tion of rigidity, there is a natural map Φ from the product of rank r n × n matrices and S ( π ) to RIG ( n , r , π )) (i.e., Φ (( A , C π )) = A + C π ).As mentioned earlier the set of rank r n × n matrices as well as S ( π ) form an affinealgebraic variety. Note that dim ( S ( π )) = s for any pattern π of size s . Also, by anargument similar to Lemma 2.1 dimension of the variety corresponding to rank r n × n matrices is n − ( n − r ) . Putting this all together, dim ( RIG ( n , r , π )) ≤ n − ( n − r ) + s since Φ is surjective. Lower bound on dim ( RIG ( n , r , ≤ s )) : First, let us try to understand elimination ideals associated with matrices of low rigidity. For any pattern π with | π | = s , let T π denotethe n × n matrix with variables y , . . . , y s as entries in the s positions indexed by π . It isclear hat for any n × n matrix X with entries x , . . . , x n , the fact that rank ( X + T π ) = r isthe same as saying that all ( r + ) × ( r + ) minors of the matrix X + T π vanish. Thendenoting by I ( n , r , π ) the ideal generated by the ( r + ) × ( r + ) minors of the ma-trix X + T π , we get that I ( n , r , π ) ⊆ F [ x , . . . , x n , y , . . . , y s ] . It is not difficult to ob-serve that RIG ( n , r , π ) = ψ ( V ( I ( n , r , π )) where ψ is a projection map representing theprojection of s variables y , . . . , y s . Let us define the elimination ideal EI ( n , r , π ) as theideal EI ( n , r , π ) (cid:44) I ( n , r , π ) ∩ F [ x , . . . , x n ] . By Closure Theorem of elimination theory[], ψ ( V ( I ( n , r , π )) = V ( EI ( n , r , π )) . Hence, dim ( RIG ( n , r , π )) = dim ( V ( EI ( n , r , π ))) anddim ( RIG ( n , r , ≤ s )) = max k ≤ s , π dim ( V ( EI ( n , r , π ))) . The authors in [21] demonstrate apattern π of size k ≤ s such that dim ( V ( EI ( n , r , π ))) ≤ n − ( n − r ) + s thhus obtaininga lower bound on dim ( RIG ( n , r , ≤ s )) . Step (2):
From Step 1, proving that a matrix A has rigidity ( n − r ) for rank r is thesame as showing that A (cid:54)∈ RIG ( n , r , ≤ ( n − r ) − ) . This is in essence the same as23roving that A (cid:54)∈ RIG ( n , r , π ) for any pattern π with | π | = ( n − r ) −
1. Given that
RIG ( n , r , π ) = V ( EI ( n , r , π )) , we want to show A (cid:54)∈ V ( EI ( n , r , π )) for any pattern π with | π | = ( n − r ) −
1. In other words, our goal is to the existence of a non-zero fairly-low de-gree polynomial g ∈ EI ( n , r , π ) such that g ( A ) (cid:54) =
0. But what if EI ( n , r , π ) = (cid:104) (cid:105) ? To rulethis out, observe that dim ( V ( EI ( n , r , π ))) < n for any pattern π with | π | = ( n − r ) − EI ( n , r , π ) (cid:54) = ( ) by Hilbert’s Nullstellensatz.In particular, the authors in [21] use the effective Nullstellensatz theorem of [6] whichis as follows: Theorem 3.24.
Let Z = { z , . . . , z m } and I = (cid:104) F , . . . , f p (cid:105) ⊆ F [ Z ] such that the maximumdegree of any of the f i ’s is d. Let Z (cid:48) be a subset of (cid:96) Z variables. If I ∩ F [ Z (cid:48) ] (cid:54) = (cid:104) (cid:105) then thereexists a non-zero polynomial g ∈ I ∩ F [ Z (cid:48) ] such that g = ∑ i ∈ [ p ] f i g i where g i ∈ F [ Z (cid:48) ] and deg ( g i f i ) ≤ d p ( d p + ) . In our setting, Z = { x , . . . , x n , y , . . . , y s } , Z (cid:48) = { x , . . . , x n } , I = I ( n , r , π ) and d ≤ r + n , r . Then there exists a polynomial g in I ( n , r , π ) ∩ F [ x , . . . , x n ] of degree less than n n where π is a pattern of size ( n − r ) −
1. This shows that there isa polynomial g ∈ EI ( n , r , π ) of degree < n n . Step (3):
Let A ∈ K n × n be the matrix in the statement of the theorem, A [ i , j ] = ζ ij where K = Q ( ζ ij , . . . , ζ ij ) . It is not very difficult to g ( A ) (cid:54) = A arealgebraic.This completes the proof of Theorem 3.22. Remark 3.25.
Along the lines of analysing the degree of the polynomial g in the above result,Kumar and Volk in [22] reduce the upper bound on the degree of such a polynomial to poly ( n ) .That is, there is a polynomial P on n variables of degree at most poly ( n ) such that any matrixM ∈ F n × n with R M ( n /100 ) ≤ n /100 satisfies P ( M ) = . ♦ In this section, we survey some of the recent developments on mathematical techniquesinvolved in proving the non-rigidity of some of the matrix families that were previouslyconjectured to be rigid.To begin with, let us focus on the upper bounds on the rigidity of 2 n × n Walsh-Hadamard matrix proved by Alman and Williams in [2].
The
Walsh-Hadamard matrix H n is a 2 n × n matrix whose rows and columns are indexedby vectors in {
0, 1 } n (in lexicographic order). The entries of H n are given by H n [ x , y ] =( − ) (cid:104) x , y (cid:105) for any x , y ∈ {
0, 1 } n where (cid:104) x , y (cid:105) denotes the inner product of vectors x and y .24lman and Williams in [2] prove the following upper bounds on the rigidity of Walsh-Hadamard matrix : Theorem 4.1.
Let F be any field. For every ε ∈ (
0, 1/2 ) , R H n ( n ( − f ( ε )) ) ≤ n ( + ε ) wheref ( ε ) = Θ ( ε log ( ε )) .Proof Sketch of Theorem 4.1. The idea behind the proof is to approximate H n by a sparsepolynomial(say M (cid:48) ) so that we can obtain a trivial upper bound on the rank of M (cid:48) and H n is close to M (cid:48) implying that rigidity of H n is low.In order to provide more clarity, we delineate the approach to prove Theorem 4.1which has two broad steps:1. Approximate H n by the truth table matrix of a sparse polynomial. That is, constructa sparse polynomial p : {
0, 1 } n → R such that the 2 n × n matrix M p given by M p [ x , y ] (cid:44) p ( x + y ) (for all x , y ∈ {
0, 1 } n ) agrees with H n on most entries. If p hassparsity 2 n − Ω ( ε n ) then rank ( M p ) ≤ n − Ω ( ε n ) . Although this way M p has low rank, H n does not agree with M p on all entries. The construction of p is only such that H n agrees with M p on those where (cid:104) x , y (cid:105) ∈ [ ε n , ( + ε ) n ] for some ε ∈ (
0, 1/2 ) .Thus, sparsity ( H n − M p ) could be large (which we tackle in Step 2).2. Construct a matrix M (cid:48) from M p that agrees with H n on far more entries than that of M p but has rank comparable to that of M p . Obtain M (cid:48) from M p such that M (cid:48) [ x , y ] = H n [ x , y ] whenever:(i) M (cid:48) [ x , y ] = M p [ x , y ] (i.e., (cid:104) x , y (cid:105) ∈ [ ε n , ( + ε ) n ] ); or(ii) one of x or y has a large fraction of 1’s (i.e., when | x | (cid:54)∈ [( − ε ) n , ( + ε ) n ] or | y | (cid:54)∈ [( − ε ) n , ( + ε ) n ] ).Now, M (cid:48) disagrees with H n only on entries indexed by elements in D = { ( x , y ) ∈ {
0, 1 } n × {
0, 1 } n | | x | ∈ [( − ε ) n , ( + ε ) n ] , | y | ∈ [( − ε ) n , ( + ε ) n ] , (cid:104) x , y (cid:105) (cid:54)∈ [ ε n , ( + ε ) n ] . } To complete the proof of Theorem 4.1, we estimate the size of D . Observe that for any ( x , y ) in {
0, 1 } n × {
0, 1 } n such that | x | ∈ [( − ε ) n , ( + ε ) n ] , | y | ∈ [( − ε ) n , ( + ε ) n ] , the inner product (cid:104) x , y (cid:105) has value at most n ( + ε ) . In order to estimate | D | , itsuffices to count for a given x ∈ {
0, 1 } n with | x | ∈ [( − ε ) n , ( + ε ) n ] , the numberof vectors y ∈ {
0, 1 } n with | y | ∈ [( − ε ) n , ( + ε ) n ] and (cid:104) x , y (cid:105) < ε n which is givenby ( + ε ) n ∑ i =( − ε ) n ε n ∑ j = ( | x | i )( n −| x | i − j ) . Using the fact that ε ∈ (
0, 1/2 ) and ( − ε ) n ≤ | x | , | y | ≤ ( + ε ) n , | D | ≤ O ( n ε log ( ε )) . (cid:3)
25n the rest of this section we discuss in detail the steps outlined above. To discuss Step1, consider the following lemma that uses multivariate polynomial interpolation overintegers to construct a sparse polynomial agreeing with the matrix H n on many entries. Lemma 4.2.
Let F be a field and ε ∈ (
0, 1/2 ) . There exists a n-variate multilinear polynomialp ( ¯ x , ¯ y ) of sparsity at most n − Ω ( ε n ) such that for any x , y ∈ {
0, 1 } n with (cid:104) x , y (cid:105) ∈ [ ε n , ( + ε ) n ] , p ( ¯ x , ¯ y ) = H n [ ¯ x , ¯ y ] = ( − ) (cid:104) ¯ x , ¯ y (cid:105) . Proof.
First, we construct an n -variate polynomial q ( z , . . . , z n ) with integer coefficientssuch that for any ¯ a ∈ {
0, 1 } n with | ¯ a | ∈ [ ε n , ( + ε ) n ] , q ( ¯ a ) = ( − ) | ¯ a | . Claim 4.3.
Given integers c , c , . . . , c r , there exists an n-variate polynomial q ( z , . . . , z n ) ofdegree r − that agrees with c , . . . , c r on boolean inputs of hamming weight k +
1, . . . , k + r forany k ≤ n − r. That is, q ( ¯ a ) = c i for any ¯ a ∈ {
0, 1 } n such that | ¯ a | = k + i.Proof of Claim . Let us consider the most natural construction of such an n -variatepolynomial q : {
0, 1 } n → Z of degree r −
1. Then, q ( z , . . . , z n ) = r − ∑ i = b i ∑ α ∈{ } n | α | = i n ∏ j = x α j j , (4.4)where b , . . . , b r − are in Z . Observe that for any ¯ a ∈ {
0, 1 } n , q ( ¯ a ) = r − ∑ i = b i ( | ¯ a | i ) . Whenever | ¯ a | = k + i , we want q ( ¯ a ) = c i . That is, for every i ∈ [ r − ] , we need b (cid:18) k + i (cid:19) + b (cid:18) k + i (cid:19) + · · · + b r − (cid:18) k + ir − (cid:19) = c i This gives us the following matrix equation: ( k + ) ( k + ) · · · ( k + r − )( k + ) ( k + ) · · · ( k + r − ) ... ... ... ... ( k + r ) ( k + r ) · · · ( k + rr − ) · b b ... b r − = c c ... c r It is not difficult to note that the r × r matrix in the above equation with binomial coeffi-cients as entries is invertible. Hence there exists a vector b = ( b b · · · b r − ) satisfying theabove matrix equation which in turn completes the proof of Claim 4.3. (cid:3) Now, we proceed with the proof of Lemma 4.2. Observe that by setting r = ( − ε ) n + k = ε n − c i = ( − ) k + i ( k + r = ( + ε ) n ) in Claim 4.3, we get thepolynomial q ( z , . . . , z n ) over Z of degree ( − ε ) n satisfying the required properties.26By taking coefficients of q modulo an appropriate m , we can get a polynomial q over afield F satisfying the required properties.) In the remaining part of the proof, we showhow to use the above lemma to obtain the 2 n -variate polynomial p ( x , . . . , x n , y , . . . , y n ) as required. A semi-natural candidate for p ( x , . . . , x n , y , . . . , y n ) (cid:44) q ( x y , . . . , x n y n ) . Forany x , y ∈ {
0, 1 } n such that (cid:104) x , y (cid:105) ∈ [ ε n , ( + ε ) n ] we immediately have p ( ¯ x , ¯ y ) = q ( ¯ z ) where | ¯ z | = (cid:104) ¯ x , ¯ y (cid:105) . Hence, by construction of q in Claim 4.3, p ( ¯ x , ¯ y ) = ( − ) (cid:104) x , y (cid:105) . Since weare only interested in x , y in {
0, 1 } n , we can make p multilinear by setting x i = x i , y i = y i for all i ∈ [ n ] . In p the variables x i and y i are tied together whenever they occur. Thus, wecan view p ( x , . . . , x n , y , . . . , y n ) as a multilinear polynomial of degree ( − ε ) n on n variables w , . . . , w n where each w i = x i y i and sparsity ( p ) = ( n ( − ε ) n + ) ≤ n − Ω ( ε n ) .Now, we move on to Step 2 of the proof of Theorem 4.1 outlined above. To obtain M (cid:48) from M p , correct those rows in M p indexed by { x ∈ {
0, 1 } n | | x | (cid:54)∈ [( − ε ) n , ( + ε ) n ] and those columns in M p indexed by { y ∈ {
0, 1 } n | | y | (cid:54)∈ [( − ε ) n , ( + ε ) n ] . Sincewe want rank ( M (cid:48) ) to be comparable to rank ( M p ) the idea is to construct a matrix M (cid:48) such that M (cid:48) (cid:44) M p − ( M + · · · + M t ) where rank ( M i ) = i ∈ [ t ] and t = n − Ω ( ε n ) . For every row r indexed by x ∈ {
0, 1 } n with | x | (cid:54)∈ [( − ε ) n , ( + ε ) n ] let matrix M r ∈ F n × n be given by M r [ x , y ] = H n [ x , y ] for all y ∈ {
0, 1 } n and everyother row of M r be zero. Similarly, for every column c indexed by y ∈ {
0, 1 } n with | y | (cid:54)∈ [( − ε ) n , ( + ε ) n ] , let matrix M c ∈ F n × n be given by M c [ x , y ] = H n [ x , y ] for all x ∈ {
0, 1 } n and every other column of M c be zero. Observe that each such matrix M r (resp., M c ) has rank 1. Note that t = · |{ v ∈ {
0, 1 } n | | v | (cid:54)∈ [( − ε ) n , ( + ε ) n ] }| = · (cid:34) ( − ε ) n ∑ i = (cid:18) ni (cid:19) + n ∑ i =( + ε ) n (cid:18) ni (cid:19)(cid:35) = · ( − ε ) n ∑ i = (cid:18) ni (cid:19) ≤ · n · n − Ω ( ε n ) Therefore, rank ( M (cid:48) ) ≤ rank ( M p ) + · n · n − Ω ( ε n ) ≤ · n · n − Ω ( ε n ) as rank ( M p ) ≤ n − Ω ( ε n ) from Step 1. Further, on every row M (cid:48) differs from H n on at most | D | ≤ n ε log ( ε ) entries. Hence, R H n ( n ( − f ( ε )) ) ≤ n ( + ε ) . (cid:3) Now, we review a recent result of Dvir and Edelman [9] on the non-rigidity of certainmatrices(based on functions over finite fields) using the
Croot-Lev-Pach Lemma . Let F q be any finite field. For any function f : F nq → F q let M f be the q n × q n matrix givenby M f [ I , J ] = f ( I + J ) for any I , J in F nq . In the following subsection we discuss a result27rom [9] proving an upper bound on the rigidity of the function matrix M f . Theorem 4.5.
Let f : F nq → F q be any function. For any ε > and n sufficiently large, thereexists an ε (cid:48) > such that R M f ( q n ( − ε (cid:48) ) ) ≤ q n ( + ε ) . The above theorem says that for any function f : F nq → F q and any ε >
0, the matrix M f has rigidity at most q n ( + ε ) for rank q n ( − ε (cid:48) ) where the rank is over F q . Proof Sketch of Theorem 4.5.
The proof is extremely elegant and involves the following twosteps:1.
Approximate f : F nq → F q by a polynomial p : F nq → F q of low degree ( d = ( − δ ) n ( q − ) ) for some δ >
0. By approximating function f by polynomial p , we mean |{ x ∈ F nq | p ( x ) (cid:54) = f ( x ) }| ≤ q n ε .2. Show that for any polynomial p : F nq → F q of sufficiently low degree( d being ( − δ ) n ( q − ) ), rank ( M p ) ≤ q n ( − ε (cid:48) ) for some ε (cid:48) > δ and ε .From Steps 1 and 2, we can infer that M f = S + L where S = M f − M p and L = M p .From Step 1 function f and polynomial p differ on at most q n ε many inputs implying that S has at most q n ε non-zero entries in every row and column. Hence, sparsity ( S ) ≤ q n ( + ε ) .From Step 2, rank ( L ) ≤ q n ( − ε (cid:48) ) for some ε (cid:48) >
0. Thus , R M f ( q n ( − ε (cid:48) ) ) ≤ q n ( + ε ) . (cid:3) We now delve into the details of Steps 1 and 2. The set of all functions { f | f : F nq → F q } denoted by F ( q , n ) is a vector space of dimension q n with basis { x a x a · · · x a n n | ≤ a i ≤ q − } . Let F d ( q , n ) be the set of polynomials of degree d in F q [ x , . . . , x n ] . F d ( q , n ) is asubspace of F ( q , n ) with basis { x a x a · · · x a n n | ≤ a i ≤ q − ∑ i a i = d } . Any function f : F nq → F q can be viewed as a vector v in F ( q , n ) .To begin with, we show that for any d ≤ n by changing the vector v in F ( q , n ) (cor-responding to the function f ) on dim ( F ( q , n )) − dim ( F d ( q , n )) many coordinates, we canobtain a vector u in F d ( q , n ) (corresponding to a polynomial p of degree d ). To com-plete the proof of Step 1, we obtain a lower bound of q n − q n ε on dim ( F d ( q , n )) when d = ( − δ ) n ( q − ) .Let m (cid:44) dim ( F ( q , n )) (i.e., m = q n ) and r (cid:44) dim ( F d ( q , n )) . As F d ( q , n ) is a subspace of F ( q , n ) , there is an m × r matrix M of rank r such that F d ( q , n ) is the image of the lineartransformation defined by M . The aim here is to construct for any vector v ∈ F ( q , n ) , avector u in F d ( q , n ) that differs from v on m − r coordinates. In other words, for any vector v in F ( q , n ) , we want to construct a vector u agreeing with v on r coordinates such that u is in the image of the transformation defined by M (i.e., u = My for some y ∈ F rq ). As rank ( M ) = r , there exists row-vectors R i , . . . , R i r that span the row-space of M . A naturalattempt would be do construct a partial vector ¯ u by setting u i j (cid:44) v i j for every j ∈ [ r ] . To The term function matrix used here is non-standard terminology and is used here to denote that there isa specific function associated with these matrices. u , observe that the matrix A with rows R i , . . . , R i r has full rank implying that there is a unique x satisfying Ax = ¯ u . As rows of A spanrows of M , the remaining coordinates of u can be fixed using matrix A and vector x . Thisimplies that vector u is in F d ( q , n ) .Now, for d = ( − δ ) n ( q − ) , we want a lower bound of q n − q n ε on dim ( F d ( q , n )) .For this, we want to bound the size of the set { x a x a · · · x a n n | ≤ a i ≤ q − ∑ a i = d } .Let m = x a · · · x a n n be a monomial of degree d . Consider the map ϕ : x a · · · x a n n (cid:55)→ x ( q − ) − a · · · x ( q − ) − a n n . Clearly, ϕ is a bijection and deg ( ϕ ( m )) ≤ n ( q − ) − d ≤ δ n ( q − ) when deg ( m ) ≥ ( − δ ) n ( q − ) . Hence, estimating dim ( F d ( q , n )) is the same as estimat-ing |{ x a x a · · · x a n n | ≤ a i ≤ q − ∑ a i ≤ δ n ( q − ) }| . Now, by multilinearizing themonomial x a x a · · · x a n n by x x · · · x a x · · · x a · · · x n x n · · · x na n , it suffices to countthe number of multilinear monomials of degree δ n ( q − ) in n ( q − ) variables. There-fore, dim ( F d ( q , n )) = ( n ( q − ) δ n ( q − ) ) = n ( q − ) H ( δ ) where H is the binary entropy function. Bychoosing δ to be a small enough, we get dim ( F d ( q , n )) ≤ q n ε where δ is a function of q and ε . Now, we move on to Step 2. We use the Croot-Lev-Pach Lemma to obtain an upperbound on the rank of the matrix M p where the degree of p if small enough. Remark 4.6 (The Cap Set Problem) . Consider the space Z n . The cap set problem is to un-derstand the maximum size of a cap set , a subset A of Z n that does not contain pairwise distinctelements a , b and c that lie in a line (i.e., a + b = c). That is, we want to find the size of the largestset A in Z n that does not contain an arithmetic progression of the form { x , x + r , x + r } for somer > . A trivial upper bound on | A | is that of n . By using the polynomial method Croot, Levand Pach in [4] showed that over Z n , any cap set A has size at most cn where c ≈ . Formore on this problem, see blog posts [32, 17] and references therein. ♦ We now state and prove the
Croot-Lev-Pach Lemma completing the proof of Step 2.
Lemma 4.7.
Let p be a polynomial in F d ( q , n ) and M p be the q n × q n matrix given by M p [ I , J ] = p ( I + J ) for all I , J ∈ F nq . Then, rank ( M p ) ≤ · dim ( F d /2 ( q , n )) .Proof. Let p be a polynomial in F d ( q , n ) , so deg ( p ) ≤ d . We show that for any x , y ∈ F nq , p ( x + y ) = ∑ Ri = f i ( x ) · g i ( y ) , where R ≤ · dim ( F d /2 ( q , n )) and the polynomial f i (re-spectively g i ( y ) ) is independent of what x ∈ F nq (respectively y ∈ F nq ) is. This immedi-ately implies that M p = ∑ Ri = M i where each M i is the outer-product of two vectors in F tq ( t = q n ) and rank ( M i ) =
1. Therefore, rank ( M p ) ≤ R ≤ · dim ( F d /2 ( q , n )) . As thepolynomial p is in F d ( q , n ) , there exists coefficients α I , J ∈ F q (depending on p ) such thatfor any x , y ∈ F nq , p ( x + y ) = ∑ I , J ⊆ [ n ] | I | + | J |≤ d α I , J x I y J where for any I , J ⊆ [ n ] , x I = ∏ i ∈ I x i and y J = ∏ j ∈ J y j . For every I , J ⊆ [ n ] , | I | + | J | ≤ d ,29e have either | I | ≤ d /2 or | J | ≤ d /2. Then, p ( x + y ) = ∑ I ⊆ [ n ] | I |≤ d /2 x I ∑ J ⊆ [ n ] | J |≤ d −| I | α I , J y J + ∑ J ⊆ [ n ] | J |≤ d /2 y J ∑ I ⊆ [ n ] d /2 < | I |≤ d −| J | α I , J x I (4.8)Let m be the number of subsets of {
1, . . . , n } of size at most d /2. Note that m = ∑ d /2 i = ( ni ) = dim ( F d /2 ( q , n )) . Let { S , . . . , S m } be subsets of {
1, . . . , n } of size at most d /2. We nowdefine vectors ¯ f and ¯ g in F m as follows:• for i ∈ [ m ] , f i ( x ) = x S i ; and g i ( y ) = ∑ J ⊆ [ n ] | J |≤ d −| S i | α S i , J y J .• for i ∈ [ m ] , f i + m ( x ) = ∑ I ⊆ [ n ] d /2 < | I |≤ d −| S i | α I , S i x I ; and g i + m ( y ) = y S i Clearly, p ( x + y ) = < ¯ f , ¯ g > . Hence, p ( x + y ) = ∑ Ri = f i ( x ) · g i ( y ) , where R ≤ · dim ( F d /2 ( q , n )) and rank ( M p ) ≤ · dim ( F d /2 ( q , n )) .Now, we need to estimate dim ( F d /2 ( q , n )) . For this, we need upper bound on num-ber of monomials in F ( q , n ) of degree at most d /2 which is q n · Pr m [ deg ( m ) ≤ d /2 ] ≤ q n exp − δ n /4 by Chernoff bound for d = ( − δ ) n ( q − ) when m is a random monomial.Therefore, dim ( F d /2 ( q , n )) ≤ q n · q − n δ
24 log q ≤ q n ( − ε (cid:48) ) for some ε (cid:48) > generalized Hadamardmatrices which were conjectured to be rigid. In the following subsection, we survey thisupper bound from [11]. For the whole of this subsection we will deal with a weaker notion of rigidity.A matrix A ∈ F n × n has weak rigidity at most s for rank r if the rank of matrix A can bereduced to r by changing at most s entries in every row and every column of A . The weakrigidity of a matrix A for rank r is denoted by W R A ( r ) . Observe that this is a weakernotion of matrix rigidity that we have seen so far as R A ( r ) ≤ n · s whenever W R A ( r ) ≤ s .The generalized Hadamard matrix H d , n is a d n × d n matrix given by H d , n [ I , J ] = ω I · J for I , J ∈ Z nd where w = e π id is the d th root of unity. One of the results of [11] is thatgeneralized Hadamard matrices are not weakly rigid over C . Note that these results arestronger than just saying that generalized Hadamard matrices are not rigid over C . In [11] the authors use the term regular rigidity. However for ease, we use the term weak-rigidity heorem 4.9. Let d , n be positive integers. For any ε ∈ (
0, 0.1 ) and n ≥ d ( log d ) ε , there existsan ε (cid:48) = ε d log d such that W R H d , n (cid:16) d n ( − ε (cid:48) ) (cid:17) ≤ d n ε . In order to proceed with the proof of Theorem 4.9 we need to introduce a few notationsand make some preliminary observations. For any I ∈ Z nd such that I = ( i , i , . . . , i n ) wedenote by x I the monomial x i x i · · · x i n n . Let f : Z nd → C be any function. We can associatewith function f :(i) a polynomial P f ∈ C [ x , . . . , x n ] given by P f (cid:44) ∑ I ∈ Z nd f ( I ) x I ; and(ii) a d n × d n matrix M f given by M f [ I , J ] (cid:44) f ( I + J ) for I , J ∈ Z nd .It is reasonable to expect interesting connections between the matrix M f and polynomial P f which we pen down in the following observation: Observation 3.
Let f : Z nd → C be any function. If the polynomial P f has r roots in the set { ( ω i , . . . , ω i n ) | ( i , . . . , i n ) ∈ Z nd } then rank ( M f ) = d n − r. Proof of the above observation is based on a simple fact that the matrix H d , n · M f · H d , n is a d n × d n diagonal matrix whose [ I , I ] th diagonal entry is given by d n · P f ( ω I ) where ω I denotes the tuple ( ω i , ω i , . . . , ω i n ) for any I = ( i , i , . . . , i n ) ∈ Z nd . Now, with Observa-tion 3 in hand, we sketch the proof of Theorem 4.9.
Proof Sketch of Theorem 4.9.
The proof proceeds in two steps:1. Rescale the rows and columns of H d , n to obtain a matrix H (cid:48) d , n such that there existsa symmetric function f : Z nd → C with M f = H (cid:48) d , n . The rows and columns of H d , n are uniformly rescaled in such a way that W R H d , n ( r ) = W R M f ( r ) for any r .2. For any ε ∈ (
0, 0.1 ) and any symmetric function f : Z nd → C , by changing f on atmost d n ε many values, obtain a symmetric function f (cid:48) : Z nd → C such that the matrix M f (cid:48) has rank ( M f (cid:48) ) ≤ d n ( − ε (cid:48) ) where ε (cid:48) = ε d log d . The upper bound on rank ( M f (cid:48) ) follows from Observation 3 as the polynomial P f (cid:48) has many roots in { ( ω i , . . . , ω i n ) | ( i , . . . , i n ) ∈ Z dn } . Proof of Step 1:
Let H d , n be the d n × d n generalized Hadamard matrix. Let µ ∈ C besuch that µ = ω . For every I , J ∈ Z nd , the d n × d n matrix H (cid:48) d , n is obtained by multi-plying every element of the I th row by µ I · I and every element of the J th column by µ J · J .Now, we define f : Z nd → C as: f ( I ) = µ i + i + ··· + i n for any I = ( i , i , . . . , i n ) ∈ Z nd .Observe that f is a symmetric function and for any I , J ∈ Z nd the matrix M f [ I , J ] = The notation ω [ I ] is more appropriate as ( ω i , ω i , . . . , ω i n ) is tuple in Z nd . However we will use ω I forease of notation. ( I + J ) = µ ( i + j ) +( i + j ) + ··· +( i n + j n ) = H (cid:48) d , n [ I , J ] . Note that the function f is well-definedand W R H d , n ( r ) = W R M f ( r ) for any r .Now, in the following step we will have to modify the function f so that the polyno-mial P f satisfies the hypothesis of Observation 3. Proof of Step 2:
Given any symmetric function f : Z nd → C by changing f on a “small" set T of values in Z nd , we want to construct a symmetric function f (cid:48) : Z nd → C such that thepolynomial P f (cid:48) vanishes on a “large" set S in { ( ω i , . . . , ω i n ) | ( i , . . . , i n ) ∈ Z nd } . The sets S and T are defined as follows:1. Let m = n ( − ε ) d and S be the set of tuples ( i , i , . . . , i n ) ∈ Z nd such that i = i = · · · = i m = i m + = i m + = · · · = i m = i ( d − ) m + = i ( d − ) m + = · · · = i dm = m .2. Let T ⊆ Z nd be the set of tuples with at least n ( − ε ) many zeros.Having defined sets S and T , the following lemma(which we will prove later) ensuresthat we can use Observation 3 to complete the proof of Theorem 4.9. Lemma 4.10.
Let f : Z dn → C be any symmetric function. By changing f on values in T, we canconstruct a symmetric function f (cid:48) : Z dn → C such that P f (cid:48) ( ω I ) = for every I ∈ S. Now, assuming Lemma 4.10, let us complete the proof of Theorem 4.9. Note that M f = ( M f − M f (cid:48) ) + M f (cid:48) and we bound rank ( M f (cid:48) ) and sparsity ( M f − M f (cid:48) ) . By Lemma4.10 P f (cid:48) vanishes on the set { ω I | I ∈ S } . Hence P f (cid:48) has | S | = d n − dm = d n ε many rootsin { ( ω i , . . . , ω i n ) | ( i , . . . , i n ) ∈ Z nd } as m = n ( − ε ) / d . However, as f (cid:48) is a symmetricfunction, the polynomial P f (cid:48) not only vanishes on { ω I | I ∈ S } but also on ω J for alltuples J in Z nd that are obtained by permuting the entries of I = ( i , . . . , i n ) . That is, P f (cid:48) ( ω J ) = J in perm ( S ) = { perm ( I ) | I ∈ S } where perm ( I ) denotes the set ofdistinct permutations are obtained by permuting the entries of I = ( i , . . . , i n ) .Thus, by Observation 3, rank ( M f (cid:48) ) is exactly the number of tuples in Z nd that are notin perm ( S ) and estimating rank ( M f (cid:48) ) amounts to estimating the size of Z nd \ perm ( S ) .A tuple I ∈ Z nd is in perm ( S ) iff every a ∈ {
0, 1, . . . , d − } appears at least m times.Then, rank ( M f (cid:48) ) is given by the number of tuples in Z nd such that there exists an a ∈{
0, 1, . . . , d − } , a appears less than m times. Let τ ∈ r Z nd , i ∈ {
0, . . . , d − } and X i be a random variable that denotes the number of times i appears in the tuple τ . Then,Pr (cid:104) X i < ( − ε ) nd (cid:105) ≤ e − ε nd and Pr [ τ (cid:54)∈ perm ( S )] ≤ d · e − ε nd . The expected size of Z nd \ perm ( S ) is at most d n · d · e − ε nd . Thus, when n > d ( log d ) ε , the size of Z nd \ perm ( S ) is d n ( − ε (cid:48) ) for ε (cid:48) = ε d log d . This immediately implies that rank ( M f (cid:48) ) ≤ d n ( − ε (cid:48) ) for ε (cid:48) = ε d log d .To upper bound sparsity ( M f − M f (cid:48) ) , it is enough to estimate | T | which is the numberof tuples in Z nd with at least n ( − ε ) many zeros. Let τ ∈ r Z nd and X be a random32ariable that denotes the number of zeros in τ . Then, Pr [ τ ∈ T ] = Pr [ X ≥ n ( − ε )] ≤ e − D (( − ε ) || d ) ≤ d − n ( − ε ) when ε ∈ (
0, 0.1 ) . The expected size of set T is at most d n · d − n ( − ε ) . This implies that sparsity ( M f − M f (cid:48) ) ≤ d n ε . Thus, by changing M f on | T | ≤ d n ε values in every row, the rank of M f becomes d n ( − ε (cid:48) ) implying that W R H d , n (cid:16) d n ( − ε (cid:48) ) (cid:17) ≤ d n ε . (cid:3) We now turn to the proof of Lemma 4.10. Let f : Z nd → C be any symmetric functionand T ⊆ Z nd be the set of tuples with at least n ( − ε ) many zeros. As we want to change f only on tuples in T , for all J (cid:54)∈ T , f (cid:48) ( J ) = f ( J ) . Also, as we want f (cid:48) : Z nd → C to besymmetric, we require that for every j ∈ [ k ] , for any J , J (cid:48) ∈ perm ( J j ) , f (cid:48) ( J ) = f (cid:48) ( J (cid:48) ) . Sincewe do not know what values to change f to on tuples in set T , the most natural approachwould be to come up with a system of equations with these as the unknown variables and P f (cid:48) ( ω I ) = I ∈ S as the constraints and show that this system has a solution. P f (cid:48) ( ω I ) = I ∈ S ∑ J ∈ T f (cid:48) ( J ) ω I · J + ∑ J (cid:48) (cid:54)∈ T f ( J (cid:48) ) ω I · J (cid:48) = I ∈ S Note that we require the new function f (cid:48) to be symmetric. Also, let us consider theequivalence classes obtained by permuting the tuples S and T and denote by rep ( S ) = { I , . . . , I (cid:96) } and rep ( T ) = { J , . . . , J k } the set obtained by picking one representation fromeach equivalence class of S and T respectively. Now, we define a system of linear equa-tions with { f (cid:48) ( J j ) | j ∈ [ k ] } as the unknowns labelled as a , . . . , a k : k ∑ j = a j ∑ J ∈ perm ( J j ) ω I · J + ∑ J (cid:48) (cid:54)∈ T f ( J (cid:48) ) ω I · J (cid:48) = I ∈ S Since f : Z nd → C is a symmetric function, it suffices to consider the following set of linearequations: k ∑ j = a j ∑ J ∈ perm ( J j ) ω I i · J + ∑ J (cid:48) (cid:54)∈ T f ( J (cid:48) ) ω I i · J (cid:48) = I i ∈ rep ( S ) That is, k ∑ j = a j ∑ J ∈ perm ( J j ) ω I i · J = − ∑ J (cid:48) (cid:54)∈ T f ( J (cid:48) ) ω I i · J (cid:48) for all I i ∈ rep ( S ) Let M be the (cid:96) × k coefficient matrix given by M ij = ∑ J ∈ perm ( J j ) ω I i · J . In order to show thatthe above non-homogeneous system of linear equations has a solution, it is enough to33how that the column space of M has full rank. That is, for each i =
1, . . . , (cid:96) , we requireconstants a , . . . , a k such that: k ∑ j = a j M ij (cid:54) = k ∑ j = a j M i (cid:48) j = i (cid:48) (cid:54) = i (4.12)Fix i (cid:48) = i in Equations (4.11) and (4.12). We need a , . . . , a k such that k ∑ j = a j ∑ J ∈ perm ( J j ) ω I i · J (cid:54) = k ∑ j = a j ∑ J ∈ perm ( J j ) ω I i · J = i (cid:54) = i (4.14)Clearly, from equations (4.13) and (4.14) this is equivalent to constructing an n -variatepolynomial P ( x , . . . , x n ) = k ∑ j = a j ∑ J ∈ perm ( J j ) x J that vanishes on ω I i for any i ∈ [ (cid:96) ] , i (cid:54) = i but does not vanish on ω I i .However, for any tuple I = ( i , . . . , i n ) in S , the first dm entries are fixed and let I (cid:48) bethe sub-tuple ( i dm + , . . . , i n ) of I and I (cid:48) ( j ) denote the j th entry of tuple I (cid:48) . Thus, we want an ( n − dm ) -variate polynomial Q ( x dm + . . . , x n ) = P (
1, . . . 1, . . . , ω d − , . . . , ω d − , x dm + . . . , x n ) that vanishes on ω I (cid:48) i if and only if i (cid:54) = i .Let Q ( x dm + . . . , x n ) = ∑ I (cid:48) ∈ perm ( I (cid:48) i ) (cid:18) x ddm + − x dm + − ω I (cid:48) ( ) (cid:19) · · · (cid:16) x dn − x n − ω I (cid:48) ( n − dm ) (cid:17) . The proof of Theo-rem 4.9 is complete with the following claim: Claim 4.15.
Let Q ( x dm + . . . , x n ) be the polynomial defined above. Then,(i) Q ( x dm + . . . , x n ) vanishes on ω I (cid:48) i if and only if i (cid:54) = i .(ii) Q ( x dm + . . . , x n ) = P (
1, . . . 1, . . . , ω d − , . . . , ω d − , x dm + . . . , x n ) . We do not include a proof of Claim 4.15 here but it is not hard to prove the aboveproperties of the polynomial Q .The above discussion proves that for any symmetric function f : Z nd → C , for every ε ∈ (
0, 0.1 ) and sufficiently large n , W R M f ( d n ( − ε (cid:48) ) ) ≤ d n ε for some ε (cid:48) > ε (cid:48) is a functionof d and ε ).We now extend this rigidity upper bound to matrices M f corresponding to functionsthat are not symmetric: Note that for every i , we get a polynomial Q i that is dependent on the tuple I i . For ease of notation,we refer to the polynomial as Q dropping the subscript i heorem 4.16. Let f : Z nd → C be any function. For any ε ∈ (
0, 0.1 ) and n ≥ d ( log d ) ε , thereexists an ε (cid:48) = ε d log d such that W R M f ( d n ( − ε (cid:48) ) ) ≤ d n ε . The proof of Theorem 4.16 is immediate from Theorem 4.9, the following property ofHadamard matrices (mentioned in Lemma 4.17 proof of which is straightforward fromthe definition of matrices H d , n , M f and polynomial P f ) and a simple tool that reduces thetask of proving non-rigidity of a matrix B to proving non-rigidity of the matrix A thatdiagonalizes it (mentioned in Lemma 4.18). Lemma 4.17.
Let f : Z nd → C be any function. Then D = H d , n · M f · H d , n is a d n × d n diagonalmatrix with D [ I , I ] = d n · P f ( ω I ) where ω is the d th root of unity. Lemma 4.18.
Let B = A ∗ D A (respectively B = AD A) where A ∗ is the conjugate transpose ofA and D is a diagonal matrix. If W R A ( r ) ≤ s then W R B ( r ) ≤ s .Proof. If W R A ( r ) ≤ s then A = S + L where rank ( L ) ≤ r and S has at most s non-zeroentries in every row and column. Then, B − S ∗ DS = B − S ∗ DS + A ∗ DS − A ∗ DS = A ∗ D A − S ∗ DS + A ∗ DS − A ∗ DS [ ∵ B = A ∗ D A ]= A ∗ D ( A − S ) + ( A ∗ − S ∗ ) DSB = S ∗ DS + A ∗ D ( A − S ) + ( A ∗ − S ∗ ) DS where the matrix S ∗ DS has at most s non-zero entries in each row and column as S hasat most s non-zero entries in every row and column. Further, rank ( A ∗ D ( A − S ) + ( A ∗ − S ∗ ) DS ) ≤ r as rank ( A − S ) ≤ r . Therefore, W R B ( r ) ≤ s .The proof of Theorem 4.16 is immediate though we sketch it here for the sake of com-pleteness. Proof of Theorem 4.16.
By Lemma 4.17, M f = H − d , n · D · H − d , n . From Theorem 4.9, we have W R H d , n ( d n ( − ε (cid:48) ) ) ≤ d n ε for any ε ∈ (
0, 0.1 ) and some ε (cid:48) . This immediately implies that W R M f ( d n ( − ε (cid:48) ) ) ≤ d n ε from Lemma 4.18. A brief note on non-rigidity of Fourier and Circulant matrices.
Although understand-ing the rigidity of generalized Hadamard matrices is of independent interest,Theorem 4.9also acts as a building block in showing that
Fourier matrices are also not rigid which isthe main theorem of [11]. As
Fourier matrix F d is the d × d matrix H d ,1 , the generalizedHadamard matrix H d , n = F d ⊗ F d · · · ⊗ F d (cid:124) (cid:123)(cid:122) (cid:125) n times . Even though we don not include the proof ofnon-rigidity of Fourier matrices which is quite involved, among other basic blocks it usesTheorem 4.9 as well as the following interesting lemma which analyses the weak rigidityof tensor product of two matrices: 35 emma 4.19. Let A ∈ F m × m and B ∈ F n × n . Then for any r ≤ m and r ≤ n, W R M ( r n + r m ) ≤ W R A ( r ) · W R B ( r ) where M = A ⊗ B.Proof.
Suppose
W R A ( r ) ≤ s and W R A ( r ) ≤ s then there exists S , S of appropriatedimensions such that rank ( A + S ) ≤ s and rank ( B + S ) ≤ s . Now, we want to argueabout the rank of M + ( S ⊗ S ) : M + ( S ⊗ S ) = ( A ⊗ B ) + ( S ⊗ S )= ( A ⊗ B ) + ( S ⊗ B ) − ( S ⊗ B ) + ( S ⊗ S )= ( A + S ) ⊗ B − S ⊗ ( B + S ) Thus, rank ( M + ( S ⊗ S )) = r n + r m and sparsity of S ⊗ S is s s .In [11], the authors also prove that circulant matrices are not rigid. Let c , . . . , c n − ∈ F .A matrix C n ∈ F n × n is said to be circulant if C n = c c n − · · · c c c c c n − · · · c ... ... ... ... ... c n − c n − · · · c c Observe that circulant matrix is a special case of Toeplitz matrix. Dvir and Liu[11]prove that for sufficiently large n , C n is not rigid. Hence, although rigidity lower boundof Toeplitz matrix in Theorem 3.1 is reasonable for much smaller n (as noted in Remark3.3) it is impossible to match the lower bound in Question 1.2. Remark 4.20.
In [11] the matrix M f is given by M f [ I , J ] = f ( I + J ) for I , J ∈ Z dn . However theargument also works for M f [ I , J ] = f ( I − J ) for I , J ∈ Z dn as the two definitions differ only uptopermutation of rows/columns giving the same rigidity bounds. Further, Theorem extends theresults of [9] to the field of complex numbers and the result of [2] to arbitrary d while the result in[2] is for d = . ♦ Given a database X of n elements { x , . . . , x n } , an ( s , t ) -data structure for X is a way tostore X into s memory cells so that any query concerning X can be answered effectivelyin time t . Let Q = { q , . . . , q m } be a set of m queries on X (usually m = poly ( n ) ). The timeto answer a query is the number of cells accessed and computation on the accessed cellsis for free.There are two trivial static data structures for any problem:(i) Pre-compute answers to all queries in Q and store them in space poly ( n ) as |Q| = poly ( n ) . In this case, any query in Q can be answered in constant time.36ii) Store the entire database X in memory using n memory cells and for every queryin Q compute the answer by performing a linear search on the memory (as queryanswer may depend on all inputs). In this case, both space and time are linear.In this regard, one major goal is to understand time-space tradeoffs . That is, can we getbetter (sub-linear) upper bounds on the query time against linear space for static datastructures? Standard counting arguments show that for most data structure problemseither time is | X | or space is | Q | . Further, there exists explicit static data structureproblems such that any data structure that uses space O ( n ) requires time Ω ( log n ) toanswer queries in Q where |Q| = poly ( n ) (see [28, 23] for details). This brings us to thefollowing question: Question 5.1.
Does there exist an explicit data structure problem P such that any ( O ( n ) , t ) -datastructure for P requires t = ω ( log n ) ? The above question is quite challenging and this difficulty in proving explicit datastructure lower bounds is justified as data structures correspond to circuits with arbitrarygates. See Figure 4 for a pictorial representation of the following discussion. An ( s , t ) -data structure for a database X containing n field elements { x , . . . , x n } can be viewedas a depth-2 circuit whose leaf gates are elements of X . The middle layer consists of s gates of unbounded fan-in representing the s memory cells and the top layer consists of m gates representing queries q , . . . , q m in Q . As the data structure is allowed to take time t on any query q ∈ Q , the fan-in of the gates in the top layer are bounded by t . Themapping of elements in X to s memory cells can be viewed as a function P : F n → F s ( P stands for pre-processing function ) and the memory cells associated with queries in toplayer gates can be viewed as a function Q : F s → F m ( Q stands for query function ). Notethat this correspondence between ( s , t ) -data structure for X and an m -output unboundedtop fan-in depth-2 circuit of width s with arbitrary gates holds only when the queries in Q are non-adaptive. . . .x x x x n X c c c c s . . .. . .. . .. . . . . .q q q q m Queries Q Memory cells t Figure 4: Data structure viewed as a depth-2 circuit with arbitrary gates
Remark 5.2.
Throughout this section, query time is measured by the number of cells probed whereeach cell is capable of holding multiple bits. This measure was introduced by Yao in [35]. However, here is yet another interesting data structure model called the bit-probe model introduced in [13]in which query time is measured by the number of bits accessed to answer the query. In this articlewe will work with the cell-probe model. ♦ This correspondence between data structures and circuits with arbitrary gates hintsthat proving data structure lower bounds are considerably hard. Hence, it is reasonableto place certain restriction on the data structure to get better lower bounds. In this regard,Dvir et al. in [10] consider static data structures with the following restrictions:• The database X = { x , . . . , x n } contains elements from F .• The data structure can perform only linear operations on the database X . That is, P : F n → F s and Q : F s → F m are linear functions.In this case, the m queries { q , . . . , q m } in Q can be viewed as m rows R , . . . , R m of amatrix M ∈ F m × n . Whenever query q i is raised, the data structure returns the innerproduct (cid:104) R i , X (cid:105) , an element in F (here X = ( x x · · · x n ) is viewed as a vector). A datastructure for the set of queries in Q using space ≤ s and query time ≤ t with P , Q beinglinear functions is called an ( s , t ) -linear data structure for M .In [10], the authors demonstrate a connection between the answers to Question 1.2and Question 5.1. In particular Dvir et al. prove the following theorem:For the rest of this section, we will need a notion of rigidity weaker than matrix rigiditycalled row-rigidity . The row-rigidity of a matrix M for rank r (denoted by RR M ( r ) ) is s ifthe rank can be reduced to r by changing at most s entries in every row. The row-rigidityof a matrix is seemingly weaker than rigidity and stronger than weak rigidity. A matrix M is t-row sparse if every row of M has at most t non-zero entries. Theorem 5.3.
Let ε , δ > be constants. Let M ∈ F m × n be a matrix such that there is no ( n − ε , ( log n ) c ) linear data structure for M. Then for some n (cid:48) ≥ α · ( log n ) c − there exists amatrix M ∈ F m × n (cid:48) such that RR M (cid:48) ( ε n (cid:48) ) ≥ ( log n ) c − . In fact, M (cid:48) is a sub-matrix of M andwhen M is explicit M (cid:48) is in P NP . Remark 5.4.
Although the above theorem relates data structure lower bounds to rigidity of rect-angular matrices an analogous theorem also holds in the case of square matrices (see Theorem 2in [16] for the exact statement). In fact, a query lower bound of t on linear space data structuretranslates to row rigidity lower bound of t log n . ♦ In the rest of this section, we provide the reader intuition as to why this connectionbetween static linear data structure lower bounds and matrix rigidity is true and sketchthe details of the proof. We begin with the following simple observation(whose proofintuitively follows from Figure 4):
Observation 4.
Let there be an ( s , t ) -linear data structure for M ∈ F m × n . Then, M = Q · Pwhere Q ∈ F m × s is a t-row sparse matrix and P ∈ F s × n . Now, we discuss a linear algebraic characterization of the existence of efficient datastructures. Let M ∈ F m × n be such that M = Q · P where Q ∈ F m × s is a t -row sparse38atrix and P ∈ F s × n . If we denote by V the column space of matrix M then there existsa subspace U (cid:44) colspace ( Q ) of F m such that V ⊆ U and U is a t -sparse vector space (as Q is a t -row sparse matrix). This leads us to the definition of the outer-dimension ofa vector space. Informally, the outer dimension of a vector space V is the dimensionof the smallest t -sparse vector space containing (outer of) V . More formally, we definethe outer-dimension of a vector space V with respect to sparsity parameter t (denoted by OuterDim V ( t ) ) as min U { dim ( U ) | V ⊆ U , U is t -sparse } . In this article, for ease of notationwe refer to OuterDim M ( t ) to denote the outer-dimension of vector space V where V is colspace ( M ) .From the above discussion and Observation 4, it is clear that if there is an ( s , t ) -linear data structure for M then OuterDim M ( t ) ≤ s . Now, consider the converse. If OuterDim M ( t ) ≤ s for some matrix M ∈ F m × n then by definition there exists U ⊆ F m of dimension at most s such that V ⊆ U and U is t -sparse (here, V = colspace ( M ) ). Let Q ∈ F m × s be such that U is colspace ( Q ) . As V ⊆ U , every column of M can be expressedas a linear combination of the columns of Q . Hence M = Q · P where where Q ∈ F m × s isa t -row sparse matrix and P is a matrix in F s × n . From the circuit view of data structuresmentioned earlier this immediately gives an ( s , t ) data structure for M . Hence, outer-dimension of a matrix M characterizes the existence of an efficient linear data structurefor M : Observation 5.
There is an ( s , t ) -linear data structure for M if and only if OuterDim M ( t ) ≤ s. Recall that the goal is to understand the connection between matrix rigidity and datastructures. Similar to the notion of low outer-dimension for efficient data structures, wegive a linear algebraic characterization of rigid matrices. Let M ∈ F m × n be a matrix thatis not row rigid (i.e., RR M ( r ) ≤ t ). Then there exists matrices S , L ∈ F m × n such thatevery row of S has at most t non-zero entries and rank ( L ) ≤ r . Let V (cid:44) colspace ( M ) , U (cid:44) colspace ( S ) and W (cid:44) colspace ( L ) and we have that V = U + W . Observe that U is a t -sparse vector space and that U + V ⊆ U + W . Thus,dim ( U + V ) ≤ dim ( U + W ) dim ( U ) + dim ( V ) − dim ( U ∩ V ) ≤ dim ( U ) + dim ( W ) − dim ( U ∩ W ) ≤ dim ( U ) + dim ( W ) dim ( U ∩ V ) ≥ dim ( V ) − dim ( W ) ≥ rank ( M ) − r Hence, whenever the row rigidity of a matrix M for rank r is at most t , there existsa t -sparse vector space U that intersects colspace ( M ) in a large number of dimensions . Thisprecisely leads us to the definition of inner-dimension of a vector space. The inner-dimension of a vector space V with respect to sparsity parameter t (denoted by InnerDim V ( t ) ) is A vector space U ⊆ F m is t -sparse if it can be expressed as the column space of a matrix that is t -rowsparse. U { dim ( U ∩ V ) | dim ( U ) ≤ dim ( V ) , U is t -sparse } . In this article, for easeof notation we denote by InnerDim M ( t ) to denote the inner-dimension of vector space V where V is colspace ( M ) . Before we move on, we make a remark on the complexity ofcomputing the inner dimension of a given matrix (we will use this to prove Theorem 5.3). Observation 6.
Let
InnerDim ( M , d , t ) denote the problem of deciding if InnerDim M ( t ) ≥ d.It is not very difficult to observe that InnerDim ( M , d , t ) is in NP . Let V = colspace ( M ) and dim ( V ) = rank ( M ) . Given a witness N in F m × n that is a t-row sparse matrix, the NP algorithm A verifies if dim ( U ∩ V ) ≥ d where U = colspace ( N ) . That is, A computes dim ( U ) + dim ( V ) − dim ( U + V ) = rank ( M ) + rank ( N ) − rank ( N M ) and test if this is atleast d. This verification can be done in polynomial time implying that InnerDim ( M , d , t ) ∈ NP . From the preceding discussion, it is clear that if M is not a row rigid matrix then M has a high inner-dimension. Apparently, the converse is also true.Suppose InnerDim M ( t ) > rank ( M ) − r for some r . Then by definition, there exists a t -sparse vector space U ⊆ F m with dim ( U ) ≤ dim ( V ) and dim ( U ∩ V ) > rank ( M ) − r where V is colspace ( M ) . This means that there exists a subspace W ⊆ F m with dim ( W ) < r such that V = U + W . As U is a t -sparse vector space there is a a t -row sparse matrix A such that the columns of A span the space U . Since V = U + W there is a matrix B of rank less than r satisfying M = AT + B for some T ∈ GL ( n , F ) . As T is invertible, MT − = A + BT − and the rank of MT − can be reduced to r by changing at most t entries in each row. Thus, RR M ( r ) ≤ t as rank ( MT − ) = rank ( M ) .At the end of the above discussion on inner dimension of spaces we observe the fol-lowing: Observation 7.
Let M ∈ F m × n be a matrix. RR M ( r ) > t if and only if InnerDim M ( t ) ≤ rank ( M ) − r. In summary, there is no efficient ( s , t ) -linear data structure for M if and only if M hashigh outer-dimension and M is a strongly rigid matrix if and only if M has low inner-dimension. Hence, in order to prove Theorem 5.3, it is enough to show that high outer-dimension of a matrix M implies the existence of a sub-matrix of M having low inner-dimension. Proof Sketch of Theorem 5.3.
We begin with the following claim that matrices having large outer-dimension have large enough sub-matrices of small inner-dimension.
Claim 5.5.
Let t , k ∈ Z + and ε ∈ (
0, 1 ) and M ∈ F m × n . If OuterDim M ( tk + n ε k ) ≥ n − ε then for some n (cid:48) ≥ n ε k there exists an m × n (cid:48) submatrix M (cid:48) of M computable in P NP such that InnerDim M (cid:48) ( t ) ≤ rank ( M (cid:48) ) − ε n (cid:48) . Let us complete the proof of Theorem 5.3 assuming Claim 5.5. Let ε , δ > M ∈ F m × n . Suppose there is no ( n − ε , ( log n ) c ) linear data structure for amatrix M then by Observation 5 we know that OuterDim M (( log n ) c ) > n − ε . Observe40hat by setting k = log ( n / t ) log ( ε ) and t = ( log n ) c − log ( ε ) −
1, we get n ε k = n ε log ( n / t ) log ( ε ) = t and hence tk + n ε k = ( k + ) t . This implies that OuterDim M ( tk + n ε k ) > n − ε for values of t , k chosenabove. Now, by Claim 5.5 for some n (cid:48) ≥ n ε k there exists an m × n (cid:48) submatrix M (cid:48) of M computable in P NP such that InnerDim M (cid:48) ( t ) ≤ rank ( M (cid:48) ) − ε n (cid:48) . From Observation 7 we get RR M (cid:48) ( ε n (cid:48) ) ≥ ( log n ) c − . (cid:3) Now, let us briefly sketch the proof of Claim 5.5. Let us begin by observing thatmatrices with large inner-dimension have a decomposition property that can be obtainedefficiently given access to an NP oracle. That is, given an m × n matrix M ∈ F m × n with InnerDim M ( t ) ≥ rank ( M ) − r we can obtain matrices A ∈ F m × n , B ∈ F n × n , C ∈ F r × n , M (cid:48) ∈ F m × r such that A is t -row sparse, M (cid:48) is a sub-matrix of M and M = A · B + M (cid:48) · C . Overlarge enough finite fields F , such a decomposition can be obtained in polynomial timegiven an oracle computing inner-dimension of a matrix. As InnerDim ( M , d , t ) ∈ NP fromObservation 6, we have that this decomposition can be computed in P NP .Given the above decomposition property we will argue Claim 5.5 that if all the useful sub-matrices of M have large inner dimension then M has small outer-dimension which isa contradiction.That is, suppose OuterDim M ( tk + n ε k ) ≥ n − ε and InnerDim M ( t ) ≤ rank ( M ) − ε n (here r = ε n ). Then, by the decomposition property, M = A · B + M (cid:48) · C for some A ∈ F m × n , B ∈ F n × n , C ∈ F r × n , M (cid:48) ∈ F m × r where A is t -row sparse and M (cid:48) is a sub-matrix of M . Further,if M (cid:48) also does not have the requisite inner-dimension then by recursively applying thedecomposition procedure we get: M = A · B + M (cid:48) · C = AB + A (cid:48) B (cid:48) C + M (cid:48)(cid:48) C (cid:48) C [ ∵ M (cid:48)(cid:48) = A (cid:48) B (cid:48) + M (cid:48)(cid:48) C (cid:48) ]= AB + A (cid:48) B (cid:48) C + ( A (cid:48)(cid:48) B (cid:48)(cid:48) + M (cid:48)(cid:48)(cid:48) C (cid:48)(cid:48) ) C (cid:48) C [ ∵ M (cid:48)(cid:48)(cid:48) = A (cid:48)(cid:48) B (cid:48)(cid:48) + M (cid:48)(cid:48)(cid:48) C (cid:48)(cid:48) ]= AB + A (cid:48) B (cid:48) C + A (cid:48)(cid:48) B (cid:48)(cid:48) C (cid:48) C + M (cid:48)(cid:48)(cid:48) C (cid:48)(cid:48) C (cid:48) CM = (cid:2) A A (cid:48) A (cid:48)(cid:48) M (cid:48)(cid:48)(cid:48) (cid:3) · BB (cid:48) B (cid:48)(cid:48) CC (cid:48)(cid:48) C assuming none of M , M (cid:48) , M (cid:48)(cid:48) , . . . and so on have low inner-dimension, Now after k stepsof the decomposition procedure we obtain: M = (cid:2) A A A · · · A k − M k (cid:3) · N N ...... N k A , . . . , A k − are all t -row sparse matrices, M k ∈ F m × n ε k and N , . . . , N k are ob-tained from B ’s and C ’s accordingly. It is not difficult to observe that from the abovedecomposition we get M = P · Q where P has at most tk + n ε k non-zero entries in eachrow as the k matrices A , . . . , A k − have t non-zero entries per row and M k has at most n ε k columns. Further, note that each matrix N i has dimension n ε i × n . Hence Q ∈ F s × n where s = n ( + ε + ε + · · · + ε k ) which is less than n − ε for any positive integer k and ε ∈ (
0, 1 ) .Thus, from the definition of outer-dimension OuterDim M ( tk + n ε k ) < n − ε which is a con-tradiction. (End of Claim 5.5) (cid:3) Coding theory essentially deals with detecting and correcting errors in messages trans-mitted over a noisy channel thereby ensuring reliable communication. Suppose there aretwo parties
Alice and
Bob and Alice wants to send a message m ∈ {
0, 1 } k to Bob. Aliceencodes the message m using an encoding function E : {
0, 1 } k → {
0, 1 } n and send the c = E ( m ) in {
0, 1 } n over a transmission channel that could potentially be noisy. Herethe word c = E ( m ) ∈ {
0, 1 } n is called the codeword . Let C denote the set of all possiblecodewords in {
0, 1 } n . Now, Bob receives a word c (cid:48) ∈ {
0, 1 } n called the received word anduses a decoding function D : {
0, 1 } n → {
0, 1 } k to obtain m (cid:48) = D ( c (cid:48) ) . In an ideal channelwith no noise, c (cid:48) = c .In other cases, if Bob is able to identify the codeword c from the received word c (cid:48) , thenhe can get hold of the message m by using D ( c ) . One intuitive way to do this is by de-signing the encoding algorithm to repeat the message m several times (here n (cid:29) k ). Thisredundancy in the codeword is captured in the value k / n called rate of the code (denotedby R ( C ) ). For any code C , R ( C ) ≤ distance between two codewords is another important parameter whichis the hamming distance (denoted by ∆ ) between them. Observe that as the distance be-tween two codewords increases, it is unlikely to confuse one codeword for another whichintuitively helps detect errors in the received codeword. The relative distance δ ( C ) of acode C is d / n where d = min c ∈ C ∆ ( c ) . An immediate question would be to understand theoptimal trade-off between R ( C ) and ∆ ( C ) . There is huge body of work revolving aroundthis question and we refer the reader to [19] for more details. In this article, we will beinterested particularly in linear codes .An [ n , k , d ] q linear code is one where the set C of codewords is a linear subspace of F nq of dimension k and distance of the code is d . Observe that every codeword of a linear codecan be obtained as a linear combination of the rows of an n × n generator matrix G C . Nowthat we have associated matrices with codes, it is natural to ask how rigid the generatormatrices of codes are?To begin with, we demonstrate a connection between coding theory( asymptoticallygood codes ) and matrix rigidity. 42 .1 Rigidity of generator matrices of asymptotically good codes Asymptotically good codes are family of codes whose rate and relative distance are bothconstant in the asymptotic sense.
Definition 6.1.
A family of codes C = { C i } i ≥ , C i = [ n i , k i , d i ] q is said to be asymptoticallygood if there exists constants R , δ > such that lim n → ∞ k i n i ≥ R and lim n → ∞ d i n i ≥ δ . ♦ Using algebraic geometric codes[20], we can prove the existence of asymptoticallygood error correcting codes. We state the lemma about the existence of asymptoticallygood error correcting codes without giving a proof. For a proof see Theorem 2.81 in [20].
Lemma 6.2.
Let F q be a finite field. For infinitely many n, there exists [ n , n , d ] q code with rate and relative distance at least − √ q − . For the above code let G denote the generator matrix and the generator matrix can bebrought to the standard form G C = [ I n | A ] where I n is the n × n identity matrix and A is a n × n matrix. In the following theorem, we prove that the matrix A has high rigidityover F q : Theorem 6.3.
Let A ∈ F n × nq be a the matrix obtained from the standard form of the generatormatrix of the [ n , n , ( − ε ) n ] code for ε = √ q − as in Lemma 6.2. Then R A ( r ) = Ω ( n r log nr ) for ε n ≤ r ≤ n /2 .Proof. Let ε n ≤ r ≤ n /2 and A (cid:48) be a 2 ( r + ) × ( r + ) submatrix of A . We claim that rank ( A (cid:48) ) ≥ r +
1. Suppose not, rank ( A (cid:48) ) < r +
1. Then, there exists a codeword ofweight n − ( r + ) < n − ε n . Hence, minimum distance of the code is at most n ( − ε ) , acontradiction. This implies that every 2 ( r + ) × ( r + ) sub-matrix of A has rank at least r +
1. Now, by following an argument similar to the untouched minor argument, we getthe required lower bound.
Remark 6.4.
Although matrices of high rigidity can be obtained from generator matrices of asymp-totically good linear codes, [8] obtained a distribution D of matrices such that for G ∼ D, G gener-ates a good linear code but with high probability R G ( r ) ≤ O ( n / r ) for any r ≤ O ( r log ( nr )) . ♦ Next, we review a result of Dvir[8] which states that if the generating matrix G C of anylocally decodable code C is not row rigid then there exists a locally self-correctable code C (cid:48) with rate of C (cid:48) is ≈ The focus of this subsection is to review the connections between locally decodable codes or locally self-correctable codes and matrix rigidity which is the main result of [7]. Informallya locally decodable code (LDC) is an error-correcting code that enables probabilistically de-coding a particular symbol of the message by querying a small number of locations of thecorresponding codeword even when the codeword is corrupted in a few locations while43 locally self correctable code (LCC) is an error-correcting code that enables probabilisticallydecoding bits of the codeword rather than the message which can be viewed as self-correcting the corrupted codeword. For any vector v , we denote by w ( v ) the Hammingweight of the vector v . We give the formal definitions below: Definition 6.5 (Locally decodable code.) . A ( q , δ , ε ) -LDC C is a linear map C : F np → F mp such that there is a randomized decoding algorithm D : F mp × [ n ] → F p that on input ( c + u , i ) queries at most q locations in c + u and recovers with probability at least − ε , the i th bit ofmessage x from c + u where c = C ( x ) and w ( u ) ≤ δ · n (i.e., codeword c is corrupted in at most δ · n locations). ♦ Definition 6.6 (Locally self-correctable code.) . A ( q , δ , ε ) -LCC is a linear map C (cid:48) : F np → F mp such that there is a randomized (self-correcting) algorithm D (cid:48) : F mp × [ n ] → F p that on input ( c + u , i ) queries at most q locations in c + u and recovers with probability at least − ε , the i th bit of codeword c from c + u where u ∈ F np with w ( u ) ≤ δ · n (i.e., codeword c is corrupted in atmost δ · n locations). ♦ We say an error-correcting code C is explicit if every entry of the generator matrixcan be obtained in deterministic polynomial time. It is interesting to note the follow-ing explicit constructions of locally decodable code from [7] which will be useful for ourpurpose. We do not prove this construction here(for proof, see Corollary 3.3 in [7]) Theorem 6.7.
For any ε , a > , there exists an explicit family of codes C n : F np → F mp such thatC n is a ( n a , δ , ε ) -LDC with m = O ( n ) and δ = δ ( ε ) > . We now state the main theorem of [7] showing that if the generating matrix G C of anylocally decodable code C is not row rigid then there exists a locally self-correctable code C (cid:48) with dimension close to n . We first give a sketch of the proof and then move on to thedetails. Theorem 6.8.
Let C : F np → F mp be a ( q , δ , ε ) locally decodable code whose generator matrix G C has RR G C ( r ) ≤ s. Then for any ρ > , there exists a ( q (cid:48) , δ (cid:48) , ε ) -LCC C (cid:48) (a subspace of F np ) withq (cid:48) = qs , δ (cid:48) = ( ρδ ) / s and dimension of C (cid:48) being n ( − ρ ) − r.Proof Sketch. Suppose G C has row rigidity at most s for rank r then G C = S + L where rank ( L ) is low and every row of S has at most s non-zero entries. Since rank ( L ) is low to construct an LCC C (cid:48) of sufficiently large dimension a natural candidate for C (cid:48) is the nullspace ( L ) . When C (cid:48) = nullspace ( L ) , dimension of C is n − rank ( L ) which is large (as rank ( L ) is low). We need to ensure that C (cid:48) is ( q (cid:48) , δ (cid:48) , ε ) locally self-correctable. That is,to decode the i th symbol of the codeword c which is corrupted in at most δ (cid:48) locationswe need a decoding algorithm D (cid:48) that on input D ( c + v ) ( w ( v ) ≤ δ (cid:48) · n ) outputs c i withprobability 1 − ε . Observe that for every c ∈ C (cid:48) , C ( c ) = G C · c = S · c + L · c = S · c as L · c = C (cid:48) = nullspace ( L ) . 44ow, it is sufficient to invoke the local decoding algorithm for the LDC C with ( C ( c ) + v (cid:48) , i ) as input where v (cid:48) (cid:44) S · v . Here, weight of v (cid:48) is small as matrix S is s -row-sparse. thealgorithm D that locally decodes the LDC C returns c i with probability 1 − ε by queryinga small number of locations as C ( c ) + v (cid:48) = S ( c + v ) .(For technical reasons we cannotquite work with the matrix S but we will construct a slightly modified matrix S (cid:48) from S obtained in Observation 8.) (cid:3) We now explain all the details mentioned in the above proof idea. We will need thefollowing simple observation that for any row sparse matrix, the columns can also bemade fairly sparse without increasing the rank by much. The proof appeals to the intu-ition that if too many columns of an s row-sparse matrix are dense then we can find a rowthat is not s -sparse. Observation 8.
Let ρ > and A ∈ F m × n be any matrix with RR A ( r ) ≤ s (i.e., A = S + Lwhere rank ( L ) ≤ r and S is s-row-sparse). Then, A = S (cid:48) + L (cid:48) where rank ( L (cid:48) ) ≤ r + ρ · n andevery column of S (cid:48) has at most ( s · m ) / ( ρ · n ) non-zero entries.Proof of Observation 8. The number of non-zero entries in A is at most s · n . For any ρ > ( s · m ) / ( ρ · n ) non-zero entries is at most ρ · n . Let C i , C i , . . . , C i j , j ∈ [ ρ · n ] be the columns in S with at least ( s · m ) / ( ρ · n ) non-zero entries.Let S (cid:48) be the matrix obtained by replacing columns C i , C i , . . . , C i j in S with all zerosvectors. Let L (cid:48) be the matrix obtained by adding to the i thj column of L the column vector C i j for all j ∈ [ ρ · n ] . Then A = L (cid:48) + S (cid:48) where rank ( L (cid:48) ) ≤ r + ρ · n and every row of S (cid:48) has at most s non-zero entries and every column of S (cid:48) has at most ( s · m ) / ( ρ · n ) non-zeroentries. (cid:3) Now, we complete proof of theorem 6.8.
Proof.
Let C : F np → F mp be a ( q , δ , ε ) -LDC and G C ∈ F m × n be its generator matrix. Suppose G C has row rigidity at most s for rank r then by Observation 8, G C = S (cid:48) + L (cid:48) where rank ( L (cid:48) ) ≤ r + ρ · n and every row of S (cid:48) has at most s non-zero entries and every columnof S (cid:48) has at most ( s · m ) / ( ρ · n ) non-zero entries. Let C (cid:48) (cid:44) nullspace ( L (cid:48) ) . Then dimensionof C (cid:48) as a subspace of F np is n − rank ( L (cid:48) ) ≥ n − ( r + ρ · n ) = n ( − ρ ) − r .It remains to show that C (cid:48) is a ( q (cid:48) , δ (cid:48) , ε ) -LCC where q (cid:48) = qs and δ (cid:48) = ρδ / s . In particu-lar, we need a randomized algorithm D (cid:48) : F np × [ n ] → F p that decodes (with probabilityat least 1 − ε ) a particular symbol of a codeword c ∈ F np that is corrupted in at most δ (cid:48) · n locations by querying at most q (cid:48) locations of the corrupted codeword. Since C is a ( q , δ , ε ) -LDC, we have at our disposal a randomized algorithm D : F mp × [ n ] → F p that decodes(with probability at least 1 − ε ) a particular symbol of a message x ∈ F np by querying atmost q locations in the corresponding codeword C ( x ) which is corrupted in at most δ · m locations. The main idea is to make D (cid:48) run D on appropriate inputs. Note that D cancorrect message symbols only when the codeword is corrupted in at most δ m locations.45he input to D (cid:48) is ( c + v , i ) where i ∈ [ n ] , c ∈ F np and v ∈ F np with w ( v ) ≤ δ (cid:48) · n .The idea is to encode c ∈ F np using the LDC C and then use decoding algorithm D on C ( c ) to correct the i th bit of codeword c i . Let v (cid:48) = S (cid:48) · v be a vector in F mp . Observe thefollowing:• The weight of vector v (cid:48) is at most δ · m as w ( v ) ≤ δ (cid:48) · n and every column of S (cid:48) hasat most ( s · m ) / ( ρ · n ) non-zero entries.• D ( C ( c ) + v (cid:48) , i ) outputs c i (the i th bit of c ∈ F np ) with probability 1 − ε by making atmost q queries to C ( c ) + v (cid:48) .• For every c ∈ C (cid:48) , C ( c ) + v (cid:48) = S (cid:48) · m + v (cid:48) = S (cid:48) · ( c + v ) . As S (cid:48) has at most s non-zeroentries in every row, D (cid:48) makes at most qs queries overall before returning c i .Thus, C (cid:48) is a ( q (cid:48) , δ (cid:48) , ε ) -LCC where q (cid:48) = qs and δ (cid:48) = ρδ / s . This article is entirely based on the problem of matrix rigidity and its multiple connectionsto other central problems in theoretical computer science such as static data structurelower bounds, error-correcting codes and communication complexity. By now, the readeris probably convinced of the harsh R eality of rigid matrices. Now, we mention a fewopen questions:1. One of the foremost open problems is to answer Valiant’s Question 1.2 or evenRazborov’s Question 1.3 by constructing explicit matrices of high rigidity. We havethus far been able to obtain explicit constructions of rigid matrices in the class P NP .2. One of the matrix families that we have not analysed so far is the incidence ma-trices of projective planes from the conjecture on Page 2. In [12], the authors showthat the monotone rigidity of incidence matrices of projective planes is α n for rank α √ n (for some α >
0) where monotone rigidity means that only non-zero entriescan be changed to reduce the rank of A . Obtaining upper or lower bounds on therigidity of such matrices remains largely open.3. On the computational front, what is the complexity of RIGID ( A , Q , s , r ) ?4. The matrix factorization problem is seemingly the dual of matrix rigidity where thegoal is to construct an explicit matrix that cannot be expressed as a product ofsparse matrices. That is, we want an explicit matrix A ∈ F n × n such that if A = A · A · · · A d then sparsity ( A i ) = Ω ( n + δ ) for some i ∈ [ d ] and δ >
0. The bestknown lower bound for matrix factorization is Ω ( n · λ d ( n )) for some small-growingfunction λ d ( n ) . In [22], authors obtain Ω ( n ) lower bounds for matrix factorization46hen d = A i ’s are symmetric or invertible matrices. It would be interesting tostudy the matrix factorization problem for other special matrices as well as in totalgenerality.5. In connection with error-correcting codes, can we obtain explicit constructions ofgood linear error-correcting codes whose generator matrices have low rigidity? Astandard methodology is to use techniques from derandomization toolkit to deran-domize the result of [8] mentioned in Remark 6.4. Acknowledgements
I am grateful to Ramprasad Saptharishi for introducing to me theconcept of matrix rigidity. I thank Ramprasad Saptharishi, Anamay Tengse and PreronaChatterjee for numerous technical discussions on the various papers presented in thisarticle. I thank Prahladh Harsha for providing several clarifications on the results in sub-section 3.3.
References [1] Josh Alman and Lijie Chen. Efficient Construction of Rigid Matrices Using an NPOracle. In David Zuckerman, editor, , pages1034–1055. IEEE Computer Society, 2019. doi:10.1109/FOCS.2019.00067 .[2] Josh Alman and R. Ryan Williams. Probabilistic rank and matrix rigidity. In HamedHatami, Pierre McKenzie, and Valerie King, editors,
Proceedings of the 49th AnnualACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada,June 19-23, 2017 , pages 641–652. ACM, 2017. doi:10.1145/3055399.3055484 .[3] Amey Bhangale, Prahladh Harsha, Orr Paradise, and Avishay Tal. Rigid MatricesFrom Rectangular PCPs.
CoRR , abs/2005.03123, 2020. URL: https://arxiv.org/abs/2005.03123 , arXiv:2005.03123 .[4] Ernie Croot, Vsevolod F. Lev, and PÃl’ter PÃ ˛al Pach. Progression-free sets in Z n4 are exponentially small. Annals of Mathematics , 185(1):331–337, 2017. URL: https://annals.math.princeton.edu/2017/185-1/p07 .[5] Amit Jayant Deshpande. Sampling-Based Algorithms for Dimension Reduc-tion.
PhD Thesis , 2007. URL: https://dspace.mit.edu/bitstream/handle/1721.1/38935/166267550-MIT.pdf;sequence=2 .[6] Alicia Dickenstein, Noaôr Fitchas, Marc Giusti, and Carmen Sessa. The member-ship problem for unmixed polynomial ideals is solvable in single exponential time.
Discrete Applied Mathematics , 33(1):73 – 94, 1991. doi:https://doi.org/10.1016/0166-218X(91)90109-A . 477] Zeev Dvir. On Matrix Rigidity and Locally Self-correctable Codes.
Comput. Complex. ,20(2):367–388, 2011. doi:10.1007/s00037-011-0009-1 .[8] Zeev Dvir. On the non-rigidity of generating matrices of good codes (written byoded goldreich). 2016.[9] Zeev Dvir and Benjamin L. Edelman. Matrix rigidity and the Croot-Lev-Pach Lemma.
Theory of Computing , 15(8):1–7, 2019. URL: , doi:10.4086/toc.2019.v015a008 .[10] Zeev Dvir, Alexander Golovnev, and Omri Weinstein. Static data structure lowerbounds imply rigidity. In Proceedings of the 51st Annual ACM SIGACT Symposium onTheory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019 , pages 967–978,2019. doi:10.1145/3313276.3316348 .[11] Zeev Dvir and Allen Liu. Fourier and Circulant Matrices Are Not Rigid. In , pages 17:1–17:23, 2019. doi:10.4230/LIPIcs.CCC.2019.17 .[12] Zeev Dvir, Shubhangi Saraf, and Avi Wigderson. Improved rank bounds for designmatrices and a new proof of Kellyâ ˘A ´Zs theorem.
Forum of Mathematics, Sigma , 2,2014. URL: .[13] Peter Elias and Richard A. Flower. The Complexity of Some Simple Retrieval Prob-lems.
J. ACM , 22(3):367–379, 1975. doi:10.1145/321892.321899 .[14] Joel Friedman. A note on matrix rigidity.
Combinatorica , 13(2):235–239, 1993. doi:10.1007/BF01303207 .[15] Oded Goldreich and Avishay Tal. Matrix rigidity of random Toeplitz matrices.
Com-putational Complexity , 27(2):305–350, Jun 2018. doi:10.1007/s00037-016-0144-9 .[16] Mika Göös, Toniann Pitassi, and Thomas Watson. The Landscape of Communi-cation Complexity Classes.
Comput. Complex. , 27(2):245–304, 2018. doi:10.1007/s00037-018-0166-6 .[17] Gower. Reflections on the recent solution of the cap-setproblem i. URL: https://gowers.wordpress.com/2016/05/19/reflections-on-the-recent-solution-of-the-cap-set-problem-i/ .[18] D. Yu Grigoriev. Using the notions of seperability and independence for proving thelower bounds on the circuit complexity(in Russian). Notes of the Leningrad branchof the Steklov Mathematical Institute, Nauka. 1976.[19] Prahladh Harsha. A Course on PCPs, codes and inapproximability. 2007. URL: .4820] Tom HÃÿholdt, Jacobus H. van Lint, and Ruud Pellikaan. Algebraic geome-try codes. 1998. URL: https://people.csail.mit.edu/dmoshkov/courses/codes/lec7-AG-codes.pdf .[21] Abhinav Kumar, Satyanarayana V. Lokam, Vijay M. Patankar, and Jayalal Sarma.Using Elimination Theory to Construct Rigid Matrices.
Comput. Complex. , 23(4):531–563, 2014. doi:10.1007/s00037-013-0061-0 .[22] Mrinal Kumar and Ben Lee Volk. Lower bounds for matrix factorization. In Shub-hangi Saraf, editor, , volume 169 of
LIPIcs , pages 5:1–5:20.Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.CCC.2020.5 .[23] Kasper Green Larsen. On Range Searching in the Group Model and CombinatorialDiscrepancy.
SIAM J. Comput. , 43(2):673–686, 2014. doi:10.1137/120865240 .[24] Satyanarayana V. Lokam. On the rigidity of Vandermonde matrices.
Theor. Comput.Sci. , 237(1-2):477–483, 2000. doi:10.1016/S0304-3975(00)00008-6 .[25] Satyanarayana V. Lokam. Quadratic Lower Bounds on Matrix Rigidity. In
Theoryand Applications of Models of Computation, Third International Conference, TAMC 2006,Beijing, China, May 15-20, 2006, Proceedings , volume 3959 of
Lecture Notes in ComputerScience , pages 295–307. Springer, 2006. doi:10.1007/11750321\_28 .[26] Satyanarayana V. Lokam. Complexity Lower Bounds using Linear Algebra.
Foun-dations and Trends in Theoretical Computer Science , 4(1-2):1–155, 2009. doi:10.1561/0400000011 .[27] Meena Mahajan and Jayalal Sarma. On the Complexity of Matrix Rank and Rigidity.
Theory Comput. Syst. , 46(1):9–26, 2010. doi:10.1007/s00224-008-9136-8 .[28] Rina Panigrahy, Kunal Talwar, and Udi Wieder. Lower Bounds on Near NeighborSearch via Metric Expansion. In , pages 805–814.IEEE Computer Society, 2010. doi:10.1109/FOCS.2010.82 .[29] Alexander A. Razborov. On rigid matrices (in Russian). 1989. URL: http://people.cs.uchicago.edu/~razborov/files/rigid.pdf .[30] Mohammad Amin Shokrollahi, Daniel A. Spielman, and Volker Stemann. A Re-mark on Matrix Rigidity.
Inf. Process. Lett. , 64(6):283–285, 1997. doi:10.1016/S0020-0190(97)00190-7 .[31] Victor Shoup. New Algorithms for Finding Irreducible Polynomials over FiniteFields. In ew York, USA, 24-26 October 1988 , pages 283–290. IEEE Computer Society, 1988. doi:10.1109/SFCS.1988.21944 .[32] Terence Tao. Open question: best bounds for cap sets. URL: https://terrytao.wordpress.com/2007/02/23/open-question-best-bounds-for-cap-sets/ .[33] Leslie G. Valiant. Graph-Theoretic Arguments in Low-Level Complexity. In JozefGruska, editor, Mathematical Foundations of Computer Science 1977, 6th Symposium,Tatranska Lomnica, Czechoslovakia, September 5-9, 1977, Proceedings , volume 53 of
Lecture Notes in Computer Science , pages 162–176. Springer, 1977. doi:10.1007/3-540-08353-7\_135 .[34] Henning Wunderlich. On a Theorem of Razborov.
Computational Complexity ,21(2):431–477, 2012. doi:10.1007/s00037-011-0021-5 .[35] Andrew Chi-Chih Yao. Should Tables Be Sorted?
J. ACM , 28(3):615–628, 1981. doi:10.1145/322261.322274 .[36] Stanislav Å¡Ã ˛ak. A turing machine time hierarchy.
Theoretical Computer Science ,26(3):327 – 333, 1983. URL: , doi:https://doi.org/10.1016/0304-3975(83)90015-4doi:https://doi.org/10.1016/0304-3975(83)90015-4