[PDF] Impossibility Results for Grammar-Compressed Linear Algebra

Abstract

To handle vast amounts of data, it is natural and popular to compress vectors and matrices. When we compress a vector from size N down to size n≪N , it certainly makes it easier to store and transmit efficiently, but does it also make it easier to process? In this paper we consider lossless compression schemes, and ask if we can run our computations on the compressed data as efficiently as if the original data was that small. That is, if an operation has time complexity T(inputsize) , can we perform it on the compressed representation in time T(n) rather than T(N) ? We consider the most basic linear algebra operations: inner product, matrix-vector multiplication, and matrix multiplication. In particular, given two compressed vectors, can we compute their inner product in time O(n) ? Or perhaps we must decompress first and then multiply, spending Ω(N) time? The answer depends on the compression scheme. While for simple ones such as Run-Length-Encoding (RLE) the inner product can be done in O(n) time, we prove that this is impossible for compressions from a richer class: essentially n 2 or even larger runtimes are needed in the worst case (under complexity assumptions). This is the class of grammar-compressions containing most popular methods such as the Lempel-Ziv family. These schemes are more compressing than the simple RLE, but alas, we prove that performing computations on them is much harder.

Full PDF

aa r X i v : . [ c s . CC ] O c t Impossibility Results for Grammar-Compressed Linear Algebra

Amir Abboud ∗ Arturs Backurs † Karl Bringmann ‡ Marvin K¨unnemann § Abstract

To handle vast amounts of data, it is natural and popular to compress vectors and matrices. Whenwe compress a vector from size N down to size n ! N , it certainly makes it easier to store and transmiteﬃciently, but does it also make it easier to process?In this paper we consider lossless compression schemes, and ask if we can run our computationson the compressed data as eﬃciently as if the original data was that small. That is, if an operationhas time complexity T p input-size q , can we perform it on the compressed representation in time T p n q rather than T p N q ? We consider the most basic linear algebra operations: inner product, matrix-vectormultiplication, and matrix multiplication. In particular, given two compressed vectors, can we computetheir inner product in time O p n q ? Or perhaps we must decompress ﬁrst and then multiply, spendingΩ p N q time?The answer depends on the compression scheme. While for simple ones such as Run-Length-Encoding(RLE) the inner product can be done in O p n q time, we prove that this is impossible for compressionsfrom a richer class: essentially n or even larger runtimes are needed in the worst case (under complexityassumptions). This is the class of grammar-compressions containing most popular methods such as theLempel-Ziv family. These schemes are more compressing than the simple RLE, but alas, we prove thatperforming computations on them is much harder. The idea of using compression to speed up computations can be found in any domain that deals with large-scale data, and ML is no exception. By exploiting redundancies and various forms of structure in a pieceof data, compression algorithms such as zip can reduce its size from N down to n , where n ! N . Thedata becomes cheaper to store, access, transmit, and perhaps also to analyze. Can we run our ML toolson the compressed data, without decompressing it ﬁrst, and make the computation times proportional to n rather than N ? Since most ML algorithms boil down to large amounts of basic algebraic operations such asmultiplications of vectors and matrices, with inner product as the atomic operation, the most basic questionin this context is: Main Question.

Given two N -dimensional vectors, each in a compressed form of size n ! N , can we computetheir inner product in ˜ O p n q time rather than O p N q ?The answer, of course, depends on the compression scheme that we use. There seems to be an inherenttension: more complex schemes have higher compression rates but are harder to analyze without decompres-sion.First, let us clarify that our interest is in exact computations and lossless compressions, even thoughlossy techniques such as dimensionality reduction [16] are widely used by the ML community. In many cases,e.g. when performing a basic algebraic operation within a larger pipeline, even a small amount of errorcould add up to make the ﬁnal result unintelligible. Recent years has seen a growing interest in exploringthe potential of lossless compression for speeding up ML [35, 83, 59, 65]. An inspiring result was honorably ∗ IBM Almaden Research Center, [email protected] † Toyota Technological Institute at Chicago, [email protected] . Supported by an NSF Grant CCF-2006806. ‡ Saarland University and Max Planck Institute for Informatics, Saarland Informatics Campus, [email protected] . This work is part of the project TIPEA that has received funding from the European Research Council(ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement No. 850979). § Max Planck Institute for Informatics, Saarland Informatics Campus, [email protected] We use the notation ˜ O p n q “ n ¨ N o p q for near-linear time, hiding small terms such as log factors. N ˆ d matrix A can be compressed downto a matrix of size d ˆ d such that the optimal solutions of Least-Mean-Squares (LMS) instances are exactlythe same on A and A . This is an example where for a speciﬁc task (LMS solvers) a speciﬁc compressionscheme (designed by the authors) leads to a solution in time T p n q rather than T p N q , giving a 100x speedupon benchmark data; it makes one wonder if this approach can work in a more general setting.For rather simple compression methods, the answer to our question is positive. A recent Communicationsof the ACM article [35] exhibits Compressed Linear Algebra [32, 33, 34] a compression scheme for vectors andmatrices that uses simple techniques such as Run Length Encoding (RLE) and allows for fast computationson the compressed data with impressive experimental results when integrated into ML systems. The RLEencoding of a vector simply replaces runs of values by tuples indicating the value and the length of the run;e.g. the binary vector 00011111000 gets encoded as 0 . Given two vectors encoded in this way with size n RLE , a simple one-pass algorithm can compute their inner product in O p n RLE q time. Before that, therewere many algorithms for exploiting succinct encodings of sparse vectors [78, 56, 52]; e.g. by simply listingthe nonzero locations the binary vector 0100001000 gets encoded as p , q . These encodings allow for a lineartime inner product computation as well.However, these simple methods are often not very compressing. At the other end of the spectrum, we havethe heavy-duty and time-tested family of Grammar-Compressions [54] that includes the Lempel-Ziv-family(LZ77, LZ78, LZW, etc.) [58, 91, 86], Byte-Pair Encoding [82], dictionary methods, and others [69, 63].These compressions are used in ubiquitous applications such as zip, Snappy, GIF, PNG, the built-in Unixutility compress , and even in PDF. Their compression rates are often on a whole diﬀerent level comparedto RLE; e.g. the current draft of this paper reduces from 10KB to 4KB with zip but RLE has no eﬀect.See Table 1 and [35, Table 1] for empirical data showing the quantitative potential of these methods forsome standard ML datasets. What all these more elaborate compression techniques have in common is thatthey essentially (up to low order terms [76]) encode a string (or vector) by a

Straight-Line Program (SLP):a restricted kind of a context-free grammar that can only produce one string. In more detail, an SLP isdeﬁned over some alphabet Σ, say t , u , and it is a set of replacement rules (or productions) of a verysimple form: a rule is either a symbol in Σ or it is the concatenation of two previous rules (under some ﬁxedordering of the rules). The last replacement rule is the sequence deﬁned by the SLP. For example, we cancompress the sequence 01010101 with the rules S Ñ S Ñ S Ñ S S ; S Ñ S S ; S Ñ S S and S corresponds to the sequence 01010101. See Figure 1. For some strings this can give an exponentialcompression, e.g. the sequence p q N requires only O p log N q rules; note that its RLE has size N . Whileﬁnding the smallest SLP for a given string is NP-Hard, it can be approximated either by the above practicalmethods or provably up to logarithmic factors [76, 20, 79, 48, 50].Thus, the holy grail in this context is to perform algebraic operations in T p compression-size q time evenwhen the vectors are compressed with zip or one of the other heavy-duty grammar compressions; that is,without unzipping them ﬁrst. Ideally, we would implement a “zip-inner-product” function that takes two zipﬁles encoding vectors and computes the inner product in near-linear time (which may not even be enoughtime to unzip them). A recent paper titled “When LZW meets ML” [59] makes partial progress towards thisgoal: the inner product can be computed eﬃciently on their tuple oriented coding where each coordinate isgrammar-compressed separately, but not the vector as a whole. This makes their method less compressingsince, unlike with zip, the size of the encoding is always at least the dimensionality of the vectors. Main Question (Restated) . Given two N -dimensional vectors, each grammar-compressed down to size n ! N ,can we compute their inner product in ˜ O p n q time rather than O p N q ?While eﬃciently analyzing these grammars may seem like a daunting task, a large body of works over the2 Ñ S Ñ S Ñ S S S Ñ S S S Ñ S S (a) S S S S S S S S S S S S S S S (b) S S S S S (c) Figure 1: (a) An SLP generating the sequence 01010101. (b) The corresponding parse tree. (c) The acyclicgraph corresponding to the SLP.last three decades has equipped us with an ingenious toolbox exactly for this purpose. It turns out that manyimportant problems can indeed be solved surprisingly faster than the decompress-then-solve bound, e.g. inpattern matching [71, 53, 11, 36, 18, 61, 40, 45, 49]. This gives hope for a positive answer to our question andthat many ML computations could be sped up by operating on grammar-compressions. These algorithmstypically look at the parse trees that have N leaves but only n distinctly labelled internal nodes (see Figure 1),and traverse them starting from the root down, while attempting to only spend time proportional to thedepth of the tree per distinct label. Using tricks that restructure the grammar to make the tree balanced,the depth can be upper bounded by O p log N q , making the total time O p n log N q . To learn more about thissubﬁeld of Algorithm Design, we refer the reader to the surveys [90, 57, 39, 81, 41, 73, 77, 64, 80]. Alas, our main result is a negative resolution to the main question above. We apply the tools of theoreticalcomputer science, and the recently blossoming ﬁeld of ﬁne-grained complexity , in order to shed light into themathematical foundations of Compressed Linear Algebra. We prove new hardness reductions showing caseswhere the time to compute the inner product must be large (under popular complexity assumptions) evenwhen the vectors have very small grammar compressions. For example, there are N -dimensional vectors withgrammar-compressions of size n “ O p N { q where the inner product must take ˜Ω p n q time to compute.The consequences to other settings such as matrix-vector multiplication are further explained below. Thiscreates a strong separation between grammar-compressions, where we prove an ˜Ω p n q lower bound, andRLE, where an O p n q algorithm exists. This formally justiﬁes the use of simpler methods in ML systems andguides researchers away from searching for an ultra-eﬃcient “zip-inner-product” function. Fine-Grained Complexity

Negative results are paramount to the success of any scientiﬁc discipline.The most prominent framework for proving such results in computer science is the theory of NP-Hardness,where one proves that a problem cannot be solved in polynomial time unless P “ N P which would implybreakthrough algorithms for famously-hard problems such as SAT and Subset Sum. Without this theory,countless hours would have been wasted by algorithm designers trying to come up with provable, worst-case,polynomial time algorithms for NP-Hard problems. Due to the increase in data sizes of recent years, theethos of this theory that “eﬃcient = polynomial” has become obsolete, and a more demanding attitudewhere “eﬃcient = linear” has arisen. By replacing the polynomial reductions of NP-Hardness with moreeﬃcient ones (often linear), ﬁne-grained complexity can prove hardness results even for problems that havepolynomial time algorithms. Exemplary results show that linear or subquadratic algorithms for certainproblems, which admit quadratic-time algorithms, would refute popular assumptions (conjectures that aresimilar to but stronger than P ‰ N P ) and have breakthrough consequences for famously hard problems. The more standard notation is n ´ o p q which indicates an Ω p n . q lower bound, no matter how close to 2 we go. Thatis, only mildly subquadratic algorithms are possible, e.g. by shaving log factors. No algorithm candecide, in subquadratic O p n ´ ε q time, if there are three numbers that sum to zero among a given set of n numbers ”. A recent survey on ﬁne-grained complexity [89] cites dozens of papers, mainly in computationalgeometry [38] but also in other ﬁelds [72, 85, 7, 8, 21, 55, 10, 43], that prove 3SUM-Hardness results showingthat their algorithms are optimal up to a refutation of this conjecture. In this paper, we prove the ﬁrst3SUM-Hardness results in ML as far as we are aware. The 3SUM assumption and its generalizations thatwe use in the theorems below are formally deﬁned and discussed in Section 2. Vector Inner Product

Our ﬁrst and main result is a reduction from 3SUM to compressed inner productof two vectors, negatively resolving our main question.

Theorem 1.1.

Assuming the 3SUM conjecture, the inner product of two N -dimensional vectors that aregrammar-compressed to size n “ Θ p N q cannot be computed in O p n ´ ε q time where ε ą . Moreover, we strengthen and generalize this result in several ways. First, we address the dependencebetween n and N : could it be that for more or less compressed vectors the picture is diﬀerent? Usinga stronger variant of the 3SUM conjecture, the same lower bound of n holds even when n “ N { , andtherefore our result can be stated as an ˜Ω p N q lower bound which is quite close to the trivial upper boundof O p N q . Moreover, by a (highly nontrivial) boosting of our reduction, in Section 3 we establish an ˜Ω p N q lower bound with n “ N ε for any ε ď {

3. That is, when the vectors are highly compressed even n time isnot suﬃcient ; this is in stark contrast to the case of RLE-compressed vectors where O p n q is always possible. Matrix-Vector Multiplication

Next, we consider the problem of computing the M ¨ v product of an N -dimensional vector v that is compressed to size n with an N ˆ N matrix M where each row is compressedto size O p n q . Perhaps computing these N inner products as a batch can be done faster than computingeach separately. Alas, by another signiﬁcant boosting of our reduction we prove that this is also impossible.While if the encoding is with RLE the product can be computed in O p N n q time, which is linear in therepresentation size of the matrix and thus optimal, it turns out that for grammar compressions ˜Ω p N n q isrequired. The proof is in Section 4. Theorem 1.2.

Assuming the 3SUM conjecture, the product of an N ˆ N -dimensional matrix, where eachrow is grammar-compressed to size n “ Θ p N q , with an N -dimensional vector that is grammar-compressedto size n cannot be computed in O p N n ´ ε q time where ε ą . Matrix Multiplication

Finally, we consider matrix multiplication of compressed matrices C “ A ¨ B .There are multiple ways to compress an N ˆ N matrix: we might compress each row or each column, sothat the compression size is N ¨ n , or treat the whole matrix as an N -dimensional vector and compressit to size n . Each way may lead to a diﬀerent time complexity, but no matter which way we choose,the ﬁrst question to ask, and that will determine the time we can hope for, is: what is the output size?The na¨ıve answer is that the matrix C has size N ˆ N , but since A and B are compressed, shouldn’t weexpect C to also be representable with a small grammar of size n ! N ? Unlike the above questions thatdeal with computation time, this is an information-theoretic question, and in Section 5 we give strong andunconditional negative answers: the matrix C cannot be grammar-compressed to size o p N { log N q even when A and B are strongly compressible. Moreover, some of our results hold even when A and B have verysmall RLE encodings. Therefore, our results should be of interest to the compressed linear algebra projectbeyond grammar-compressions. We remark that some complexity assumption is necessary for proving the kind of results we are interested, since uncon-ditionally proving even very weak lower bounds on the time complexity such as Ω p n ` ε q and even for NP-Hard problems likeSAT (not to mention inner product) is far beyond current techniques [12]. Strictly speaking, such a conditional lower bound of Ω p n q for highly compressible inputs can already be proven by com-bining a known N , namely a bound of Ω p N ǫ q for some non-explicit, possibly tiny ǫ . Our lower bounds always give an explicit, reasonably largevalue for ǫ . echnical Remarks While the tools for proving NP-Hardness results for grammar-compressed data areold [64], they only apply in the unrealistic setting where n “ log N , and we are interested in more ﬁne-grained results. Only recently, a FOCS paper by the authors [2] introduced the techniques for proving suchlower bounds. This previous work focused on combinatorial pattern matching problems and the currentwork extends it to the setting of linear algebra. Our results establish the hardness even of the simplestsetting of binary vectors and matrices over t , u . This setting is particularly studied due to its connectionto graphs, where grammar compressions have also received a lot of attention [66, 67]. Moreover, we showthat even deciding if the inner product is 0 or ě ℓ distancebetween two vectors is also easy. Like almost all results in ﬁne-grained complexity [89], our lower boundsare against both deterministic and randomized algorithms.Finally, we remark that our lower bounds are for the most basic setting of worst-case instances. Extendingthem to average-case results, showing that instances that come from certain natural distributions are alsohard, is an open question. However, notice that even if the original vectors come from a natural distribution,the distribution of the grammar representations will be completely diﬀerent (and probably far from natural).Therefore, exploiting the structure of non-worst-case instances seems far beyond current reach in this context. There have been a few recent works showing ﬁne-grained complexity results for machine learning problems.In particular, [14] showed that the classic algorithm of Viterbi that computes the most likely path in aHidden Markov Model which results in a given sequence of observations is essentially optimal assumingcertain complexity theoretical hypotheses. Another work [13] showed conditional hardness results for multipleempirical risk minimization problems such as kernel support vector machines, kernel ridge regression, andtraining the ﬁnal layer of a neural network. Furthermore, there are many works that show hardness forproblems that are used in machine learning literature. This includes conditional lower bounds for kernel low-rank approximation [68], closest pair and its variants [9, 75, 88, 24, 29, 28], maximum inner product [6, 22, 23],earth mover’s distance (a.k.a. Wasserstein metric) [74], dynamic time warping distance [3, 17].Further contexts in which lossless compressions are used for ML applications, where the primary focusis on other aspects than increasing algorithmic performance, include compressing and accelerating modelsfor deployment on resource-constrained devices (see [44, 26]; e.g., lossless compressions are used to compressweights after a quantization step) or implementing the principle of minimum description length for featurelearning (see [70]).Outside of ML, the idea of improving eﬃciency by operating on (losslessly) compressed data is well-established in databases [1, 25, 87, 46], and is gaining traction also in bioinformatics [84].

As described in Section 1, a grammar compression of a sequence (or a vector) is an SLP that produces thesequence. In our proofs we will use the following simple observation about SLPs.

Proposition 2.1.

Let G be an SLP with start symbol S that generates a sequence s . For any α P N , we cancompute an SLP G that generates the α -fold repetition of s , i.e., s α “ s s ¨ ¨ ¨ s looomooon α times , and has size | G | ` O p log α q in time O p| G |q .Proof sketch. Using O p log α q repeated squaring rules S i Ñ S i ´ S i ´ and S Ñ S , we obtain non-terminals S , . . . , S t log α u generating s i for i P t , . . . , t log α u u . It is straightforward to combine these non-terminals,according to the binary representation of α , to generate s α using only O p log α q additional non-terminals.Using this property, we can often compress sequences much more eﬃciently than run-length encodingalone could: E.g., repetitive patterns like p q n can be encoded using only Θ p log n q bits instead of Θ p n q .Indeed, our constructions crucially exploit a repeated application of this property to compress hard instancesto very small sizes. 5 he Complexity Assumptions As discussed in Section 1, the impossibility results in ﬁne-grained com-plexity are based on certain popular conjectures. One of the central ones concerns the 3SUM problem, whichhas a few equivalent formulations (up to linear time transformations [31]); we will mostly use the following . Deﬁnition 2.2 (The 3SUM Problem) . Given three sets

A, B, C of m integers in t , . . . , U u , decide if thereis a triple a P A, b P B, c P C such that a ` b “ c . It is a simple exercise (that is often given in interviews) to come up with an O p m q time algorithm, anddespite decades of eﬀorts, only mildly subquadratic O p m { log c m q bounds for a small 0 ă c ă The 3SUM Conjecture.

No algorithm can solve the 3SUM problem in O p m ´ ε q time, where ε ą U is equivalent to the case where U “ O p m log m q (see Lemma B.1 in [5] for a proof). Therefore, we will assume this bound on U . When U becomes toosmall, the problem becomes easy due to an O p m ` U log U q algorithm using Fast Fourier Transform [27].However, the problem is conjectured to be hard even when U “ Θ p m q and this is referred to as the Strong3SUM Conjecture [10, 2]. This stronger assumption allows us to strengthen our lower bounds by reducing N .Second, the hardness of the more general kSUM problem is also used as a complexity assumption [4, 2]. In theformulation that we will use, we are given k sets A , . . . , A k of m integers in t , . . . , U u where U “ Θ p m r k { s q and are asked to decide if there are k numbers, one from each set, such that a ` ¨ ¨ ¨ ` a k ´ “ a k . TheStrong kSUM conjecture states that cannot be done in O p m r k { s ´ ε q time, for any ε ą

0. We will use thisassumption to prove lower bounds even when n is much smaller than N . Third, 3SUM and the other hardnessassumptions in ﬁne-grained complexity are conjectured to be true even against randomized algorithms thatsucceed with high probability. This is important since some of our reductions are randomized. In this section we present the proof of Theorem 1.1 by giving a reduction from 3SUM to the inner productof compressed vectors. A slightly weaker conditional lower bound of ˜Ω p N { q for vectors compressible to n “ N { can be extracted from the proof of Theorem 5.11 in [2]. We use similar tricks, but a diﬀerentand more optimized construction to obtain a stronger conditional lower bound of ˜Ω p N { q already on lesscompressible vectors with n “ N { . Technically, the novelty is that we manage to encode two sets ( A and B ) into one vector of length mU rather than m U . This new construction is crucial for the extensions weshow – we do not see how to prove any lower bound for matrix-vector inner product without building onthis new construction. Proof.

Given an instance of 3SUM, that is, three sets

A, B, C of m integers in t , . . . , U u , we show how toconstruct vectors v A ` B , v C P t , u N with N “ mU log m such that: (1) v A ` B ¨ v C ě a P A, b P B, c P C with a ` b “ c , (2) both vectors have a compression of size O p m log U q , and (3) theconstruction time is O p m log U q .This reduction suﬃces for proving Theorem 1.1 due to the following calculations. Since (as discussedin Section 2) we can assume that U “ Θ p m log m q , the reduction produces two vectors of dimension N “ Θ pp m log m q q and compressed size n “ Θ p N { q “ Θ p m log m q , such that the inner product revealsthe answer to the 3SUM instance. Therefore, an O p n ´ ε q -time algorithm would solve the 3SUM instancein time O p m ´ ε polylog m q , refuting the 3SUM conjecture. Note that the O p m log U q time for the reductionitself is negligible. Moreover, if we assume the Strong 3SUM conjecture, we can start with 3SUM instanceswhere U “ O p m q and get vectors of dimension N “ O pp m log m q q , ruling out inner product algorithmswith time O p N ´ ε q .We now present the construction of the vectors. As a ﬁrst step, we observe that for any set X Ď t , ..., U u ,we can compress its characteristic vector v X P t , u U , i.e., v X r i s “ i P X , to size O p| X | log U q as follows.We write X “ t x , . . . , x | X | u with x ă x ă ¨ ¨ ¨ ă x | X | and observe that v X : “ x ´ x ´ x ´ . . . x | X | ´ x | X |´ ´ U ´ x | X | , For example, instead of a ` b “ c or a ` b ` c “ t and ask for a ` b ` c “ t . U and can thus be encoded using O p log U ) symbols using Proposi-tion 2.1. In total, we obtain a compression of size O p| X | log U q , which can be computed in time O p| X | log U q as well.Let A “ t a , . . . , a n u . The central idea is to let v A ` B , v C consist of m blocks of size 2 U , where the i -thblock in v A ` B gives the characteristic vector of the set a i ` B “ t a i ` b | b P B u Ď t , . . . , U u and the i -thblock in v C gives the characteristic vector of C Ď t , . . . , U u . Formally, we deﬁne v A ` B : “ a v B U ´ a loooooomoooooon v a ` B a v B U ´ a loooooomoooooon v a ` B . . . a m v B U ´ a m looooooomooooooon v am ` B N ´ mU ,v C : “ v C U v C U . . . v C U N ´ mU . (Here, the last block of 0s only serves to get the desired dimension of N for technical reasons.) Weobserve that v A ` B and v C have an inner product of at least 1 if and only if the characteristic vectors ofsome block i have a common 1-entry. Thus, consider any block i : We have v a i ` B r k s “ p v C U qr k s “ k ´ a i P B and k P C , i.e., a i P A, k ´ a i P B, k P C is a solution of the given 3SUM instance. Thus, v A ` B ¨ v C ě a P A, b P B, c P C such that a ` b “ c , as desired.It remains to show that a O p m log U q -sized compression of v A ` B and v C can be computed in time O p m log U q : Clearly, since v C U can be compressed to size O p m log U q eﬃciently, we can also compress its m -fold repetition using O p log m q additional symbols using Proposition 2.1, as well 0 N ´ mU which takes O p log N q “ O p log mU q additional symbols; thus, v C can be compressed to size O p m log mU q in time O p m log U q . Furthermore, recall that we can compress v B to size O p m log U q eﬃciently, and let G be anSLP with starting symbol S B generating v B . Thus, to compress v a i ` B , we only need to compress thesurrounding blocks 0 a i , 0 U ´ a i and can reuse S B to generate v B . Since we can encode the 0-blocks using O p log U q additional non-terminals, this yields a compression size of O p log U q per block i . Together with a O p log mU q encoding of the trailing block 0 N ´ mU , this yields again a compression of size O p m log U q . Notethat reusing a non-terminal generating v B was instrumental in giving a compression of size O p m log m q ratherthan O p m log m q and that this compression can indeed be computed in time O p m log U q and concludes theclaim.With more work, the above arguments can be generalized to reduce a k SUM instance with k sets of m integers in t , . . . , U u to vectors of dimension N “ Θ p m k ´ U q and compressed size O p m log U q in time O p m log U q . The main idea is to encode a shift of A k ´ for each tuple of A , . . . , A k ´ in one vector, andencode m k ´ repetitions of the remaining set A k in the other vector. Under the Strong k SUM conjecture, thisyields a conditional lower bound for inner product of ˜Ω p N { q where n “ O pp N { U q {p k ´ q log N q . Thus, forany ﬁxed ε ą

0, let k be a suﬃciently large constant integer such that 1 {p k ´ q ă ε , then the Strong k SUMconjecture implies that N -dimensional vectors with compressed size n “ O p N ε q cannot have an O p N { ´ δ q algorithm for any constant δ ą

0. We formally prove the result in the appendix.

In this section we sketch how to prove Theorem 1.2 by giving a reduction from 3SUM to Matrix-Vectormultiplication on compressed data. We give a complete formal proof in the appendix.A helpful tool for this task is the following self-reduction for 3SUM, which follows from combining a knownself-reduction [62] with a standard universe-size reduction technique on each produced instance [15, 72, 5].

Lemma 4.1 (Self-Reduction for 3SUM) . Let ď s “ s p m q ď m and ǫ ą be arbitrary. If there is analgorithm that, given a target t and L “ O pp m { s q q sets A ℓ , B ℓ , C ℓ of s integers in t , . . . , O p s log s qu ,determines for all ď ℓ ď L whether there are a P A ℓ , b P B ℓ , c P C ℓ with a ` b ` c “ t in total time O p m ´ ǫ q ,then the 3SUM conjecture is false. Given the above self-reduction, the basic idea is as follows. We construct a matrix M whose rowsare indexed by the instance 1 ď ℓ ď L and the aim is to construct the row M ℓ and the vector v such that M ℓ ¨ v ě A ℓ, B ℓ , C ℓ contains a solution, i.e., a P A ℓ , b P B ℓ , c P C ℓ with a ` b ` c “ t .Unfortunately, we cannot apply our Vector Inner Product construction directly: this would encode the set A ℓ ` B ℓ “ t a ` b | a P A ℓ , b P B ℓ u into the row M ℓ and the set C ℓ into the vector v – however, in the matrix7roduct M v , each row M ℓ is multiplied with a ﬁxed vector v , while the C ℓ ’s diﬀer for each ℓ . We overcomethis issue by adapting our construction to encode the set A ℓ ` B ℓ ` C ℓ “ t a ` b ` c | a P A ℓ , b P B ℓ , c P C ℓ u into the row M ℓ , and only the common target t into v . As all instances use the same target t , this is indeedpossible.Speciﬁcally, using the ideas of Theorem 1.1, which produces a 2 sU -dimensional vectors encoding the sets A ` B and C , both having compressed size O p s log U q , we show how to produce 3 s U -dimensional vectors M ℓ and v encoding the sets A ℓ ` B ℓ ` C ℓ and t t u , both having compressed size O p s log U q . This yieldsa p L ˆ s U q -dimensional matrix M and 3 s U -dimensional vector v . There is a choice s “ Θ p m { q thatleads to a quadratic matrix M with dimension N “ Θ p m { q (as it has O pp m { s q q “ O p m { q rows and O p s U q “ O p s q “ O p m { q columns), with row compressions of size n “ Θ p s log s q “ Θ p m { log m q « N { . Thus, any O p N n ´ ε q algorithm computing M ¨ v would solve 3SUM instances in time ˜ O p m ´ ε { q ,refuting the 3SUM conjecture. In this section, we consider the problem of computing the matrix product C of two N ˆ N matrices A, B .We consider the following representations of the input matrices: • Convenient compression: A is compressed row-wise, B is compressed column-wise. This repre-sentation allows us to compute any single entry C i,j by running an inner product algorithm on thecompressed row A i and the compressed column B j . The size of the input is O p N ¯ n in q , where ¯ n in is themaximum compressed size of the rows A i and columns B j . • Strong compression:

For any matrix M , we deﬁne strong compression as a grammar compressionof M or M T when viewed as n -dimensional vector, whichever is shortest. When both A, B are givenas strong compression, the resulting representation can have a much smaller size (it can be o p N q ), butto compute a single entry C i,j , we ﬁrst might need to obtain a representation of the row A i and thecolumn B j .Similarly, we have several options for representing C : • Row-wise compression of C . This compression is particularly useful if we aim to compute re-peated matrix products A p A p¨ ¨ ¨ p A k B qqq . The output size is O p N ¯ n out q , where ¯ n out is the maximumcompressed size over all rows of C . • Column-wise compression of C . This compression is particularly useful if we aim to computerepeated matrix products ppp AB q B q ¨ ¨ ¨ q B k . The output size is O p N ¯ n out q , where ¯ n out is the maximumcompressed size over all columns of C . • Strong compression of C . This compression has the smallest output size, which can be even o p N q .We show the following result: Theorem 5.1.

For inﬁnitely many N, there are N ˆ N matrices A, B with1. convenient compression of size O p N log N q (already under RLE), and2. strong compression of size O p log N q , such that3. the matrix product C “ AB has size Ω p N { log N q in any grammar-compression (row-wise, column-wise, or strong). As a consequence, there can be no o p N { log N q algorithm for matrix-matrix multiplication (for any ofour discussed representations), since already writing the output requires time Ω p N { log N q .The rough proof strategy is to construct an instance C “ AB such that C and C T , when viewed as N -dimensional vectors, contain all substrings of length 2 log n . By the following standard lemma, such astring has no grammar compression of size o p N { log N q .8 emma 5.2 (see, e.g., [20, Lemma 3]) . Let ℓ P N . If a string x is generated by a grammar of size n , then x contains at most nℓ distinct substrings of length ℓ .Proof of Theorem 5.1. Let ℓ P N . We ﬁrst deﬁne the matrices A , B where A is a p ℓ ˆ ℓ q matrix with rowsindexed by strings x P t , u ℓ in lexicographic order, and B is a p ℓ ˆ ℓ p ℓ qq matrix with columns indexedby p y, k q P t , u ℓ ˆ t , . . . , ℓ u in lexicographic order. For arbitrary z P t , u ℓ , let diag p z q denote the ℓ ˆ ℓ diagonal matrix with z on the diagonal. We deﬁne A x : “ p x | ℓ q , B y, q ,..., p y, ℓ q : “ ˆ diag p ℓ q

The broader impact of our work is to inform algorithm design for compressed linear algebra, which can leadto faster algorithms for a variety of tasks on large data sets. The ethical consequences depend on the speciﬁcapplication. We do not see any inherently new concerns raised by our results, beyond those that followgenerally from faster algorithms and an increased ability to process data.

References [1] Daniel Abadi, Samuel Madden, and Miguel Ferreira. Integrating compression and execution in column-oriented database systems. In

Proceedings of the 2006 ACM SIGMOD international conference onManagement of data , pages 671–682, 2006.[2] Amir Abboud, Arturs Backurs, Karl Bringmann, and Marvin K¨unnemann. Fine-grained complexityof analyzing compressed data: Quantifying improvements over decompress-and-solve. In , pages 192–203, 2017.[3] Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for lcs andother sequence similarity measures. In , pages 59–78. IEEE, 2015.[4] Amir Abboud and Kevin Lewi. Exact weight subgraphs and the k-sum conjecture. In

InternationalColloquium on Automata, Languages, and Programming , pages 1–12. Springer, 2013.[5] Amir Abboud, Kevin Lewi, and Ryan Williams. Losing weight by gaining edges. In

European Symposiumon Algorithms , pages 1–12. Springer, 2014.[6] Amir Abboud, Aviad Rubinstein, and Ryan Williams. Distributed pcp theorems for hardness of ap-proximation in p. In ,pages 25–36. IEEE, 2017.[7] Amir Abboud and Virginia Vassilevska Williams. Popular conjectures imply strong lower bounds fordynamic problems. In , pages434–443. IEEE, 2014. 98] Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. Consequences of faster alignmentof sequences. In

International Colloquium on Automata, Languages, and Programming , pages 39–51.Springer, 2014.[9] Josh Alman and Ryan Williams. Probabilistic polynomials and hamming nearest neighbors. In , pages 136–150. IEEE, 2015.[10] A. Amir, T. M. Chan, M. Lewenstein, and N. Lewenstein. On hardness of jumbled indexing. In

Proc.ICALP , volume 8572, pages 114–125, 2014.[11] Amihood Amir, Gary Benson, and Martin Farach. Let sleeping ﬁles lie: Pattern matching in z-compressed ﬁles.

Journal of Computer and System Sciences , 52(2):299–307, 1996.[12] Sanjeev Arora and Boaz Barak.

Computational complexity: a modern approach . Cambridge UniversityPress, 2009.[13] Arturs Backurs, Piotr Indyk, and Ludwig Schmidt. On the ﬁne-grained complexity of empirical riskminimization: Kernel methods and neural networks. In

Advances in Neural Information ProcessingSystems , pages 4308–4318, 2017.[14] Arturs Backurs and Christos Tzamos. Improving viterbi is hard: Better runtimes imply faster cliquealgorithms. In

Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages311–321. JMLR. org, 2017.[15] Ilya Baran, Erik D Demaine, and Mihai Patra¸scu. Subquadratic algorithms for 3sum. In

Workshop onAlgorithms and Data Structures , pages 409–421. Springer, 2005.[16] Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to im-age and text data. In

Proceedings of the seventh ACM SIGKDD international conference on Knowledgediscovery and data mining , pages 245–250, 2001.[17] Karl Bringmann and Marvin K¨unnemann. Quadratic conditional lower bounds for string problems anddynamic time warping. In ,pages 79–97. IEEE, 2015.[18] Patrick C´egielski, Irene Guessarian, Yury Lifshits, and Yuri Matiyasevich. Window subsequence prob-lems for compressed texts. In

Proc. 1st International Computer Science Symposium in Russia (CSR’06) ,pages 127–136. Springer, 2006.[19] Timothy M Chan. More logarithmic-factor speedups for 3sum,(median,+)-convolution, and some geo-metric 3sum-hard problems.

ACM Transactions on Algorithms (TALG) , 16(1):1–23, 2019.[20] Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and AbhiShelat. The smallest grammar problem.

STOC’02 and IEEE Transactions on Information Theory ,51(7):2554–2576, 2005.[21] Kuan-Yu Chen, Ping-Hui Hsu, and Kun-Mao Chao. Approximate matching for run-length encodedstrings is 3sum-hard. In

Annual Symposium on Combinatorial Pattern Matching , pages 168–179.Springer, 2009.[22] Lijie Chen. On the hardness of approximate and exact (bichromatic) maximum inner product. arXivpreprint arXiv:1802.02325 , 2018.[23] Lijie Chen, Shaﬁ Goldwasser, Kaifeng Lyu, Guy N Rothblum, and Aviad Rubinstein. Fine-grained com-plexity meets ip= pspace. In

Proceedings of the Thirtieth Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 1–20. SIAM, 2019.[24] Lijie Chen and Ryan Williams. An equivalence class for orthogonal vectors. In

Proceedings of theThirtieth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 21–40. SIAM, 2019.1025] Zhiyuan Chen, Johannes Gehrke, and Flip Korn. Query optimization in compressed database systems.In

Proceedings of the 2001 ACM SIGMOD international conference on Management of data , pages271–282, 2001.[26] Tejalal Choudhary, Vipul Mishra, Anurag Goswami, and Jagannathan Sarangapani. A comprehensivesurvey on model compression and acceleration.

Artif. Intell. Rev. , 53(7):5113–5155, 2020.[27] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Cliﬀord Stein.

Introduction to algorithms .MIT press, 2009.[28] Karthik CS and Pasin Manurangsi. On closest pair in euclidean metric: Monochromatic is as hard asbichromatic. In . SchlossDagstuhl-Leibniz-Zentrum fuer Informatik, 2018.[29] Roee David and Bundit Laekhanukit. On the complexity of closest pair via polar-pair of point-sets.

SIAM Journal on Discrete Mathematics , 33(1):509–527, 2019.[30] Dheeru Dua and Casey Graﬀ. UCI machine learning repository, 2017.[31] Bartlomiej Dudek, Pawel Gawrychowski, and Tatiana Starikovskaya. All non-trivial variants of 3-ldtare equivalent.

CoRR , abs/2001.01289, 2020.[32] Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. Com-pressed linear algebra for large-scale machine learning.

Proc. VLDB Endow. , 9(12):960–971, 2016.[33] Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. Scalingmachine learning via compressed linear algebra.

SIGMOD Rec. , 46(1):42–49, 2017.[34] Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. Com-pressed linear algebra for large-scale machine learning.

VLDB J. , 27(5):719–744, 2018.[35] Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. Com-pressed linear algebra for declarative large-scale machine learning.

Commun. ACM , 62(5):83–91, 2019.[36] Martin Farach and Mikkel Thorup. String matching in Lempel-Ziv compressed strings. In

Proc. 27thAnnual ACM Symposium on Theory of Computing (STOC’95) , pages 703–712. ACM, 1995.[37] Ari Freund. Improved subquadratic 3sum.

Algorithmica , 77(2):440–458, 2017.[38] Anka Gajentaan and Mark H. Overmars. On a class of o p n q problems in computational geometry. Computational Geometry , 5(3):165–185, 1995.[39] Leszek Gasieniec, Marek Karpinski, Wojciech Plandowski, and Wojciech Rytter. Eﬃcient algorithmsfor Lempel-Ziv encoding.

Proc. 5th Scandinavian Workshop on Algorithm Theory (SWAT’96) , pages392–403, 1996.[40] Pawe l Gawrychowski. Pattern matching in Lempel-Ziv compressed strings: fast, simple, and determin-istic. In

Proc. 19th Annual European Symposium on Algorithms (ESA’11) , pages 421–432. Springer,2011.[41] Raﬀaele Giancarlo, Davide Scaturro, and Filippo Utro. Textual data compression in computationalbiology: a synopsis.

Bioinformatics , 25(13):1575–1586, 2009.[42] Omer Gold and Micha Sharir. Improved bounds for 3SUM, K -SUM, and linear degeneracy. CoRR ,abs/1512.05279, 2015.[43] Isaac Goldstein, Tsvi Kopelowitz, Moshe Lewenstein, and Ely Porat. How hard is it to ﬁnd (honest)witnesses? arXiv preprint arXiv:1706.05815 , 2017.1144] Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural network withpruning, trained quantization and huﬀman coding. In Yoshua Bengio and Yann LeCun, editors,

Proc.4th International Conference on Learning Representations, ICLR 2016 , 2016.[45] Danny Hermelin, Gad M Landau, Shir Landau, and Oren Weimann. Uniﬁed compression-based accel-eration of edit-distance computation.

Algorithmica , 65(2):339–353, 2013.[46] Balakrishna R Iyer and David Wilhite. Data compression support in databases. In

VLDB , volume 94,pages 695–704, 1994.[47] Klaus Jansen, Felix Land, and Kati Land. Bounding the running time of algorithms for scheduling andpacking problems.

SIAM J. Discret. Math. , 30(1):343–366, 2016.[48] Artur Je˙z. Approximation of grammar-based compression via recompression.

Theoretical ComputerScience , 592:115–134, 2015.[49] Artur Je˙z. Faster fully compressed pattern matching by recompression.

ACM Transactions on Algo-rithms (TALG) , 11(3):20, 2015.[50] Artur Je˙z. A really simple approximation of smallest grammar.

Theoretical Computer Science , 616:141–150, 2016.[51] Allan Grønlund Jørgensen and Seth Pettie. Threesomes, degenerates, and love triangles. In

Proc. ofthe 55th Annual IEEE Symposium on Foundations of Computer Science (FOCS) , pages 621–630, 2014.[52] Vasileios Karakasis, Theodoros Gkountouvas, Kornilios Kourtis, Georgios Goumas, and NectariosKoziris. An extended compression format for the optimization of sparse matrix-vector multiplication.

IEEE Transactions on Parallel and Distributed Systems , 24(10):1930–1940, 2012.[53] Marek Karpinski, Wojciech Rytter, and Ayumi Shinohara. Pattern-matching for strings with shortdescriptions. In

Proc. Annual Symposium on Combinatorial Pattern Matching (CPM’95) , pages 205–214. Springer, 1995.[54] John C. Kieﬀer and En-Hui Yang. Grammar-based codes: A new class of universal lossless source codes.

IEEE Trans. Inf. Theory , 46(3):737–754, 2000.[55] Tsvi Kopelowitz, Seth Pettie, and Ely Porat. Higher lower bounds from the 3sum conjecture. In

Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms , pages 1272–1287. SIAM, 2016.[56] Kornilios Kourtis, Georgios Goumas, and Nectarios Koziris. Optimizing sparse matrix-vector multipli-cation using index and value compression. In

Proceedings of the 5th conference on Computing frontiers ,pages 87–96, 2008.[57] N Jesper Larsson.

Structures of string matching and data compression . Department of ComputerScience, Lund University, 1999.[58] Abraham Lempel and Jacob Ziv. On the complexity of ﬁnite sequences.

IEEE Transactions on Infor-mation Theory , 22(1):75–81, 1976.[59] Fengan Li, Lingjiao Chen, Arun Kumar, Jeﬀrey F Naughton, Jignesh M Patel, and Xi Wu. Whenlempel-ziv-welch meets machine learning: A case study of accelerating machine learning using coding. arXiv preprint arXiv:1702.06943 , 2017.[60] Yury Lifshits. Processing compressed texts: A tractability border. In Bin Ma and Kaizhong Zhang,editors,

Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM 2007) , volume 4580of

Lecture Notes in Computer Science , pages 228–240. Springer, 2007.[61] Yury Lifshits, Shay Mozes, Oren Weimann, and Michal Ziv-Ukelson. Speeding up hmm decoding andtraining by exploiting sequence repetitions.

Algorithmica , 54(3):379–399, 2009.1262] Andrea Lincoln, Virginia Vassilevska Williams, Joshua R. Wang, and R. Ryan Williams. Deterministictime-space trade-oﬀs for k-sum. In

International Colloquium on Automata, Languages, and Program-ming , pages 58:1–58:14, 2016.[63] Qi Liu, Yu Yang, Chun Chen, Jiajun Bu, Yin Zhang, and Xiuzi Ye. RNACompress: Grammar-basedcompression and informational complexity measurement of RNA secondary structure.

BMC bioinfor-matics , 9(1):176, 2008.[64] Markus Lohrey. Algorithmics on slp-compressed strings: A survey.

Groups Complexity Cryptology ,4(2):241–299, 2012.[65] Alaa Maalouf, Ibrahim Jubran, and Dan Feldman. Fast and accurate least-mean-squares solvers. In

Advances in Neural Information Processing Systems 32: Annual Conference on Neural InformationProcessing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada , pages 8305–8316, 2019.[66] Sebastian Maneth and Fabian Peternek. A survey on methods and systems for graph compression. arXiv preprint arXiv:1504.00616 , 2015.[67] Sebastian Maneth and Fabian Peternek. Grammar-based graph compression.

Information Systems ,76:19–45, 2018.[68] Cameron Musco and David Woodruﬀ. Is input sparsity time possible for kernel low-rank approximation?In

Advances in Neural Information Processing Systems , pages 4435–4445, 2017.[69] Craig G Nevill-Manning and Ian H Witten. Compression and explanation using hierarchical grammars.

The Computer Journal , 40(2 and 3):103–116, 1997.[70] Hristo S. Paskov, Robert West, John C. Mitchell, and Trevor J. Hastie. Compressive feature learning.In Christopher J. C. Burges, L´eon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger, editors,

Advances in Neural Information Processing Systems , pages 2931–2939, 2013.[71] Wojciech Plandowski. Testing equivalence of morphisms on context-free languages.

Proc. 2nd AnnualEuropean Symposium on Algorithms (ESA’94) , pages 460–470, 1994.[72] Mihai Pˇatra¸scu. Towards polynomial lower bounds for dynamic problems. In

Proc. of the 42nd AnnualACM Symposium on Theory Of Computing (STOC) , pages 603–610, 2010.[73] Roberto Radicioni and Alberto Bertoni. Grammatical compression: compressed equivalence and otherproblems.

Discrete Mathematics and Theoretical Computer Science , 12(4):109, 2010.[74] Dhruv Rohatgi. Conditional hardness of earth mover distance. arXiv preprint arXiv:1909.11068 , 2019.[75] Aviad Rubinstein. Hardness of approximate nearest neighbor search. In

Proceedings of the 50th AnnualACM SIGACT Symposium on Theory of Computing , pages 1260–1268, 2018.[76] Wojciech Rytter. Application of Lempel–Ziv factorization to the approximation of grammar-basedcompression.

Theoretical Computer Science , 302(1-3):211–222, 2003.[77] Wojciech Rytter. Grammar compression, lz-encodings, and string algorithms with implicit input. In

Proc. 31st International Colloquium on Automata, Languages, and Programming (ICALP’04) , pages15–27. Springer, 2004.[78] Yousef Saad.

Iterative methods for sparse linear systems , volume 82. siam, 2003.[79] Hiroshi Sakamoto. A fully linear-time approximation algorithm for grammar-based compression.

Journalof Discrete Algorithms , 3(2):416–430, 2005.[80] Hiroshi Sakamoto. Grammar compression: Grammatical inference by compression and its applicationto real data. In

ICGI , pages 3–20, 2014. 1381] D Sculley and Carla E Brodley. Compression and machine learning: A new perspective on feature spacevectors. In

Proc. Data Compression Conference (DCC’06) , pages 332–341, 2006.[82] Yusuxke Shibata, Takuya Kida, Shuichi Fukamachi, Masayuki Takeda, Ayumi Shinohara, Takeshi Shi-nohara, and Setsuo Arikawa. Byte pair encoding: A text compression scheme that accelerates patternmatching. Technical report, Technical Report DOI-TR-161, Department of Informatics, Kyushu Uni-versity, 1999.[83] Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, and Simon J Puglisi. Scalable partial least squares re-gression on grammar-compressed data matrices. In

Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining , pages 1875–1884, 2016.[84] Kedar Tatwawadi, Mikel Hernaez, Idoia Ochoa, and Tsachy Weissman. Gtrac: fast retrieval fromcompressed collections of genomic variants.

Bioinformatics , 32(17):i479–i486, 2016.[85] Virginia Vassilevska and Ryan Williams. Finding, minimizing, and counting weighted subgraphs. In

Proceedings of the forty-ﬁrst annual ACM symposium on Theory of computing , pages 455–464, 2009.[86] Terry A. Welch. A technique for high-performance data compression.

Computer , 6(17):8–19, 1984.[87] Till Westmann, Donald Kossmann, Sven Helmer, and Guido Moerkotte. The implementation andperformance of compressed databases.

ACM Sigmod Record , 29(3):55–67, 2000.[88] Ryan Williams. On the diﬀerence between closest, furthest, and orthogonal pairs: Nearly-linear vsbarely-subquadratic complexity. In

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposiumon Discrete Algorithms , pages 1207–1215. SIAM, 2018.[89] Virginia Vassilevska Williams. On some ﬁne-grained questions in algorithms and complexity. In

Pro-ceedings of the ICM , volume 3, pages 3431–3472. World Scientiﬁc, 2018.[90] Ian H Witten, Alistair Moﬀat, and Timothy C Bell.

Managing gigabytes: compressing and indexingdocuments and images . Morgan Kaufmann, 1999.[91] Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression.

IEEE Trans-actions on Information Theory , 23(3):337–343, 1977.

A Further Preliminaries

For a sequence of vectors v , . . . , v ℓ , we let v v . . . v ℓ “ v ˝ v ˝ ¨ ¨ ¨˝ v ℓ “ (cid:13) ℓi “ v i denote their concatenation.By the following observation, when proving a lower bound for a compression of size Θ p N γ q , the maintask is to prove the upper bound n “ O p N γ q ; the lower bound n “ Ω p N γ q can be ensured mechanically. Observation A.1.

Let ď γ ď . Given two N -dimensional vectors u, v of compressed size O p N γ q , wecan compute two O p N q -dimensional vectors u , v of compressed size Θ p N γ q with the same inner product.Proof. Append 0 N γ using Θ p N γ q additional rules to the encodings of u and v . The Strong k SUM Assumption

To generalize the lower bound of Theorem 1.1 so that it works foran arbitrary relationship between compressed and uncompressed sizes, we will use an assumption about ageneralized version of 3SUM.

Deﬁnition A.2 (The k SUM Problem) . Given k sets A , . . . , A k of m integers in t , . . . , U u , decide if thereare k numbers a P A , . . . , a k P A k such that a ` ¨ ¨ ¨ ` a k ´ “ a k . For all constant k ě k SUM in O p m r k { s q time, and no faster algorithm by m ε factors, for any ε ą

0, is known to date, unless the universe size U issmaller than O p m r k { s ´ ε q . This is because Fast Fourier Transform gives an O p m ` kU log U q time algorithm[27]. It is conjectured that substantially faster algorithms do not exist (e.g. in [4, 2]).14 he Strong k SUM Conjecture.

For all constant k ě k SUMproblem with U “ O p m r k { s q in O p m r k { s ´ ε q time, where ε ą k ě k grows the ratio between the time complexity m k { and theinput size m grows. B Vector Inner Product

In this section, we prove the generalization of the lower bound of Theorem 1.1 to arbitrary relationshipsbetween compressed and uncompressed sizes of the vectors.

Theorem B.1.

Let ă ε ă { . Assuming the Strong k SUM conjecture for all constant k , the innerproduct of two N -dimensional vectors that are grammar-compressed to size n “ Θ p N ε q cannot be computedin O p N { ´ δ q time, where δ ą . This result follows from the following stronger statement.

Theorem B.2.

Let k ě . Assuming the Strong k SUM conjecture, the inner product of two N -dimensionalvectors that are grammar-compressed to size n “ Θ p N { r k ´ s q cannot be computed in O p N p { ` γ k q´ δ q time,where δ ą and γ k : “ p k ´ q , if k is odd, k ´ , if k is even. Observe that the above statement implies Theorem B.1: For any 0 ă ε ă {

3, we choose k suﬃcientlylarge such that 1 { r k ´ s ă ε . Then using Observation A.1, we obtain that any O p N { ´ δ q -time algorithmfor Vector Inner Product with compressed size n “ Θ p N ε q would give an O p N { ` γ k ´ δ q -time algorithm forVector Inner Product with compressed size O p N { r k ´ s q “ O p N ε q , where δ “ γ k ` δ – this would refutethe Strong k SUM conjecture by Theorem B.2.Furthermore, observe that if we set k “

3, we obtain a ˜Ω p N { q lower bound for compressed size n “ Θ p N { q under the Strong 3SUM conjecture.In the remainder of this section, we give the proof of Theorem B.2. The central construction is capturedby the following lemma. Lemma B.3.

Given sets A , . . . , A k of integers in t , . . . , U u , we deﬁne v A `¨¨¨` A k ´ : “ (cid:13) p a ,...,a k ´ qP A ˆ¨¨¨ˆ A k ´ in lexicographic order a `¨¨¨` a k ´ v A k ´ p k ´ q U ´ a ´¨¨¨´ a k ´ ,v A k : “ p v A k p k ´ q U q m k ´ , where v A k ´ , v A k P t , u U denote the characteristic vectors of the sets A k ´ , A k . We have the followingproperties:1. The inner product of the m k ´ p k ´ q U -dimensional vectors v A `¨¨¨` A k ´ and v A k is nonzero if andonly if there is a tuple p a , . . . , a k q P A ˆ ¨ ¨ ¨ ˆ A k with a ` ¨ ¨ ¨ ` a k ´ “ a k .2. We can compute compressions of v A `¨¨¨` A k ´ , v A k of size O p km log U q “ O p m log U q in time O p m log U q .Proof. For 1., observe that by construction, v A `¨¨¨` A k and v A k consist of m k ´ blocks, indexed by p a , . . . , a k ´ q P A ˆ ¨ ¨ ¨ˆ A k ´ and consisting of the sequence 0 a `¨¨¨` a k ´ v A k ´ p k ´ q U ´ a ´¨¨¨´ a k ´ and v A k p k ´ q U of length p k ´ q U , respectively. In particular, in block p a , . . . , a k ´ q there is a common 1-entry t if and only if t “ p a ` a ` ¨ ¨ ¨ ` a k ´ q ` a for some a P A k ´ and t “ a for some a P A k . Thus, there exists a common1-entry in v A `¨¨¨` A k ´ and v A k if and only if there are p a , . . . , a k q P A ˆ ¨ ¨ ¨ ˆ A k with a ` ¨ ¨ ¨ ` a k ´ “ a k .For 2., we ﬁrst recall that as shown in the proof of Theorem 1.1, we can compute a compression of thecharacteristic vectors v A k ´ and v A k of size O p m log U q in time O p m log U q . Thus, using Proposition 2.1, we15an compute a compression of v A k “ p v A k p k ´ q U q m k ´ of size O p m log U q` O p log pp k ´ q U qq` O p log m k ´ q “ O p m log U q in time O p m log U q . To show the claim for v A `¨¨¨` A k ´ , we proceed inductively and constructthe strings v A k ´ : “ v A k ´ and v A i `¨¨¨` A k ´ : “ (cid:13) p a i ,...,a k ´ qP A i ˆ¨¨¨ˆ A k ´ in lexicographic order a i `¨¨¨` a k ´ v A k ´ p k ´ ´ i q U ´ a i ´¨¨¨´ a k ´ , for i “ k ´ , . . . ,

1. The central observation is that we can write A i “ t a p i q , . . . , a p i q m u with a p i q ă a p i q ă ¨ ¨ ¨ ă a p i q m and obtain v A i `¨¨¨` A k ´ “ (cid:13) mj “ a p i q j v A i ` `¨¨¨` A k ´ U ´ a p i q j . Thus, given an SLP G i ` for v A i ` `¨¨¨` A k ´ with starting symbol S i ` , we can give an SLP G i for v A i `¨¨¨` A k ´ of size | G i ` | ` O p m log U q as follows: For each j “ , . . . , m , we encode 0 a p i q j using O p log a p i q j q “ O p log U q additional symbols, re-use S i ` to generate v A i ` `¨¨¨` A k ´ , and encode 0 U ´ a p i q j using O p log p U ´ a p i q j qq “ O p log U q additional symbols. Observe that we can obtain this compression in time O p m log U q .Thus, starting from an SLP for v A k ´ , after k ´ G for v A `¨¨¨` A k ´ of size O p km log U q “ O p m log U q . The running time of this construction is O p km log U q “ O p m log U q , concludingthe proof.Let A , . . . , A k Ď t , . . . , U u be a Strong k SUM instance, i.e., U “ O p m r k { s q . The reduction given inLemma B.3 gives two vectors v, v of dimension m k ´ ¨ p k ´ q U such that their inner product allows us todecide the k SUM instance. Furthermore, the vectors have a compressed size of O p m log U q .We slightly adapt v, v by appending 0’s to increase the dimension slightly to N “ m k ´ ¨p k ´ q U log r p k ´ q{ s U (this does not change their inner product). We verify the following facts: (1) an O p N { ` γ k ´ δ q -time VectorInner Product algorithm for some δ ą k SUM conjecture and (2) n “ O p N { r k ´ s q .Using Observation A.1, this concludes the proof of Theorem B.2.For (1), consider ﬁrst the case that k is odd. Then U “ O p m p k ` q{ q and N “ O p m k ´ U polylog U q “ O p m p k ´ q{ polylog m q . Observe that N { ` γ k ´ δ “ O p m p k ´ q ¨p ` p k ´ q ´ δ q polylog m q“ O p m k ´ ` ´ p k ´ q δ q “ O p m r k s ´ δ q , for any 0 ă δ ă p k ´ q δ { k , we have U “ O p m k { q and N “ O p m k ´ U polylog U q “ O p m p k ´ q{ polylog m q .Using 1 { ` γ k “ { ` {p k ´ q “ k {p k ´ q , we obtain that N { ` γ k ´ δ “ O p m k ´ ¨p k k ´ ´ δ q polylog m q “ O p m k ´ δ q , for any 0 ă δ ă p k ´ q δ {

2. Thus, in both cases, an O p N { ` γ k ´ δ q -time Vector Inner Product algorithmrefutes the Strong k SUM conjecture by solving the given k SUM instance in time O p m r k { s ´ δ q with δ ą N “ O p m k ´ U log r p k ´ q{ s U q “ O p m r p k ´ q{ s log r p k ´ q{ s m q . Thus n “ O p m log m q “ O p N { r p k ´ q{ s q , as desired. C Matrix-Vector Product

In this section we provide the full proof of Theorem 1.2. We ﬁrst prove a self-reduction for 3SUM as a centraltool (using standard techniques), and then proceed to give the ﬁnal reduction.

C.1 Proof of the Self-Reduction

Let us restate Lemma 4.1. 16 emma C.1 (Self-Reduction for 3SUM) . Let ď s “ s p m q ď m and ε ą be arbitrary. If there is analgorithm that, given a target t and L “ O pp m { s q q sets A ℓ , B ℓ , C ℓ of s integers in t , . . . , O p s log s qu ,determines for all ď ℓ ď L whether there are a P A ℓ , b P B ℓ , c P C ℓ with a ` b ` c “ t in total time O p m ´ ǫ q ,then the 3SUM conjecture is false. In the remainder of this section, we give the proof.Let

A, B, C be sets of m integers in t , . . . , U u . We use a couple of results from earlier work that arestated for the following 3SUM formulation: given three sets A , B , C of m integers in t´ U, . . . , U u with U “ O p m log m q , we are asked to determine whether there are a P A , b P B , c P C such that a ` b ` c “ A : “ A, B : “ B, and C : “ ´ C “ t´ c | c P C u .We can now use the following known self-reduction for 3SUM. Lemma C.2 (Reformulated from [62, Theorem 13]) . Let s : “ s p m q with ď s ď m . Given three sets A , B , C of m integers in t´ U, . . . , U u , we can compute, in time O p m { s q , a list of L “ O pp m { s q q A ℓ , B ℓ , C ℓ with ď ℓ ď L , such that there is an a P A , b P B , c P C with a ` b ` c “ ifand only if there is an instance ď ℓ ď L and a triple a P A ℓ , b P B ℓ , c P C ℓ with a ` b ` c “ . Furthermore,each A ℓ , B ℓ , C ℓ is a subset of s integers of A , B , C , respectively.Proof sketch. We give the high-level arguments (for details, see the proof of Theorem 13 in [62]). For a set S , let min S and max S denote the smallest and largest element in S , respectively. We sort A , B , C andsplit each array into r m { s s consecutive parts A , . . . , A r m { s s , B , . . . , B r m { s s , C , . . . , C r m { s s , each of at most s elements, such that max A i ă min A i ` ,max B i ă min B i ` and max C i ă min C i ` for all i . Instead ofsearching for a 3SUM triple a P A i , b P B j , c P C k for each 1 ď i, j, k ď r m { s s (i.e., Θ pp m { s q q subproblemswith s elements each), one observes that most subproblems can be trivially solved: We say that a subproblem p i, j, k q is trivial, if min A i ` min B j ` min C k ą A i ` max B j ` max C k ă

0; these subproblemscannot contain a solution. The key insight is that there are at most O pp m { s q q non-trivial subproblems (whichfollows since the domination partial ordering on t , . . . , u u has at most O p u q incomparable elements); thesecan be determined in time O pp m { s q q . Thus, it suﬃces to list all O pp m { s q q non-trivial subproblems with s integers in each set in time O p m { s q .The resulting instances A ℓ , B ℓ , C ℓ consist of integers in t´ U, . . . , U u with large universe size U “ O p m log m q . We reduce the universe size to O p s log s q using a folklore technique (a slightly strongerresult with U “ O p s q can be achieved using the techniques of [15]). To prepare notation, for any set S , welet S mod p : “ t s mod p | s P S u . Lemma C.3 (Adaptation of [5, Lemma B.1]) . There is some α such that U : “ αs log s log U satisﬁes thefollowing property: Let A, B, C be sets of s integers in t´ U, . . . , U u such that no a P A, b P B, c P C satisﬁes a ` b ` c “ . Let p be a prime chosen uniformly at random from t , . . . , U u . Then the probability that thereare a p P A mod p, b p P B mod p, c p P C mod p with a p ` b p ` c p ” p mod p q is at most { .Proof. Let a P A, b P B, c P C be arbitrary. Since a ` b ` c ‰

0, note that p a mod p q`p b mod p q`p c mod p q ” p mod p q if and only if p divides a ` b ` c . Since a ` b ` c P t´ U, . . . , U u , a ` b ` c has at most log p U q prime factors. Let P denote the number of prime numbers in t , . . . , U u ; by the prime number theoremwe can choose α large enough such that P ě s log p U q . Thus, the probability that p was chosen amongthese at most log p U q prime factors is at most log p U q{ P ď {p s q . Thus, by a union bound over all s triples a P A, b P B, c P C , the probability that there are a p P A mod p, b p P B mod p, c p P C mod p with a ` b ` c ” p mod p q is at most 1 { A, B, C contain a triple a, b, c with a ` b ` c “

0, then also A mod p, B mod p, C mod p contain a triple a p , b p , c p with a p ` b p ` c p ” p mod p q for any p .We can ﬁnally prove Lemma C.1: Assume that there is an algorithm A that given a target t and L “ O pp m { s q q instances A ℓ , B ℓ , C ℓ , ď ℓ ď L of s integers in t , . . . , U u , determines for all 1 ď ℓ ď L whether there are a P A ℓ , b P B ℓ , c P C ℓ with a ` b ` c “ t in total time O p m ´ ε q with ε ą

0. Observe thatsince A runs in time O p m ´ ε q , we must have s “ Ω p m ε q , since otherwise already the size of the input to A of Θ p m { s q would be ω p m ´ ε q . Thus, we have U “ O p s log s q .For r “ , . . . , γ log m many repetitions, we do the following: We choose a random prime p r P r , U s and obtain ℓ instances in t , . . . , p r ´ u Ď t , . . . , U u by taking the sets modulo p r , i.e., A p r q ℓ : “ A ℓ mod p r , p r q ℓ : “ B ℓ mod p r , and C p r q ℓ “ C ℓ mod p r . Observe that we may determine whether there is some a P A p r q ℓ , b P B p r q ℓ , c P C p r q ℓ with a ` b ` c ” p mod p r q by testing for each t P t , p r , p r u , whether there a P A p r q ℓ , b P B p r q ℓ , c P C p r q ℓ with a ` b ` c “ t . Thus, to do this, and additionally ensure that each integeris in t , . . . , U u , we add 1 to each integer in A p r q ℓ , B p r q ℓ , C p r q ℓ and for each λ P t , , u , call A on the sets A p r q ℓ , B p r q ℓ , C p r q ℓ , ď ℓ ď L with common target t λ : “ ` λp r .Observe that after these 3 γ log m calls to A , we know for each 1 ď ℓ ď L and 1 ď r ď γ log m whetherthere are a P A ℓ , b P B ℓ , c P C ℓ with a ` b ` c ” p mod p r q . We declare our original 3SUM instance A, B, C to be a YES instance if and only if there is some ℓ such that for all r we have found a witness a P A ℓ , b P B ℓ , c P C ℓ with a ` b ` c ” p mod p r q . Note that if A, B, C is a YES instance, we always returnYES by Lemma C.2. Otherwise, if

A, B, C is a NO instance, consider a ﬁxed ℓ. By Lemmas C.2 and C.3,the probability that for all r , we ﬁnd a P A ℓ , b P B ℓ , c P C ℓ with a ` b ` c ” p mod p r q is bounded by2 ´ γ log m “ m ´ γ . Thus, by a union bound over all ℓ , the probability that we incorrectly return YES in thiscase is at most Lm ´ γ “ O pp m { s q m ´ γ q “ O p m ´ γ q . We can make this error probability polynomially smallby choosing γ ą O p log m q times the running time of A (note thatthe running time used for Lemma C.2 is linear in its output size, which is the input size of A and thusdominated by the running time of A ). Thus, we can solve any 3SUM instance in time O p m ´ ε log m q , whichwould refute the 3SUM conjecture. This concludes the proof of Lemma C.1. C.2 Main Reduction for Matrix-Vector Multiplication

We now turn to the proof of Theorem 1.2.

Proof.

Let s be a parameter to be chosen later. By Lemma 4.1, it suﬃces to solve L “ O pp m { s q q A ℓ , B ℓ , C ℓ consisting of s integers in t , . . . , U u , U “ O p s log s q with common target 1 ď t ď U in time O p m ´ ǫ q for some ǫ ą p L ˆ s U q matrix M and v P t , u s U as follows. Intuitively, each row M ℓ and thevector v are partitioned into s blocks of size 3 U . Each block is indexed by p i, j q with i, j P t , . . . , s u inlexicographic order and the block of M ℓ corresponding to p i, j q encodes the characteristic vector of the set a i ` b j ` C ℓ “ t a i ` b j ` c | c P C ℓ u Ď t , . . . , U u , where a i is the i -th integer in A ℓ and b j is the j -thinteger in B ℓ . Correspondingly, every block p i, j q in v encodes the characteristic vector of the singleton set t t u Ď t , . . . , U u . Thus, there is a position in block p i, j q in which both M ℓ and v have a 1 if and only ifthere is a c P C ℓ such that a i ` b j ` c “ t .Formally, for any 1 ď ℓ ď L , we write A ℓ “ t a ℓ , . . . , a ℓs u , B ℓ “ t b ℓ , . . . , b ℓs u and deﬁne M ℓ : “ a ` b v C ℓ U ´ a ´ a looooooooooomooooooooooon v aℓ ` bℓ ` Cℓ . . . a i ` b j v C ℓ U ´ a i ´ b j looooooooooomooooooooooon v aℓi ` bℓj ` Cℓ . . . a s ` b s v C ℓ U ´ a s ´ b s looooooooooomooooooooooon v aℓs ` bℓs ` Cℓ ,v : “ t ´ U ´ t . . . t ´ U ´ t . . . t ´ U ´ t , where v C ℓ P t , u U denotes the characteristic vector of C ℓ . By this structure, it is clear that M ℓ v ě a P A ℓ , b P B ℓ , c P C ℓ with a ` b ` c “ t .We will show that each row M ℓ can be compressed to size Θ p s log s q (as opposed to its RLE of lengthΘ p s log s q ). We thus will set N “ r s U log s s “ Θ p s log s q , and append 0 N ´ s U to each row M ℓ and v ,so that we obtain an L ˆ N matrix M and N -dimensional vector v whose product M v can be used to solveall instances A ℓ , B ℓ , C ℓ in linear time. Observe that each row has a compression of size Θ p N { q “ Θ p s log s q ,as desired. Since L “ O pp m { s q q and N ě s , we can set s “ Θ p m { q such that L ď N (we can indeed make L “ N by introducing zero rows, if necessary). Thus, an O p N n ´ ǫ q -time algorithm for multiplying M and v would solve all L O p N n ´ ǫ q “ O pp m { s q p s log s q ´ ǫ q “ O pp m { s ǫ q polylog s q “ O p m ´ ǫ polylog m q , which would refute the 3SUM conjecture.Analogous to the proof of Theorems 1.1 and B.2, we can compute a compression of size Θ p s log s q intime O p s log s q . Indeed, for each M ℓ , this already follows from Lemma B.3 when setting A : “ A ℓ , A : “ ℓ , A : “ C ℓ , which shows how to compress the string v A ` A ` A “ M ℓ to size O p s log U q “ O p s log s q intime O p s log U q “ O p s log s q . For v , we simply apply Proposition 2.1 to the straightforward compression of0 t ´ U ´ t to size O p log U q , which leads to a compression of v of size O p log U ` log s q “ O p log s q . UsingObservation A.1, we can make all encodings have size Θ p s log s q , which concludes the proof. D Matrix-Matrix Product

In this section, we give the full proof of Theorem 5.1.

Proof of Theorem 5.1.

Let ℓ P N . We ﬁrst deﬁne the matrices A , B where A is a p ℓ ˆ ℓ q matrix with rowsindexed by strings x P t , u ℓ in lexicographic order, and B is a p ℓ ˆ ℓ p ℓ qq matrix with columns indexedby p y, k q P t , u ℓ ˆ t , . . . , ℓ u in lexicographic order. For arbitrary z P t , u ℓ , let diag p z q denote the ℓ ˆ ℓ diagonal matrix with z on the diagonal. We deﬁne A x : “ p x | ℓ q , B y, q ,..., p y, ℓ q : “ ˆ diag p ℓ q

00 diag p y q ˙ . Let C “ A B be the p ℓ ˆ ℓ p ℓ qq product matrix of A and B , with rows and columns indexed by t , u ℓ and t , u ℓ ˆ t , . . . , ℓ u , respectively. Observe that by deﬁnition, p C x, p y, q , . . . , C x, p y, ℓ q q “ p x | y q forany x, y P t , u ℓ . In particular, when we view C as a 2 ℓ p ℓ q -length string, it contains all strings in t , u ℓ as substrings, thus by Lemma 5.2, any row-wise compression is of size at least 2 ℓ {p ℓ q .To also ensure column-wise incompressibility, we slightly extend the construction by analogous transposedconstructions: We let N : “ ℓ p ℓ ` q and deﬁne the ﬁnal p N ˆ N q matrices A, B as follows: A : “ ˆ A B T ˙ , B : “ ¨˝ B A T ˛‚ . Since C : “ AB “ ˆ A B p A B q T ˙ contains all length- p ℓ q strings as substrings of the rows (in the A B part) and as substrings of the columns (in the p A B q T part), any strong compression of C is of size at least2 ℓ {p ℓ q “ Ω p N { log N q , proving the third part of the claim.For the ﬁrst two parts, it remains to show that A and B can be well compressed: For the convenientcompression, we observe that any row in A is either of the form p x ℓ | ℓ | N ´ ℓ q , which has a RLEof length at most | x ℓ | ` O p log N q “ O p log N q , or it is of the form p ℓ | i ´ α ℓ ´ i | N ´ ℓ q for some α P t , u , i P t , ..., ℓ u , which also has a RLE of length at most O p log N q . Thus, each of the N rows of A can be compressed to size O p log N q , as desired. By a symmetric statement, also each column of B has aRLE of size O p log N q .Finally, for the strong compression, we show that we compress A T when viewed as a string, i.e., wecompress the concatenation of the columns of A . The main insight is the following: Imagine a binary ℓ -bit counter. Using grammar compression, we can compress the sequence of values of any ﬁxed bit whilethe counter counts from 0 to 2 ℓ ´ O p ℓ q . Formally, let G , G be grammar compressions of strings s , s . For any 1 ď i ď ℓ , we can encode p s ℓ ´ i s ℓ ´ i q i ´ using only O p ℓ q additional non-terminals inthe canonical way. Speciﬁcally, using O p ℓ ´ i q new symbols, we may encode s ℓ ´ i s ℓ ´ i ; let ˜ S denote thecorresponding non-terminal. We then encode ˜ S i ´ using O p i q additional new symbols. In total, we onlyneed O pp ℓ ´ i q ` i q “ O p ℓ q additional symbols, as desired.We apply the above idea to encode the concatenation all columns of A as follows: Consider column i . • For 1 ď i ď ℓ , then by the chosen lexicographic order of the row indices x P t , u ℓ of A , note that the i -th column of A is of the form p ℓ ´ i ℓ ´ i q i ´ | N ´ ℓ . Using the above analysis, we can compress itto size O p ℓ q ` O p log N q “ O p log N q . • If ℓ ` ď i ď ℓ , the i -th column is of the form 1 ℓ | N ´ ℓ , which we can compress to size O p log ℓ ` log N q “ O p log N q . 19 If 2 ℓ ` ď i ď ℓ , write i “ ℓ ` i and observe that the i -th column of A is of the form 0 ℓ |p i ´ ℓ ´ i q ℓ . Using O p ℓ q non-terminals to encode 0 i ´ ℓ ´ i , it is immediate that we can compressthe complete column using O p ℓ q additional non-terminals, i.e., yielding a total of O p ℓ q “ O p log N q . • If 3 ℓ ` ď i ď ℓ, write i “ ℓ ` i and observe that by the chosen lexicographic order of the columnindices p y, k q P t , u ℓ ˆ t , . . . , ℓ u of B , the i -th column of A is of the form 0 ℓ | p s ℓ ´ i s ℓ ´ i q i where s α : “ i ´ α ℓ ´ i . We can give trivial grammars of size O p ℓ q for s , s . Then, by the aboveanalysis, we only need O p ℓ q additional non-terminals for the counter-like part. In total, we only need O p ℓ q “ O p log N q non-terminals to encode the i -th column. • Finally, observe that the remaining columns i “ ℓ ` , . . . , N consist of p N ´ ℓ q N zeroes, which wecan encode together using only O p log N q non-terminals.In summary, we can encode the ﬁrst 4 ℓ columns using O p log N q non-terminals each, and only O p log N q non-terminals for the remaining columns, so we can fully compress the concatenation of A ’s columns to size O p log N qq