Algebraic Branching Programs, Border Complexity, and Tangent Spaces
Markus Bläser, Christian Ikenmeyer, Meena Mahajan, Anurag Pandey, Nitin Saurabh
aa r X i v : . [ c s . CC ] M a r Algebraic Branching Programs, border complexity, and tangent spaces
Markus Bl¨aser ∗ , Christian Ikenmeyer † , Meena Mahajan ‡ , Anurag Pandey § , Nitin Saurabh ¶ March 11, 2020
Abstract
Nisan showed in 1991 that the width of a smallest noncommutative single-(source,sink) alge-braic branching program (ABP) to compute a noncommutative polynomial is given by the ranksof specific matrices. This means that the set of noncommutative polynomials with ABP widthcomplexity at most k is Zariski-closed, an important property in geometric complexity theory.It follows that approximations cannot help to reduce the required ABP width.It was mentioned by Forbes that this result would probably break when going from single-(source,sink) ABPs to trace ABPs. We prove that this is correct. Moreover, we study thecommutative monotone setting and prove a result similar to Nisan, but concerning the analyticclosure. We observe the same behavior here: The set of polynomials with ABP width complexityat most k is closed for single-(source,sink) ABPs and not closed for trace ABPs. The proofs revealan intriguing connection between tangent spaces and the vector space of flows on the ABP. Weclose with additional observations on VQP and the closure of VNP which allows us to establisha separation between the two classes. ∗ Department of Computer Science, Saarland University, Saarland Informatics Campus, Saarbr¨ucken, Germany † University of Liverpool. Part of this research was done when CI was at the Max Planck Institute for SoftwareSystems, Saarbr¨ucken, Germany. CI was supported by DFG grant IK 116/2-1 ‡ The Institute of Mathematical Sciences, HBNI, Chennai, India § Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbr¨ucken, Germany ¶ Technion-IIT, Haifa, Israel. Part of this work was done when the author was at the Max Planck Institute forInformatics, Saarbr¨ucken, Germany
Introduction and Results
Algebraic branching programs (ABPs) are an elegant model of computation that is widely studiedin algebraic complexity theory (see e.g. [BOC88, Tod92, MV97, MP08, AW16, AFS +
16, KNST18,Kum19, FMST19]) and is a focus of study in geometric complexity theory [Lan15, Ges16, GIP17].An ABP is a layered directed graph with d + 1 layers of vertices (edges only go from layers i to i + 1)such that the first and last layer have exactly the same number of vertices, so that each vertex in thefirst layer has exactly one so-called corresponding vertex in the last layer. One interesting classicalcase is when the first and last layer have exactly one vertex, which is usually studied in theoreticalcomputer science. We call this the single-(source,sink) model . Among algebraic geometers workingon ABPs it is common to not impose restrictions on the number of vertices in the first and lastlayer [Lan15, Ges16, Lan17]. We call this the trace model . Every edge in an ABP is labeled by ahomogeneous linear form. If we denote by ℓ ( e ) the homogeneous linear form of edge e , then we saythat the ABP computes P p Q e ∈ p ℓ ( e ), where the sum is over all paths that start in the first layerand end in the last layer at the vertex corresponding to the start vertex.The width of an ABP is the number of vertices in its largest layer. We denote by w ( f ) the minimalwidth required to compute f in the trace model and we call w ( f ) the trace ABP width complexity of f . We denote by w ( f ) the minimal width required to compute f in the single-(source,sink)model and we call w ( f ) the single-(source,sink) ABP width complexity of f .The complexity class VBP is defined as the set of sequences of polynomials ( f m ) for whichthe sequence w ( f m ) is polynomially bounded. Let per m := P π ∈ S m Q mi =1 x i,π ( i ) be the permanentpolynomial. Valiant’s famous VBP = VNP conjecture can concisely be stated as “The sequenceof natural numbers (cid:0) w (per m ) (cid:1) m is not polynomially bounded.” Alternatively, this is phrased with w or other polynomially related complexity measures in a completely analogous way. In geometriccomplexity theory (GCT), one searches for lower bounds on algebraic complexity measures over C such as w and w for explicit polynomials such as the permanent. All lower bounds methods inGCT and most lower bounds methods in algebraic complexity theory are continuous , which meansthat if f ε is a curve of polynomials with lim ε → f ε = f (coefficient-wise limit) and w ( f ε ) ≤ w , thenthese methods cannot be used to prove w ( f ) > w . This is usually phrased in terms of so-called border complexity (see e.g. [BLMW11, Lan15]): The border trace ABP width complexity w ( f ) is thesmallest w such that f can be approximated arbitrarily closely by polynomials f ε with w ( f ε ) ≤ w .Analogously, we define the border single-(source,sink) ABP width complexity w ( f ) as the smallest w such that f can be approximated arbitrarily closely by polynomials f ε with w ( f ε ) ≤ w . Analogouslyto VBP we define VBP as the set of sequences of polynomials whose ( w ( f m )) is polynomiallybounded. Clearly VBP ⊆ VBP. Mulmuley and Sohoni [MS01, MS08, BLMW11] (see also [B¨ur01] fora related conjecture) conjectured a strengthening of Valiant’s conjecture, namely that VNP VBP.In principle it could be that w ( f ) < w ( f ); the gap could even be superpolynomial, which wouldmean that VBP ( VBP. If VBP = VBP, then Valiant’s conjecture is the same as the Mulmuley-Sohoni conjecture, which would mean that if VBP = VNP, then continuous lower bounds methodsexist that show this separation.Border complexity is an old area of study in algebraic geometry. In theoretical computer scienceit was introduced by Bini et al. [BCRL79], where [Bin80] proves that in the study of fast matrixmultiplication, the gap between complexity and border complexity is not too large. The study ofthe gap between complexity and border complexity of algebraic complexity measures in generalhas started recently [GMQ16, BIZ18, Kum18] as an approach to understand if strong algebraiccomplexity lower bounds can be obtained from continuous methods.In this paper we study two very different settings of ABPs: The noncommutative and themonotone setting. To capture commutative, noncommutative, and monotone computation, let R
1e a graded semiring with homogeneous components R d . In our case the settings for R d are • R d = F [ x , . . . , x m ] d the set of homogeneous degree d polynomials in m variables over a field F , • R d = F h x , . . . , x m i d the set of homogeneous degree d polynomials in m noncommuting vari-ables over a field F , • R d = R + [ x , . . . , x m ] d the set of homogeneous degree d polynomials in m variables withnonnegative coefficients.As it is common in the theoretical computer science literature, we call elements of R d polynomials .Note that F h x , . . . , x m i d is naturally isomorphic to the d -th tensor power of F m , so tensor wouldbe the better name. We hope that no confusion arises when in the later sections (where we useconcepts from multilinear algebra) we use the tensor language. In the homogeneous setting , all ABPedge labels are in R , and hence the polynomial that is computed is in R d . In the affine setting , allABP edge labels are in R + R , and hence the polynomial that is computed is in L d ′ ≤ d R d ′ . Noncommutative ABPs
Let R d = F h x , . . . , x m i d and consider the homogeneous setting. We write ncw instead of w and ncw instead of w to highlight that we are in the noncommutative setting. Nisan [Nis91] proved: Let M i denote the n i × n d − i matrix whose entry at position (( k , . . . , k i ) , ( k i +1 , . . . , k d )) is the coefficient of the monomial x k x k · · · x k d in f . Then ev-ery single-(source,sink) ABP computing f has at least rk ( M i ) many vertices in layer i . Conversely,there exists a single-(source,sink) ABP computing f with exactly rk ( M i ) many vertices in layer i . Nisan used this formulation to prove strong complexity lower bounds for the noncommutativedeterminant and permanent. Forbes [For16] remarked that Theorem 1.1 implies that for fixed w the set { f | ncw ( f ) ≤ w } is Zariski-closed (1.2)and hence that ncw ( f ) = ncw ( f ) for all f . (1.3)Proving a similar result (even up to polynomial blowups) in the commutative setting would bespectacular: It would imply VBP = VBP and hence that Valiant’s conjecture is the same as theMulmuley-Sohoni conjecture. By a general principle, for all standard algebraic complexity measures,over C we have that the Zariski-closure of a set of polynomials of complexity at most w equals theEuclidean closure [Mum95, § f with ncw ( f ) < ncw ( f ) . (1.4)The proof is given in Sections 5–8. It is a surprisingly subtle application of differential geometry(inspired by [HL16]) and interprets tangent spaces to certain varieties as vector spaces of flows onan ABP digraph.The gap between ncw ( f ) and ncw ( f ) can never be very large though:2 cw ( f ) ≤ ncw ( f ) ≤ ncw ( f ) (1.3) = ncw ( f ) ≤ (cid:0) ncw ( f ) (cid:1) for all f . (1.5)It is worth noting that for our separating polynomial f , the gap is even less; ncw ( f ) < ncw ( f ) ≤ ncw ( f ). This is the first algebraic model of computation where complexity and bor-der complexity differ, but their gap is known to be polynomially bounded! For most models ofcomputation almost nothing is known about the gap between complexity and border complexity.For commutative width 2 affine ABPs the gap is even as large as between computable and non-computable [BIZ18]! Monotone ABPs
Let R d = R + [ x , . . . , x m ] d and consider the affine or homogeneous setting.Since R is not algebraically closed, we switch to a more algebraic definition of approximation. Let R [ ε, ε − ] + denote the ring of Laurent polynomials that are nonnegative for all sufficiently small ε > R [ ε, ε − ] + can have a pole at ε = 0 of arbitrarily high order. We define mw ( f )to be the smallest w such that there exists a polynomial f ′ over the ring R [ ε, ε − ] + such that • there exists a width w ABP over R [ ε, ε − ] + that computes f ′ , • no coefficient in f ′ contains an ε with negative exponent, and setting ε to 0 in f ′ yields f , i.e., f ′ ε =0 = f .We prove a result that is comparable to (1.3), but uses a very different proof technique: mw ( f ) = mw ( f ) for all f . (1.6)In terms of complexity classes this can concisely be written asMVBP = MVBP R . Our proof also works if the ABP is not layered and the labels are affine.Intuitively, in this monotone setting, one would think that approximations do not help, becausethere cannot be cancellations. But quite surprisingly the same construction as in (1.4) can be usedto find f such that mw ( f ) < mw ( f ) . (1.7)By the same reasoning as in (1.5), we obtain mw ( f ) ≤ mw ( f ) ≤ (cid:0) mw ( f ) (cid:1) for all f . (1.8)This gives a natural monotone model of computation where approximations speed up the compu-tation. Again, the gap is polynomially bounded! Given a trace ABP Γ computing f and a pair of corresponding start and end vertices, we can extract a single-(source,sink) ABP by deleting all other start and end vertices. If we do this for each pair of start and end vertices,and if we then idenfity all start vertices to a single start vertex, and also all end vertex to a single end vertex, thenwe obtain a single-(source,sink) ABP computing f . The width has grown by a factor of w , where w is the number ofstart vertices in Γ. eparating VQP from
VNP
B¨urgisser in his monograph [B¨ur00] defined the complexity class VQP as the class of polynomialswith quasi-polynomially bounded straight-line programs, and established its relation to the classesVP and VNP (see Section 9 for definitions). He showed that the determinant polynomial is VQP-complete with respect to the so-called qp -projections (see [B¨ur00], Corollary 2.29). He strengthenedValiant’s hypothesis of VNP VP to VNP VQP and called it
Valiant’s extended hypothesis (see [B¨ur00], section 2.5). He further showed that VP is strictly contained in VQP as one wouldintuitively expect (see [B¨ur00], section 8.2). Finally, he also showed that VQP is not contained inVNP (see [B¨ur00], Proposition 8.5 and Corollary 8.9). In this article, we observe that his proof isstronger and actually shows that VQP is not contained in VNP either, where VNP is the closureof the complexity class VNP in the sense as mentioned above.
Structure of the paper
In Section 4 we prove (1.6). Sections 5 to 8 are dedicated to proving (1.4) and (1.7) via a newconnection between tangent spaces and flow vector spaces. In Section 9, we discuss the separationbetween VQP and VNP.
Grenet [Gre11] showed that mw (per m ) ≤ (cid:0) m ⌈ m/ ⌉ (cid:1) by an explicit construction of a monotone single-(source,sink) ABP. Even though the construction is monotone, its size is optimal for m = 3 [ABV15](for 4 this is already unknown). The noncommutative version of this setting has been studied in[FMST19]. [Yeh19] recently showed that the monotone circuit classes MVP and MVNP are different.We refer the reader to [Yeh19] and [Sri19] and the references therein to get more information aboutmonotone algebraic models of computation and their long history.[HL16] present a method that can be used to show that a complexity measure and its bordervariant are not the same. They used it to prove that an explicit polynomial has border determinantalcomplexity 3, but higher determinantal complexity. We use their ideas as a starting point inSection 5 and the later sections. For a homogeneous degree d ABP Γ, we denote by V the set of vertices of Γ and by V i the set ofvertices in layer i , 1 ≤ i ≤ d + 1. We choose an explicit bijection between the sets V and V d +1 , sothat each vertex v in V has exactly one corresponding vertex corr ( v ) in V d +1 . We denote by E i the set of edges from V i to V i +1 . Let E denote the union of all E i .There is a classical interpretation in terms of iterated matrix multiplication: Fix some arbitraryordering of the vertices within each layer, such that the i -th vertex in V corresponds to the i -thvertex in V d +1 . For 1 ≤ k ≤ d let M k be the | V k | × | V k +1 | matrix whose entry at position ( i, j ) in M k is the label from the i -th vertex in V k to the j -th vertex in V k +1 . Then Γ computes the trace X ≤ k ≤| V | ≤ k ≤| V | ... ≤ k d ≤| V d | ( M ) k ,k ( M ) k ,k · · · ( M d − ) k d − ,k d ( M d ) k d ,k = tr (cid:0) M M · · · M d (cid:1) . (3.1)4ence the name trace model . In the single-(source,sink) model, the trace is taken of a 1 × For fixed w ∈ N we study the set { f ∈ R + [ x , . . . , x n ] d | mw ( f ) ≤ w } . (4.1)We first start with the simple observation that it is not Zariski-closed. { f ∈ R + [ x , . . . , x n ] d | mw ( f ) ≤ w } is not Zariski-closed.Proof. An analogous statement is true for all natural algebraic complexity measures. Note that ahomogeneous degree d single-(source,sink) width w ABP has 2 w + w ( d −
2) many edges. The labelon each edge is a linear form in n variables, so such an ABP is determined by N := n (2 w + w ( d − F : C N → C [ x , . . . , x n ] d be the map that maps these parameters to thepolynomial computed by the ABP. Every coordinate function of F is given by polynomials in N variables, so F is Zariski-continuous. Therefore F (( R + ) N ) = F (( R + ) N ) = F ( C N ) ⊇ F ( C N ) % F (( R + ) N ) , where the overline means the Zariski closure.Recall that an ABP has d + 1 layers of vertices. If an ABP has w i many vertices in layer i ,1 ≤ i ≤ d , we say the ABP has format w = ( w , w , . . . , w d ). We further recall that w d +1 = w .The following theorem is our closure result, which proves (1.6) and hence MVBP = MVBP R . Given a polynomial f over R and given a format w single-(source,sink) ABP withaffine linear labels over R [ ε, ε − ] + computing f ε such that lim ε → f ε = f . Then there exists a format w monotone single-(source,sink) ABP that computes f .Proof. The proof is constructive and done by a two-step process. In the first step (which is fairlystandard and works in many computational models) we move all the ε with negative exponents toedges adjacent to the source. The second step then uses the monotonicity.Given Γ with affine linear labels over R [ ε, ε − ] + we repeat the following process until all labelsthat contain an ε with a negative exponent are incident to the source vertex. • Let e be an edge whose label contains ε with a negative exponent − i <
0. Moreover, assumethat e is not incident to the source vertex. Let v be the start vertex of e . We rescale all edgesoutgoing of v with ε i and we rescale all edges incoming to v with ε − i .If we always choose the edge with the highest layer, then it is easy to see that this process terminates.Since every path from the source to the sink that goes through a vertex v must use exactly oneedge that goes into v and exactly one edge that comes out of v , throughout the process the valueof Γ does not change. We finish this first phase by taking the highest negative power i among alllabels of edges that are incident to the source and then rescale all these edges with ε i . The resultingABP Γ i computes ε i f and no label contains an ε with negative exponent. We now start phase2 that transforms Γ i into Γ i − that computes ε i − f without introducing negative exponents of ε .We repeat phase 2 until we reach Γ in which we safely set ε to 0. Throughout the whole processwe do not change the structure of the ABP and only rescale edge labels with powers of ε , whichpreserves monotonicity, so the proof is finished. It remains to show how Γ i can be transformed intoΓ i − . An edge whose label is divisible by ε is called an ε -edge . Consider the set ∆ of vertices that5re reachable from the source using only non ε -edges in Γ i . The crucial insight is that since Γ i ismonotone and computes a polynomial that is divisible by ε , we know that every path in Γ i fromthe source to the sink uses an ε -edge. Therefore ∆ cannot contain the sink. We call a vertex in∆ whose outdegree is zero a leaf vertex. We repeat the following procedure until the source is theonly leaf vertex. • Let v be a non-source leaf vertex in ∆. We rescale all edges outgoing of v with ε − and werescale all edges incoming to v with ε .It is easy to see that this process terminates with the source being the only leaf vertex. Since thesource is a leaf vertex, all edges incident to the source are ε -edges. We divide all their labels by ε to obtain Γ i − . f with higher complexity than bordercomplexity Fix some d ≥
3. In this section for every m ≥ f such that m = ncw ( f ) < ncw ( f ) . (5.1)A completely analogous construction can be used to find f with w ( f ) < w ( f ) and with mw ( f ) < mw ( f ). For the sake of simplicity, we carry out only the proof for (5.1).Recall that in a format w ABP we have w d +1 = w . In each layer i we enumerate the vertices V i = { v i , . . . , v iw i } and we assume without loss of generality that the correspondence bijectionbetween V d +1 and V is the identity on the indices j of v j , i.e., the j th vertex in V correspondsto the j th vertex in V d +1 .Fix an ABP format w = ( w , w , . . . , w d ) such that for all i , w i ≥
2. Let Γ com denote thedirected acyclic graph underlying an ABP of format w . An edge can be described by the triple( a, b, i ), where 1 ≤ i ≤ d , 1 ≤ a ≤ w i and 1 ≤ b ≤ w i +1 . Consider the following labeling of the edgeswith triple-indexed variables: ℓ com (( a, b, i )) = x ( i )( a,b ) . Define f com to be the polynomial computed byΓ com with edge labels ℓ com .We now construct f as follows. Let d be odd (the case when d is even works analogously). Sincein each layer we enumerated the vertices, we can now assign to each vertex its parity: even or odd.We call an edge between two even or two odd vertices parity preserving , while we call the otheredges parity changing . Let us consider the following labeling of Γ com : We set ℓ (( a, b, i )) := x ( i )( a,b ) if ( a, b, i ) is parity changing (i.e., a b (mod 2)) and set the label ℓ (( a, b, i )) := εx ( i )( a,b ) otherwise,where ε ∈ C . Let f ′ ε be the polynomial computed by Γ com with edge labels ℓ and set f ε := ε f ′ ε for ε = 0. We define f := lim ε → f ε (convergence follows from the construction, because d is odd). Bydefinition, for all ε = 0, f ε can be computed by a format w ABP. However, we will now prove thatthis property fails for the limit point f . Fix an ABP format w = ( w , w , . . . , w d ) such that for all i , w i ≥ . Let f bedefined as above. Then, f cannot be computed by an ABP of format w . Note that for a format where m = w = · · · = w d , this gives the f which was desired in (5.1).(Note, however, that f can be computed by an ABP of width 2 m as follows. Construct an ABP Γ ′ that has, for each vertex v ∈ Γ com , vertices v ′ and v ′′ . For each parity changing edge ( a, b ) ∈ Γ com with label ℓ , add edges ( a ′ , b ′ ) and ( a ′′ , b ′′ ) with the same label ℓ . For each parity preserving edge( a, b ) ∈ Γ com with label ℓ , add edge ( a ′ , b ′′ ) with label ( ε ) ℓ . For corresponding vertices u, v inΓ com , let v ′′ be the corresponding vertex for u ′ and v ′ be the corresponding vertex for u ′′ in Γ ′ . All6aths between corresponding vertices in this ABP use exactly one parity preserving edge of Γ com ,and so this ABP computes f .)The proof of Theorem 5.2 works as follows. Let G := GL w w × GL w w × · · · × GL w d w d +1 . Let End := G denote its Euclidean closure, i.e., tuples of matrices in which one or several matrices canbe singular.We consider noncommutative homogeneous polynomials in the variables x ( i )( a,b ) such that the i -thvariable in each monomial is x ( i )( a,b ) for some a ∈ [ w i ] and b ∈ [ w i +1 ]. The vector space of thesepolynomials is isomorphic to W := C w w ⊗ C w w ⊗ · · · ⊗ C w d w d +1 and the monoid End (and thusalso the group G ) acts on this space in the canonical way. The set { f ∈ W | f can be computed by a format w ABP } is precisely the orbit End f com . We follow the overall proof strategy in [HL16]. The monoid orbit End f com decomposes into two disjoint orbits: End f com = G f com ∪ ( End \ G ) f com . Our goal is to show two things independently:1. f / ∈ ( End \ G ) f com , and2. f / ∈ G f com ,which finishes the proof of Theorem 5.2.All elements in ( End \ G ) f com are not concise , a term that we define in Section 6, where we alsoprove that f is concise. Therefore f / ∈ ( End \ G ) f com .All elements in G f com have full orbit dimension , a term that we define in Section 7 and we provethat f does not have full orbit dimension in Section 8. This finishes the proof of Theorem 5.2. In this section we show that f / ∈ ( End \ G ) f com . To do so we use a notion called conciseness .Informally, it captures whether a polynomial depends on all variables independent of a change ofbasis, or a tensor cannot be embedded into a tensor product of smaller spaces.Given a tensor f in C m ⊗ C m ⊗ · · · ⊗ C m d , we associate the following matrices with f . For j ∈ [ d ], define a matrix M jf of dimension m j × ( Q i ∈ [ d ] \{ j } m i ) with rows labeled by the standardbasis of C m j , and columns by elements in the Cartesian product { standard basis of C m } × · · · ×{ standard basis of C m j − } × { standard basis of C m j +1 } × · · · × { standard basis of C m d } . We writethe tensor f in the standard basis f = X ≤ i ≤ m ≤ i ≤ m ... ≤ i d ≤ m d α i ,...,i d e i ⊗ · · · ⊗ e i d and associate to it the matrix M jf whose entry at position (( i j ) , ( i , i , . . . , i j − , i j +1 , . . . , i d )) is α i ,...,i d . We say that a tensor f in C m ⊗ C m ⊗ · · · ⊗ C m d is concise if and only if for all j ∈ [ d ], M jf has full rank. 7s a warm-up exercise we now show that f com is concise. f com is concise.Proof. We know that f com ∈ W . Let us consider the matrix M jf com for some j ∈ [ d ]. To establishthat M jf com has full rank, it suffices to show that rows are linearly independent. In order to showthat, we argue that every row is non-zero and every column has at most one non-zero entry. Inother words, rows are supported on disjoint sets of columns.A row of M jf com is labeled by an edge in the j -th layer of the ABP Γ com . Recall that onlypaths that start at a vertex in V and end at the corresponding vertex in V d +1 contribute tothe computation in Γ com . We call such paths valid paths . An entry in M jf com is non-zero iff thecorresponding row and column labels form a valid path in Γ com . Thus, it is easily seen that a rowis non-zero iff there is a valid path in Γ com that passes through the edge given by the row label. Bythe structure of Γ com , in particular that every layer is a complete bipartite graph, we observe thatpassing through every edge there is some valid path. Hence, we obtain that every row is non-zero.The second claim now follows from the observation that fixing d − d th edge so that these d edges form a valid path, or for these d − d th edge.As mentioned in Section 5, to establish f / ∈ ( End \ G ) f com we will show that f is concise whileany element in ( End \ G ) f com is not. f is concise.Proof. Analogous to the proof of Proposition 6.2, we again show that every row of M jf is non-zeroand every column of it has at most one non-zero entry. That is, rows of M jf are supported ondisjoint sets of columns.From the construction of f it is seen that a path in Γ com contributes to the computation of f iff it is a valid path that comprises of exactly one parity preserving edge. The second claim ofevery column having at most one non-zero entry now follows for the same reason as in the proof ofProposition 6.2.Before proving the first claim, we recall two assumptions in the construction of f . The first isthat the format w = ( w , w , . . . , w d ) is such that w i ≥ i ∈ [ d ] and the second is that d isodd. To argue that a row is non-zero it suffices to show that a valid path comprising of only oneparity preserving edge passes through the edge given by the row level. Let us consider an arbitraryedge e in Γ com . We have two cases to consider depending on whether it is parity preserving or changing . Case 1.
Suppose e is parity preserving and it belongs to a layer j ∈ [ d ]. The number of layerson the left of e is j − d − j . Since d is odd, these numbers are either both evenor both odd. We now argue for the case when they are even (the odd case is analogous). Choosea vertex v in V that has the same parity (different in the odd case) as one of the end points of e .(Such a choice exists because w ≥ v that passes through e and contains exactly one parity preserving edge. Since e is parity preserving,all edges in the claimed path must be parity changing. We observe that e can be easily extendedin both directions using parity changing edges such that the path ends at corr ( v ). The existence ofparity changing edges at each layer uses the assumption that w i ≥ Case 2.
Otherwise e = ( a, b ) is parity changing. Again as before there are two cases based onwhether both j − d − j are even or odd. Consider the case when they are even (the odd casebeing analogous). We first assume that j = d . Choose a vertex v in V that has the same parity8s a . We now construct a valid path from v to corr ( v ) that passes through e and contains exactlyone parity preserving edge. It is easily seen that there exists a path from v to a using only paritychanging edges. We choose a parity preserving outgoing edge incident to b . We call its endpoint v . Since v and v have different parities, we can connect v to corr ( v ) in V d +1 using only paritychanging edges. Thus we obtain the following valid path v → · · · → a → b → v → · · · → corr ( v )passing through exactly one parity preserving edge ( b, v ). In the case that j = d , choose anincoming parity preserving edge incident on a instead of an outgoing edge on b . We note that if the format w = ( w , . . . , w d ) defining f is such that for some j ∈ [ d ], w j = 1, then f is not concise. This can be seen as follows.Let w j = 1, and let v denote the unique vertex in V j . Let e be the edge e = (1 , , j ). If j < d ,let e ′ be the edge e ′ = (1 , , j + 1), otherwise let e ′ be the edge e ′ = (1 , , j − e, e ′ areparity preserving edges. By construction, every valid path using e ′ must also use e . Hence thecorresponding row in the matrix M j +1 f if j < d , and in M j − f otherwise, is zero. Therefore f is notconcise.This is an interesting observation, because this is the point where our proof fails for single-(source,sink) ABPs, and this is expected, because Nisan [Nis91] had shown that the set of polyno-mials computed by such ABPs of format w is a closed set. Let f ∈ ( End \ G ) f com . Then f is not concise.Proof. This statement is true in very high generality. In our specific case a proof goes as follows.If f ∈ ( End \ G ) f com , then f = gf com for some g ∈ End \ G . Let g = ( g , . . . , g d ), where g i ∈ C w i w i +1 × w i w i +1 . Since g / ∈ G , at least one of the g i must be singular. The crucial property is M igf com = g i M if com , which finishes the proof. In this section we introduce tangent spaces and study their dimensions. We especially study themin the context of G f com , and G f .The orbit dimension of a tensor f ∈ C w w ⊗ C w w ⊗ · · · ⊗ C w d w d +1 is the dimension of the orbit G f as an affine variety. It can be determined as the dimension of the tangent space T f of the actionof G at f , which is a vector space defined as follows. Let g := C w w × w w × · · · × C w d w d +1 × w d w d +1 .For A ∈ g we define the Lie algebra action Af := lim ε → ε ((id + εA ) f − f ), where id ∈ G is theidentity element. We define the vector space T f := g f = { Af | A ∈ g } . The dimension dim T h is the same for all h ∈ G f .Proof. Since the action of G is linear, for all g ∈ G and A ∈ g we have A ( gf ) = lim ε → ε ((id + εA )( gf ) − gf ) = lim ε → ε (cid:0) gg − (id + εA ) gf − gf (cid:1) = g lim ε → ε (cid:0) (id + ε ( g − Ag )) f − f (cid:1) = g (( g − Ag ) f )Since A g − Ag is a bijection on g , it follows that T gf = gT f . Hence the claim follows.In the following we will use Claim 7.1 to argue f / ∈ G f com by showing that dim T f com and dim T f are different. 9et e, e ′ ∈ E i and let A ( i ) e,e ′ ∈ g denote the matrix tuple where the i -th matrix has a 1 at position( e, e ′ ) and all other entries (also in all other matrices) are 0. Since these matrices form a basis of g ,it follows that g f = linspan { A ( i ) e,e ′ f } . For a tensor f we define the support of f as the set of monomials (i.e., standard basis tensors) forwhich f has nonzero coefficient. For a linear subspace V ⊆ C w w ⊗ C w w ⊗ · · · ⊗ C w d w d +1 we definethe support of V as the union of the supports of all f ∈ V .We write e ∩ e ′ = ∅ to indicate that two edges e and e ′ do not share any vertex. We write | e ∩ e ′ | = 1 if they share exactly one vertex. We observe that for f ∈ { f com , f } the vector space T f decomposes into a direct sum of three vector spaces, g := linspan { A ( i ) e,e ′ | ≤ i ≤ d, ≤ e, e ′ ≤ w i w i +1 , e ∩ e ′ = ∅} g := linspan { A ( i ) e,e ′ | ≤ i ≤ d, ≤ e, e ′ ≤ w i w i +1 , | e ∩ e ′ | = 1 } g := linspan { A ( i ) e,e | ≤ i ≤ d, ≤ e ≤ w i w i +1 } . g = g ⊕ g ⊕ g T f = g f ⊕ g f ⊕ g f The last direct sum decomposition follows from the fact that g f , g f , and g f have pairwise disjointsupports.We show in this section that dim g f com = dim g f , and that dim g f com = dim g f . InSection 8 we show that dim g f com > dim g f , which then implies f / ∈ G f com by Claim 7.1. Infact, Theorem 8.1 gives the exact dimension of g f com by proving that g f com is isomorphic to thevector space of flows on the ABP digraph when identifying vertices in V with their correspondingvertices in V d +1 . Theorem 8.2 establishes an additional equation based on the vertex parities thatshows that g f is strictly lower dimensional than g f com .We start with Lemma 7.2, which shows that dim g f com and dim g f have full dimension. Let f ∈ { f com , f } . The space g f has full dimension. That is, its dimension equals P di =1 w i w i +1 ( w i − w i +1 − .Proof. Suppose f = f com . The other case being analogous, we only argue this case.We analyze the monomials that appear in the different A ( i ) e,e ′ f com and argue that a monomialthat appears in some A ( i ) e,e ′ f com can only appear in that specific A ( i ) e,e ′ f com . Indeed, each monomialcorresponds to a valid path in which one edge e in layer i is changed to e ′ . Since e and e ′ share novertex, from this edge sequence we can reconstruct i , e , and e ′ uniquely: e ′ is the edge that doesnot have any vertex in common with the rest of the edge sequence, i is its layer, and e is the uniqueedge that we can replace e ′ by in order to form a valid path. We conclude that the A ( i ) e,e ′ f com havedisjoint support and the lemma follows.To establish that dim g f com = dim g f , we introduce some notation.For a connected directed graph G = ( V, E ) we define a flow to be a labeling of the edge set E by complex numbers such that at every vertex the sum of the labels of the incoming edges equalsthe sum of the labels of the outgoing edges. It is easily seen that the set of flows forms a vectorspace F . We have dim F = | E | − | V | + 1 , (7.3)see e.g. Theorem 20.7 in [BM08]. 10ecall that E i denotes the set of edges from V i to V i +1 . Let X := E × · · · × E d denote thedirect product of the sets of edge lists. Each directed path of length d from layer 1 to d + 1 is anelement of X , but X contains other edge sets as well. Define E i := C E i . Consider the followingmap ϕ from X to E ⊗ · · · ⊗ E d , ϕ ( e , . . . , e d ) = x e ⊗ · · · ⊗ x e d ∈ E ⊗ · · · ⊗ E d where ( x j ) is the standard basis of E i . Note ϕ is a bijection between X and the standard basis of E ⊗ · · · ⊗ E d .An edge set in X is called a valid path if it forms a path that starts and ends at correspondingvertices (see Sec. 1). Let P ⊆ X denote the set of valid paths. dim g f com = dim g f = P di =1 ( w i − + w i +1 − w i − w i , where w := w d .Proof. The proof works almost analogously for f com and f , so we treat only the more naturalcase f com . We show that g f com is isomorphic to a direct sum of vector spaces of flows on verysimple digraphs. Fix 1 ≤ i ≤ d . Fix distinct 1 ≤ a, b ≤ w i . For distinct edges e, e ′ ∈ E i , let P e,e ′ ⊆ X be the set of edge sets containing e ′ that are not valid paths, but that become validpaths by removing e ′ and adding e . Let P ia,b ⊆ X be the set of edge sets that are not valid paths,but that become valid paths by switching the end point of the ( i − v ib and that alsobecome valid paths by switching the start point of the i -th edge to v ia (if i − i − d ). Pictorially, this means that elements in P ia,b are almost valid paths, but there is adiscontinuity at layer i , where the path jumps from vertex v ia to vertex v ib . We have A ( i ) e,e ′ f com = X p ∈ P e,e ′ ϕ ( p ) . The vectors { A ( i ) e,e ′ f com | ≤ i ≤ d, e, e ′ ∈ E i , | e ∩ e ′ | = 1 } are not linearly independent, because for a = b we have X e and e ′ have the same start point e ′ ends at the a -th vertex e ends at the b -th vertex A ( i − e,e ′ f com = X p ∈ P ia,b ϕ ( p ) = X h and h ′ have the same end point h starts at the a -th vertex h ′ starts at the b -th vertex A ( i ) h,h ′ f com . (7.5)Define T a,b,i := linspan (cid:26) A ( i − e,e ′ f com (cid:12)(cid:12)(cid:12)(cid:12) e and e ′ have the same start point e ′ ends at the a -th vertex e ends at the b -th vertex (cid:27) + linspan (cid:26) A ( i ) h,h ′ f com (cid:12)(cid:12)(cid:12)(cid:12) h and h ′ have the same end point h starts at the a -th vertex h ′ starts at the b -th vertex (cid:27) . The support of T a,b,i and T ˜ a, ˜ b, ˜ i are disjoint, provided ( a, b, i ) = (˜ a, ˜ b, ˜ i ). Hence g f com = M ≤ i ≤ d ≤ a,b ≤ w i a = b T a,b,i It remains to prove that the dimension of T a,b,i is w i − + w i +1 −
1, because thendim g f com = X ≤ i ≤ d ≤ a,b ≤ w i a = b ( w i − + w i +1 −
1) = d X i =1 ( w i − + w i +1 − w i − w i . T a,b,i is defined as the linear span of w i − + w i +1 many vectors, but (7.5) shows thatthese are not linearly independent. We prove that (7.5) is the only equality by showing that T a,b,i is isomorphic to a flow vector space. We define a multigraph with two vertices: ·(cid:13) and ∗(cid:13) . Wehave w i +1 many edges from ·(cid:13) to ∗(cid:13) , and we have w i − many edges from ∗(cid:13) to ·(cid:13) . We denote by ∗(cid:13) k → ·(cid:13) the k -th edge from ∗(cid:13) to ·(cid:13) . Let F a,b,i denote the vector space of flows on this graph. Itsdimension is w i − + w i +1 −
1, see (7.3). We define ̺ : E ⊗ · · · ⊗ E d → F a,b,i on rank 1 tensors via ̺ ( x e ⊗ · · · ⊗ x e d )( ∗(cid:13) k → ·(cid:13) ) = ( e i − starts at k in layer i − a in layer i ,0 otherwise .̺ ( x e ⊗ · · · ⊗ x e d )( ·(cid:13) l → ∗(cid:13) ) = ( e i starts at b in layer i and ends at l in layer i + 1,0 otherwise . Using (7.5) it is readily verified that ̺ maps T a,b,i to F a,b,i . It remains to show that ̺ : T a,b,i → F a,b,i is surjective. Let α := | P ia,b | . We observe that ̺ ( A ( i − e,e ′ f com )( ∗(cid:13) k → ·(cid:13) ) = ( α/w i − if e and e ′ both start at the k -th vertex0 if e and e ′ both start at the same vertex, but not at the k -th ̺ ( A ( i − e,e ′ f com )( ·(cid:13) l → ∗(cid:13) ) = α/ ( w i − w i +1 ) ̺ ( A ( i ) h,h ′ f com )( ·(cid:13) l → ∗(cid:13) ) = ( α/w i +1 if h and h ′ both end at the l -th vertex0 if h and h ′ both end at the same vertex, but not at the l -th ̺ ( A ( i ) h,h ′ f com )( ∗(cid:13) k → ·(cid:13) ) = α/ ( w i − w i +1 )Let Ξ := P A ( i − e,e ′ f com . Then ∀ k : ̺ (Ξ)( ∗(cid:13) k → ·(cid:13) ) = α/w i − and ∀ l : ̺ (Ξ)( ·(cid:13) l → ∗(cid:13) ) = α . Therefore,for e, e ′ starting at the k -th vertex and h, h ′ ending at the l -th vertex we have that ̺ (cid:18) w i − w i +1 ̺ ( A ( i − e,e ′ f com ) + w i − w i +1 ̺ ( A ih,h ′ f com ) − Ξ (cid:19) is nonzero only on exactly two edges: ∗(cid:13) k → ·(cid:13) and ·(cid:13) l → ∗(cid:13) . Cycles form a generating set of thevector space F a,b,i , which finishes the proof of the surjectivity of ̺ . We now proceed to the analysis of g f com and g f . The connection to flow vector spaces will be evenmore prevalent than in Proposition 7.4. The main result of this section is dim g f com > dim g f (Theorems 8.1 and 8.2), which implies that f com and f have different orbit dimensions. We therebyconclude that f / ∈ G f com .To each edge e we assign its path tensor ψ ( e ) by summing tensors over all valid paths passingthrough e , ψ ( e ) := X p ∈ P with e ∈ p ϕ ( p ) ∈ E ⊗ · · · ⊗ E d . By linear continuation this gives a linear map ψ : C E → E ⊗· · ·⊗ E d . Observe that ψ ( e ) = A ( i ) e,e f com . Let T denote the linear span of all ψ ( e ), e ∈ E . In other words, T = g f com .12et P ′ ⊆ P ⊆ X be the set of valid paths that contain exactly one parity preserving edge. Toeach edge e we assign its parity path tensor ψ ′ ( e ) by summing tensors over paths in P ′ , ψ ′ ( e ) := X p ∈ P ′ with e ∈ p ϕ ( p ) ∈ E ⊗ · · · ⊗ E d . By linear continuation this gives a linear map ψ ′ : C E → E ⊗ · · · ⊗ E d . Observe that ψ ′ ( e ) = A ( i ) e,e f . Let T ′ denote the linear span of all ψ ′ ( e ), e ∈ E . In other words, T ′ = g f .We will establish the following bounds on the dimensions of T and T ′ . dim T = | E | − P di =1 w i + 1 . dim T ′ ≤ | E | − P di =1 w i . The rest of this section is dedicated to the proofs of Theorem 8.1 and Theorem 8.2 by showingthat T is isomorphic to the vector space of flows “on the ABP”, while the parity constraints leadto a smaller dimension of T ′ .From an ABP Γ we construct a digraph ˜Γ by identifying corresponding vertices from the firstand the last layer in V and calling the resulting vertex set ˜ V . Note | ˜ V | = P di =1 w i . The directedgraphs Γ and ˜Γ have the same edge set. The resulting directed graph is called ˜Γ = ( ˜ V , E ). Let F denote the vector space of flows on ˜Γ. Note that by (7.3) we have dim F = | E | − | ˜ V | + 1. Alldirected cycles in ˜Γ have a length that is a multiple of d . In particular, all cycles of length exactly d are in one-to-one correspondence with valid paths in Γ com . For an edge e ∈ E , let χ ( e ) ∈ C E denotethe characteristic function of e , i.e., the function whose value is 1 on e and 0 everywhere else.We now prove Theorem 8.1 by establishing a matching upper (Lemma 8.3) and lower bound(Lemma 8.4) of | E | − | ˜ V | + 1 = dim F on dim T . The upper bound dim T ≤ | E | − | ˜ V | + 1 .Proof. For v ∈ ˜ V , let in ( v ) ⊆ E denote the set of incoming edges incident to v and out ( v ) ⊆ E denote the set of outgoing edges incident to v . For each v ∈ ˜ V , define the row vector r v = X e ∈ in ( v ) χ ( e ) − X e ∈ out ( v ) χ ( e ) . These vectors are the rows of the signed incidence matrix of ˜Γ, and since ˜Γ is connected, they spana space of dimension | ˜ V | − v ∈ ˜ V , X e ∈ in ( v ) ψ ( e ) = X e ∈ out ( v ) ψ ( e ) . Since ψ is linear, this is equivalent to ψ X e ∈ in ( v ) χ ( e ) − X e ∈ out ( v ) χ ( e ) = 0 . Hence each r v is in the kernel of ψ , and hence dim ker ψ ≥ | ˜ V | −
1. Using (7.3), we obtain dim T =dim im ψ = | E | − dim ker ψ ≤ | E | − | ˜ V | + 1 = dim F .13 v v v v v v v v v v v v v v v v v v v Figure 1: The spanning tree construction for width 4 and d = 5. The lower bound
To obtain the lower bound, we define a linear map ̺ : E ⊗ · · · ⊗ E d → C E such that the image of ̺ restricted to T equals F . This will imply that dim T ≥ dim F , thereby achieving the requiredlower bound.We define the linear map ̺ on standard basis elements x e ⊗ · · · ⊗ x e d as follows, ̺ ( x e ⊗ · · · ⊗ x e d ) := χ ( e ) + · · · + χ ( e d ) , and then extend it to the domain E ⊗ · · · ⊗ E d via linear continuation. Let ̺ | T denote the restriction of ̺ to the linear subspace T . Then, im ̺ | T = F . Inparticular, dim T ≥ dim F = | E | − | ˜ V | + 1 .Proof. To prove equality it suffices to show im ̺ | T ⊆ F and F ⊆ im ̺ | T .The first containment is easy to see. For an edge e , consider the image of ψ ( e ) under the map ̺ , ̺ ( ψ ( e )) = X e ∈ p ∈ P X e ′ ∈ p χ ( e ′ ) . Observe that for a path p ∈ P , P e ′ ∈ p χ ( e ′ ) is a flow on ˜Γ and hence it belongs to F . Thus, wehave ̺ ( ψ ( e )) ∈ F . Since T is spanned by ψ ( e ), for e ∈ E , we obtain that im ̺ | T ⊆ F .To establish the second containment it suffices to show that the image of T under the map ̺ contains a basis of F . We identify a specific basis for F in Claim 8.5 and prove that it is containedin im ̺ | T in Claim 8.6 to complete the argument.We identify directed cycles with their characteristic flows, i.e., flows that have value 1 on thecycle’s edges and 0 everywhere else. We also identify directed cycles that use edges in any directionwith their characteristic flow: the characteristic flow is defined to take the value 1 on an edge e if e is traversed in the direction of e , and value − e if e is traversed against its direction.From the theory of flows we know that for every (undirected) spanning tree T of ˜Γ, the vectorspace F ∈ C E has a basis given by the characteristic flows of cycles that only use edges from T and exactly one additional edge (for example, see Theorem 20.8 in [BM08]). Thus, the cycle flowscorresponding to the elements not in the spanning tree form a basis of F . F is spanned by the set of directed cycles in ˜Γ of length exactly d . C e = + - C C C Figure 2: Decomposing a cycle of length d + 2 as a linear combination of cycles of length d . Thefigure is an illustration when d = 3. The dotted layers in each cycle from the left are V , V , V ,and V again. Proof.
We construct a spanning tree τ as follows, which will be a tree whose edges are all directedaway from its root. Informally, the tree is given by the following subgraph, we make the first vertexin V as root, and include all the outgoing edges incident to it. We then move to the first vertexin V and include all the outgoing edges incident to it. We continue in this way until we reach V d .Upon reaching the first vertex in V d we include all but one outgoing edges incident to it. The onethat is an incoming edge to the root is not included. Figure 1 illustrates the construction. We nowformally define this.Let v i ∈ V i denote the first vertex in the layer i , 1 ≤ i ≤ d . Further recall in ( v ) ⊆ E and out ( v ) ⊆ E denote the set of incoming and outgoing edges, respectively, incident to v . Define theedge set τ := d [ i =1 out ( v i ) ! \ { ( v d , v ) } , which is a spanning tree in ˜Γ. We know that every edge not in the tree when added to the tree givesa unique undirected cycle. We now show that the characteristic flows of these undirected cyclescan be expressed as a linear combination of the characteristic flows of directed cycles of length d .For e ∈ E \ τ , let c e denote the characteristic flow of the unique undirected cycle that uses e in itscorrect direction and only edges of τ . We argue depending on which layer the edge e belongs to. • Suppose e ∈ E \ τ . – If e is incident to v , the first vertex in V , then the inclusion of e creates a directedcycle of length d . Hence, c e equals the characteristic flow of this directed cycle. – Otherwise, the inclusion of e creates an undirected cycle of length d + 2. If e = ( v j , v j )for some j ∈ [2 , w ] and j ∈ [2 , w ], then the cycle c e is given as follows: v d − v j − v j − v − v − · · · − v d − − v d . Consider the following two directed cycles: C : v − v j − · · · − v d − v and C : v j − v j − · · · − v d − v j , such that the part v j − · · · − v d between v j and v d in the two cycles is the same. Let usdenote the characteristic flow of a cycle C by χ ( C ). We now observe that χ ( C ) − χ ( C )equals the characteristic flow of the undirected cycle v j − v j − v − v d − v j . This15s because the common part in C and C cancels out. To χ ( C ) − χ ( C ) we add thecharacteristic flow of the directed cycle, C : v − v − v − · · · − v d − − v d − v . It is now easily seen that χ ( C ) − χ ( C ) + χ ( C ) equals the characteristic flow of thecycle c e (see Figure 2 for an illustration). • Suppose e ∈ E d \ τ . – If e is incident to v , the first vertex in V , then as before the inclusion of e creates adirected cycle of length d . Hence, c e equals the characteristic flow of this directed cycle. – Otherwise, the inclusion of e creates an undirected cycle of length 4. If e = ( v dj , v j ) forsome j ∈ [2 , w d ] and j ∈ [2 , w ], then the cycle c e is given as follows: v dj − v j − v d − v d − − v dj . Consider the following two directed cycles: C : v j − · · · − v d − − v d − v j and C : v j − · · · − v d − − v dj − v j , such that the part v j − · · · − v d − between v j and v d − in the two cycles is the same.We now claim that χ ( C ) − χ ( C ) equals the characteristic flow of c e . This is becausethe common part in C and C cancels out. • Otherwise e ∈ E i \ τ for some i ∈ { , . . . , d − } . In such a case inclusion of e creates anundirected cycle of length 4. We can again argue exactly like in the previous case, and so weomit the argument here.We now prove that the generating set given by the directed cycles of length d is contained inthe image of T under the map ̺ . im( ̺ | T ) contains the characteristic flow of each directed cycle of length d .Proof. Let { e , e , . . . , e d } ⊆ E be a directed cycle of length d , where each e i points from a vertexin V i to a vertex in V i +1 . Let { e ( j ) i } denote the set of edges that start at the same vertex as e i , butfor which e ( j ) i = e i . Thus |{ e ( j ) i }| = | V i +1 | −
1. Let¯ ψ ( e ) := 1 |{ p ∈ P with e ∈ p }| ψ ( e ) , so that ̺ ( ¯ ψ ( e )) is a flow with value 1 on the edge e . It is instructive to have a look at the left sideof Figure 3, where ̺ ( ¯ ψ ( e )) is depicted. Subtracting w P w − j =1 ̺ ( ¯ ψ ( e ( j )2 )) and adding w − w ̺ ( ¯ ψ ( e ))reduces the support significantly and brings us one step closer to the cycle, see the right side ofFigure 3. We iterate this process until only the cycle is left. Formally: χ ( e , . . . , e d ) = ̺ ( ¯ ψ ( e ))+ w − w ̺ ( ¯ ψ ( e )) − w w − X j =1 ̺ ( ¯ ψ ( e ( j )2 ))+ · · · + w d − w d ̺ ( ¯ ψ ( e d − )) − w d w d − X j =1 ̺ ( ¯ ψ ( e ( j ) d − )) . w w w w w w w w w w Figure 3: On the left: ̺ ( ¯ ψ ( e )). On the right: ̺ ( ¯ ψ ( e )) − w P w − j =1 ̺ ( ¯ ψ ( e ( j )2 )) + w − w ̺ ( ¯ ψ ( e )). Thisis the case d = 5 and format (4 , , , , e is the top edge in the center . Here we assume that each e i points from the firstvertex V i to the first vertex in V i +1 . The stronger upper bound via parities
We now proceed to upper bound dim T ′ (Theorem 8.2). The proof is analogous to the proof ofLemma 8.3. (Restatement of Theorem 8.2) . dim T ′ ≤ | E | − | ˜ V | .Proof. As in the proof of Lemma 8.3, for v ∈ ˜ V , we have X e ∈ in ( v ) ψ ′ ( e ) = X e ∈ out ( v ) ψ ′ ( e ) . Furthermore, we have the following additional constraint on ψ ′ ,( d − X e parity preserving ψ ′ ( e ) = X e parity changing ψ ′ ( e ) . By the linearity of ψ ′ , we have ψ ′ ( d − X e parity preserving χ ( e ) − X e parity changing χ ( e ) = 0 . Therefore, the kernel of ψ ′ is spanned by the vectors ( P e ∈ in ( v ) χ ( e ) − P e ∈ out ( v ) χ ( e )), for v ∈ ˜ V ,and an additional vector (( d − P e parity preserving χ ( e ) − P e parity changing χ ( e )).We now claim that the new vector is linearly independent from the earlier set of vectors. Weprove the claim by constructing a vector in C E that is orthogonal to the earlier set of vectors but isnon-orthogonal to the additional vector. One such vector is given by the characteristic flow of thedirected cycle v − v − v − · · · − v d − − v d − v .Thus, it follows that dim ker ψ ′ ≥ | ˜ V | , and hence dim T ′ ≤ | E | − | ˜ V | .In the next section we continue our investigation of comparing exact complexity classes withthe approximative complexity classes. This would be a comparison between two well known classes,namely VQP and VNP. 17 VQP versus
VNP
In this section, we compare the complexity classes VQP and VNP. Valiant in his seminal paper[Val79] defined the complexity classes that are now called as VP and VNP, and the central questionof algebraic complexity is to understand whether the two complexity classes are indeed different assets (Valiant’s hypothesis). B¨urgisser [B¨ur00] defined the complexity class VQP and related it to thecomplexity classes VP and VNP. We proceed to define the above three classes for establishing thecontext. For an exhaustive treatment of the classes, we refer the readers to B¨urgisser’s monograph[B¨ur00] from where we are lifting the definitions. We first need to define so-called p-families.
A sequence f = ( f n ) of multivariate polynomials over a field k is called a p -family(over k ) iff the number of variables as well as the degree of f n are bounded by polynomial functionsin n .We now need to define the model of computation and the notion of complexity in order to definethe complexity classes of interest. A straight-line program Γ (expecting m inputs) represents a sequence (Γ , . . . , Γ r )of instructions Γ ρ = ( ω ρ ; i ρ , j ρ ) with operation symbols ω ρ ∈ { + , − , ∗} and the address i ρ , j ρ whichare integers satisfying − m < i ρ , j ρ < ρ . We call r the size of Γ.So, essentially, in a straight-line program, we either perform addition or subtraction or multi-plication on the inputs or the previously computed elements. The size of the straight-line programnaturally induces a size complexity measure on polynomials as follows: The complexity L ( f ) of a polynomial f ∈ F [ x , . . . , x n ] is the minimal size of astraight-line program computing f from variables x i and constants in F .We are now all set to define the above discussed complexity classes. A p -family f = ( f n ) is said to be p -computable iff the complexity L ( f n ) is apolynomially bounded function of n . VP F consists of all p -computable families over the field F . A p -family f = ( f n ) is said to be p - definable iff there exists a p - computable family g = ( g n ), g n ∈ F [ x , . . . , x u ( n ) ], such that for all nf n ( x , . . . , x v ( n ) ) = X e ∈{ , } u ( n ) − v ( n ) g n ( x , . . . , x v ( n ) , e v ( n )+1 , . . . , e u ( n ) ) . The set of p -definable families over F forms the complexity class VNP F . A p -family f = ( f n ) is said to be qp - computable iff the complexity L ( f n ) is aquasi-polynomially bounded function of n . The complexity class VQP F consists of all qp -computablefamilies over F .In the above three definitions, if the underlying field is clear from the context, we can drop thesubscript F and simply represent the classes as VP , VNP and VQP respectively. In what follows,the underlying field is always assumed to be Q , the field of rational numbers.In [B¨ur00], B¨urgisser showed the completeness of the determinant polynomial for VQP under qp -projections and strengthened Valiant’s hypothesis of VNP VP to VNP VQP and called it
Valiant’s extended hypothesis (see [B¨ur00], section 2.5). He also established that VP ( VQP andwent on to show that VQP VNP (see [B¨ur00], Proposition 8.5 and Corollary 8.9). The main18bservation of this section is that his proof is stronger and is sufficient to conclude that VQP is notcontained in the closure of VNP either, where the closure is in the sense as mentioned in Section 1.In fact, B¨urgisser in his monograph [B¨ur00] also gives a set of conditions which if the coefficientsof a polynomial sequence satisfies, then that polynomial sequence cannot be in VNP ([B¨ur00],Theorem 8.1). His theorem and the proof is inspired by Heintz and Sieveking [HS80]. The secondobservation of this section is that this proof is even stronger and actually those conditions aresufficient to show that the given polynomial sequence is not contained in VNP either.We now discuss both the observations.
VQP VNP
We first show that there is a log n variate polynomial of degree ( n −
1) log n which is in VQP butnot in VNP. In this exposition, for the sake of better readability, we do not present the B¨urgisser’sstatements in full generality since it is not essential for the theorem that we want to show here.Moreover, the less general version that we present here contains all the ideas for the theoremstatements and their proofs. Let N n := { , . . . , n − } log n and f n := P µ ∈ N n j ( µ ) X µ · · · X µ log n log n , where j ( µ ) := P log nj =1 µ j n j − . Then f n ∈ VQP , but f n / ∈ VNP , and hence
VQP VNP . The theorem consists of two parts. The containment in VQP follows immediately from the factthat the total number of monomials in f n is n log n . For the other part, we closely follow B¨urgisser’slower bound proof ([B¨ur00], Proposition 8.5) against VNP here, making transparent the fact thatthe proof works also against VNP. His proof techniques were borrowed from Strassen ([Str74]).The idea is to use the universal representation for polynomial sequences in VNP, so that we get ahold on how the coefficients of the polynomials look like. Using that, we establish polynomials H n that vanish on all the polynomial sequences in VNP (in other words, H n is in the vanishing idealof sequences in VNP), but do not vanish on f n (because the growth rate of its coefficients is toohigh), hence giving the separation. Since the vanishing ideal of a set characterizes its closure, weget the stronger separation, i.e., f n does not belong to the closure of VNP, namely, VNP. Proof of Theorem 9.7.
As stated above, the proof works in three stages: first, assuming the contraryand writing f n using the universal representation for the polynomial sequences in VNP, then givingpolynomials H n of special forms in the vanishing ideal of polynomial sequences in VNP, and finallyshowing that H n cannot vanish on our sequence f n , hence arriving at a contradiction.Assuming ( f n ) ∈ VNP implies the existence of a family ( g n ) ∈ VP, with L ( g n ) bounded by apolynomial r ( n ), and a polynomial u ( n ) such that f n ( X , . . . , X log n ) = X e ∈{ , } u ( n ) − log n g n ( X , . . . , X log n , e log n +1 , . . . , e u ( n ) ) . Next, we use the universal representation theorem (see [Str74], [Sch77]) as stated in B¨urgisser’smonograph ([B¨ur00], Proposition 8.3; for a proof see [BCS13], Proposition 9.11) for size r ( n )straight-line program to get that there exist polynomials G ( n ) ν ∈ Z [ Y , . . . , Y q ( n ) ], with q ( n ) being apolynomial in n (more precisely, it is a polynomial in r ( n ) and u ( n )) which for | ν | ≤ deg g n = n O (1) ,guarantee that deg G ν = n O (1) , log wt ( G ν ) ( n ) = 2 n O (1) , and also guarantee the existence of some ζ ∈ Q q ( n ) , such that g n = X ν G ( n ) ν ( ζ ) X ν , · · · , X ν u ( n ) u ( n ) , f , wt ( f ) refers to the sum of the absolute values of its coefficients.Now, taking exponential sum yields that f n = X µ ∈ N n F ( n ) µ ( ζ ) X µ · · · X µ log n log n , where the polynomials F ( n ) µ are obtained as a sum of at most 2 u ( n ) polynomials G ( n ) ν . Thus, we nowhave a good hold on F ( n ) µ i.e. deg F ( n ) µ ≤ α ( n ) and log wt ( F ( n ) µ ) ≤ β ( n ) , where both α ( n ) and β ( n )are polynomially bounded functions of n .Thus, for f n to be in VNP, the coefficients of f n should be in the image of the polynomial map F nµ : Q q ( n ) → Q n log n . In other words, we must have some ζ ∈ Q q ( n ) , such that for all µ ∈ N n ,we have F nµ ( ζ ) = 2 j ( µ ) , where j ( µ ) := P log nj =1 µ j n j − . Since F nµ takes all the values from 2 to2 n log n − , we have a subset of indices ˜ N n ⊆ N n of size s ( n ) := ⌊| N n | /n ⌋ = ⌊ n log n /n ⌋ , such thatfor σ ∈ { , , . . . , s ( n ) − } and a bijection δ : { , , . . . , s ( n ) − } → ˜ N n with σ δ ( σ ), we have F nδ ( σ ) = 2 σn +1 .Now we can apply Lemma 9.28 from [BCS13] which asserts that there will be polynomials of lowheight (ht) (the maximum of the absolute value of the coefficients) on which these coefficients shallvanish. More precisely, there exists non-zero forms H n ∈ Z [ Y µ | µ ∈ ˜ N n ] with ht( H n ) ≤
3, deg H n ≤ D ( n ), and such that H n ( F nµ | µ ∈ N n ) = 0, given that D ( n ) s ( n ) − q ( n ) − > α ( n ) q ( n ) s ( n ) s ( n ) β ( n ) .It can be seen that D ( n ) = 2 n − α ( n ) , β ( n ) and q ( n ) arepolynomially bounded and 2 n grows much faster than s ( n ) = ⌊ n log n /n ⌋ . This allows us to write H n = P e λ e Q µ ∈ ˜ N n Y e µ µ , where the absolute values of λ e are bounded by 3. Since H n vanishes onthe subset of coefficients of f n i.e it vanishes on F nδ ( σ ) = 2 σn +1 with σ ∈ { , , . . . , s ( n ) − } , we have0 = H n ( F nµ | µ ∈ ˜ N n ) = X e λ e s ( n ) − Y σ =0 e δ ( σ ) σn +1 = X e λ e · P σ e δ ( σ ) (2 n ) σ . The last sum is essentially a 4-adic integer, since firstly, | λ e | ≤
3, and secondly, all the exponents of 4,that is, P σ e δ ( σ ) (2 n ) σ are all distinct, as they can be seen as 2 n -adic representation since e δ ( σ ) < n .Thus λ e has to be zero for all e . Hence H n must be identically zero, which is a contradiction. VNP
In this section, we discuss a criterion B¨urgisser presented in his monograph [B¨ur00] based on a proofdue to Heintz and Sieveking which gives a set of conditions that puts a p -family out of VNP. Weobserve that those conditions if satisfied, in fact, put a given p -family out of VNP as well. Let ( p n ) be a sequence of polynomials over Q and let N ( n ) denote the degree of thefield extension generated by the coefficients of p n over Q . Further suppose the following holds:1. The map n
7→ ⌈ log N ( n ) ⌉ is not p -bounded.2. For all n , there is a system G n of rational polynomials of degree at most D ( n ) with finitezeroset, containing the coefficient system of f n , and such that n
7→ ⌈ log D ( n ) ⌉ is p -bounded.Then the family ( p n ) VNP . Thus the above theorem shows that certain p -families with algebraic coefficients of high degreeare not contained in VNP. We now give a simple example from [B¨ur00] to illustrate the theorem.20 .9 Example. Consider the following multivariate family defined as p n = X e ∈{ , } n \ p p j ( e ) X e , where j ( e ) = P ns =1 e s s − and p j refers to the j -th prime number. Then using the aboveTheorem 9.8, we can conclude that p n / ∈ VNP. This is because the degree of field extension N ( n ) = [ Q ( √ p j | ≤ j ≤ n ) : Q ] = 2 n − (see for example [BCS13], Lemma 9.20), hence condi-tion 1 above is satisfied. Condition 2 is also satisfied because the coefficients are the roots of thesystem G n = { Z j − p j | ≤ j < n } , with D ( n ) = 2.For a proof of the theorem, we refer the readers to [[B¨ur00],Theorem 8.1]. We point out thatthe proof in its original form already works. In his proof, he wanted to conclude that f n / ∈ VNP.However, along the way, he arrives at a contradiction to the assertion that f n is contained in theZariski closure of VNP, which is exactly what is now known as VNP. During the time of the originalproof, the complexity class VNP was not defined. Acknowledgements
We thank Michael Forbes for illuminating discussions and for telling us about his (correct) intuitionconcerning Nisan’s result. We thank the Simons Institute for the Theory of Computing (Berkeley),Schloss Dagstuhl - Leibniz-Zentrum f¨ur Informatik (Dagstuhl), and the International Centre forTheoretical Sciences (Bengaluru), for hosting us during several phases of this research.
References [ABV15] Jarod Alper, Tristram Bogart, and Mauricio Velasco. A lower bound for the deter-minantal complexity of a hypersurface.
Foundations of Computational Mathematics ,pages 1–8, 2015.[AFS +
16] Matthew Anderson, Michael A. Forbes, Ramprasad Saptharishi, Amir Shpilka, andBen Lee Volk. Identity testing and lower bounds for read-k oblivious algebraic branchingprograms. In
Proceedings of the 31st Conference on Computational Complexity , CCC16, Dagstuhl, DEU, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.[AW16] Eric Allender and Fengming Wang. On the power of algebraic branching programs ofwidth two.
Comput. Complex. , 25(1):217–253, March 2016.[BCRL79] Dario Bini, Milvio Capovani, Francesco Romani, and Grazia Lotti. O( n . ) com-plexity for n × n approximate matrix multiplication. Inf. Process. Lett. , 8(5):234–235,1979.[BCS13] Peter B¨urgisser, Michael Clausen, and Mohammad A Shokrollahi.
Algebraic complexitytheory , volume 315. Springer Science & Business Media, 2013.[Bin80] D. Bini. Relations between exact and approximate bilinear algorithms. applications.
CALCOLO , 17(1):87–97, Jan 1980.[BIZ18] Karl Bringmann, Christian Ikenmeyer, and Jeroen Zuiddam. On algebraic branchingprograms of small width.
J. ACM , 65(5), August 2018.21BLMW11] Peter B¨urgisser, J.M. Landsberg, Laurent Manivel, and Jerzy Weyman. An overviewof mathematical issues arising in the Geometric complexity theory approach to VP v.s.VNP.
SIAM J. Comput. , 40(4):1179–1209, 2011.[BM08] J.A. Bondy and U.S.R Murty.
Graph Theory . Springer Publishing Company, Incorpo-rated, 2008.[BOC88] Michael Ben-Or and Richard Cleve. Computing algebraic formulas using a constantnumber of registers.
Proceedings 20th Annual ACM Symposium on Theory of Computing1988 , pages 554–257, 1988.[B¨ur00] Peter B¨urgisser.
Completeness and Reduction in Algebraic Complexity Theory . SpringerBerlin Heidelberg, Berlin, Heidelberg, 2000.[B¨ur01] Peter B¨urgisser. The complexity of factors of multivariate polynomials. In , pages 378–385. IEEE Computer Soc., Los Alamitos, CA, 2001.[FMST19] Herv´e Fournier, Guillaume Malod, Maud Szusterman, and S´ebastien Tavenas. Nonneg-ative Rank Measures and Monotone Algebraic Branching Programs. In Arkadev Chat-topadhyay and Paul Gastin, editors, , volume 150 of
Leibniz International Proceedings in Informatics (LIPIcs) , pages 15:1–15:14, Dagstuhl,Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.[For16] Michael Forbes. Some concrete questions on the Border Complexity of polynomials,2016. Talk at the Workshop on Algebraic Complexity Theory (WACT) 2016 in TelAviv.[Ges16] Fulvio Gesmundo. Geometric aspects of iterated matrix multiplication.
Journal ofAlgebra , 461:42 – 64, 2016.[GIP17] Fulvio Gesmundo, Christian Ikenmeyer, and Greta Panova. Geometric complexity the-ory and matrix powering.
Differential Geometry and its Applications , 55:106 –6 127,2017. Geometry and complexity theory.[GMQ16] Joshua A. Grochow, Ketan D. Mulmuley, and Youming Qiao. Boundaries of VP andVNP. In Ioannis Chatzigiannakis, Michael Mitzenmacher, Yuval Rabani, and DavideSangiorgi, editors, , volume 55 of
Leibniz International Proceedings in Informat-ics (LIPIcs) , pages 34:1–34:14, Dagstuhl, Germany, 2016. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.[Gre11] Bruno Grenet. An upper bound for the permanent versus determinant problem. Ac-cepted for
Theory of Computing , 2011.[HL16] Jesko Httenhain and Pierre Lairez. The boundary of the orbit of the 3-by-3 determinantpolynomial.
Comptes Rendus Mathematique , 354(9):931 – 935, 2016.[HS80] Joos Heintz and Malte Sieveking. Lower bounds for polynomials with algebraic coeffi-cients.
Theoretical Computer Science , 11(3):321–330, 1980.22KNST18] Neeraj Kayal, Vineet Nair, Chandan Saha, and S´ebastien Tavenas. Reconstruction offull rank algebraic branching programs.
ACM Trans. Comput. Theory , 11(1), November2018.[Kum18] Mrinal Kumar. On top fan-in vs formal degree for depth-3 arithmetic circuits. https://eccc.weizmann.ac.il/report/2018/068/revision/1/download , 2018.[Kum19] Mrinal Kumar. A quadratic lower bound for homogeneous algebraic branching pro-grams. computational complexity , 28(3):409–435, 2019.[Lan15] J. M. Landsberg. Geometric complexity theory: an introduction for geometers.
AN-NALI DELL’UNIVERSITA’ DI FERRARA , 61(1):65–117, May 2015.[Lan17] J. M. Landsberg.
Geometry and Complexity Theory . Cambridge Studies in AdvancedMathematics. Cambridge University Press, 2017.[MP08] Guillaume Malod and Natacha Portier. Characterizing valiant’s algebraic complexityclasses.
Journal of Complexity , 24(1):16 – 38, 2008.[MS01] K.D. Mulmuley and M. Sohoni. Geometric Complexity Theory. I. An approach to theP vs. NP and related problems.
SIAM J. Comput. , 31(2):496–526 (electronic), 2001.[MS08] K.D. Mulmuley and M. Sohoni. Geometric Complexity Theory. II. Towards explicitobstructions for embeddings among class varieties.
SIAM J. Comput. , 38(3):1175–1206,2008.[Mum95] D. Mumford.
Algebraic geometry. I: Complex projective varieties . Classics in mathe-matics. Springer-Verlag, Berlin, 1995. Reprint of the 1976 edition in Grundlehren dermathematischen Wissenschaften, vol. 221.[MV97] Meena Mahajan and V. Vinay. Determinant: combinatorics, algorithms, and complex-ity.
Chicago J. Theoret. Comput. Sci. , pages Article 5, 26 pp. (electronic), 1997.[Nis91] Noam Nisan. Lower bounds for non-commutative computation. In
Proceedings of thetwenty-third annual ACM symposium on Theory of computing , pages 410–418. ACM,1991.[Sch77] Claus-Peter Schnorr. Improved lower bounds on the number of multiplications/divisionswhich are necessary to evaluate polynomials. In
International Symposium on Mathe-matical Foundations of Computer Science , pages 135–147. Springer, 1977.[Sri19] Srikanth Srinivasan. Strongly exponential separation between monotone VP and mono-tone VNP. arXiv:1903.01630, 2019.[Str74] Volker Strassen. Polynomials with rational coefficients which are hard to compute.
SIAM Journal on Computing , 3(2):128–149, 1974.[Tod92] Seinosuke Toda. Classes of Arithmetic Circuits Capturing the Complexity of the De-terminant.
IEICE TRANS. INF. & SYST. , E75-D(1):116–124, 1992.[Val79] L. G. Valiant. Completeness classes in algebra. In
Conference Record of the EleventhAnnual ACM Symposium on Theory of Computing (Atlanta, Ga., 1979) , pages 249–261.ACM, New York, 1979. 23Yeh19] Amir Yehudayoff. Separating monotone VP and VNP. In