Simple Reductions from Formula-SAT to Pattern Matching on Labeled Graphs and Subtree Isomorphism
aa r X i v : . [ c s . CC ] A ug Simple Reductions from Formula-SAT to Pattern Matching onLabeled Graphs and Subtree Isomorphism
Daniel Gibney ∗ Gary Hoppenworth † Sharma V. Thankachan ‡ Abstract
The CNF formula satisfiability problem (CNF-SAT) has been reduced to many fundamentalproblems in P to prove tight lower bounds under the Strong Exponential Time Hypothesis(SETH). Recently, the works of Abboud, Hansen, Vassilevska W. and Williams (STOC16), andlater, Abboud and Bringmann (ICALP18) have proposed basing lower bounds on the hardness ofgeneral boolean formula satisfiability (Formula-SAT). Reductions from Formula-SAT have twoadvantages over the usual reductions from CNF-SAT: (1) conjectures on the hardness of Formula-SAT are arguably much more plausible than those of CNF-SAT, and (2) these reductions giveconsequences even for logarithmic improvements in a problems upper bounds.Here we give tight reductions from Formula-SAT to two more problems: pattern matching onlabeled graphs (PMLG) and subtree isomorphism. Previous reductions from Formula-SAT wereto sequence alignment problems such as Edit Distance, LCS, and Frechet Distance and requiredsome technical work. This paper uses ideas similar to those used previously, but in a decidedlysimpler setting, helping to illustrate the most salient features of the underlying techniques. ∗ Dept. of CS, University of Central Florida, Orlando, USA. e-mail: [email protected] † Dept. of CS, University of Central Florida, Orlando, USA. e-mail: [email protected] ‡ Dept. of CS, University of Central Florida, Orlando, USA. e-mail: [email protected] Introduction and Related Work
The Strong Exponential Time Hypothesis (SETH) has proven to be a powerful tool in establish-ing conditional lower bounds for many problems with known polynomial-time solutions. However,recent work by Abboud, Hansen, Vassilevska W., and Williams [3], as well as Abboud and Bring-mann [2] has sought to use the hardness of general Formula-SAT problems as the basis for fine-grained conditional lower bounds, rather than CNF-SAT and SETH. Since general Formula-SATcontains within it all CNF-SAT instances, Formula-SAT is at least as hard as CNF-SAT. Addi-tionally, when basing conditional lower bounds on Formula-SAT rather than CNF-SAT, the samealgorithmic breakthroughs that previously would have violated SETH, now have far more remark-able consequences (see Section 1.2 for examples). This makes it plausible that conjectures based onthe hardness of Formula-SAT are more likely to hold than those based on the hardness of CNF-SAT.Aside from a plausible increase in the robustness of the conjectures, using Formula-SAT as astarting point has the advantage of allowing for tighter hardness results. Previous lower boundsbased on SETH have been effective in establishing results of the form: an algorithm running in time O ( n c − ε ) for some ε >
0, where the best-known solution has time complexity e O ( n c ) would violateSETH. Despite this success, SETH has proven less effective at establishing tighter fine-grainedhardness results regarding how many logarithmic-factors can be shaved. In fact, the impossibilityof proving such a hardness result via fine-grained reductions from CNF-SAT was proven in [2].Overcoming this by using Formula-SAT as a starting point, in [3] conditional lower bounds ofthis form were established for Edit Distance and Longest Common Subsequence (LCS). In [2], theresults on LCS were further extended to show that an O ( n / log ε n ) time solution for LCS wouldimply major breakthroughs in circuit complexity. As a final example, work in [28] uses reductionsfrom Formula-SAT to analyze which regular expression matching problems can have super-polylogfactors shaved from their time complexity, and which cannot.In this work, we will use Formula-SAT to establish hardness results similar to those listed above,but for two additional fundamental problems, Pattern Matching on Labeled Graphs (PMLG) andSubtree Isomorphism. We describe these problems next. Pattern Matching On Labeled Graphs. (PMLG)
Given an alphabet Σ, a labeled graph G isa triplet ( V, E, L ), where (
V, E ) corresponds to the vertices and edges of a graph, and L : V → Σ + is a function that defines a nonempty string (i.e., label) over Σ to each vertex in G . For any string S ,we use S [ ..ℓ ] to denote its prefix ending at ℓ and S [ ℓ.. ] to denote its suffix starting at ℓ . We say that apattern P occurs in G if there is a path v , v , . . . , v m in G such that L ( v )[ ℓ.. ] ◦ L ( v ) ◦· · ·◦ L ( v m )[ ..ℓ ′ ]equals P for some ℓ, ℓ ′ . Given a labeled graph G and a pattern P , the PMLG problem is to decideif there exists an occurrence of P in G The PMLG problem began being intensely studied roughly thirty years ago in the contextof alignment of strings (equivalent to approximate matching under edits, mismatches, etc.) in hypertext . This was initiated by Manber and Wu [19] and underwent several improvements [4, 5,21, 22]. In the case where changes are allowed in the pattern, but not in the graph, the best-knownalgorithm runs in time O ( | V | + | E || P | ), matching the time complexity of the dynamic programmingsolution of the exact problem, and is by Rautiainen and Marschall [24]. In the case where changesare allowed in the graph as well, the problem is NP-complete [5], even for binary alphabet [15].The work by Equi et al. in [11] established the SETH based lower bounds for exact matching. Subtree Isomorphism.
Given two trees T and T , is T contained in T ? This problem has2een the subject of extensive study [9, 17, 18, 25, 31, 33], much of this research dating back severaldecades. For general trees, both with at most n vertices, the currently best known solution has atime bound that is O ( n ω ), where ω is the exponent on fast-matrix multiplication [31]; for rooted,constant maximum degree trees it is O ( n / log n ) [17]; and, for ordered trees it is O ( n log n ) [10].Here we will be considering rooted trees with constant maximum degree. In terms of lower bounds,SETH based quadratic lower bounds for this version of the problem have been established in [1],even for binary rooted trees. Road Map.
We will first describe the Formula-SAT problem and deMorgan Formulas in moredetail. Following this, we will state our results for PMLG and Subtree Isomorphism in terms ofits implications for solving Formula-SAT, along with the resulting corollaries. Section 2 providesthe reduction from Formula-SAT to PMLG. The reduction to Subtree Isomorphism is in Section3. Finally, in Section 4 we discuss the similar themes and techniques that appear in both of thesereductions. deMorgan Formulas.
For our purposes, we define a deMorgan formula over n Boolean inputvariables as a rooted binary tree where each leaf node represents an input variable or its negation,and every internal node represents a logical operator from the set {∧ , ∨} . Leaf nodes will be calledinput gates, and internal nodes will be called AND/OR gates. For a given bit assignment x , wedefine F ( x ) as the binary value output at the root of F when the input bits are propagated fromthe leaves to the root of F . The size of the formula, which we will denote as s , is defined as thenumber of leaves in the tree. Problem 1 (Formula-SAT) . Given a deMorgan formula F of size s over n inputs, does there existan input x ∈ { , } n such that F ( x ) = 1 ? The set of all Formula-SAT instances obviously contains within it all CNF-SAT instances. Un-surprisingly, due to its generality, it appears harder to derive efficient solutions for Formula-SAT.For CNF-SAT there exists ever-improving upper bounds [6, 12, 20, 23, 26, 29]. There also existsupper bounds for more general circuits such as ours, however, these work through restricting someparameter of the circuit, often some combination of the size, depth, and type of gates used withinit (see for example [7, 13, 14, 27, 30, 32]).
Our reduction will create an instance of PMLG (or Subtree Isomorphism) from a given instance ofFormula-SAT. In doing so, we make explicit the roles that the size of the circuit s and the numberof inputs n play in determining the size of the resulting instance. Theorem 1.
A Formula-SAT instance of size s on n inputs can be reduced to an instance of PMLGover a binary alphabet with a graph G = ( V, E ) and pattern P such that | P | is of size O (2 n/ · s ) and | E | is of size O (2 n/ · s ) in O ( | E | ) time, where G is a DAG with maximum total degree three. Total degree is in-degree plus out-degree.
Theorem 2.
A Formula-SAT instance of size s on n inputs can be reduced to an instance of SubtreeIsomorphism on two binary trees T and T , where the size of T is O (2 n/ · s ) , and the size of T is O (2 n/ · s ) in O ( | T | ) time. Combining Theorems 1 and 2 with observations made by Abboud et al. in [3] (and restated inAppendix A), we obtain the following ‘breakthrough’ implications of a strongly subquadratic timealgorithm for PMLG or Subtree Isomorphism. Proofs are deferred to Appendix A.
Corollary 1.
The existence of a strongly subquadratic time algorithm for PMLG (or Subtree Iso-morphism) would imply the class E NP (1) does not have non-uniform o ( n ) -size Boolean formulasand (2) does not have non-uniform o ( n ) -depth circuits of bounded fan-in. It also implies that NTIME [2 O ( n ) ] is not in non-uniform NC . The second corollary gives the consequences of being able to shave arbitrarily many logarithmicfactors from the quadratic time complexity.
Corollary 2.
If PMLG (or Subtree Isomorphism) can be solved in time O ( | E || P | log c | E | ) or O ( | E || P | log c | P | ) ( O ( | T || T | log c | T | ) or O ( | T || T | log c | T | ) resp.) for all c = Θ(1) , then NTIME [2 O ( n ) ] does not have non-uniformpolynomial-size log-depth circuits. In fact, we can give a particular constant c for which shaving a log c n factor would yield sur-prising new results in complexity theory. The following log-sensitive lower bounds leave a huge gapfrom the best known upper bounds; we present these corollaries purely for instructive purposes. Hardness of Shaving Log Factors.
We work under the Word-RAM model and limit the setof constant-time primitive operations to those operations which are robust to change in word size.Specifically, suppose we are given a word size of w = Θ(log n ) and an operation that can beperformed in O (1) time. We stipulate that we must be able to simulate this operation on words ofsize W = Θ(2 w ) in time n o (1) . This is a reasonable assumption that is satisfied by many constanttime operations such as addition, subtraction, multiplication, and division with remainder. See [2]for a detailed discussion.The following hypothesis was suggested by Abboud and Bringmann in [2]. It reflects the factthat the best known algorithmic solutions to Formula-SAT fail to provide a time complexity betterthan the na¨ıve solution on formulas of size s = n . Hypothesis 1 ([2]) . There is no algorithm that can solve SAT on deMorgan formulas of size s = n in O ( n n ε ) time for some ε > in the Word-RAM model. Corollary 3.
Hypothesis 1 is false if PMLG (respectively Subtree Isomorphism) can be solved intime O (cid:16) | E || P | log ε | E | (cid:17) or O (cid:16) | E || P | log ε | P | (cid:17) , (respectively O (cid:16) | T || T | log ε | T | (cid:17) or O (cid:16) | T || T | log ε | T | (cid:17) ) for any ε > .Proof. We show the proof for PMLG; the proof for Subtree Isomorphism is identical. By Theorem1, an O ( | E || P | log ε | E | ) algorithm for PMLG can be converted to yield an algorithm running in n o (1) · As observed by Williams in [34], for deMorgan formulas of size n − o (1) there exists a randomized 2 n − n Ω(1) time,zero error algorithm which can be obtained by applying results from [8] and [16]. n/ · s )(2 n/ s )log ε (2 n/ · s ) = O (cid:16) n · s n ε (cid:17) time for Formula-SAT (note the n o (1) factor introduced when movingfrom a word size of Θ(log n ) to Θ( n )). If we choose s = n ε/ then this yields an algorithm forFormula-SAT of time O ( n n ε/ ), and Hypothesis 1 is false.Again thanks to results highlighted by Abboud et al. in [3], we can also say the following aboutshaving a constant number of logarithmic factors from the quadratic time complexity. The proofis deferred to Appendix A. Corollary 4. E NP cannot be computed by non-uniform formulas of cubic size if PMLG (respec-tively Subtree Isomorphism) can be solved in time O (cid:16) | E || P | log ε | E | (cid:17) or O (cid:16) | E || P | log ε | P | (cid:17) (respectively O (cid:16) | T || T | log ε | T | (cid:17) or O (cid:16) | T || T | log ε | T | (cid:17) ) for any ε > . The same hardness results for PMLG apply for several more specific types of graphs (details willbe presented in the full version of this paper). These include when the graph G is a deterministicDAG (at most one edge leaves a vertex with the same leading character on an edge label) of totaldegree at most 3, and the case when G is a directed or undirected planar graph of degree at most3. Our reduction from Formula-SAT to PMLG uses an intermediate problem called Formula-Pair.
Definition 1 (Formula-Pair) . Given a deMorgan Formula F = F ( x , . . . , x m , y , . . . , y m ) of size m where each input is used exactly once, and two sets A, B ⊆ { , } m each of size N , does thereexist a ∈ A and b ∈ B such that F ( a, b ) = F ( a , . . . , a m , b , . . . , b m ) = 1 ? The role Formula-Pair plays in our reduction is analogous to the role of the Orthogonal VectorsProblem in many SETH reductions. It was proven in [2] that an instance of Formula-SAT on aformula of size s over n inputs can be reduced to an instance of Formula-Pair on two sets of size N = O (2 n/ ) and a formula of size O ( s ) in linear time (in particular, they reduce from a harderproblem they call F -Formula-SAT). Note that we may assume that F contains no input gates withnegated binary variables, since if variable x i is negated in F , we can flip bit a i for all a ∈ A .We begin our reduction from Formula-Pair to PMLG by considering a formula F and someinput bit assignments a ∈ A and b ∈ B . We then construct a pattern P and labeled graph G suchthat P occurs in G if and only if together a and b satisfy F . In this step, we must ensure that ourconstruction of P only relies on the input bit assignments of a , and our construction of G only relieson the input bit assignments of b . This allows us to create patterns P , P , . . . , P N correspondingto the N bit assignments in A , and graphs G , G , . . . , G N corresponding to the N bit assignmentsin B . Then we will have that P i occurs in G j if and only if F ( a, b ) = 1, where a ∈ A is the bitassignment corresponding to P i , and b ∈ B is the bit assignment corresponding to G j . Finally, wecombine these patterns and graphs into a product pattern P and a product graph G such that P occurs in G if and only if some P i occurs in some G j . This will complete the reduction.5 .2 Reduction Given a deMorgan formula F and a complete assignment of input bits ( a, b ) where a ∈ A and b ∈ B ,we will construct a corresponding pattern P and labeled DAG G over alphabet { , , $ } such that P occurs in G if and only if the output of F is 1 on input ( a, b ). This pattern and graph will be builtrecursively, starting with the input gates as a base case. For a gate g = ( g ∗ g ) where ∗ ∈ {∨ , ∧} ,we will construct a corresponding pattern and graph for gate g by merging the patterns and graphsof subgates g and g . At each step in this process, the pattern corresponding to gate g occurs inthe graph corresponding to gate g if and only if g evaluates to 1 on input ( a, b ). Invariants.
We will maintain the following invariants during this recursive procedure. Let g be agate of F with height h , and let P and G be the pattern and graph corresponding to gate g in ourconstruction.1. Graph G will have a designated source vertex and sink vertex, both with label “1”. Everymaximal path in G will be of length | P | and start and end at the source and sink vertices of G respectively.2. The construction of pattern P is independent of the choice of bit assignment b ∈ B , and theconstruction of graph G is independent of the choice of bit assignment a ∈ A .3. Pattern P occurs in G if and only if g has output 1 on input ( a, b ).Observe that by the first invariant, every occurrence of pattern P in graph G will start at thesource vertex of G and end at the sink vertex of G . If this is the case, we will say that G matches P . We will also refer to the designated source and sink vertices of G as the start and end verticesof G . a.1 G G b.1 G U ( | P | ) 1 U ( | P | ) G c.10...0 1 1...1 Figure 1: From left to right: the graph constructed for gate g = ( g ∧ g ), the graph constructedfor gate g = ( g ∨ g ), and the Universal Subgraph U ( x ). Note that Universal Subgraph U ( x ) hasa series of x − x .6 nput Gate. Each input gate g in F takes as input a binary variable z . We will design a graph G and pattern P such that G matches P if and only if z had value 1 in bit assignment ( a, b ), andhence g evaluates to 1. Our construction depends on whether z corresponds to an input bit in a or b . • Case 1. z corresponds to some a i ∈ a . We let P := 1 a i G be a path of lengththree with all vertices labeled 1. • Case 2. z corresponds to some b i ∈ b . We let P := 111 and G be a path of length threewith the first and last vertex labeled 1 and the middle vertex labeled b i .The start vertex of G will be the first vertex in the path, and the end vertex of G will be thethird (last) vertex in the path. Then our graph G matches pattern P if and only if z = 1 and thusthe input gate evaluates to true. Additionally, the construction of P does not depend on b and theconstruction of G does not depend on a . All invariants are satisfied. AND Gate.
Given a gate g = ( g ∧ g ) and the graphs and patterns ( G , P ) and ( G , P )corresponding to gates g and g respectively, we must construct a product graph G and pattern P such that G matches P if and only if G matches P and G matches P . This is done rather easily.Let P := 1 P P
1. Now let our product graph G be defined as in Figure 1.a. Our start vertex islabeled 1 and has an outgoing edge to the start vertex of subgraph G . The end vertex of G inturn has an outgoing edge to start vertex of subgraph G , whose own end vertex has an outgoingedge to the final vertex of G . We now verify all invariants are satisfied. • Invariant 1.
We assume that every maximal path in G (respectively G ) is of length | P | (respectively | P | ). Then by the construction of P and G , every maximal path in G is oflength | P | . The invariant is maintained. • Invariant 2.
Assuming that the construction of P and P is independent of b , and theconstruct of G and G is independent of a , it follows that the construction of pattern P is independent of bit assignment b , and the construction of graph G is independent of bitassignment a . • Invariant 3.
Since every occurrence of P in G starts at the start vertex of G and ends atthe end vertex, we must conclude that P occurs in G if and only if P occurs in G and P occurs in G . Then by our invariant P occurs in G if and only if g evaluates to 1 on input( a, b ). The invariant is preserved. OR Gate.
Given a gate g = ( g ∨ g ) and the graphs and patterns ( G , P ) and ( G , P ) corre-sponding to gates g and g respectively, we must construct a product graph G and pattern P suchthat G matches P if and only if G matches P or G matches P . As with our AND gate, we let P := 1 P P
1. Our product graph G (see Figure 1.b) splits into two branches. One branch checksif G matches P and ignores P , while the other branch checks if G matches P and ignores P .We are able to ignore P (respectively P ) by constructing a ‘universal’ subgraph that matches allbinary strings that start and end with 1 and are of length | P | (respectively | P | ). We let U ( x )denote the universal subgraph for length x , and we depict our construction of U ( x ) in Figure 1.c.Observe that graphs U ( | P | ) and U ( | P | ) match P and P respectively. We now check that allinvariants are satisfied. 7 Invariant 1.
A similar argument as in the AND gate shows that every maximal path in G isof length | P | and passes through the start and end vertices of G . The invariant is preserved. • Invariant 2.
Pattern P is independent of bit assignment b by a similar argument as with theAND gate construction. However, for our graph G , we must verify that subgraphs U ( | P | )and U ( | P | ) of G do not depend on bit assignment a . This will follow from proving thatthe lengths of patterns P and P do not depend on the bit assignment a . Note that ineach of the input, AND, and OR gate constructions, the length of the constructed pattern isthe same regardless of the bit assignment a . Thus we conclude that U ( | P | ) and U ( | P | ) areindependent of the bit assignment a , and therefore the construction of graph G is independentof the bit assignment a . • Invariant 3.
Since every occurrence of pattern P starts at the start vertex of G and endsat the end vertex, it is immediate that G matches P if and only if G matches P or G matches P . It immediately follows from our invariant that G matches P if and only if gate g = ( g ∨ g ) evaluates to 1 on input ( a, b ). Now corresponding to our formula F of size s and a complete assignment of input bits ( a, b ), we canbuild a pattern P and a graph G such that G matches P if and only if assignment ( a, b ) satisfies F . Note that we only add a constant number of symbols to our pattern P for each gate in F , andthere are fewer than 2 s gates in F , so | P | = O ( s ). On the other hand, each OR gate in F cancontribute O ( | P | ) vertices and edges to our final graph G . It follows that G is of size O ( s ).Using our construction, for every a ∈ A we may construct a corresponding pattern P , andfor every b ∈ B we may construct a corresponding graph G . We will denote these patterns andgraphs by P , P , . . . , P N and G , G , . . . , G N respectively. Note that each pattern P j makes noassumptions on the bit assignment b , and graph G i makes no assumptions on the bit assignment a .It follows that G i matches P j if and only if together the corresponding bit assignments a ∈ A and b ∈ B satisfy F .Next, we construct a final graph G and pattern P such that P occurs in G if and only if some G i matches some P j . This will complete our reduction. We define our final pattern P as follows: P := $$ P $ P $ · · · $ P N $$. The structure of our final graph G is similar to the final graph presentedin [11]. We present this graph in Figure 2 and briefly explain the intuition behind it. Let µ = | P i | for any i . Then subgraph U ( µ ) will match any subpattern P i in P . The graph G uses U ( µ ) tomatch the subpatterns P i in P that do not match with any G j . Note that since pattern P has aprefix of two $ symbols and a suffix of two $ symbols, P is forced to pass through the second rowof G . More specifically, the first row of G alone cannot match the $$ suffix of P , and the thirdrow of G alone cannot match the $$ prefix of P . Then it can be seen that P occurs in G onlyif P passes through the second row of G , and hence some subgraph G i matches some subpattern P j . Then by construction, P occurs in G if and only if there exists a ∈ A and b ∈ B such that F ( a, b ) = 1. Furthermore, our final graph is a DAG of size O ( N · s ) and our final pattern P is oflength O ( N · s ). This completes our reduction from Formula-SAT to PMLG on DAGs.8 U ( µ )1 $ $ U ( µ ) N − U ( µ )2 N − G $ G N $$ $ U ( µ )1 $ $ U ( µ ) N $ $ U ( µ )2 N − Figure 2: Our final graph G . Here µ = | P i | . We begin our reduction from Formula-Pair to Subtree Isomorphism by considering a formula F andsome input bit assignments a ∈ A and b ∈ B . We then construct trees T a and T b such that T a iscontained in T b if and only if together a and b satisfy F . In this step it is important that we ensurethat our construction of T a only relies on the input bit assignments of a , and our construction of T b only relies on the input bit assignments of b . This allows us to create N T a trees correspondingto the N bit assignments a in A , and N T b trees corresponding to the N bit assignments b in B .Then we will have that some T a tree is contained in some T b tree if and only if the correspondingbit assignments a ∈ A and b ∈ B satisfy F ( a, b ) = 1. Finally, we combine these trees into two finaltrees T A and T B such that T A is contained in T B if and only if some T a is contained in some T b .This will complete the reduction. Given a deMorgan formula F and a complete assignment of input bits ( a, b ) where a ∈ A and b ∈ B ,we will construct the corresponding rooted trees T a and T b such that T a is contained in T b if andonly if the output of F ( a, b ) = 1. These trees will be constructed recursively, starting with theinput gates of F as a base case. For a gate g = ( g ∗ g ) where ∗ ∈ {∨ , ∧} , we will construct thecorresponding trees T ga and T gb for gate g by merging the trees of subgates g and g . At each stepin this process, T ga will be contained in T gb if and only if gate g has output 1 on input ( a, b ). Invariants.
We will maintain the following invariants throughout our construction. Let g be agate of F with height h .1. The height of T ga is equal to the height of T gb and is at most 4 h .9. The construction of T ga is independent of the choice of bit assignment b ∈ B , and the con-struction of T gb is independent of the choice of bit assignment a ∈ A .3. Tree T ga is contained in tree T gb if and only if gate g has output 1 on input ( a, b ). Input T ga T gb a i = 0 v a v b a i = 1 v a v b b j = 0 v a v b b j = 1 v a v b Figure 3: The trees T ga and T gb correspondingto input gate g = a i or g = b j . v a v a v a T a v a T a v a v b v b v b T b v b T b v b Figure 4: The trees T ga (top) and T gb (bottom)corresponding to AND gate g = ( g ∧ g ). Input Gate.
Given an input gate g corresponding to a bit value a i ∈ a (respectively, a bit value b j ∈ b ), we will construct trees T ga and T gb so that T ga is contained in T gb if and only if a i = 1(respectively, b j = 1). We construct T ga and T gb as in Figure 3. These trees are rooted at vertices v a and v b respectively. We define input gates of F to have a height of one, so the trees in Figure 3satisfy the first invariant. The remaining two invariants can be verified by examining every case ofFigure 3. AND Gate.
Given an input gate g = ( g ∧ g ), and the trees T a , T b and T a , T b corresponding togates g and g respectively, we wish to construct trees T ga and T gb so that T ga is contained in T gb ifand only if gate g has output 1 on input ( a, b ). By our third invariant it suffices to ensure that T ga is contained in T gb if and only if T a is contained in T b AND T a is contained in T b . We constructtrees T ga and T gb as in Figure 4. The trees are rooted at vertices v a and v b respectively. We nowverify that all invariants are satisfied. 10 a v a v a T a v a T a v a v b v b v b T b v b v b T b v b v b U g Figure 5: The trees T ga (left) and T gb (right) corresponding to OR gate g = ( g ∨ g ). • Invariant 1.
By our inductive hypothesis tree T a has the same height as T b and T a has thesame height as T b , so it follows from our construction that T ga has the same height as T gb .Now to see why the height of these trees is at most 4 h , note that subtrees T a , T b , T a , T b haveheight at most 4( h − T ga and T gb have height at most 4( h −
1) + 4 = 4 h . • Invariant 2.
We assume that the construction of trees T a and T a is independent of b , andthe trees T b and T b are independent of a . Then it can be easily verified that tree T ga doesnot depend on b , and tree T gb does not depend on a . • Invariant 3.
We must show that tree T ga is contained in tree T gb if and only if g evaluates to 1on bit assignment ( a, b ). By our inductive hypothesis, it suffices to show that T ga is containedin T gb if and only if T a is contained in T b AND T a is contained in T b . The ‘if’ direction isimmediate from our construction: just map vertex v ia in T ga to vertex v ib in T ga for i ∈ [0 , T a and T b to subtrees of T a and T b respectively.For the ‘only if’ direction we must prove that subtree T a can only map to a subtree of T b ,and subtree T a can only map to a subtree of T b . First note that since trees T ga and T gb havethe same height, every isomorphism between T ga and a subtree T gb must map the root vertex v a of T ga to the root vertex v b of T gb . Now suppose T a is mapped to T b in some isomorphismbetween T ga and a subtree of T gb . Then vertex v a would be mapped to vertex v b , and the pathof length two hanging off v a would have nowhere to map to. It immediately follows that inevery valid subtree isomorphism, T a is mapped to T b , and T a is mapped to T b . Then T ga iscontained in T gb if and only if T a is contained in T b and T a is contained in T b . OR Gate.
Given an input gate g = ( g ∨ g ), and the trees T a , T b and T a , T b corresponding togates g and g respectively, we will construct trees T ga and T gb so that T ga is contained in T gb if andonly if T a is contained in T b OR T a is contained in T b . We construct trees T ga and T gb as in Figure5. These trees are rooted at vertices v a and v b respectively. Tree T gb contains a subtree U g , whichwe call a universal subtree. We design U g so that it contains both tree T a and tree T a for every bitassignment a . This will allow either T a or T a to match with U g , thus achieving the OR gate logic.We now construct our universal subtree U g . First, observe that for any gate g and any two bitassignments a, a ′ ∈ A , the only difference between trees T ga and T ga ′ is in the input gate subtrees.11 ... T a . . .. . . N ... T a N ... T a N . . .. . . x ... T a N ... U . . .. . . x − ... U x T b . . . NT b N . . . x Figure 6: The final T A (left) and T B (right).There are two different input gate subtrees in T ga : the a i = 0 subtree composed of a root vertex andtwo leaves, and the a i = 1 subtree composed of a root vertex with a single leaf (see Figure 3). Notethat the a i = 0 input subtree contains the a i = 1 input subtree. Then if we define a bit assignment u = 0 m , it follows that for every a ∈ A , the tree T ga is contained within the tree T gu . Then for trees T a and T a we construct trees T u and T u so that T a is contained in T u and T a is contained in T u for all a ∈ A . We define our universal subtree U g as the tree created by merging the root vertex of T u with the root vertex of T u . By construction, this tree U g contains T a and T a for all a ∈ A asintended. We now verify that all invariants are satisfied. • Invariant 1.
This invariant holds by an argument identical to that of the AND gate con-struction. • Invariant 2.
A similar argument as with the AND gate will show that T ga does not dependon bit assignment b . Likewise, tree T gb does not depend on bit assignment a ; the constructionof universal subtree U g is independent of a as detailed in its construction. • Invariant 3.
By our inductive hypothesis, it suffices to show that T ga is contained in T gb ifand only if T a is contained in T b OR T a is contained in T b . The ‘if’ direction can be seen byobserving that if T a is contained in T b , then we can align T a with T b and align T a with U g ,which is guaranteed to contain T a ; the case where T a is contained in T b is identical.The ‘only if’ direction follows from a similar argument given for the AND construction. Firstnote that since trees T ga and T gb have the same height, every subtree isomorphism must mapthe root vertex v a of T ga to the root vertex v b of T gb . Additionally, it is immediate fromconstruction that exactly one subtree T a or T a can be aligned with universal subtree U g .Then we simply need to verify that there is no valid subtree isomorphism between T ga and T gb that maps T a to T b or T a to T b . Suppose that T a was mapped to a subtree of T b (theother case is symmetric). Then vertex v a would map to vertex v b , and the path of lengthtwo hanging off v a would have nowhere to map to. We conclude that subtree T a must mapto subtree T b or subtree T a must map to subtree T b in any subtree isomorphism from T ga to T gb . The invariant is maintained. 12 .3 Completing the Reduction The final trees are constructed using the technique provided in [1]. The construction is shown inFigure 6 and described next. • For the final tree T A , start with a complete binary tree where the number of leaves is thesmallest power of 2 that is greater or equal to N , say 2 x . From each of the 2 x leaves, attacha path of length x . Let the first N leaves at the ends of these paths be numbered 1 to N .For 1 ≤ i ≤ N , replace leaf i with root of T a i . For the remaining 2 x − N leaves at the end ofpaths, replace the leaf with the roots of 2 x − N copies of T a N . • For the final tree T B , again start with a complete binary tree with 2 x leaves. From the first2 x − x . Replace the end of each of the paths with the rootof a universal tree U , which is T a with input bit assignment u = 0 m . From the remaining leafin the complete binary tree, replace this leaf with the root of another complete binary tree,again with 2 x leaves. Let the first N leaves of this second complete binary tree be numbered1 to N . For 1 ≤ i ≤ N , replace leaf i with the root of T b i .To see why this works, consider that for T A to be isomorphic to a subtree of T B , the root of T A mustbe mapped onto the root of T B . Then, one of T A ’s 2 x paths hanging from the leaves of its completebinary tree must traverse down the lower complete binary tree in T B . From here, a subtree rootedat the end of one of these paths in T A must have to be isomorphic to one of the subtrees hangingfrom the leaves of the second binary tree in T B . This is possible if and only if for some a ∈ A and b ∈ B we have that T a is isomorphic to a subtree of T b . By the invariants proven above, such a pair a ∈ A and b ∈ B exists iff the starting formula F evaluates to true on the assignment ( a, b ).The final tree T A is of size O ( N s ). This is because there are N trees T a in T A , and each tree T a is of size O ( s ). The upper bound on the size of T a follows from the fact that formula F has s gates,and each gate contributes constantly many vertices to T a . The final tree T B is of size O ( N s ). Tosee this, fix a particular assignment ( a, b ), and consider the tree T b . Each AND gate contributesa constant number of vertices to T b . Each OR gate appends a universal subtree U of size at mostthe size of T a to T b . Since the size of T a is O ( s ) and there are s gates in formula F , we have that T b is of size O ( s ). The key property highlighted by the two reductions is that both problems we reduced to allow forthe construction of two independent objects O A and O B , where O A is constructed independentlyfrom the partial input assignments in B , and O B is constructed independently from the partialinput assignments in A .In order to construct these objects, both reductions start by fixing an input assignment ( a, b ).Then, two new objects for each gate g are constructed using the objects for the circuits that are inputinto g . The aim of this construction is to maintain the invariant that whichever desired property wewant our objects to have (e.g., the pattern occurring in a graph, or having an isomorphic subtree)holds iff ( a, b ) satisfy the circuit with output gate g . This is accomplished by supposing (i) we areadding the gate g = g ∗ g where ∗ ∈ {∧ , ∨} , (ii) the objects O g a and O g b have the desired propertyiff ( a, b ) evaluates to true on the circuit with output gate g , and (iii) the objects O g a and O g b a, b ) evaluate to true on the circuit with output gate g . The task isthen to construct O ga from only O g a and O g a , and O gb from only O g b and O g b , such that O ga and O gb have the desired property iff g = g ∗ g evaluates to true. By the invariant, this is equivalent when ∗ = ∧ to O g a and O g b having the desired property, and O g a and O g b having the desired property.In the case of ∗ = ∨ , only one of the pairs O g a , O g b or O g a , O g b needs to have the property.In the last step, the final objects O A and O B are constructed by combining all O a i , 1 ≤ i ≤ N to form O A , and O b j , 1 ≤ j ≤ N to form O B . These final objects must allow for selection betweendifferent partial assignments. Additionally, the final objects satisfy the desired property iff at leastone object pair O a i and O b j together satisfy the desired property.The above outlines, on a high level, the approach used in reductions from Formula-SAT topolynomial-time problems that appear here, and in [2, 28]. The techniques presented in [3] insteadstart with the problem of the satisfiability of branching programs, but they work similarly in thesense that they must model the logical gates AND and OR (this time connecting logical statementsabout reachability). The authors also take similar steps in order to build two independent objectsbased on a fixed input assignment ( a, b ). References [1] A. Abboud, A. Backurs, T. D. Hansen, V. V. Williams, and O. Zamir. Subtree isomorphismrevisited.
ACM Trans. Algorithms , 14(3):27:1–27:23, 2018.[2] A. Abboud and K. Bringmann. Tighter connections between formula-sat and shaving logs.In , pages 8:1–8:18, 2018.[3] A. Abboud, T. D. Hansen, V. V. Williams, and R. Williams. Simulating branching programswith edit distance and friends: or: a polylog shaved is a lower bound made. In
Proceedings ofthe 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge,MA, USA, June 18-21, 2016 , pages 375–388, 2016.[4] T. Akutsu. A linear time pattern matching algorithm between a string and a tree. In
Combi-natorial Pattern Matching, 4th Annual Symposium, CPM 93, Padova, Italy, June 2-4, 1993,Proceedings , pages 1–10, 1993.[5] A. Amir, M. Lewenstein, and N. Lewenstein. Pattern matching in hypertext.
J. Algorithms ,35(1):82–99, 2000.[6] T. Br¨uggemann and W. Kern. An improved local search algorithm for 3-sat.
Electron. NotesDiscret. Math. , 17:69–73, 2004.[7] R. Chen. Satisfiability algorithms and lower bounds for boolean formulas over finite bases. In
Mathematical Foundations of Computer Science 2015 - 40th International Symposium, MFCS2015, Milan, Italy, August 24-28, 2015, Proceedings, Part II , pages 223–234, 2015.[8] R. Chen, V. Kabanets, A. Kolokolova, R. Shaltiel, and D. Zuckerman. Mining circuit lowerbound proofs for meta-algorithms.
Comput. Complex. , 24(2):333–392, 2015.149] M. Chung. O(nˆ(2.55)) time algorithms for the subgraph homeomorphism problem on trees.
J. Algorithms , 8(1):106–112, 1987.[10] R. Cole and R. Hariharan. Tree pattern matching to subset matching in linear time.
SIAM J.Comput. , 32(4):1056–1066, 2003.[11] M. Equi, R. Grossi, V. M¨akinen, and A. I. Tomescu. On the complexity of string matching forgraphs. In C. Baier, I. Chatzigiannakis, P. Flocchini, and S. Leonardi, editors, , volume 132 of
LIPIcs , pages 55:1–55:15. Schloss Dagstuhl - Leibniz-Zentrumf¨ur Informatik, 2019.[12] T. D. Hansen, H. Kaplan, O. Zamir, and U. Zwick. Faster k -sat algorithms using biased-ppsz.In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC2019, Phoenix, AZ, USA, June 23-26, 2019 , pages 578–589, 2019.[13] R. Impagliazzo, W. Matthews, and R. Paturi. A satisfiability algorithm for ac0. In
Proceedingsof the twenty-third annual ACM-SIAM symposium on Discrete Algorithms , pages 961–972.SIAM, 2012.[14] R. Impagliazzo, R. Paturi, and S. Schneider. A satisfiability algorithm for sparse depth twothreshold circuits. In , pages 479–488, 2013.[15] C. Jain, H. Zhang, Y. Gao, and S. Aluru. On the complexity of sequence to graph alignment. InL. J. Cowen, editor,
Research in Computational Molecular Biology - 23rd Annual InternationalConference, RECOMB 2019, Washington, DC, USA, May 5-8, 2019, Proceedings , volume11467 of
Lecture Notes in Computer Science , pages 85–100. Springer, 2019.[16] I. Komargodski, R. Raz, and A. Tal. Improved average-case lower bounds for demorganformula size. In , pages 588–597, 2013.[17] A. Lingas. An application of maximum bipartite c-matching to subtree isomorphism. In
CAAP’83, Trees in Algebra and Programming, 8th Colloquium, L’Aquila, Italy, March 9-11,1983, Proceedings , pages 284–299, 1983.[18] A. Lingas and M. Karpinski. Subtree isomorphism is NC reducible to bipartite perfect match-ing.
Inf. Process. Lett. , 30(1):27–32, 1989.[19] U. Manber and S. Wu. Approximate string matching with arbitrary costs for text and hy-pertext. In
Advances In Structural And Syntactic Pattern Recognition , pages 22–33. WorldScientific, 1992.[20] B. Monien and E. Speckenmeyer. Solving satisfiability in less than 2n steps.
Discret. Appl.Math. , 10(3):287–295, 1985.[21] G. Navarro. Improved approximate pattern matching on hypertext.
Theor. Comput. Sci. ,237(1-2):455–463, 2000. 1522] K. Park and D. K. Kim. String matching in hypertext. In
Combinatorial Pattern Matching,6th Annual Symposium, CPM 95, Espoo, Finland, July 5-7, 1995, Proceedings , pages 318–329,1995.[23] R. Paturi, P. Pudl´ak, M. E. Saks, and F. Zane. An improved exponential-time algorithm for k -sat. J. ACM , 52(3):337–364, 2005.[24] M. Rautiainen and T. Marschall. Aligning sequences to general graphs in o (v+ me) time. bioRxiv , page 216127, 2017.[25] S. W. Reyner. An analysis of a good algorithm for the subtree problem.
SIAM J. Comput. ,6(4):730–732, 1977.[26] R. Rodosek. A new approach on solving 3-satisfiability. In
Artificial Intelligence and SymbolicMathematical Computation, International Conference AISMC-3, Steyr, Austria, September23-25, 1996, Proceedings , pages 197–212, 1996.[27] T. Sakai, K. Seto, S. Tamaki, and J. Teruyama. A satisfiability algorithm for depth-2 circuitswith a symmetric gate at the top and AND gates at the bottom.
Electronic Colloquium onComputational Complexity (ECCC) , 22:136, 2015.[28] P. Schepper. Fine-grained complexity of regular expression pattern matching and membership.
CoRR , abs/2008.02769, 2020.[29] U. Sch¨oning. A probabilistic algorithm for k -sat based on limited local search and restart.
Algorithmica , 32(4):615–623, 2002.[30] K. Seto and S. Tamaki. A satisfiability algorithm and average-case hardness for formulas overthe full binary basis.
Comput. Complex. , 22(2):245–274, 2013.[31] R. Shamir and D. Tsur. Faster subtree isomorphism.
J. Algorithms , 33(2):267–280, 1999.[32] S. Tamaki. A satisfiability algorithm for depth two circuits with a sub-quadratic number ofsymmetric and threshold gates.
Electronic Colloquium on Computational Complexity (ECCC) ,23:100, 2016.[33] R. M. Verma and S. W. Reyner. An analysis of a good algorithm for the subtree problem,corrected.
SIAM J. Comput. , 18(5):906–908, 1989.[34] R. Williams. Algorithms for circuits and circuits for algorithms: Connecting the tractable andintractable. In
Proceedings of the International Congress of Mathematicians , pages 659–682,2014.
A Proving the implications of logarithmically faster algorithmsfor Subtree Isomorphism
Theorem 3 ([3]) . Let n ≤ S ( n ) ≤ o ( n ) be time constructible and monotone non-decreasing. Let C be a class of circuits. Suppose there is an SAT algorithm for n -input circuits which are AN Ds of O ( S ( n )) arbitrary functions of three O ( S ( n )) -size circuits from C, that runs in O (2 n /n ) time.Then E NP does not have S ( n ) -size circuits. heorem 4 ([3]) . Suppose there is a satisfiability algorithm for bounded fan-in formulas of size n k running in O (2 n /n k ) time, for all constants k > . Then NTIME [2 O ( n ) ] is not contained innon-uniform NC . Corollary 1.
The existence of a strongly subquadratic time algorithm for PMLG (or SubtreeIsomorphism) would imply the class E NP (1) does not have non-uniform o ( n ) -size Boolean formulasand (2) does not have non-uniform o ( n ) -depth circuits of bounded fan-in. It also implies that NTIME [2 O ( n ) ] is not in non-uniform NC .Proof. Note that the condition in Theorem 3 that the SAT-algorithm works on n -input circuitswhich are ANDs of O ( S ( n )) arbitrary functions of three O ( S ( n ))-size circuits is trivially satis-fied by a solver that works over Boolean formula. By Theorem 1 (Theorem 2 resp.), for circuits(or equivalently formulas) of size S ( n ) = 2 o ( n ) , a strongly subquadric time algorithm for PMLG(Subtree Isomorphism resp.) would imply a SAT algorithm running in time O ( n o (1) · | E || P | − ε ) = O ( n o (1) · n − εn/ S ( n ) )which is O (2 n /n ); the n o (1) factor is introduced when moving from a word size of Θ(log n )to Θ( n ). Thus, Theorem 3 implies (1). Part (2) is implied as well since a o ( n )-depth circuit ofbounded fan-in can be expressed as a formula of size S ( n ) = 2 o ( n ) . The last statement follows fromTheorem 4 and the fact that on circuits of size n k , our subquadratic algorithm would run in time O ( n o (1) · n − εn/ n k ) which is O (2 n /n k ). Corollary 2.
If PMLG (or Subtree Isomorphism) can be solved in time O ( | E || P | log c | E | ) or O ( | E || P | log c | P | ) ( O ( | T || T | log c | T | ) or O ( | T || T | log c | T | ) resp.) for all c = Θ(1) , then NTIME [2 O ( n ) ] does not have non-uniformpolynomial-size log-depth circuits.Proof. We prove this for PMLG, the proof for Subtree Isomorphism is similar. By Theorem 4,it suffices to show that for all k , there exists an algorithm to check satisfiability of all boundedfan-in formulas of size n k running in time O (2 n /n k ). Suppose that for all c = Θ(1), there existsan algorithm running in time O ( | E || P | log c | P | ) or O ( | E || P | log c | E | ). Then by Theorem 1, if we let c > k + 1 weobtain an algorithm running in time n o (1) · n s log c (2 n s ) = n o (1) · n n k log c (2 n n k ) ≤ n o (1) · n n k (cid:0) n (cid:1) c = 2 n + c n c − k − − o (1) = O (cid:18) n n k (cid:19) Corollary 4. E NP cannot be computed by non-uniform formulas of cubic size if PMLG (or SubtreeIsomorphism) can be solved in time O (cid:16) | E |·| P | log ε | E | (cid:17) or O (cid:16) | E |·| P | log ε | P | (cid:17) for ε > , where G is adeterministic DAG of maximum degree three (or O (cid:16) | T |·| T | log ε | T | (cid:17) or O (cid:16) | T |·| T | log ε | T | (cid:17) for ε > resp.).Proof. Theorem 3 as given in [3] says that solving Formula-SAT in time O (2 n /n ) on formulas ofsize s = O ( n ε ) implies that there is a function in class E NP that cannot be computed by formulasof size O ( n εε