Hard satisfiable formulas for DPLL algorithms using heuristics with small memory
aa r X i v : . [ c s . CC ] J a n Hard satisfiable formulas for DPLL algorithmsusing heuristics with small memory
Nikita Gaevoy , St. Petersburg State University,Universitetskaya nab., 7/9, St. Petersburg, Russia, 199034, Steklov Institute of Mathematics at St. Petersburg, nab. r. Fontanki 27,St. Petersburg, Russia, 191023 [email protected]
Abstract.
DPLL algorithm for solving the Boolean satisfiability prob-lem (SAT) can be represented in the form of a procedure that, usingheuristics A and B , select the variable x from the input formula ϕ and thevalue b and runs recursively on the formulas ϕ [ x := b ] and ϕ [ x := 1 − b ] .Exponential lower bounds on the running time of DPLL algorithms onunsatisfiable formulas follow from the lower bounds for tree-like reso-lution proofs. Lower bounds on satisfiable formulas are also known forsome classes of
DPLL algorithms such as “myopic” and “drunken” algo-rithms [1].All lower bounds are made for the classes of
DPLL algorithms that limitheuristics access to the formula. In this paper we consider
DPLL algo-rithms with heuristics that have unlimited access to the formula but usesmall memory. We show that for any pair of heuristics with small memorythere exists a family of satisfiable formulas Φ n such that a DPLL algo-rithm that uses these heuristics runs in exponential time on the formulas Φ n . Keywords:
DPLL · SAT · online Turing machines · space-bounded com-putations · sublinear space
DPLL (are named after the authors: Davis, Putnam, Logemann, Loveland [6,5])algorithms are one of the most popular approaches to Boolean satisfiability prob-lem (SAT).
DPLL is an algorithm that takes the formula ϕ , uses heuristics A and B (which are the parameters of an algorithm) to choose a variable x and thevalue b that would be investigated first and makes recursive calls on formulas ϕ [ x := b ] and ϕ [ x := ¬ b ] if the first one is not satisfiable.Every DPLL algorithm on any formula finds either its satisfying assignmentor its tree-like resolution refutation. Therefore, exponential lower bounds for tree-like resolution (e.g. Tseitin formulae and its generalizations [13,14] and formulasbased on pigeonhole principle [7]) imply that any
DPLL algorithm should workexponential time proving that corresponding formulas are unsatisfiable. However,
N. Gaevoy the running time on satisfiable formulas may differ from the running time onunsatisfiable formulas and be even linear if heuristic B is able to solve SAT.Moreover, satisfiable formulas are simpler for DPLL -based SAT solvers thatused on practice (and therefore more restrictive to the choice of its heuristics)rather than unsatisfiable ones.Despite the fact that there is no hope to prove any nontrivial bounds forthe running time of
DPLL algorithms on satisfiable formulas with arbitrarypolynomial time heuristics unless P = NP , it is still interesting to prove lowerbounds for DPLL algorithms that use heuristics from more narrow classes than P . Alekhnovich, Hirsch, and Itsykson [1] proved exponential lower bounds onsatisfiable formulas for two wide classes of DPLL algorithms: myopic
DPLL and drunken
DPLL . Drunken
DPLL has no restrictions on heuristic A , but theheuristic B chooses its answer at random with equal probabilities. In myopic DPLL both heuristics has limited access on the input formula: they can read thewhole formula with all negation signs erased and also they are able to read n − ε clauses precisely. Many formula simplification heuristics, such as elimination of aunit clauses can be simulated by myopic DPLL , but others, such as subsumptionheuristic (i.e. deletion of a clause that is a superset of an another clause) cannot.There are also a number of works concerning lower bounds for generalizationsof
DPLL algorithms. The paper [11] gives lower bounds for
DPLL algorithmswith a cut heuristic, i.e. such additional heuristic C that is able to decide not tomake recursive calls of subformulas that it considered not “perspective” enough,and the paper [9] gives lower bounds for DPLL( ⊕ ) algorithms that can splitnot only by values of some variables, but also by values of its linear combina-tions. Papers [3,8,10,4] consider the generalization of DPLL designed to invertGoldreich’s one-way function candidate and provide lower bounds on it.
Our contribution.
All discussed lower bounds are made for the classes ofheuristics that limits access to the formula rather than computational power ofa heuristic. In this work we consider ordinary
DPLL with classes of deterministicheuristics that are bounded only by its space usage and not limited in its accessto the formula, in particular
DSPACE ( o (log)) . In order to prove an exponentiallower bound for DPLL with heuristics from this class we use the notion of anonline Turing machines which can be considered as a formalization of streamingalgorithms. Then we build an exponential reduction to its slight modification andprove an exponential lower bound for
DPLL using online heuristics with sub-linear memory. Note that although the class of online heuristics using sublinearmemory appears to be relatively small, it is easy to see that it can express someformula simplification heuristics that can not be done using myopic or drunkenalgorithms, such as a subsumption of clauses that are not far away from eachother (namely, at the distance O ( n polylog ( n ) ) where n is the size of the input) inthe k -SAT formula. Further research.
Our reduction of offline algorithms to online is specific todeterministic algorithms and cannot be straightforward generalized to random-ized algorithms. It would be interesting to find a proper generalization of notion ard satisfiable formulas for DPLL with small memory 3 of an online algorithm and a similar reduction to it for randomized algorithms.Also, despite the fact that almost all bounds presented in this work cannot besignificantly improved without proving that L = NP there is an logarithmic gapbetween lower and upper bounds in the reduction of offline algorithms to onlinewhich is interesting to close. In order to work with classes of small memory we use the definition of a Turingmachine with separate read-only input tape and read-write working tape. Wealso add separate write-only output tape for Turing machines whose output ismore than one bit. By memory configuration of a Turing machine we mean atuple of configuration of the working tape and the current state of the Turingmachine.
Definition 1.
An online Turing machine is a Turing machine with the addi-tional restriction that the input tape head can be shifted only in one direction.
Definition 2. ( f ) is the class of all languages recognized by onlineTuring machine using at most f ( n ) memory. Obviously, any online Turing machine is also an offline Turing machine, so ( f ) ⊆ DSPACE ( f ) . However, ( f ) = DSPACE ( f ) for any f = Ω ( n ) . Definition 3.
Online Turing machine with shifted input is an online Turingmachine with a modification that the input string is shifted on the input tape tothe size of the input. Thus, if the size of the input string is n , an online Turingmachine with shifted input must read n empty symbols before it starts to read theinput itself. Definition 4. ′ ( f ) is a class of all languages recognized by online Tur-ing machine with shifted input. This modification gives additional ability to read size of the input before read-ing input string itself. Obviously, ( f ) ⊆ ′ ( f ) . Now we showthat modified online Turing machines are strictly more powerful than regularTuring machines even when the latters are restricted to a substantially largerspace. Lemma 1. ′ (log n ) \ ( o ( n )) is not empty.Proof. Consider language L consisting of binary strings s such that the binaryrepresentation of | s | is a prefix of s . It is easy to see that L ∈ ′ (log) .Consider arbitrary online Turing machine M that recognizes L . We show thatafter reading k symbols M should use at least Ck cells on working tape for someinput where C is some constant depending only on M . Assume the opposite, then N. Gaevoy there are two distinct words s and t such that | s | = | t | = k , their first symbolis and M moves to the same configuration after reading s and t . Considerwords S = s · [ s ] − k and T = t · [ s ] − k , where [ s ] means number which binaryrepresentation is s . Configuration of M after reading s and t are the same, so M accepts S if and only if M accepts T , but S ∈ L when T / ∈ L which leads usto contradiction.However, there is no difference when the space is very low. Lemma 2 ([12]). ( o (log)) = REG
Lemma 3. ′ ( o (log)) = REG .Proof.
Consider arbitrary online Turing machine with shifted input M . We showthat M either uses Ω (log) memory or recognizes a regular language. We split thework of M into two phases. During the first phase it reads empty symbols beforethe start of the input and on the second phase it reads the input itself. Considerthe graph of the memory configurations which M passes when its input consistsof infinite number of empty cells (informally, during the infinite first phase).Every vertex of this graph (i.e. configuration of M ) has exactly one outgoingedge, since all symbols on the input tape are identical. Therefore, this graph maybe either an infinite simple path or a cycle. In the case when this graph is a path M should use Ω (log) cells of memory after the first phase since no to memoryconfiguration can appear twice. In the other case M appears only in constantnumber of memory configurations during (and therefore after) the first phase. If M uses at least Ω (log) memory we already done, otherwise we can construct anonline Turing machine M which invokes the same computation as M does onthe second phase with all possible results of the first phase simultaneously andchooses one of them on the end of computation. Therefore, M uses the sameamount of memory as M up to multiplicative constant and recognizes the samelanguage as M , so by Lemma 2 M either uses Ω (log) memory or recognizes aregular language. Definition 5.
DSPACE ( f, g ) is a class of all functions computable by onlineTuring machine using O ( f ) cells of working tape and O ( g ) cells of output tape.Similarly, we define ( f, g ) and ′ ( f, g ) for online Turingmachines and online Turing machines with shifted input respectively. Note that
DSPACE ( f ) = DSPACE ( f, by definition. From now on we willconsider only Turing machines with at least Ω (log n ) memory. Lemma 4.
For any function f ( n ) = Ω (log n ) such that f ( n ) ≤ n for all suf-ficiently large n , if the function n f ( n ) belongs to DSPACE (log f, log f ) ,then there exists language L such that L ∈ DSPACE (log f ) ∩ ′ ( f ) and L / ∈ ′ ( o ( f )) .Proof. Let L be the set of all f -periodic strings. More formally L = { s | | s | = n ⇒ ( ∀ i < n − f ( n ) : s [ i ] = s [ i + f ( n )]) } . ard satisfiable formulas for DPLL with small memory 5 First we show that L ∈ DSP ACE (log f ) . Consider offline Turing machinethat first computes the value f ( n ) , then rewinds the input tape to the start andchecks equality in all pairs of symbols on distance of f ( n ) in the input tape. Itis easy to see that both parts can be done using only O (log f ) memory.Consider the computation of f ( n ) as a function n f ( n ) on offline Turingmachine. In order to simulate this computation it suffices to store the configura-tion of the working tape and the position of the reading head on the input tapesince all the symbols in the input tape are the same. It takes O (log n ) memory tostore position on the input tape and O (log f ) memory to store configuration ofworking tape which is equal to O (log n ) in total since f ( n ) ≤ n for all sufficientlylarge n . Recall that f = Ω (log n ) . Note that f ( n ) can also be computed on anonline Turing machine with shifted input with O ( f ) memory by counting thenumber of empty symbols and simulation of the computation of function f ( n ) on an offline Turing machine.Any online algorithm that recognizes L should be in different configurationsafter reading different prefixes of the input of size f ( n ) , so L / ∈ ′ ( o ( f )) .On the other hand, it suffices to store only the last f ( n ) symbols of the inputtape to recognize L on an online Turing machine, therefore, L ∈ ′ ( f ) . Theorem 1.
For any function f ( n ) = Ω (log log n ) , if function F can be com-puted on an offline Turing machine M using f cells of working tape and binaryworking alphabet and f can be computed on an offline Turing machine as afunction n f ( n ) using O ( f · f · log F ) memory, then F ∈ ′ ( f · f · log F , log F ) .Proof. Consider an offline Turing machine M that computes F . Without loss ofgenerality we can assume that M moves working tape head on every step andstops only when it has its input tape head on the end of the input string. Byposition of a Turing machine we mean position of its head on the input tape.We construct an online Turing machine M ′ that computes F . At the start M ′ reads the size of the input n and finds the value of the function f ( n ) bysimulation of computation of the function n f ( n ) . Then, M ′ reads its inputand simulates the work of M . We show how to do this explicitly.Let M be in the memory configuration x at position k . Consider path ρ that M will traverse over pairs of its memory configuration and the position of theinput tape head, starting from the current position until it reaches position k + 1 or some halting configuration. Note that ρ implicitly depends on the input andcan be infinite. Let h k ( x ) be the function that returns memory configuration atthe end of ρ and string that M prints on the output tape in ρ . If ρ is infiniteor the string of all printed symbols is longer than log F , i.e. the longest possibleoutput, then h k ( x ) returns a special loop marker.Note that in order to simulate the work of M it is enough for M ′ only tomaintain h k ( x ) for all x and current position k and memory configuration inwhich M first comes into position k . h can be computed trivially. We show thatfor every k function h k can be computed using only h k − and the k -th inputsymbol. N. Gaevoy
Consider some fixed k and memory configuration x . Recall that M moves theinput tape head on every step of its computation. Then, M either moves headto the right and reaches position k + 1 or moves head to the left and then using h k − we can compute in which memory configuration M will reach position k next time and the string that M will print to the output until it reaches thisposition. Let g k ( x ) be the function that computes these two values. Note that h k ( x ) is either equal to g ik ( x ) for some i or returns the loop marker. In order torecognize a cycle in g ik ( x ) we compute g ik ( x ) and g ik ( x ) in parallel and then if atsome step of this computation the string printed in g i ( x ) becomes too long orconfigurations computed by these two functions becomes equal, we mark h k ( x ) with the loop marker, otherwise the computation halts.In order to compute the value of h k ( x ) we need O ( f · log | Q | · log F ) memorywhere Q denotes the set of finite states used by M . Let q be the number ofmemory configurations of M . Then M ′ uses O (log q + q log q · log F ) cells ofmemory and hence F ∈ ′ ( f · f · log F , log F ) . Corollary 1.
For any function f ( n ) = Ω (log log( n )) , if language L can be com-puted on offline Turing machine M using f cells of working tape and binaryworking alphabet and f can be computed on an offline Turing machine as afunction n f ( n ) using O ( f · f ) memory, then L ∈ ′ ( f · f ) .Proof. Immediately follows from the fact that any language is a function withonly one bit output.
Consider
DPLL
A,B algorithm for deciding the satisfiability of CNF-formula ϕ parametrized by two heuristics A and B . Heuristic A takes formula ϕ , choosessome variable in ϕ and returns its number. Heuristic B takes ϕ and the numberof a variable (chosen by A ) and returns a value for this variable.For purposes of our proof from now on we will consider the DPLL H algorithm,which has a single heuristic for both choosing a variable and a value for it. Algorithm 1
DPLL H procedure DPLL H ( ϕ ) ⊲ ϕ — formula in CNF2: if ϕ is empty then return satisfiable if ϕ contains empty clause then return unsatisfiable ( x, b ) ← H ( ϕ ) ⊲ H choose both variable and its value7: if DPLL H ( ϕ [ x = b ]) = satisfiable then return satisfiable return DPLL H ( ϕ [ x = ¬ b ]) ard satisfiable formulas for DPLL with small memory 7 The following lemma shows that
DPLL
A,B and
DPLL H are almost equivalentin terms of space complexity of its heuristics. Lemma 5.
Let A and B use at most f ( n ) memory on all inputs of size n . Thenthere is a heuristic H , such that H uses at most f ( n ) + O (log log n ) memoryand DPLL H makes the same recursive calls as DPLL
A,B .Proof.
Let S be the string returned by A . Since | S | = O (log n ) , memory f ( n ) + O (log log n ) suffices to compute the k -th symbol of string S . Consider algorithm H which emulates algorithm B and computes symbols of string S every time B access them using additional f ( n ) + O (log log n ) memory. We need the construction of boundary expander matrices from [1].
Definition 6 ([1, Definition 2.1]).
Let A be Boolean matrix. For a set of rows I of A , its boundary ∂I is a set of all columns such that there exists exactly onerow in I that contains on the intersection. A is an ( r, s, c ) -boundary expanderif1. Every row of A has at most s ones.2. For any set of rows I if | I | ≤ r , then | ∂I | ≥ c · | I | The formula Φ A,~b encoding the system of linear equations
A~x = ~b is theformula constructed as follows. For each row of the matrix A we construct CNF-formula Φ A,~b,i ( x ) , encoding ( L j ∈ S i x j = b [ i ]) , where i is the number of row, and S i is the set of column numbers in which the row with the number i containsones. We take Φ A,~b ( x ) := V i Φ A,b,i ( x ) . Note that the resulting formula has aconjunctive normal form. We identify the system of linear equations with theformula encoding it. Lemma 6 ([1, Theorem 3.1, Lemma 2.1, Remark 3.1]).
There exists afamily of Boolean matrices ( A n ) such that for every n ,1. A n has size n × n .2. A n is a full rank matrix.3. Every row in A n has exactly three ones.4. Every column in A n has O (log n ) ones.5. A n is an ( n log n , , ) -boundary expander. Lemma 7 ([1, Lemma 3.7, Lemma 3.8] [2, Corollary 3.4]).
For any ma-trix A which is an ( r, , c ) -boundary expander and any vector b / ∈ Im ( A ) size ofany tree-like resolution refutation of the system A~x = ~b must be at least cr − . Definition 7.
A subformula is called elementary if it is obtained from the orig-inal formula by substituting a single variable.
N. Gaevoy
Let A be a matrix of size n × n satisfying the conditions of Lemma 6. Considerfamily of functions f i,j ( x ) = x i ⊕ x j . Definition 8.
Consider a CNF-formula Φ i,j ; b encoding the system of a linearequations A~x = ~b with additional equation f i,j ( x ) = 0 . We call a pair of indices ( i, j ) bad if i < j and for some value of b there exists elementary unsatisfiablesubformula of formula Φ i,j ; b such that the size of its tree-like resolution refutationis less than r − , where r is parameter of A . Lemma 8.
Let A be a matrix of size n × n satisfying the conditions of Lemma 6.There exist at most O ( n log n ) bad pairs of indices. Proof. Φ i,j ; b encodes some system of a linear equations B~x = ~d where matrix B depends on parameters i, j . We show that if | ∂ B I | ≥ max(2 , | I | ) holds for anyset I of rows of the matrix B , then the pair ( i, j ) is good. Consider an arbitraryelementary unsatisfiable subformula of Φ i,j ; b . Let B ′ be the matrix obtained byremoving the column that corresponds to a substituted variable. We show that B ′ is an ( r, , ) -boundary expander.We identify rows of the matrix B ′ with corresponding rows of the matrix B .Let c ′ := min I | ∂ B ′ I || I | be the parameter c of B ′ . c ′ = min I | ∂ B ′ I || I | ≥ min I | ∂ B I | − | I | ≥ min I max(2 , | I | ) − | I | Let function f ( t ) be max(2 , t ) − t . It attains its minimum of at t = , so c ′ ≥ and B ′ is an ( r, , ) -boundary expander. By Lemma 7 the size of theminimal tree-like resolution refutation of the considered subformula is at least r − and the pair ( i, j ) is good.Consider a bad pair of indices ( i, j ) . This pair corresponds to some set ofrows I of the matrix A such that the condition | ∂ B I ′ | < max(2 , | I ′ | ) holds forthe set of rows I ′ obtained from I by adding the row encoding f i,j = 0 . It iseasy to see that | ∂ B I ′ | ≥ | ∂ B I | − . The condition | ∂ B I ′ | < does not hold for | I | ≥ because parameter c of the matrix A equals , which is greater than .Then | ∂ B I ′ || I ′ | ≥ | ∂ B I |− | I | +1 ≥ c | I |− | I | +1 = c − c | I | +1 . But c − c | I | +1 ≥ when | I | ≥ ,therefore | I | ≤ .Now for each k ≤ we bound the amount of bad pairs corresponding to thesets of rows of the size k .Each set of the size corresponds to exactly three bad pairs. Consider a setof the size . If | ∂ B I | ≥ , then it cannot correspond to any bad pair because | ∂ B I ′ || I ′ | ≥ > . Otherwise this set consists of two rows that have two commonvariables. Each set of this type corresponds to exactly one bad pair, but thereare at most O ( n log n ) such sets because every column of the matrix A contains O (log n ) ones and therefore every row can be in at most O (log n ) pairs. This bound is not tight, but we need only o ( n ) .ard satisfiable formulas for DPLL with small memory 9 The only remaining case is | I | = 3 . If | ∂ B I | ≥ then I can not correspondto any bad pai because | ∂ B I ′ | ≥ | ∂ B I | − , and | ∂ B I ′ || I ′ | ≥ > . Moreover | ∂ B I | ≥ because c | I | > . Therefore, I can correspond to some bad pair onlyof | ∂ B I | = 3 and in this case it corresponds to exactly three bad pairs. Note that | ∂ B I | = 3 can be only if each row in I has at least one common variable withsome of the remaining two. But then there is a row that has common variableswith both other rows and there are at most O ( n log n ) sets of this type becauseevery column of the matrix A contains O (log n ) ones. Definition 9.
A family of unsatisfiable formulae ( Φ k ) is a family of hard un-satisfiable formulae, if the size of a minimal tree-like resolution refutation of anyformula from ( Φ k ) is at least ℓ log c ℓ for some constant c , where ℓ = Ω ( | Φ k | ) . Definition 10.
A family of satisfiable formulae ( Φ k ) is a family of hard satisfi-able formulae, if the family of all its elementary unsatisfiable formulae is hard. Definition 11.
Two Boolean vectors are called opposite if their sum is equal toa vector of all ones.
Lemma 9.
Let A be the full rank Boolean matrix. If A contains an odd numberof ones in each row, then operators A , A − and the addition of a unit vectorcommute.Proof. It suffices to prove that A ( ~x ⊕ ~
1) = (
A~x ) ⊕ ~ for all x . There is an oddnumber of ones in each row, so A~ ~ . Then A ( ~x ⊕ ~
1) =
A~x ⊕ A~ A~x ) ⊕ ~ Theorem 2.
For any online algorithm H using o ( n log n ) memory there existsa family of pairs of hard satisfiable formulae ( Φ m , Φ m ) satisfying the followingconditions1. Φ jm is a formula over O ( m ) variables.2. Each formula has exactly one satisfying assignment and formulae in one pairhave opposite satisfying assignments.3. H returns the same answers for formulae in one pair.Proof. Without loss of generality we can assume that H prints its answer onlyafter reading the entire input. Fix m . Let A be the Boolean matrix of size m × m from Lemma 6. We will construct a formula with the number of literals linearin m . The bit size of the resulting formula will be O ( m log m ) , so H will use atmost o ( m log m log( m log m ) ) = o ( m ) memory. From now on we will identify the size ofthe formula with the number of literals in it.We create a variable for each column and each “1” in matrix A . Let x i,j denotethe variable that corresponds to the “1” in the i -th row and the j -th column and x j denote the variable that corresponds to the j -th column. Note that we have alinear number of x i,j variables since each row of A contains exactly three ones. Consider an arbitrary Boolean vector ~q and the i -th row of matrix A . Let a, b and c be the column numbers corresponding to ones in the i -th row of A .Consider formula ϕ q,i := ( x i,a ⊕ x i,b ⊕ x i,c = q [ i ]) . Note that CNF representationof ϕ q,i consists of O (1) literals. We define the formula ϕ q := V i ϕ q,i .Now consider the formula Φ q,w,d ; a,b := ( ϕ q ∨ u ) ∧ ( ϕ w ∨ ¬ u ) ∧ V i,j ( x i,j = x j ⊕ d [ i ]) ∧ ψ a,b where ψ a,b := x a ⊕ x b . ϕ q and ϕ w are formulae of size linear in m , so the size of the formulae ( ϕ q ∨ u ) and ( ϕ w ∨ ¬ u ) after conversion to CNFwill also be linear, and therefore the formula Φ q,w,d ; a,b will have linear size.Consider the formula ξ q,d := ϕ q ∧ V i,j ( x i,j = x j ⊕ d [ i ]) . The values of thevariables x i,j are uniquely determined by x j according to the formula x i,j = x j ⊕ d [ i ] , therefore ξ q,d is true if and only if the condition L j ∈ I ( x j ⊕ d [ i ]) = q [ i ] is satisfied for all rows of the matrix A , where I denotes the set of columnscontaining one in the i -th row. Every row of A contains exactly three ones, sothis condition is equivalent to L j ∈ I x j = q [ i ] ⊕ d [ i ] . The conjunction of theseconditions encodes the system A~x = ~q ⊕ ~d which means that ξ q,d has uniquesatisfying assignment A − ( ~q ⊕ ~d ) . Therefore, for a fixed value of u , the formula Φ q,w,d ; a,b will have at most one satisfying assignment.The part of the formula depending on the parameter ~w begins later than theend of the part of the formula depending on ~q and ends before the part of theformula depending on ~d , therefore H will read the formula parameters in theorder q, w, d . Let Q be the largest set of vectors q that H cannot distinguish,and W be the largest set of vectors w ∈ Q ⊕ ~ that H cannot distinguish underassumption that q ∈ Q . It is easy to see that | Q | ≥ m − o ( m ) , and therefore | W | ≥ m − o ( m ) . Let Φ q,w ; a,b := Φ q,w,d ; a,b where arbitrary element of W ⊕ ~ isselected as ~d . Consider f W := W ⊕ ~d and e Q := f W ⊕ ~ . Note that ~ ∈ e Q (andtherefore ~ ∈ f W ).By Lemma 9, A and the unit vector addition commute and e Q = f W ⊕ ~ ,therefore A − e Q = ( A − f W ) ⊕ ~ .We choose a , b in order that ψ a ,b is not constant on the set A − e Q . Weshow that this can be done. We construct an equivalence relation on the coordi-nates of the space { , } m (i.e. bits) as follows. i ∼ j if and only if q [ i ] ⊕ q [ j ] isconstant on all q ∈ A − e Q . There are at least m − o ( m ) equivalence classes since | A − e Q | ≥ m − o ( m ) . Therefore there exist Ω ( m ) functions that are not constantson A − e Q , and by Lemma 8 there is also such ψ a ,b that the size of the refutationof any unsatisfiable elementary subformula of the formula ( A~x = ~q ) ∧ ψ a ,b isexponential.Note that the chosen ψ a ,b will be also non constant on the set A − f W , sincethese sets consist of opposite elements. Now we choose q ∈ e Q and w ∈ f W suchthat ψ a ,b ( A − ~q ) = ψ a ,b ( ~
0) = ψ a ,b ( A − ~ and ψ a ,b ( A − ~w ) = ψ a ,b ( ~ .Recall that A contains exactly three ones in each row, which means that A~ ~ and therefore ψ a ,b ( A − ~w ) = ψ a ,b ( A − ~ . Consider formulae Φ ~ ⊕ ~d, ~w ⊕ ~d ; a ,b and Φ ~q ⊕ ~d,~ ⊕ ~d ; a ,b . Note that the corresponding formula parameters are indis-tinguishable for H by construction, which means that H answers the same onboth formulae. ard satisfiable formulas for DPLL with small memory 11 We show that both formulae have exactly one satisfying assignment and theirsatisfying assignments are opposite. The formula Φ q,w ; a,b with ψ a ,b removed, ithas exactly one satisfying assignment for each value of u ( A − ( ~q ⊕ ~d ) for u = 0 and A − ( ~w ⊕ ~d ) for u = 1 ). For the considered formulae, q takes the values ~ ⊕ ~d and q ⊕ ~d (and w , respectively, ~w ⊕ ~d and ~ ⊕ ~d ), on which ψ a ,b takesdifferent values by construction. Moreover, ψ a ,b takes value on assignmentscorresponding to parameters q = ~ ⊕ ~d and w = ~ ⊕ ~d , which correspond toopposite values of u and are themselves opposite, since A commutes with theunit vector addition by Lemma 9.It remains to show that the constructed family of a formulae is a familyof hard formulae. After substitution of any variable, if the formula becomesunsatisfiable, then the resulting formula has an unsatisfiable subformula of theform ϕ q ∧ V i,j ( x i,j = x j ⊕ d [ i ]) ∧ ψ a,b , possibly with one substituted variable,the size of the refutation of which is not less than the size of the refutation ofan elementary unsatisfiable subformula of a formula of the form ( A~x = ~q ) ∧ ψ a,b which is exponential. Corollary 2.
For any online heuristic H that uses o ( n log n ) memory, there existsa family of satisfiable formulae such that DPLL H makes at least ℓ log c ℓ recursivecalls on formulae from this family for some c , where ℓ = Ω ( n log n ) .Proof. Consider the family of pairs of formulae from Theorem 2 and in eachpair choose the formula on which H goes into an unsatisfiable subformula afterthe first step. The size of the minimal tree-like resolution refutation of thissubformula is exponential, so DPLL H runs exponential time on it. Corollary 3.
For any offline heuristic H that uses (1 − ε ) log n cells of memoryover the binary alphabet for some positive ε , there exists a family of satisfiableformulae such that DPLL H makes at least ℓ log c ℓ recursive calls on formulae fromthis family for some c , where ℓ = Ω ( n log n ) .Proof. By Theorem 1, there exists an equivalent online heuristic H ′ that uses O (log n · (1 − ε ) log n · (1 − ε ) log n ) memory. It is easy to see that log n · (1 − ε ) log n · (1 − ε ) log n = o ( n log n ) . Thus, H ′ ∈ ′ ( o ( n log n ) , log) . Acknowledgments
The author is grateful to Alexander Okhotin for helpful discussions and alsograteful to Edward A. Hirsch, who supervised this work.
References
1. Alekhnovich, M., Hirsch, E., Itsykson, D.: Exponential lower bounds for the run-ning time of dpll algorithms on satisfiable formulas. Journal of Automated Rea-soning , 131–143 (07 2004). https://doi.org/10.1007/978-3-540-27836-8_102 N. Gaevoy2. Ben-Sasson, E., Wigderson, A.: Short proofs are narrow — resolution made sim-ple. J. ACM (2), 149–169 (Mar 2001). https://doi.org/10.1145/375827.375835,http://doi.acm.org/10.1145/375827.3758353. Cook, J., Etesami, O., Miller, R., Trevisan, L.: Goldreich’s one-way function candidate and myopic backtracking algorithms. In: Pro-ceedings of the 6th Theory of Cryptography Conference on The-ory of Cryptography. pp. 521–538. TCC ’09, Springer-Verlag, Berlin,Heidelberg (2009). https://doi.org/10.1007/978-3-642-00457-5_31,http://dx.doi.org/10.1007/978-3-642-00457-5_314. Cook, J., Etesami, O., Miller, R., Trevisan, L.: On the one-way function candidateproposed by goldreich. ACM Trans. Comput. Theory (3), 14:1–14:35 (Jul 2014).https://doi.org/10.1145/2633602, http://doi.acm.org/10.1145/26336025. Davis, M., Logemann, G., Loveland, D.: A machine program for theorem-proving.Commun. ACM (7), 394–397 (Jul 1962). https://doi.org/10.1145/368273.368557,http://doi.acm.org/10.1145/368273.3685576. Davis, M., Putnam, H.: A computing procedure for quantification the-ory. J. ACM (3), 201–215 (Jul 1960). https://doi.org/10.1145/321033.321034,http://doi.acm.org/10.1145/321033.3210347. Haken, A.: The intractability of resolution. Theoretical Computer Science , 297–308 (08 1985). https://doi.org/10.1016/0304-3975(85)90144-68. Itsykson, D.: Lower bound on average-case complexity of inversion of goldreich’sfunction by drunken backtracking algorithms. In: Ablayev, F., Mayr, E.W. (eds.)Computer Science – Theory and Applications. pp. 204–215. Springer Berlin Hei-delberg, Berlin, Heidelberg (2010)9. Itsykson, D., Knop, A.: Hard satisfiable formulas for splittings by linear combina-tions. In: Gaspers, S., Walsh, T. (eds.) Theory and Applications of SatisfiabilityTesting – SAT 2017. pp. 53–61. Springer International Publishing, Cham (2017)10. Itsykson, D., Sokolov, D.: The complexity of inversion of explicit goldreich’s func-tion dpll algorithms. In: Kulikov, A., Vereshchagin, N. (eds.) Computer Science –Theory and Applications. pp. 134–147. Springer Berlin Heidelberg, Berlin, Heidel-berg (2011)11. Itsykson, D., Sokolov, D.: Lower bounds for myopic dpll algorithmswith a cut heuristic. In: Proceedings of the 22Nd International Confer-ence on Algorithms and Computation. pp. 464–473. ISAAC’11, Springer-Verlag, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25591-5_48,http://dx.doi.org/10.1007/978-3-642-25591-5_4812. Stearns, R.E., Hartmanis, J., Lewis, P.M.: Hierarchies of memory limited computa-tions. In: 6th Annual Symposium on Switching Circuit Theory and Logical Design(SWCT 1965). pp. 179–190 (Oct 1965). https://doi.org/10.1109/FOCS.1965.1113. Tseitin, G.S.: On the Complexity of Derivation in Proposi-tional Calculus, pp. 466–483. Springer Berlin Heidelberg, Berlin,Heidelberg (1983). https://doi.org/10.1007/978-3-642-81955-1_28,https://doi.org/10.1007/978-3-642-81955-1_2814. Urquhart, A.: Hard examples for resolution. J. ACM34