[PDF] Hard satisfiable formulas for DPLL algorithms using heuristics with small memory

Abstract

DPLL algorithm for solving the Boolean satisfiability problem (SAT) can be represented in the form of a procedure that, using heuristics A and B, select the variable x from the input formula \varphi and the value b and runs recursively on the formulas \varphi[x := b] and \varphi[x := 1 - b]. Exponential lower bounds on the running time of DPLL algorithms on unsatisfiable formulas follow from the lower bounds for tree-like resolution proofs. Lower bounds on satisfiable formulas are also known for some classes of DPLL algorithms such as "myopic" and "drunken" algorithms. All lower bounds are made for the classes of DPLL algorithms that limit heuristics access to the formula. In this paper we consider DPLL algorithms with heuristics that have unlimited access to the formula but use small memory. We show that for any pair of heuristics with small memory there exists a family of satisfiable formulas \Phi_n such that a DPLL algorithm that uses these heuristics runs in exponential time on the formulas \Phi_n.

Full PDF

aa r X i v : . [ c s . CC ] J a n Hard satisﬁable formulas for DPLL algorithmsusing heuristics with small memory

Nikita Gaevoy , St. Petersburg State University,Universitetskaya nab., 7/9, St. Petersburg, Russia, 199034, Steklov Institute of Mathematics at St. Petersburg, nab. r. Fontanki 27,St. Petersburg, Russia, 191023 [email protected]

Abstract.

DPLL algorithm for solving the Boolean satisﬁability prob-lem (SAT) can be represented in the form of a procedure that, usingheuristics A and B , select the variable x from the input formula ϕ and thevalue b and runs recursively on the formulas ϕ [ x := b ] and ϕ [ x := 1 − b ] .Exponential lower bounds on the running time of DPLL algorithms onunsatisﬁable formulas follow from the lower bounds for tree-like reso-lution proofs. Lower bounds on satisﬁable formulas are also known forsome classes of

DPLL algorithms such as “myopic” and “drunken” algo-rithms [1].All lower bounds are made for the classes of

DPLL algorithms that limitheuristics access to the formula. In this paper we consider

DPLL algo-rithms with heuristics that have unlimited access to the formula but usesmall memory. We show that for any pair of heuristics with small memorythere exists a family of satisﬁable formulas Φ n such that a DPLL algo-rithm that uses these heuristics runs in exponential time on the formulas Φ n . Keywords:

DPLL · SAT · online Turing machines · space-bounded com-putations · sublinear space

DPLL (are named after the authors: Davis, Putnam, Logemann, Loveland [6,5])algorithms are one of the most popular approaches to Boolean satisﬁability prob-lem (SAT).

DPLL is an algorithm that takes the formula ϕ , uses heuristics A and B (which are the parameters of an algorithm) to choose a variable x and thevalue b that would be investigated ﬁrst and makes recursive calls on formulas ϕ [ x := b ] and ϕ [ x := ¬ b ] if the ﬁrst one is not satisﬁable.Every DPLL algorithm on any formula ﬁnds either its satisfying assignmentor its tree-like resolution refutation. Therefore, exponential lower bounds for tree-like resolution (e.g. Tseitin formulae and its generalizations [13,14] and formulasbased on pigeonhole principle [7]) imply that any

DPLL algorithm should workexponential time proving that corresponding formulas are unsatisﬁable. However,

N. Gaevoy the running time on satisﬁable formulas may diﬀer from the running time onunsatisﬁable formulas and be even linear if heuristic B is able to solve SAT.Moreover, satisﬁable formulas are simpler for DPLL -based SAT solvers thatused on practice (and therefore more restrictive to the choice of its heuristics)rather than unsatisﬁable ones.Despite the fact that there is no hope to prove any nontrivial bounds forthe running time of

DPLL algorithms on satisﬁable formulas with arbitrarypolynomial time heuristics unless P = NP , it is still interesting to prove lowerbounds for DPLL algorithms that use heuristics from more narrow classes than P . Alekhnovich, Hirsch, and Itsykson [1] proved exponential lower bounds onsatisﬁable formulas for two wide classes of DPLL algorithms: myopic

DPLL and drunken

DPLL . Drunken

DPLL has no restrictions on heuristic A , but theheuristic B chooses its answer at random with equal probabilities. In myopic DPLL both heuristics has limited access on the input formula: they can read thewhole formula with all negation signs erased and also they are able to read n − ε clauses precisely. Many formula simpliﬁcation heuristics, such as elimination of aunit clauses can be simulated by myopic DPLL , but others, such as subsumptionheuristic (i.e. deletion of a clause that is a superset of an another clause) cannot.There are also a number of works concerning lower bounds for generalizationsof

DPLL algorithms. The paper [11] gives lower bounds for

DPLL algorithmswith a cut heuristic, i.e. such additional heuristic C that is able to decide not tomake recursive calls of subformulas that it considered not “perspective” enough,and the paper [9] gives lower bounds for DPLL( ⊕ ) algorithms that can splitnot only by values of some variables, but also by values of its linear combina-tions. Papers [3,8,10,4] consider the generalization of DPLL designed to invertGoldreich’s one-way function candidate and provide lower bounds on it.

Our contribution.

All discussed lower bounds are made for the classes ofheuristics that limits access to the formula rather than computational power ofa heuristic. In this work we consider ordinary

DPLL with classes of deterministicheuristics that are bounded only by its space usage and not limited in its accessto the formula, in particular

DSPACE ( o (log)) . In order to prove an exponentiallower bound for DPLL with heuristics from this class we use the notion of anonline Turing machines which can be considered as a formalization of streamingalgorithms. Then we build an exponential reduction to its slight modiﬁcation andprove an exponential lower bound for

DPLL using online heuristics with sub-linear memory. Note that although the class of online heuristics using sublinearmemory appears to be relatively small, it is easy to see that it can express someformula simpliﬁcation heuristics that can not be done using myopic or drunkenalgorithms, such as a subsumption of clauses that are not far away from eachother (namely, at the distance O ( n polylog ( n ) ) where n is the size of the input) inthe k -SAT formula. Further research.

Our reduction of oﬄine algorithms to online is speciﬁc todeterministic algorithms and cannot be straightforward generalized to random-ized algorithms. It would be interesting to ﬁnd a proper generalization of notion ard satisﬁable formulas for DPLL with small memory 3 of an online algorithm and a similar reduction to it for randomized algorithms.Also, despite the fact that almost all bounds presented in this work cannot besigniﬁcantly improved without proving that L = NP there is an logarithmic gapbetween lower and upper bounds in the reduction of oﬄine algorithms to onlinewhich is interesting to close. In order to work with classes of small memory we use the deﬁnition of a Turingmachine with separate read-only input tape and read-write working tape. Wealso add separate write-only output tape for Turing machines whose output ismore than one bit. By memory conﬁguration of a Turing machine we mean atuple of conﬁguration of the working tape and the current state of the Turingmachine.

Deﬁnition 1.

An online Turing machine is a Turing machine with the addi-tional restriction that the input tape head can be shifted only in one direction.

Deﬁnition 2. ( f ) is the class of all languages recognized by onlineTuring machine using at most f ( n ) memory. Obviously, any online Turing machine is also an oﬄine Turing machine, so ( f ) ⊆ DSPACE ( f ) . However, ( f ) = DSPACE ( f ) for any f = Ω ( n ) . Deﬁnition 3.

Online Turing machine with shifted input is an online Turingmachine with a modiﬁcation that the input string is shifted on the input tape tothe size of the input. Thus, if the size of the input string is n , an online Turingmachine with shifted input must read n empty symbols before it starts to read theinput itself. Deﬁnition 4. ′ ( f ) is a class of all languages recognized by online Tur-ing machine with shifted input. This modiﬁcation gives additional ability to read size of the input before read-ing input string itself. Obviously, ( f ) ⊆ ′ ( f ) . Now we showthat modiﬁed online Turing machines are strictly more powerful than regularTuring machines even when the latters are restricted to a substantially largerspace. Lemma 1. ′ (log n ) \ ( o ( n )) is not empty.Proof. Consider language L consisting of binary strings s such that the binaryrepresentation of | s | is a preﬁx of s . It is easy to see that L ∈ ′ (log) .Consider arbitrary online Turing machine M that recognizes L . We show thatafter reading k symbols M should use at least Ck cells on working tape for someinput where C is some constant depending only on M . Assume the opposite, then N. Gaevoy there are two distinct words s and t such that | s | = | t | = k , their ﬁrst symbolis and M moves to the same conﬁguration after reading s and t . Considerwords S = s · [ s ] − k and T = t · [ s ] − k , where [ s ] means number which binaryrepresentation is s . Conﬁguration of M after reading s and t are the same, so M accepts S if and only if M accepts T , but S ∈ L when T / ∈ L which leads usto contradiction.However, there is no diﬀerence when the space is very low. Lemma 2 ([12]). ( o (log)) = REG

Lemma 3. ′ ( o (log)) = REG .Proof.

Consider arbitrary online Turing machine with shifted input M . We showthat M either uses Ω (log) memory or recognizes a regular language. We split thework of M into two phases. During the ﬁrst phase it reads empty symbols beforethe start of the input and on the second phase it reads the input itself. Considerthe graph of the memory conﬁgurations which M passes when its input consistsof inﬁnite number of empty cells (informally, during the inﬁnite ﬁrst phase).Every vertex of this graph (i.e. conﬁguration of M ) has exactly one outgoingedge, since all symbols on the input tape are identical. Therefore, this graph maybe either an inﬁnite simple path or a cycle. In the case when this graph is a path M should use Ω (log) cells of memory after the ﬁrst phase since no to memoryconﬁguration can appear twice. In the other case M appears only in constantnumber of memory conﬁgurations during (and therefore after) the ﬁrst phase. If M uses at least Ω (log) memory we already done, otherwise we can construct anonline Turing machine M which invokes the same computation as M does onthe second phase with all possible results of the ﬁrst phase simultaneously andchooses one of them on the end of computation. Therefore, M uses the sameamount of memory as M up to multiplicative constant and recognizes the samelanguage as M , so by Lemma 2 M either uses Ω (log) memory or recognizes aregular language. Deﬁnition 5.

DSPACE ( f, g ) is a class of all functions computable by onlineTuring machine using O ( f ) cells of working tape and O ( g ) cells of output tape.Similarly, we deﬁne ( f, g ) and ′ ( f, g ) for online Turingmachines and online Turing machines with shifted input respectively. Note that

DSPACE ( f ) = DSPACE ( f, by deﬁnition. From now on we willconsider only Turing machines with at least Ω (log n ) memory. Lemma 4.

For any function f ( n ) = Ω (log n ) such that f ( n ) ≤ n for all suf-ﬁciently large n , if the function n f ( n ) belongs to DSPACE (log f, log f ) ,then there exists language L such that L ∈ DSPACE (log f ) ∩ ′ ( f ) and L / ∈ ′ ( o ( f )) .Proof. Let L be the set of all f -periodic strings. More formally L = { s | | s | = n ⇒ ( ∀ i < n − f ( n ) : s [ i ] = s [ i + f ( n )]) } . ard satisﬁable formulas for DPLL with small memory 5 First we show that L ∈ DSP ACE (log f ) . Consider oﬄine Turing machinethat ﬁrst computes the value f ( n ) , then rewinds the input tape to the start andchecks equality in all pairs of symbols on distance of f ( n ) in the input tape. Itis easy to see that both parts can be done using only O (log f ) memory.Consider the computation of f ( n ) as a function n f ( n ) on oﬄine Turingmachine. In order to simulate this computation it suﬃces to store the conﬁgura-tion of the working tape and the position of the reading head on the input tapesince all the symbols in the input tape are the same. It takes O (log n ) memory tostore position on the input tape and O (log f ) memory to store conﬁguration ofworking tape which is equal to O (log n ) in total since f ( n ) ≤ n for all suﬃcientlylarge n . Recall that f = Ω (log n ) . Note that f ( n ) can also be computed on anonline Turing machine with shifted input with O ( f ) memory by counting thenumber of empty symbols and simulation of the computation of function f ( n ) on an oﬄine Turing machine.Any online algorithm that recognizes L should be in diﬀerent conﬁgurationsafter reading diﬀerent preﬁxes of the input of size f ( n ) , so L / ∈ ′ ( o ( f )) .On the other hand, it suﬃces to store only the last f ( n ) symbols of the inputtape to recognize L on an online Turing machine, therefore, L ∈ ′ ( f ) . Theorem 1.

For any function f ( n ) = Ω (log log n ) , if function F can be com-puted on an oﬄine Turing machine M using f cells of working tape and binaryworking alphabet and f can be computed on an oﬄine Turing machine as afunction n f ( n ) using O ( f · f · log F ) memory, then F ∈ ′ ( f · f · log F , log F ) .Proof. Consider an oﬄine Turing machine M that computes F . Without loss ofgenerality we can assume that M moves working tape head on every step andstops only when it has its input tape head on the end of the input string. Byposition of a Turing machine we mean position of its head on the input tape.We construct an online Turing machine M ′ that computes F . At the start M ′ reads the size of the input n and ﬁnds the value of the function f ( n ) bysimulation of computation of the function n f ( n ) . Then, M ′ reads its inputand simulates the work of M . We show how to do this explicitly.Let M be in the memory conﬁguration x at position k . Consider path ρ that M will traverse over pairs of its memory conﬁguration and the position of theinput tape head, starting from the current position until it reaches position k + 1 or some halting conﬁguration. Note that ρ implicitly depends on the input andcan be inﬁnite. Let h k ( x ) be the function that returns memory conﬁguration atthe end of ρ and string that M prints on the output tape in ρ . If ρ is inﬁniteor the string of all printed symbols is longer than log F , i.e. the longest possibleoutput, then h k ( x ) returns a special loop marker.Note that in order to simulate the work of M it is enough for M ′ only tomaintain h k ( x ) for all x and current position k and memory conﬁguration inwhich M ﬁrst comes into position k . h can be computed trivially. We show thatfor every k function h k can be computed using only h k − and the k -th inputsymbol. N. Gaevoy

Consider some ﬁxed k and memory conﬁguration x . Recall that M moves theinput tape head on every step of its computation. Then, M either moves headto the right and reaches position k + 1 or moves head to the left and then using h k − we can compute in which memory conﬁguration M will reach position k next time and the string that M will print to the output until it reaches thisposition. Let g k ( x ) be the function that computes these two values. Note that h k ( x ) is either equal to g ik ( x ) for some i or returns the loop marker. In order torecognize a cycle in g ik ( x ) we compute g ik ( x ) and g ik ( x ) in parallel and then if atsome step of this computation the string printed in g i ( x ) becomes too long orconﬁgurations computed by these two functions becomes equal, we mark h k ( x ) with the loop marker, otherwise the computation halts.In order to compute the value of h k ( x ) we need O ( f · log | Q | · log F ) memorywhere Q denotes the set of ﬁnite states used by M . Let q be the number ofmemory conﬁgurations of M . Then M ′ uses O (log q + q log q · log F ) cells ofmemory and hence F ∈ ′ ( f · f · log F , log F ) . Corollary 1.

For any function f ( n ) = Ω (log log( n )) , if language L can be com-puted on oﬄine Turing machine M using f cells of working tape and binaryworking alphabet and f can be computed on an oﬄine Turing machine as afunction n f ( n ) using O ( f · f ) memory, then L ∈ ′ ( f · f ) .Proof. Immediately follows from the fact that any language is a function withonly one bit output.

Consider

DPLL

A,B algorithm for deciding the satisﬁability of CNF-formula ϕ parametrized by two heuristics A and B . Heuristic A takes formula ϕ , choosessome variable in ϕ and returns its number. Heuristic B takes ϕ and the numberof a variable (chosen by A ) and returns a value for this variable.For purposes of our proof from now on we will consider the DPLL H algorithm,which has a single heuristic for both choosing a variable and a value for it. Algorithm 1

DPLL H procedure DPLL H ( ϕ ) ⊲ ϕ — formula in CNF2: if ϕ is empty then return satisfiable if ϕ contains empty clause then return unsatisfiable ( x, b ) ← H ( ϕ ) ⊲ H choose both variable and its value7: if DPLL H ( ϕ [ x = b ]) = satisfiable then return satisfiable return DPLL H ( ϕ [ x = ¬ b ]) ard satisﬁable formulas for DPLL with small memory 7 The following lemma shows that

DPLL

A,B and

DPLL H are almost equivalentin terms of space complexity of its heuristics. Lemma 5.

Let A and B use at most f ( n ) memory on all inputs of size n . Thenthere is a heuristic H , such that H uses at most f ( n ) + O (log log n ) memoryand DPLL H makes the same recursive calls as DPLL

A,B .Proof.

Let S be the string returned by A . Since | S | = O (log n ) , memory f ( n ) + O (log log n ) suﬃces to compute the k -th symbol of string S . Consider algorithm H which emulates algorithm B and computes symbols of string S every time B access them using additional f ( n ) + O (log log n ) memory. We need the construction of boundary expander matrices from [1].

Deﬁnition 6 ([1, Deﬁnition 2.1]).

Let A be Boolean matrix. For a set of rows I of A , its boundary ∂I is a set of all columns such that there exists exactly onerow in I that contains on the intersection. A is an ( r, s, c ) -boundary expanderif1. Every row of A has at most s ones.2. For any set of rows I if | I | ≤ r , then | ∂I | ≥ c · | I | The formula Φ A,~b encoding the system of linear equations

A~x = ~b is theformula constructed as follows. For each row of the matrix A we construct CNF-formula Φ A,~b,i ( x ) , encoding ( L j ∈ S i x j = b [ i ]) , where i is the number of row, and S i is the set of column numbers in which the row with the number i containsones. We take Φ A,~b ( x ) := V i Φ A,b,i ( x ) . Note that the resulting formula has aconjunctive normal form. We identify the system of linear equations with theformula encoding it. Lemma 6 ([1, Theorem 3.1, Lemma 2.1, Remark 3.1]).

There exists afamily of Boolean matrices ( A n ) such that for every n ,1. A n has size n × n .2. A n is a full rank matrix.3. Every row in A n has exactly three ones.4. Every column in A n has O (log n ) ones.5. A n is an ( n log n , , ) -boundary expander. Lemma 7 ([1, Lemma 3.7, Lemma 3.8] [2, Corollary 3.4]).

For any ma-trix A which is an ( r, , c ) -boundary expander and any vector b / ∈ Im ( A ) size ofany tree-like resolution refutation of the system A~x = ~b must be at least cr − . Deﬁnition 7.

A subformula is called elementary if it is obtained from the orig-inal formula by substituting a single variable.

N. Gaevoy

Let A be a matrix of size n × n satisfying the conditions of Lemma 6. Considerfamily of functions f i,j ( x ) = x i ⊕ x j . Deﬁnition 8.

Consider a CNF-formula Φ i,j ; b encoding the system of a linearequations A~x = ~b with additional equation f i,j ( x ) = 0 . We call a pair of indices ( i, j ) bad if i < j and for some value of b there exists elementary unsatisﬁablesubformula of formula Φ i,j ; b such that the size of its tree-like resolution refutationis less than r − , where r is parameter of A . Lemma 8.

Let A be a matrix of size n × n satisfying the conditions of Lemma 6.There exist at most O ( n log n ) bad pairs of indices. Proof. Φ i,j ; b encodes some system of a linear equations B~x = ~d where matrix B depends on parameters i, j . We show that if | ∂ B I | ≥ max(2 , | I | ) holds for anyset I of rows of the matrix B , then the pair ( i, j ) is good. Consider an arbitraryelementary unsatisﬁable subformula of Φ i,j ; b . Let B ′ be the matrix obtained byremoving the column that corresponds to a substituted variable. We show that B ′ is an ( r, , ) -boundary expander.We identify rows of the matrix B ′ with corresponding rows of the matrix B .Let c ′ := min I | ∂ B ′ I || I | be the parameter c of B ′ . c ′ = min I | ∂ B ′ I || I | ≥ min I | ∂ B I | − | I | ≥ min I max(2 , | I | ) − | I | Let function f ( t ) be max(2 , t ) − t . It attains its minimum of at t = , so c ′ ≥ and B ′ is an ( r, , ) -boundary expander. By Lemma 7 the size of theminimal tree-like resolution refutation of the considered subformula is at least r − and the pair ( i, j ) is good.Consider a bad pair of indices ( i, j ) . This pair corresponds to some set ofrows I of the matrix A such that the condition | ∂ B I ′ | < max(2 , | I ′ | ) holds forthe set of rows I ′ obtained from I by adding the row encoding f i,j = 0 . It iseasy to see that | ∂ B I ′ | ≥ | ∂ B I | − . The condition | ∂ B I ′ | < does not hold for | I | ≥ because parameter c of the matrix A equals , which is greater than .Then | ∂ B I ′ || I ′ | ≥ | ∂ B I |− | I | +1 ≥ c | I |− | I | +1 = c − c | I | +1 . But c − c | I | +1 ≥ when | I | ≥ ,therefore | I | ≤ .Now for each k ≤ we bound the amount of bad pairs corresponding to thesets of rows of the size k .Each set of the size corresponds to exactly three bad pairs. Consider a setof the size . If | ∂ B I | ≥ , then it cannot correspond to any bad pair because | ∂ B I ′ || I ′ | ≥ > . Otherwise this set consists of two rows that have two commonvariables. Each set of this type corresponds to exactly one bad pair, but thereare at most O ( n log n ) such sets because every column of the matrix A contains O (log n ) ones and therefore every row can be in at most O (log n ) pairs. This bound is not tight, but we need only o ( n ) .ard satisﬁable formulas for DPLL with small memory 9 The only remaining case is | I | = 3 . If | ∂ B I | ≥ then I can not correspondto any bad pai because | ∂ B I ′ | ≥ | ∂ B I | − , and | ∂ B I ′ || I ′ | ≥ > . Moreover | ∂ B I | ≥ because c | I | > . Therefore, I can correspond to some bad pair onlyof | ∂ B I | = 3 and in this case it corresponds to exactly three bad pairs. Note that | ∂ B I | = 3 can be only if each row in I has at least one common variable withsome of the remaining two. But then there is a row that has common variableswith both other rows and there are at most O ( n log n ) sets of this type becauseevery column of the matrix A contains O (log n ) ones. Deﬁnition 9.

A family of unsatisﬁable formulae ( Φ k ) is a family of hard un-satisﬁable formulae, if the size of a minimal tree-like resolution refutation of anyformula from ( Φ k ) is at least ℓ log c ℓ for some constant c , where ℓ = Ω ( | Φ k | ) . Deﬁnition 10.

A family of satisﬁable formulae ( Φ k ) is a family of hard satisﬁ-able formulae, if the family of all its elementary unsatisﬁable formulae is hard. Deﬁnition 11.

Two Boolean vectors are called opposite if their sum is equal toa vector of all ones.

Lemma 9.

Let A be the full rank Boolean matrix. If A contains an odd numberof ones in each row, then operators A , A − and the addition of a unit vectorcommute.Proof. It suﬃces to prove that A ( ~x ⊕ ~

1) = (

A~x ) ⊕ ~ for all x . There is an oddnumber of ones in each row, so A~ ~ . Then A ( ~x ⊕ ~

1) =

A~x ⊕ A~ A~x ) ⊕ ~ Theorem 2.

For any online algorithm H using o ( n log n ) memory there existsa family of pairs of hard satisﬁable formulae ( Φ m , Φ m ) satisfying the followingconditions1. Φ jm is a formula over O ( m ) variables.2. Each formula has exactly one satisfying assignment and formulae in one pairhave opposite satisfying assignments.3. H returns the same answers for formulae in one pair.Proof. Without loss of generality we can assume that H prints its answer onlyafter reading the entire input. Fix m . Let A be the Boolean matrix of size m × m from Lemma 6. We will construct a formula with the number of literals linearin m . The bit size of the resulting formula will be O ( m log m ) , so H will use atmost o ( m log m log( m log m ) ) = o ( m ) memory. From now on we will identify the size ofthe formula with the number of literals in it.We create a variable for each column and each “1” in matrix A . Let x i,j denotethe variable that corresponds to the “1” in the i -th row and the j -th column and x j denote the variable that corresponds to the j -th column. Note that we have alinear number of x i,j variables since each row of A contains exactly three ones. Consider an arbitrary Boolean vector ~q and the i -th row of matrix A . Let a, b and c be the column numbers corresponding to ones in the i -th row of A .Consider formula ϕ q,i := ( x i,a ⊕ x i,b ⊕ x i,c = q [ i ]) . Note that CNF representationof ϕ q,i consists of O (1) literals. We deﬁne the formula ϕ q := V i ϕ q,i .Now consider the formula Φ q,w,d ; a,b := ( ϕ q ∨ u ) ∧ ( ϕ w ∨ ¬ u ) ∧ V i,j ( x i,j = x j ⊕ d [ i ]) ∧ ψ a,b where ψ a,b := x a ⊕ x b . ϕ q and ϕ w are formulae of size linear in m , so the size of the formulae ( ϕ q ∨ u ) and ( ϕ w ∨ ¬ u ) after conversion to CNFwill also be linear, and therefore the formula Φ q,w,d ; a,b will have linear size.Consider the formula ξ q,d := ϕ q ∧ V i,j ( x i,j = x j ⊕ d [ i ]) . The values of thevariables x i,j are uniquely determined by x j according to the formula x i,j = x j ⊕ d [ i ] , therefore ξ q,d is true if and only if the condition L j ∈ I ( x j ⊕ d [ i ]) = q [ i ] is satisﬁed for all rows of the matrix A , where I denotes the set of columnscontaining one in the i -th row. Every row of A contains exactly three ones, sothis condition is equivalent to L j ∈ I x j = q [ i ] ⊕ d [ i ] . The conjunction of theseconditions encodes the system A~x = ~q ⊕ ~d which means that ξ q,d has uniquesatisfying assignment A − ( ~q ⊕ ~d ) . Therefore, for a ﬁxed value of u , the formula Φ q,w,d ; a,b will have at most one satisfying assignment.The part of the formula depending on the parameter ~w begins later than theend of the part of the formula depending on ~q and ends before the part of theformula depending on ~d , therefore H will read the formula parameters in theorder q, w, d . Let Q be the largest set of vectors q that H cannot distinguish,and W be the largest set of vectors w ∈ Q ⊕ ~ that H cannot distinguish underassumption that q ∈ Q . It is easy to see that | Q | ≥ m − o ( m ) , and therefore | W | ≥ m − o ( m ) . Let Φ q,w ; a,b := Φ q,w,d ; a,b where arbitrary element of W ⊕ ~ isselected as ~d . Consider f W := W ⊕ ~d and e Q := f W ⊕ ~ . Note that ~ ∈ e Q (andtherefore ~ ∈ f W ).By Lemma 9, A and the unit vector addition commute and e Q = f W ⊕ ~ ,therefore A − e Q = ( A − f W ) ⊕ ~ .We choose a , b in order that ψ a ,b is not constant on the set A − e Q . Weshow that this can be done. We construct an equivalence relation on the coordi-nates of the space { , } m (i.e. bits) as follows. i ∼ j if and only if q [ i ] ⊕ q [ j ] isconstant on all q ∈ A − e Q . There are at least m − o ( m ) equivalence classes since | A − e Q | ≥ m − o ( m ) . Therefore there exist Ω ( m ) functions that are not constantson A − e Q , and by Lemma 8 there is also such ψ a ,b that the size of the refutationof any unsatisﬁable elementary subformula of the formula ( A~x = ~q ) ∧ ψ a ,b isexponential.Note that the chosen ψ a ,b will be also non constant on the set A − f W , sincethese sets consist of opposite elements. Now we choose q ∈ e Q and w ∈ f W suchthat ψ a ,b ( A − ~q ) = ψ a ,b ( ~

0) = ψ a ,b ( A − ~ and ψ a ,b ( A − ~w ) = ψ a ,b ( ~ .Recall that A contains exactly three ones in each row, which means that A~ ~ and therefore ψ a ,b ( A − ~w ) = ψ a ,b ( A − ~ . Consider formulae Φ ~ ⊕ ~d, ~w ⊕ ~d ; a ,b and Φ ~q ⊕ ~d,~ ⊕ ~d ; a ,b . Note that the corresponding formula parameters are indis-tinguishable for H by construction, which means that H answers the same onboth formulae. ard satisﬁable formulas for DPLL with small memory 11 We show that both formulae have exactly one satisfying assignment and theirsatisfying assignments are opposite. The formula Φ q,w ; a,b with ψ a ,b removed, ithas exactly one satisfying assignment for each value of u ( A − ( ~q ⊕ ~d ) for u = 0 and A − ( ~w ⊕ ~d ) for u = 1 ). For the considered formulae, q takes the values ~ ⊕ ~d and q ⊕ ~d (and w , respectively, ~w ⊕ ~d and ~ ⊕ ~d ), on which ψ a ,b takesdiﬀerent values by construction. Moreover, ψ a ,b takes value on assignmentscorresponding to parameters q = ~ ⊕ ~d and w = ~ ⊕ ~d , which correspond toopposite values of u and are themselves opposite, since A commutes with theunit vector addition by Lemma 9.It remains to show that the constructed family of a formulae is a familyof hard formulae. After substitution of any variable, if the formula becomesunsatisﬁable, then the resulting formula has an unsatisﬁable subformula of theform ϕ q ∧ V i,j ( x i,j = x j ⊕ d [ i ]) ∧ ψ a,b , possibly with one substituted variable,the size of the refutation of which is not less than the size of the refutation ofan elementary unsatisﬁable subformula of a formula of the form ( A~x = ~q ) ∧ ψ a,b which is exponential. Corollary 2.

For any online heuristic H that uses o ( n log n ) memory, there existsa family of satisﬁable formulae such that DPLL H makes at least ℓ log c ℓ recursivecalls on formulae from this family for some c , where ℓ = Ω ( n log n ) .Proof. Consider the family of pairs of formulae from Theorem 2 and in eachpair choose the formula on which H goes into an unsatisﬁable subformula afterthe ﬁrst step. The size of the minimal tree-like resolution refutation of thissubformula is exponential, so DPLL H runs exponential time on it. Corollary 3.

For any oﬄine heuristic H that uses (1 − ε ) log n cells of memoryover the binary alphabet for some positive ε , there exists a family of satisﬁableformulae such that DPLL H makes at least ℓ log c ℓ recursive calls on formulae fromthis family for some c , where ℓ = Ω ( n log n ) .Proof. By Theorem 1, there exists an equivalent online heuristic H ′ that uses O (log n · (1 − ε ) log n · (1 − ε ) log n ) memory. It is easy to see that log n · (1 − ε ) log n · (1 − ε ) log n = o ( n log n ) . Thus, H ′ ∈ ′ ( o ( n log n ) , log) . Acknowledgments

The author is grateful to Alexander Okhotin for helpful discussions and alsograteful to Edward A. Hirsch, who supervised this work.

References

1. Alekhnovich, M., Hirsch, E., Itsykson, D.: Exponential lower bounds for the run-ning time of dpll algorithms on satisﬁable formulas. Journal of Automated Rea-soning , 131–143 (07 2004). https://doi.org/10.1007/978-3-540-27836-8_102 N. Gaevoy2. Ben-Sasson, E., Wigderson, A.: Short proofs are narrow — resolution made sim-ple. J. ACM (2), 149–169 (Mar 2001). https://doi.org/10.1145/375827.375835,http://doi.acm.org/10.1145/375827.3758353. Cook, J., Etesami, O., Miller, R., Trevisan, L.: Goldreich’s one-way function candidate and myopic backtracking algorithms. In: Pro-ceedings of the 6th Theory of Cryptography Conference on The-ory of Cryptography. pp. 521–538. TCC ’09, Springer-Verlag, Berlin,Heidelberg (2009). https://doi.org/10.1007/978-3-642-00457-5_31,http://dx.doi.org/10.1007/978-3-642-00457-5_314. Cook, J., Etesami, O., Miller, R., Trevisan, L.: On the one-way function candidateproposed by goldreich. ACM Trans. Comput. Theory (3), 14:1–14:35 (Jul 2014).https://doi.org/10.1145/2633602, http://doi.acm.org/10.1145/26336025. Davis, M., Logemann, G., Loveland, D.: A machine program for theorem-proving.Commun. ACM (7), 394–397 (Jul 1962). https://doi.org/10.1145/368273.368557,http://doi.acm.org/10.1145/368273.3685576. Davis, M., Putnam, H.: A computing procedure for quantiﬁcation the-ory. J. ACM (3), 201–215 (Jul 1960). https://doi.org/10.1145/321033.321034,http://doi.acm.org/10.1145/321033.3210347. Haken, A.: The intractability of resolution. Theoretical Computer Science , 297–308 (08 1985). https://doi.org/10.1016/0304-3975(85)90144-68. Itsykson, D.: Lower bound on average-case complexity of inversion of goldreich’sfunction by drunken backtracking algorithms. In: Ablayev, F., Mayr, E.W. (eds.)Computer Science – Theory and Applications. pp. 204–215. Springer Berlin Hei-delberg, Berlin, Heidelberg (2010)9. Itsykson, D., Knop, A.: Hard satisﬁable formulas for splittings by linear combina-tions. In: Gaspers, S., Walsh, T. (eds.) Theory and Applications of SatisﬁabilityTesting – SAT 2017. pp. 53–61. Springer International Publishing, Cham (2017)10. Itsykson, D., Sokolov, D.: The complexity of inversion of explicit goldreich’s func-tion dpll algorithms. In: Kulikov, A., Vereshchagin, N. (eds.) Computer Science –Theory and Applications. pp. 134–147. Springer Berlin Heidelberg, Berlin, Heidel-berg (2011)11. Itsykson, D., Sokolov, D.: Lower bounds for myopic dpll algorithmswith a cut heuristic. In: Proceedings of the 22Nd International Confer-ence on Algorithms and Computation. pp. 464–473. ISAAC’11, Springer-Verlag, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25591-5_48,http://dx.doi.org/10.1007/978-3-642-25591-5_4812. Stearns, R.E., Hartmanis, J., Lewis, P.M.: Hierarchies of memory limited computa-tions. In: 6th Annual Symposium on Switching Circuit Theory and Logical Design(SWCT 1965). pp. 179–190 (Oct 1965). https://doi.org/10.1109/FOCS.1965.1113. Tseitin, G.S.: On the Complexity of Derivation in Proposi-tional Calculus, pp. 466–483. Springer Berlin Heidelberg, Berlin,Heidelberg (1983). https://doi.org/10.1007/978-3-642-81955-1_28,https://doi.org/10.1007/978-3-642-81955-1_2814. Urquhart, A.: Hard examples for resolution. J. ACM34