Knowledge compilation languages as proof systems
KKnowledge compilation languages as proofsystems
Florent Capelli − − − Universit´e de Lille, Inria, UMR 9189 - CRIStAL - Centre de Recherche enInformatique Signal et Automatique de Lille, F-59000 Lille, France [email protected]
Abstract.
In this paper, we study proof systems in the sense of Cook-Reckhow for problems that are higher in the polynomial hierarchy thancoNP, in particular,
Keywords:
Knowledge compilation · Proof complexity · PropositionalModel Counting · maxSAT. Proof complexity studies the hardness of finding a certificate that a CNF formulais not satisfiable. A minimal requirement for such a certificate is that it shouldbe checkable in polynomial time in its size, so that it is easier for an independentchecker to assess the correctness of the proof than to redo the computation madeby a solver. While proof systems have been implicitly used for a long time startingwith resolution [11,10], their systematic study has been initiated by Cook andReckhow [7] who showed that unless NP = coNP , one cannot design a proofsystem where all unsatisfiable CNF have short certificates. Nevertheless, manyunsatisfiable CNF may have short certificates if the proof system is powerfulenough, motivating the study of how such systems, such as resolution [10] orpolynomial calculus [5], compares in terms of succinctness (see [18] for a survey).More recently, proof sytems found practical applications as SAT solvers areexpected – since 2013 – to output proof of unsatisfiability in SAT competitionsto avoid implementation bugs.While the proof systems implicitly defined by the execution trace of modernCDCL SAT solvers is fairly well understood [20], it is not the case for tools solvingharder problems on CNF formulas such as SAT and
MaxSAT . For
MaxSAT , aresolution-like system for
MaxSAT has been proposed by Bonet et al. [2] for whicha compressed version has been used in a solver by Bacchus and Narodytska [17]but it is to the best of our knowledge the only such proof system. To the bestof our knowledge, no proof system has been proposed for
SAT . a r X i v : . [ c s . CC ] M a r F. Capelli
In this short paper, we introduce new proof systems for
SAT and
MaxSAT .Contrary to the majority of proof systems for
SAT , our proof systems are notbased on the iterative application of inference rules on the original CNF formula.In our proof systems, our certificates are restricted Boolean circuits representingthe Boolean function computed by the input CNF formula. These restricted cir-cuits originate from the field of knowledge compilation [9], whose primary focusis to study the succinctness and tractability of representations such as Read OnceBranching Programs [23] or deterministic DNNF [8] and how CNF formula canbe transformed into such representations. To use them as certificates for
SAT ,we first have to add some extra information in the circuit so that one can checkin polynomial time that they are equivalent to the original CNF. The syntac-tic properties of the input circuits then allow to efficiently count the numberof satisfying assignments, resulting in the desired proof system. Moreover, weobserve that most tools doing exact model counting are already implicitly gen-erating such proofs. Our result generalizes known connections between regularresolution and Read Once Branching Programs (see [13, Section 18.2]).The paper is organized as follows. Section 2 introduces all the notions thatwill be used in the paper. Section 3 contains the definition of certified dec-DNNF that allows us to define our proof systems for
SAT and
MaxSAT . Assignments and Boolean functions.
Let X be a finite set of variables and D afinite domain. We denote the set of functions from X to D as D X . An assignmenton variables X is an element of { , } X . A Boolean function f on variables X isan element of { , } { , } X , that is, a function that maps an assignment to a valuein { , } . An assignment τ ∈ { , } X such that f ( τ ) = 1 is called a satisfyingassignment of f , denoted by τ | = f . We denote by ⊥ X the Boolean function onvariables X whose value is always 0. Given two Boolean functions f and g onvariables X , we write f ⇒ g if for every τ , f ( τ ) ≤ g ( τ ). CNF.
Let X be a set of variable. A literal on variable X is either a variable x ∈ X or the negation ¬ x of a variable x ∈ X . A clause is a disjunction ofliterals. A conjunctive normal form formula , CNF for short, is a conjunction ofclauses. A CNF naturally defines a Boolean function on variables X : a satisfyingassignment for a CNF F on variable X is an assignment τ ∈ { , } X such thatfor every clause C of F , there exists a literal (cid:96) of C such that τ ( (cid:96) ) = 1 (where wedefine τ ( ¬ x ) := 1 − τ ( x )). We often identify a CNF with the Boolean functionit defines.The problem SAT is the problem of deciding, given a CNF formula F , whether F has a satisfying assignment. It is the generic NP -complete problem [6]. Theproblem UNSAT is the problem of deciding, given a CNF formula F , whether F does not have a satisfying assignment. It is the generic coNP -complete problem.Given a CNF F , we denote by F = |{ τ | τ | = F }| the number of solutionsof F and by M ( F ) = max τ |{ C ∈ F | τ | = C }| the maximum number of clauses nowledge compilation languages as proof systems 3 of F that can be simultaneously satisfied. The problem SAT is the problem ofcomputing F given a CNF F as input and the problem MaxSAT is the problemof computing M ( F ) given a CNF F as input. Cook-Reckhow proof systems.
Let
Σ, Σ (cid:48) be finite alphabets. A (Cook-Reckhow)proof system [7] for a language L ⊆ Σ ∗ is a surjective polynomial time com-putable function f : Σ (cid:48) → L . Given a ∈ L , there exists, by definition, b ∈ Σ (cid:48) such that f ( b ) = a . We will refer to b as being a certificate of a .In this paper, we will mainly be interested in proof systems for the problems SAT and
MaxSAT , that is, we would like to design polynomial time verifiableproofs that a CNF formula has k solutions or that at most k clauses in theformula can be simultaneously satisfied. For the definition of Cook-Reckhow,this could translate to finding a proof system for the languages { ( F, F ) | F is a CNF } and { ( F, M ( F )) | F is a CNF } .For example, a naive proof system for SAT could be the following: a cer-tificate that F has k solutions would be the list of the k solutions togetherwith a resolution proof that F (cid:48) = F ∧ (cid:86) τ | τ | = F C τ is not satisfiable where C τ := (cid:87) { x | τ ( x )=0 } x ∨ (cid:87) {¬ x | τ ( x )=1 } ¬ x is the clause such that the only non-satisfyingassignment is τ . One could then check in polynomial time that each of the k as-signments satisfies F and that F (cid:48) is indeed unsatisfiable and then output ( F, k ).This proof system is however not very interesting as one can construct very sim-ple CNF with exponentially many solutions: for example the empty CNF on n variables has 2 n and will thus have a certificate of size at least 2 n . dec - DNNF . A decision Decomposable Negation Normal Form circuit D on vari-ables X , dec-DNNF for short, is a directed acyclic graph (DAG) having exactlyone node of indegree 0 called the source . Nodes of outdegree 0 are called the sinks and are labeled by 0 or 1. The other nodes have outdegree 2 and can beof two types: – The decision nodes are labeled with a variable x ∈ X . One outgoing edge islabeled with 1 and the other by 0, represented respectively as a solid and adashed edge in our figures. – The ∧ -nodes are labeled with ∧ .Moreover, we have two other syntactic properties. We introduce a few notationsbefore explaining them. If there is a decision node in D labeled with variable x ,we say that x is tested in D . We denote by var ( D ) the set of variables testedin D . Given a node α of D , we denote by D ( α ) the dec-DNNF whose source is α and nodes are the nodes that can be reached in D starting from α . We alsoassume the following: – Every x ∈ X is tested at most once on every source-sink path of D . – Every ∧ -gate of D are decomposable , that is, for every ∧ -node α with suc-cessors β, γ in D , it holds that var ( D ( β )) ∩ var ( D ( γ )) = ∅ .Let τ ∈ { , } X . A source-sink path P in D is compatible with τ if and only ifwhen x is tested on P , the outgoing edge labeled with τ ( x ) is in P . We say that F. Capelli x ∧ ∧ y z
10 0 yz Fig. 1. A dec-DNNF computing x = y = z . τ satisfies D if only 1-sinks are reached by paths compatible with τ . A dec-DNNF and the paths compatible with the assignment τ ( x ) = τ ( y ) = 0 , τ ( z ) = 1 aredepicted in bold red on Figure 1. Observe that a 0-sink is reached so τ doesnot satisfy D . We will often identify a dec-DNNF with the Boolean function itcomputes. Observation 1
Given a dec - DNNF D on variables X and a source-sink path P in D , there exists τ ∈ { , } X such that P is compatible with τ . Indeed, bydefinition, every variable x ∈ X is tested at most once in P , thus, if x is testedon P in a decision node α and P contains the outgoing edge labeled with v x , wecan choose τ ( x ) := v x . The value of τ for a variable x not tested on P can bechosen arbitrarily. The size of a dec-DNNF D , denoted by size ( D ) is the number of edges of theunderlying graph of D . Tractable queries.
The main advantage of representing a Boolean function with a dec-DNNF is that it makes the analysis of the function easier. Given a dec-DNNF ,one can easily find a satisfying assignment by only following paths backward from1-sinks. Similarly, one can also count the number of satisfying assignments orfind one satisfying assignment with the least number of variables set to 1 etc. Therelation between the queries that can be solved efficiently and the representationof the Boolean function has been one focus of Knowledge Compilation. See [9]for an exhaustive study of tractable queries depending on the representation.Let f : 2 X → { , } be a Boolean function. In this paper, we will mainly beinterested in solving the following problems: – Model Counting Problem ( MC ): return the number of satisfying assignmentof f . – Clause entailment ( CE ): given a clause C on variables X , does f ⇒ C ? – Maximal Hamming Weight ( HW ): given Y ⊆ X , computemax τ | = f |{ y ∈ Y | τ ( y ) = 1 }| . All these problems are tractable when the Boolean function is given as a dec-DNNF : Theorem 1 ([8,14]).
Given a dec - DNNF D , one can solve problems MC , CE , HW on the Boolean function represented by D in linear time in size ( D ) . nowledge compilation languages as proof systems 5 The tractability of CE on dec-DNNF has the following useful consequence: Corollary 1.
Given a dec - DNNF D and a CNF formula F , one can check intime O ( size ( F ) × size ( D )) whether D ⇒ F .Proof. One simply has to check that for every clause C of F , D ⇒ C , which canbe done in polynomial time by Theorem 1. Theorem 1 suggests that given a CNF F , one could use a dec-DNNF D computing F as a certificate for SAT . The proof system could then check the certificateas follows:1. Compute the number k of satisfying assignments of D .2. Check whether F is equivalent to D .3. If so, return ( F, k ).While Step 1 can be done in polynomial time by Theorem 1, it turns out thatStep 2 is not tractable:
Theorem 2.
The problem of checking, given a CNF F and an dec - DNNF D asinput, whether F ⇒ D is coNP -complete.Proof. The problem is clearly in coNP . For completeness, there is a straight-forward reduction to
UNSAT . Indeed, observe that a CNF F on variables X isnot satisfiable if and only if F ⇒ ⊥ X . Moreover, ⊥ X is easily represented as a dec-DNNF having only one node: a 0-labeled sink. dec-DNNF The reduction used in the proof of Theorem 2 suggests that the coNP -completenessof checking whether F ⇒ D comes from the fact that dec-DNNF can succinctlyrepresent ⊥ . In this section, we introduce restrictions of dec-DNNF called cer-tified dec-DNNF for which one can check whether a CNF formula entails thecertified dec-DNNF . The idea is to add information on 0-sink to explain whichclause would be violated by an assignment leading to this sink.Our inspiration comes from a known connection between regular resolutionand read once branching programs ( i.e. a dec-DNNF without ∧ -gate [1]) thatappears to be folklore but we refer the reader to the book by Jukna [13, Section18.2] for a thorough and complete presentation. It turns out that a regularresolution proof of unsatisfiability of a CNF F can be represented by a readonce branching program D whose sinks are labeled with clauses of F . Moreover,for every τ , if a sink labeled by a clause C is reached by a path compatible with τ , then C ( τ ) = 0. We generalize this idea so that the function represented by a dec-DNNF is not only an unsatisfiable CNF: A regular resolution proof is a resolution proof where, on each path, a variable isresolved at most once. F. Capelli
Definition 1. A certified dec-DNNF D on variables X is a dec - DNNF on vari-ables X such that every -sink α of D is labeled with a clause C α . D is said tobe correct if for every τ ∈ { , } X such that there is a path from the source of D to a -sink α compatible with τ , C α ( τ ) = 0 . Given a certified dec-DNNF , we denote by Z ( D ) the set of 0-sinks of D andby F ( D ) = (cid:86) α ∈ Z ( D ) C α .Intuitively, the clause labeling a 0-sink is an explanation on why one assign-ment does not satisfy the circuit. The degenerated case where there are only0-sinks and no ∧ -gates corresponds to the characterization of regular resolution.A crucial property of certified dec-DNNF is that their correctness can betested in polynomial time: Theorem 3.
Given a certified dec - DNNF D , one can check in polynomial timewhether D is correct.Proof. By definition, D is not correct if and only if there exists a 0-sink α , aliteral (cid:96) in C α , an assignment τ such that τ ( (cid:96) ) = 1 and a path in D from thesource to α compatible with τ . By Observation 1, it is equivalent to the fact thatthere exists a path from the source to α that: either does not test the underlyingvariable of (cid:96) or contains the outgoing edge corresponding to τ ( (cid:96) ) = 1 when theunderlying variable of (cid:96) is tested.In other words, D is correct if and only if for every 0-sink α and for everyliteral (cid:96) of C α with variable x , every path from the source to α tests variable x and contains the outgoing edge corresponding to an assignment τ such that τ ( (cid:96) ) = 0.This can be checked in polynomial time. Indeed, fix a 0-sink α and a literal (cid:96) of C α . For simplicity, we assume that (cid:96) = x (the case (cid:96) = ¬ x is completelysymmetric). We have to check that every path from the source to α contains adecision node β on variable x and contains the outgoing edge of β labeled with0. To check this, it is sufficient to remove all the edges labeled with 0, goingout of a decision node on variable x and test that the source and α are nowin two different connected components of D , which can obviously be done inpolynomial time. Running this for every 0-sink α and every literal (cid:96) of C α givesthe expected algorithm.The clauses labeling the 0-sinks of a correct certified dec-DNNF naturallyconnect to the function computed by D : Theorem 4.
Let D be a correct certified dec - DNNF on variables X . We have F ( D ) ⇒ D .Proof. Observe that F ( D ) ⇒ D if and only if for every τ ∈ { , } X , if τ doesnot satisfy D then τ does not satisfy F ( D ). Now let τ be an assignment thatdoes not satisfy D . By definition, there exists a path compatible with τ from thesource of D to a 0-sink α of D . Since D is correct, C α ( τ ) = 0. Thus, τ does notsatisfy F ( D ) as C α is by definition a clause of F ( D ). Corollary 2.
Let F be CNF formula and D be a correct certified dec - DNNF such that every clause of F ( D ) are also in F . Then F ⇒ D . nowledge compilation languages as proof systems 7 Proof system for
SAT . One can use certified dec-DNNF to define a proofsystem for
SAT . The
Knowledge Compilation based Proof System for
SAT , kcps ( SAT ) for short, is defined as follows: given a CNF F , a certificate that F has k satisfying assignments is a correct certified dec-DNNF D such that: – every clause of F ( D ) are clauses of F , – D computes F and has k satisfying assignments.To check a certificate D , one has to check that D is equivalent to F andhas indeed k satisfying assignments, which can be done in polynomial time asfollows: – Check that D is correct, which is tractable by Theorem 3. – Check that D ⇒ F , which is tractable by Corollary 1 and that every clauseof F ( D ) are clauses of D . By Corollary 2, it means that D ⇔ F . – Computes the number k of solutions of D , which is tractable by Theorem 1. – Returns (
F, k ).This proof system for
SAT is particularly well-suited for the existing toolssolving
SAT in practice. Many of them such as sharpSAT [22] or cachet [21]are based on a generalization of DPLL for counting which is sometimes referedas exhaustive DPLL in the literature. It has been observed by Huang and Dar-wiche [12] that these tools were implicitly constructing a dec-DNNF equivalentto the input formula. Tools such as c2d [19], D4 [15] or DMC [16] already exploitthis connection and have the option to directly output an equivalent dec-DNNF .These solvers explore the set of satisfying assignments by branching on vari-ables of the formula which correspond to a decision node and, when two variableindependent components of the formula are detected, compute the number ofsatisfying assignments of both components and take the product, which corre-sponds to a decomposable ∧ -gate. When a satisfying assignment is reached, itcorresponds to a 1-sink. If a clause is violated by the current assignment, thenit corresponds to a 0-sink. At this point, the solvers could also label the 0-sinkby the violated clause which would give a correct certified dec-DNNF . Proof system for
MaxSAT . As for
SAT , one can exploit the tractability ofmany problems on dec-DNNF to define a proof system for
MaxSAT . Given aCNF formula F , ket ˜ F = (cid:86) C ∈ F C ∨ ¬ s C be the formula where each clause isaugmented with a fresh selector variable. Let S = { s C | C ∈ F } . Observe that M ( F ) is exactly max τ | = ˜ F |{ s ∈ S | τ ( s ) = 1 }| since if τ | = ˜ F and τ ( s C ) =1, then τ | = C . By Theorem 1, if ˜ F is represented by a dec-DNNF D , thenone can solve this problem in polynomial time in size ( D ). The proof system kcps ( MaxSAT ) is defined as follows: given a CNF F , a certificate is a correctcertified dec-DNNF D with clauses in ˜ F that computes ˜ F . The proof may bechecked as before by checking both the correctness of D and the fact that D ⇔ ˜ F .However, we are not aware of any tool solving MaxSAT based on this technique
F. Capelli and thus the implementation of such a proof system in existing tools may not berealistic. It will still be worth comparing this proof system with the resolutionfor
MaxSAT [2].In general, we observe that we can use this idea to build a proof system kcps ( Q ) for any tractable problem Q on dec-DNNF . This could for example beapplied to weighted versions of SAT and
MaxSAT . Combining proof systems.
An interesting feature of kcps -like proof systems isthat they can be combined with other proof systems for
UNSAT to be mademore powerful. Indeed, one could label the 0-sink of the dec-DNNF with a clause C that are not originally in the initial CNF F but that is entailed by F , that is, F ⇒ C . In this case, Corollary 2 would still hold. The only thing that is neededto obtain a real proof system is that a proof that F ⇒ C has to be given alongthe correct certified dec-DNNF , that is, a proof of unsatisfiability of F ∧ ¬ C . Anyproof system for UNSAT may be used here.
Lower bounds.
Lower bounds on the size of dec-DNNF representing CNF for-mulas may be directly lifted to lower bounds for kcps ( SAT ) or kcps ( MaxSAT ).There exists families of monotone 2-CNF that cannot be represented as polyno-mial size dec-DNNF [1,3,4]. It directly gives the following corollary:
Corollary 3.
There exists a family ( F n ) n ∈ N of monotone -CNF such that F n is of size O ( n ) and any proof for F n in kcps ( SAT ) and kcps ( MaxSAT ) is of sizeat least Ω ( n ) . An interesting open question is to find CNF formulas having polynomial size dec-DNNF but no small proof in kcps ( SAT ). In this paper, we have developed techniques based on circuits used in knowledgecompilation to extend existing proof systems for tautology to harder problems.It seems possible to implement these systems into existing tools for
SAT basedon exhaustive DPLL, which would allow these tools to provide an independentlycheckable certificate that their output is correct, the same way
SAT -solvers re-turns a proof on unsatisfiable instances. It would be interesting to see how addingthe computation of this certificate to existing solver impacts their performances.Another interesting direction would be to compare the power of kcps ( MaxSAT )with the resolution for
MaxSAT of Bonet et al. [2] and to see how such proofsystems could be implemented in existing tools for
MaxSAT . Finally, we thinkthat a systematic study of other languages used in knowledge compilation suchas deterministic DNNF should be done to see if they can be used as proof sys-tems, by trying to add explanations on why an assignment does not satisfy thecircuit. nowledge compilation languages as proof systems 9
References
1. Paul Beame, Jerry Li, Sudeepa Roy, and Dan Suciu. Lower bounds for exactmodel counting and applications in probabilistic databases. In
Proceedings of theTwenty-Ninth Conference on Uncertainty in Artificial Intelligence , 2013.2. Mar´ıa Luisa Bonet, Jordi Levy, and Felip Many`a. Resolution for Max-SAT.
Arti-ficial Intelligence , 171(8–9):606–618, June 2007.3. Simone Bova, Florent Capelli, Stefan Mengel, and Friedrich Slivovsky. KnowledgeCompilation Meets Communication Complexity. In
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, NewYork, NY, USA, 9-15 July 2016 , pages 1008–1014, 2016.4. Florent Capelli.
Structural restrictions of CNF formulas: application to modelcounting and knowledge compilation . PhD thesis, Universit´e Paris Diderot, 2016.5. Matthew Clegg, Jeffery Edmonds, and Russell Impagliazzo. Using the groebnerbasis algorithm to find proofs of unsatisfiability. In
Proceedings of the Twenty-eighth Annual ACM Symposium on Theory of Computing , STOC ’96, 1996.6. Stephen A Cook. The complexity of theorem-proving procedures. In
Proceedings ofthe third annual ACM symposium on Theory of computing , pages 151–158. ACM,1971.7. Stephen A Cook and Robert A Reckhow. The relative efficiency of propositionalproof systems.
The Journal of Symbolic Logic , 44(1):36–50, 1979.8. Adnan Darwiche. On the tractable counting of theory models and its applicationto truth maintenance and belief revision.
Journal of Applied Non-Classical Logics ,11(1-2):11–34, 2001.9. Adnan Darwiche and Pierre Marquis. A Knowledge Compilation Map.
Journal ofArtificial Intelligence Research , 17:229–264, 2002.10. Martin Davis, George Logemann, and Donald Loveland. A machine program fortheorem-proving.
Commun. ACM , 5(7):394–397, July 1962.11. Martin Davis and Hilary Putnam. A Computing Procedure for QuantificationTheory.
J. ACM , 7(3):201–215, July 1960.12. Jinbo Huang and Adnan Darwiche. DPLL with a trace: From SAT to knowledgecompilation. In
Proceedings of the Nineteenth International Joint Conference onArtificial Intelligence , pages 156–162, 2005.13. Stasys Jukna.
Boolean Function Complexity - Advances and Frontiers , volume 27of
Algorithms and combinatorics . Springer, 2012.14. Fr´ed´eric Koriche, Daniel Le Berre, Emmanuel Lonca, and Pierre Marquis. Fixed-parameter tractable optimization under DNNF constraints. In
ECAI 2016 - 22ndEuropean Conference on Artificial Intelligence, 29 August-2 September 2016, TheHague, The Netherlands - Including Prestigious Applications of Artificial Intelli-gence (PAIS 2016) , pages 1194–1202, 2016.15. Jean-Marie Lagniez and Pierre Marquis. An improved decision-dnnf compiler.In
Proceedings of the Twenty-Sixth International Joint Conference on ArtificialIntelligence, IJCAI 2017 , 2017.16. Jean-Marie Lagniez, Pierre Marquis, and Nicolas Szczepanski. Dmc: A distributedmodel counter. In
IJCAI , pages 1331–1338, 2018.17. Nina Narodytska and Fahiem Bacchus. Maximum satisfiability using core-guidedmaxsat resolution. In
Twenty-Eighth AAAI Conference on Artificial Intelligence ,2014.18. Jakob Nordstr¨om. Pebble games, proof complexity, and time-space trade-offs.
Log-ical Methods in Computer Science (LMCS) , 9(3), 2013.0 F. Capelli19. Umut Oztok and Adnan Darwiche. A top-down compiler for sentential decisiondiagrams. In
Proceedings of the Twenty-Fourth International Joint Conference onArtificial Intelligence, IJCAI 20155 , pages 3141–3148, 2015.20. Knot Pipatsrisawat and Adnan Darwiche. On the power of clause-learning satsolvers as resolution engines.
Artificial Intelligence , 175(2):512–525, 2011.21. Tian Sang, Fahiem Bacchus, Paul Beame, Henry A Kautz, and Toniann Pitassi.Combining component caching and clause learning for effective model counting.
Theory and Applications of Satisfiability Testing , 4:7th, 2004.22. Marc Thurley. sharpsat–counting models with advanced component caching andimplicit bcp. In
Theory and Applications of Satisfiability Testing , pages 424–429.Springer, 2006.23. Ingo Wegener.