[PDF] Graph Neural Networks and Boolean Satisfiability

Abstract

In this paper we explore whether or not deep neural architectures can learn to classify Boolean satisfiability (SAT). We devote considerable time to discussing the theoretical properties of SAT. Then, we define a graph representation for Boolean formulas in conjunctive normal form, and train neural classifiers over general graph structures called Graph Neural Networks, or GNNs, to recognize features of satisfiability. To the best of our knowledge this has never been tried before. Our preliminary findings are potentially profound. In a weakly-supervised setting, that is, without problem specific feature engineering, Graph Neural Networks can learn features of satisfiability.

Full PDF

GGraph Neural Networks and Boolean Satisﬁability

Benedikt B ¨unz

Department of Computer ScienceStanford UniversityStanford, CA 94305 [email protected]

Matthew Lamm ∗ Department of LinguisticsStanford UniversityStanford, CA 94305 [email protected]

Abstract

In this paper we explore whether or not deep neural architectures can learn to classify Boolean sat-isﬁability (SAT). We devote considerable time to discussing the theoretical properties of SAT. Then,we deﬁne a graph representation for Boolean formulas in conjunctive normal form, and train neuralclassiﬁers over general graph structures called Graph Neural Networks, or GNNs, to recognize fea-tures of satisﬁability. To the best of our knowledge this has never been tried before. Our preliminaryﬁndings are potentially profound. In a weakly-supervised setting, that is, without problem speciﬁcfeature engineering, Graph Neural Networks can learn features of satisﬁability.

The Boolean satisﬁability problem, or SAT, asks whether there exists a satisfying assignment to the variables of apropositional formula. If such an assignment exists, we say the problem is SAT. If it does not, we say it is UNSAT.It is assumed without loss of generality that formulas are given in conjunctive normal form, or CNF. A formula φ inCNF is written as a conjunction of M clauses ω i , each of which is a disjunction of literals, x j or ¬ x j . For example ( x ∨ ¬ x ∨ x ) ∧ ( x ∨ x ) ∧ ( ¬ x ∨ x ) is a CNF formula. It has been shown that SAT is NP-complete (Cook, 1971); this implies that even the hardestproblems in NP can be expressed as a SAT problem. On the other hand many SAT problems turn out to be easyin practice. Modern SAT solvers exist that can solve extremely large instances of SAT in a matter of milliseconds(Selman et al. , 1995). This disparity motivates the search for properties that make SAT instances difﬁcult.SAT is a self-reducible problem. Given an oracle determining whether a problem instance is satisﬁable or not, onecan ﬁnd a satisfying assignment in time linear in the number of variables. This has motivated recent work by Devlinand O’Sullivan (2008) examining the performance of a host of machine-learning classiﬁers for satisﬁability. In thatpaper, the authors employ a variety of manually-designed features, many of which encode graph-like properties ofCNF formulas such as the average number of unique clauses a variable appears in.Elsewhere, neural networks have shown great promise in reasoning about some subclasses of graphs, such as the tree-like structures of natural language syntax (Socher et al. , 2012). Despite a degree of opacity to what is in fact beinglearned by these models, they are remarkable in their ability to obtain state-of-the art performance in a problem spacethat is traditionally thought to require a great deal of expert knowledge and feature engineering. While the aptnessof neural networks for studying Boolean semantics does not fall out of these ﬁndings, we maintain that there is animportant connection between natural language meaning on the one hand, and formal-logical meaning on the other.At ﬁrst pass, we have found that the imposition of tree-like structure onto CNF formulas, in a sense forcing upon theman ad-hoc natural-language syntax, is an unsound approach. Building on these results, and the aforementioned ﬁndingthat manually-designed graph features can help other kinds of satisﬁability classiﬁers, we deﬁne a representation of ∗ The authors contributed equally to this article a r X i v : . [ c s . A I] F e b NF formulas as a graphs. We then train Graph Neural Networks, or GNNs, to classify problem instances as satisﬁableor unsatisﬁable (Scarselli et al. , 2009b).Our results are intriguing. GNNs are capable of classifying satisﬁability. We are still in the process of exploring howformal problem difﬁculty affects graph representations of CNF formulas, and how exactly this relates to the accuracyof GNN classiﬁers on these graphs.In section 2 we brieﬂy review both related work and preliminary experiments. In section 3 we discuss some theoreticalproperties of Boolean satisﬁability, emphasizing implications for training learners on weakly-supervised graphs todiscover properties of satisﬁability. In section 4 we review GNNs. In section 5 we describe our graph representationof CNF formulas. In section 6 we present experimental results, and conclude in section 7.

The idea to use neural architectures to solve combinatorial optimization problems initially gained traction during whatmight be termed the “ﬁrst wave” of neural network research. Hopﬁeld and Tank (1985) famously developed a neuralarchitecture that could solve some instances of the traveling salesman problem. In another instance Johnson (1989)used a neural architecture to encode 3-SAT.The general scheme of earlier approaches was to deﬁne a circuit over which an objective function attains globallyoptimum values only at satisfying assignments in the Boolean polytope. While these architectures allow for thereinterpretation of analog problems in digital form, in sufﬁciently complex cases the objective is riddled with localoptima that make gradient-based optimization difﬁcult. Moreover, they do not learn from data in any way, and in thissense do not exploit the power of neural networks suggested by more recent research.See Appendix A for results on our own implementation of a Johnson-style network.

Neural network classiﬁers have recently achieved high performance on natural language semantics tasks such as sen-timent analysis (Socher et al. , 2012). Propositional logic is a different kind of language, but one with a semanticsno less: The denotation of a Boolean expression is typically thought to be its truth value given an assignment to itsvariables. Also like natural language, this denotation is the result of a compositional operation on the truth values ofthe variables the formula contains.Recursive Neural Networks have been designed to leverage the tree-like syntactic structures of natural language tocapture complex aspects of meaning composition. This is an important point at which the latent structure of a CNFformula diverges from that of natural language. Logical conjunctions and disjunctions are fully commutative and thereis no natural interpretation of CNF syntax as being tree-like.There is a more philosophical distinction to be made about semantic classiﬁcation here. In natural language there aremultiple kinds of meaning , of which sentiment is but one example. The difﬁculty of classifying satisﬁability arises inpart from the fact that it is not a question of a particular kind of meaning, but rather at a Boolean formula’s capacity tomean . The task for a learner is not akin to classifying sentiment, or even to ﬁnding a valid interpretation. The learneris charged with determining whether or not some valid interpretation, of indeterminate form, exists at all. Satisﬁabilityclassiﬁcation is in this sense a problem of higher-order.In Appendix B we demonstrate our ﬁnding that Recursive Neural Networks over tree-like structures are ineffective atlearning about satisﬁability.

The difﬁculty of SAT problems exhibits a hard phase-shift phenomenon that has persistently puzzled theorists ofcomputability. In k-SAT problems where the number of atoms per CNF clause is ﬁxed to be exactly k, one observesa drastic increase in the percentage of unsatisﬁable problems after a very speciﬁc clause-to-literal ratio. In 3-SAT, forexample, the phase shift control ratio α cr ≈ . (Saitta et al. , 2011; Haim and Walsh, 2009).2t is further understood that for problems that fall far enough to the left of the phase shift, the relative Hammingdistance among solutions is low. There is another threshold before α cr , after which point the solution space is progres-sively broken up into exponentially many clusters. At the phase shift, then, solutions to formulas that are satisﬁableare scattered sparingly in an exponentially large Boolean hypercube, and are difﬁcult ﬁnd. After α cr , an altogetherdifferent phenomenon emerges. For the rare satisﬁable formulas that exist after the phase-shift, problems are discov-ered to have a backbone . The backbone is a set of variables, each of which takes on the same value for every satisfyingassignment to a given formula. (Saitta et al. , 2011)The phase shift and related phenomena likely have important implications for a learner of features of satisﬁability andunsatisﬁability. In an unsatisﬁable instance some unresolvable conﬂict arises out of the complex interplay of multipleBoolean variables distributed across multiple conjoined clauses, resulting in a logical contradiction. Given this fact, italready seems a daunting task to develop a neural architecture capable of discovering these features on its own givena set of problem-label pairs.There are other unanswered questions. Do unsatisﬁable formulas occuring after the phase shift, where satisﬁabileproblems have backbones, look very different from the unsatisﬁable problems to the left of the phase shift? Arethere different kinds of unsatisﬁability, each better highlighted by different representations of the relations that inhereamong variables and clauses in CNF formulas? These are questions that loom large as we design experiments andassess results. Graph Neural Networks, or GNNs, denote a class of neural networks that implement functions of the form τ ( G, n ) ∈ R m which map a graph G and one of its nodes into an m -dimensional Euclidean space. Scarselli et al. (2009a) showthat GNNs approximate any functions on graphs that satisfy preservation of unfolding equivalence . Informally, thismeans that GNNs fail to produce distinct outputs only when input graphs exhibit certain highly particular symmetries.By implication GNNs are capable of counting node degree, second-order node degree, and detecting cliques of a givensize in a set of graphs (Scarselli et al. , 2009b).Given a graph G = ( N, E ) , where N is a set of nodes and E is the set of edges between them, a GNN employs twofunctions. h w is a parametric function describing the relationships among nodes in a graph. g w models the connectionbetween output labels and the relationships described by h w . More speciﬁcally the state x n of node n ∈ G is computedas x n = (cid:88) u ∈ ne [ n ] h w ( l n , l n,u , x u , l u ) where l n is the label of node n , l n,u is the label of the edge between nodes n and its neighbor u , and so forth for u .Note the transformation deﬁnes a system of equations over all nodes, where the state x n of a node is a function ofthe states of x u of its neighbors. It is precisely this functional form that allows GNNs to extend beyond the capacityof FNNs and RNNs, because it can be deﬁned for undirected and cyclic graphs. In line with Banach’s ﬁxed-pointtheorem, it is assumed that h is a contraction map with respect to node state so that this system of equations has astable state.In Linear GNNs, transition functions are modeled as in graph autocorrelative models: h w ( l n , l n,u , x u , l u ) = A n,u x u + b n In Non-Linear GNNs, transition functions are instead modeled as multi-layered feed-forward neural networks (FNNs).In this instance, additional penalty terms are required to ensure that the resulting function is still a contraction map.In graph-level classiﬁcation, as opposed to node-level classiﬁcation, the learning set is given by L = { ( G i , n i,j , t i,j ) | G i = ( N i , E i ) ∈ G , n i,j ∈ N i , t i,j ∈ R m , ≤ i ≤ p, q i = j = 1 } where n i,j is a special output node containing classiﬁcation or regression targets. For node-level applications q i ≥ .Learning proceeds with a gradient descent method, and gradients are computed using backpropagation for graphnetworks. For details, see (Scarselli et al. , 2009b). 3 x ∨ x ¬x ∨ ¬x x ∨ ¬x x ∨ x ¬x ∨ x x x x Figure 1: Factor-graph representation of CNF formula

As previously discussed, the satisﬁability of a Boolean formula is a matter of whether or not the formula encodessome (potentially very complex) conﬂict among the variables over which it is deﬁned. It seems true that if a weakly-supervised algorithm is to learn to recognize these patterns of conﬂict it must provided at the very least with relationalinformation among variables in a formula. For example, whether a variable is negated in a given formula or whetheror not two variables appear in a clause together.That inter-variable and inter-clausal dependencies deﬁne a graph is borne out by the ﬁndings of Devlin and O’Sullivan(2008) for whom manually-designed graph-like features were found to help classiﬁcation of satisﬁability. This viewis additionally motivated by negative results presented in Appendix B, to the effect that neural networks for naturallanguage sentential semantics fail to classify satisﬁability. We suspect this is in part due to the fact that those models aredesigned to operate over structures in which linear ordering is important. As aforementioned, CNFs are commutative,and so dependencies among clauses are not a function of distance in any way.There are at least two obvious representations of Boolean formulas as graphs. These are the familiar clause-variablefactor graph (Figure 1) and variable-variable graph. The former is an undirected bipartite graph that reserves one nodetype for clauses and another for nodes. In one instance of this setup, the edge label between a node and a clausedenotes whether that node is negated or not in the clause. In the latter, nodes in a graph correspond exactly with thevariables in a formula, and are connected by undirected edges if two nodes appear in at least one clause together.

To our knowledge there has not been any work that has studied, either directly or indirectly, the effect of the phase-shiftcontrol parameters on CNFs as graphs.In one likely scenario, a weakly-supervised learner like a Graph Neural Net will be more effective at recognizingfeatures of satisﬁability in graphs describing CNF formulas that are distant from the phase shift, than for CNF formulasthat are close to it. This effect is described by (Devlin and O’Sullivan, 2008): Optimized search-solvers must performmore backtracking steps the closer they get to α cr . In another, more general scenario, the capacity to learn from graphrepresentations will be different in some way at points before the phase shift, at the phase shift, and after it. From an implementation perspsective, the variable-variable type is the simpler of the two representations. In GNNswith multiple node types it is sensible to implement type-speciﬁc transition functions (Scarselli et al. , 2009b). As weare just beginning to understand the behavior of GNNs we limit ourselves to a consideration of the variable-variablegraph type in our experiments.A variable-variable GNN graph G contains twice the number of nodes in the Boolean formula φ with which it cor-responds. Binary labels indicate whether a node is a negated literal or not ( x n vs. ¬ x n ), and equivalently indexedvariables are connected by a special edge. Nodes x n and x n (cid:48) , for n (cid:54) = n (cid:48) are connected by an edge if they appeartogether in a clause in φ . In our experiment, edges are given Euclidean labels l n ∈ R m where m is maximum number4 a) 4.4 Clause Atom Ratio (b) 6.6 Clause Atom Ratio (c) 10 Clause Atom Ratio Figure 2: Training and Validation error vs. epochsof clauses in the problems one chooses to consider. Entries e i = 1 in an edge label e if the two corresponding variablesappear together in clause i , and equal otherwise. CNF formulas encode a very powerful and theoretically studied classiﬁer: The clause-to-atom ratio. In order todetermine whether GNNs can learn anything beyond this intrinsic classiﬁer, we generate multiple training sets, eachat a ﬁxed clause-to-variable ratio. Additionally in order to analyze the effect of the phase shift in the learning abilityof GNNs we chose to use uniformly randomly generated 3-SAT instances (RAND-SAT). These instances have a ﬁxednumber of clauses and atoms and exactly 3 literals per clause. Atoms appear in clauses with uniform probability andare negated with uniform probability.We created 3 datasets with clause to atom ratios of (4.4,6.6,10). According to the clause-to-atom ratio, the probabilitythat a formula is satisﬁable is roughly 90% in the ﬁrst dataset, 50% in the second and 10% in the third. To preventstatistical skew from clouding the learning of features of satisﬁability we created balanced datasets that were exactly50% satisﬁable.GNNs signiﬁcantly improve over a random baseline for all three datasets. Interestingly, for the ratio closest to thephase-shift (where problems are thought to be the hardest) the network converged to a test accuracy of approximately70%, whereas the best accuracy for the two other problem sets was roughly 65%. The exact errors are shown in Table1. Figure 4 shows the training and validation error over time.Clause/Atoms Train Error Validation Error Train Error4.4 70.71% 69.50% 69.80%6.6 65.65% 66.30% 64.40%10 65.92% 65.70% 67.30%Table 1: Training, Validation and Testing Error

In this paper we made nontrivial headway in applying deep learning with Boolean satisﬁability. Initially, we soughtto analogize Boolean satisﬁability with natural language semantic classiﬁcation, a domain in which neural networkshave obtained state-of-the-art results for certain tasks. While our experiments to this effect were backgrounded infavor of later ﬁndings, they point to an interesting discovery: It does not seem that Boolean formulas in conjunctivenormal form can, in any principled way, be made to “look like” expressions in natural language. This discovery iscoextensive with the ﬁnding that more general graph-like representations are appropriate for representing the complexinterdependencies necessary for classifying satisﬁability. 5e also apply Graph Neural Networks in a novel way. Our results suggest that graph classiﬁcation may provide a newway of thinking about the theoretical properties of SAT, and that SAT can be used as a test domain for the expressiveproperties of neural learners.Fascinatingly, we ﬁnd that without any explicit feature engineering Graph Neural Networks trained on a variable-variable representation of CNF formulas can in fact learn features of satisﬁability, even for theoretically difﬁcultproblem instances.There are several obvious directions for future research. As mentioned in our results section, we did not have the timeto optimize for hyperparameters. For example, what kinds of nonlinearities are most appropriate for nonlinear, FNNtransformation functions? The variable-variable graph as we deﬁne it is just one possible graph representation of CNFformulas. We intend to explore the effect of other representations on the ability of GNNs to recognize satisﬁability.6

Circuit Solvers

As mentioned in Section 2 neural architectures can be used to represent combinatorial problems, such as SAT, asglobal optimization problems.Consider the following SAT formulation. Let x ∈ {− , } N be a vector where the j ’th element of x representsBoolean variable x j and is thus if x i is true and − otherwise. A SAT instance can be described by a matrix W where W i,j =  x j ∈ ω i − ¬ x j ∈ ω i otherwiseFurther consider the step activation function: θ ( x ) = (cid:26) if x ≥ otherwiseLet SAT W : {− , } N → { , } , deﬁned for a speciﬁc SAT instance W , be a function which maps an assignmentvector x to if x satisﬁes the instance and otherwise. Speciﬁcally: SAT W ( x ) = θ ( (cid:88) i θ ( W i, · · x + W i, · · x − . − M + . Note that max x SAT W ( x ) (cid:26) if W is satisﬁable otherwiseThat is, the global optima of the following function correspond bijectively with satisfying assignments. More speciﬁ-cally W i, · · x + W i, · · x − . is greater than zero if and only if W i,j = x j for some j . That is, if clause the disjunctiveclause i is satisﬁed. The objective value of SAT W is 1 if and only if all clauses are satisﬁed.The idea of the circuit solver is to approximate the step activation function using σ and additionally approximate thebinary nature of x by replacing x with tanh (ˆ x ) where ˆ x ∈ R N . Concretely, let: APPROXSAT W (ˆ x ) := σ ( (cid:88) i σ ( W i, · · tanh ( x ) + W i, · · tanh ( x ) − . − M + . APPROXSAT W : R N → (0 , retains some of the nice properties of SAT W while being deﬁned over R N and havingwell-deﬁned gradients. Speciﬁcally, when interpreting x j as true if x j > and false if x j < , note that v i · tanh ( x ) + v i · tanh ( x ) − . > implies that sgn ( v i j ) = sgn ( x j ) for some j and thus that the conjunctive clause i is satisﬁed.The objective value for a clause can only be positive if the clause is in fact satisﬁed. Additionally this implies that if APPROXSAT W (ˆ x ) > ⇒ SAT W ( sgn (ˆ x )) = 1 . The real valued ˆ x can be rounded to a satisfying assignment. A.1 Experiments

To test the neural sat encoding we ran a set of experiments. Random SAT problems are well known to be hard onlyfor a very narrow ratio of clauses to variables. For 3-Sat that ratio is believed to be around 4.3 (Saitta et al. , 2011)(Haim and Walsh, 2009). To test the scalability of our optimization approach we tested how well we were able toﬁnd satisfying assignments, or determine unsatisﬁability for different problem sizes. As stated before the procedurecannot result in false negatives, as an unsatisfying assignment could never be mistaken for a satisfying one. Figure 3thus plots the false positive rate vs. different problem sizes. We can see that for small problem instances the annealingapproach ﬁnds satisfying solutions for all instances that were in fact satisﬁable. For larger instances, the approachbegins to fail and is not successful.The optimization procedure gets stuck in local optima. the sigmoid function sgn is the signum function that returns the sign of a value %20%40%60%80%100% 0 5 10 15 20 25 30 35 SATs solved

SATs solved

Figure 3: Number of SAT’s solved vs. number of variables. 100 instances per data point

B Recursive Neural Networks

As described in Section 5, we maintain that CNF formulas are best represented as graphs. Feed-forward neuralnetworks are incapable of operating on graphs, as they are general function approximators for functions deﬁned overEuclidean space.Recursive Neural Networks (RNNs) are capable of classifying tree-structures. As a ﬁrst experiment we investigatedwhether the success of RNNs in natural language processing task had any implictions for classiﬁaction of Booleanformulas. In this setting we interpret disjunctive CNF clauses as words within a “sentence,” and a represent themusing vector representations of clauses ω j deﬁned in Appendix A.Unlike sentences in natural language, Boolean formulas do not have a tree-like syntactic structure. The clauses in aformula are inherently commutable and their meaning composition is not recursive or scopal. In order to impose sometree-structure on Boolean formulas, we hierarchically cluster clause vectors based on their cosine similarity. Eachnode in the resulting binary tree contains a subset of the clauses of the original formula. The root then “represents”the whole formula.In keeping with the sentiment classiﬁcation task of Socher et al. (2012), each node in the tree is labeled according tothe satisﬁability of the subproblem. Note that if a node is labeled SAT then implicitly so must the whole subtree belowit. An example is shown in Figure 4.The RNN was not able to successfully learn any meaningful distinction between satisﬁable and unsatisﬁable formulas.As described in Section 3 the satisﬁability is largely dependent on the clause to variable ratio. As each inner node ofthe RNN tree contains only a subset of the clauses and the same number of variables it is highly unlikely for such asubtree to be unsatisﬁable. For this reason, even though the full problems were chosen to be balanced between SATand UNSAT the cost function of the RNN was highly biased towards SAT. The RNN, thus, would guess every instanceto be SAT and could not improve upon that.We tested the setting in which only root nodes were labeled. Moreover, we tested other neural architectures with asomewhat tree-like, directed acyclic structure, but none of them showed any success over a random baseline.8igure 4: Number of SAT’s solved vs. number of variables. 100 instances per data point References

Stephen A. Cook. The complexity of theorem-proving procedures. In

Proceedings of the Third Annual ACM Sympo-sium on Theory of Computing , STOC ’71, pages 151–158, New York, NY, USA, 1971. ACM.David Devlin and Barry O’Sullivan. B.: Satisﬁability as a classiﬁcation problem. In

Proc. of the 19th Irish Conf. onArtiﬁcial Intelligence and Cognitive Science , 2008.Shai Haim and Toby Walsh. Restart strategy selection using machine learning techniques. In Oliver Kullmann, editor,

Theory and Applications of Satisﬁability Testing - SAT 2009 , volume 5584 of

Lecture Notes in Computer Science ,pages 312–325. Springer Berlin Heidelberg, 2009.J.J. Hopﬁeld and D.W. Tank. neural computation of decisions in optimization problems.

Biological Cybernetics ,52(3):141–152, 1985.James L. Johnson. A neural network approach to the 3-satisﬁability problem.

Journal of Parallel and DistributedComputing , 6(2):435 – 449, 1989.Lorenza Saitta, Attilio Giordana, and Antoine Cornujols.

Phase Transitions in Machine Learning . Cambridge Uni-versity Press, New York, NY, USA, 1st edition, 2011.Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. Computationalcapabilities of graph neural networks.

Neural Networks, IEEE Transactions on , 20(1):81–102, 2009.Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neuralnetwork model.

Neural Networks, IEEE Transactions on , 20(1):61–80, 2009.Bart Selman, Henry Kautz, and Bram Cohen. Local search strategies for satisﬁability testing. In

DIMACS SERIES INDISCRETE MATHEMATICS AND THEORETICAL COMPUTER SCIENCE , pages 521–532, 1995.Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. Semantic Compositionality Through Re-cursive Matrix-Vector Spaces. In