Graph Neural Networks and Boolean Satisfiability
GGraph Neural Networks and Boolean Satisfiability
Benedikt B ¨unz
Department of Computer ScienceStanford UniversityStanford, CA 94305 [email protected]
Matthew Lamm ∗ Department of LinguisticsStanford UniversityStanford, CA 94305 [email protected]
Abstract
In this paper we explore whether or not deep neural architectures can learn to classify Boolean sat-isfiability (SAT). We devote considerable time to discussing the theoretical properties of SAT. Then,we define a graph representation for Boolean formulas in conjunctive normal form, and train neuralclassifiers over general graph structures called Graph Neural Networks, or GNNs, to recognize fea-tures of satisfiability. To the best of our knowledge this has never been tried before. Our preliminaryfindings are potentially profound. In a weakly-supervised setting, that is, without problem specificfeature engineering, Graph Neural Networks can learn features of satisfiability.
The Boolean satisfiability problem, or SAT, asks whether there exists a satisfying assignment to the variables of apropositional formula. If such an assignment exists, we say the problem is SAT. If it does not, we say it is UNSAT.It is assumed without loss of generality that formulas are given in conjunctive normal form, or CNF. A formula φ inCNF is written as a conjunction of M clauses ω i , each of which is a disjunction of literals, x j or ¬ x j . For example ( x ∨ ¬ x ∨ x ) ∧ ( x ∨ x ) ∧ ( ¬ x ∨ x ) is a CNF formula. It has been shown that SAT is NP-complete (Cook, 1971); this implies that even the hardestproblems in NP can be expressed as a SAT problem. On the other hand many SAT problems turn out to be easyin practice. Modern SAT solvers exist that can solve extremely large instances of SAT in a matter of milliseconds(Selman et al. , 1995). This disparity motivates the search for properties that make SAT instances difficult.SAT is a self-reducible problem. Given an oracle determining whether a problem instance is satisfiable or not, onecan find a satisfying assignment in time linear in the number of variables. This has motivated recent work by Devlinand O’Sullivan (2008) examining the performance of a host of machine-learning classifiers for satisfiability. In thatpaper, the authors employ a variety of manually-designed features, many of which encode graph-like properties ofCNF formulas such as the average number of unique clauses a variable appears in.Elsewhere, neural networks have shown great promise in reasoning about some subclasses of graphs, such as the tree-like structures of natural language syntax (Socher et al. , 2012). Despite a degree of opacity to what is in fact beinglearned by these models, they are remarkable in their ability to obtain state-of-the art performance in a problem spacethat is traditionally thought to require a great deal of expert knowledge and feature engineering. While the aptnessof neural networks for studying Boolean semantics does not fall out of these findings, we maintain that there is animportant connection between natural language meaning on the one hand, and formal-logical meaning on the other.At first pass, we have found that the imposition of tree-like structure onto CNF formulas, in a sense forcing upon theman ad-hoc natural-language syntax, is an unsound approach. Building on these results, and the aforementioned findingthat manually-designed graph features can help other kinds of satisfiability classifiers, we define a representation of ∗ The authors contributed equally to this article a r X i v : . [ c s . A I] F e b NF formulas as a graphs. We then train Graph Neural Networks, or GNNs, to classify problem instances as satisfiableor unsatisfiable (Scarselli et al. , 2009b).Our results are intriguing. GNNs are capable of classifying satisfiability. We are still in the process of exploring howformal problem difficulty affects graph representations of CNF formulas, and how exactly this relates to the accuracyof GNN classifiers on these graphs.In section 2 we briefly review both related work and preliminary experiments. In section 3 we discuss some theoreticalproperties of Boolean satisfiability, emphasizing implications for training learners on weakly-supervised graphs todiscover properties of satisfiability. In section 4 we review GNNs. In section 5 we describe our graph representationof CNF formulas. In section 6 we present experimental results, and conclude in section 7.
The idea to use neural architectures to solve combinatorial optimization problems initially gained traction during whatmight be termed the “first wave” of neural network research. Hopfield and Tank (1985) famously developed a neuralarchitecture that could solve some instances of the traveling salesman problem. In another instance Johnson (1989)used a neural architecture to encode 3-SAT.The general scheme of earlier approaches was to define a circuit over which an objective function attains globallyoptimum values only at satisfying assignments in the Boolean polytope. While these architectures allow for thereinterpretation of analog problems in digital form, in sufficiently complex cases the objective is riddled with localoptima that make gradient-based optimization difficult. Moreover, they do not learn from data in any way, and in thissense do not exploit the power of neural networks suggested by more recent research.See Appendix A for results on our own implementation of a Johnson-style network.
Neural network classifiers have recently achieved high performance on natural language semantics tasks such as sen-timent analysis (Socher et al. , 2012). Propositional logic is a different kind of language, but one with a semanticsno less: The denotation of a Boolean expression is typically thought to be its truth value given an assignment to itsvariables. Also like natural language, this denotation is the result of a compositional operation on the truth values ofthe variables the formula contains.Recursive Neural Networks have been designed to leverage the tree-like syntactic structures of natural language tocapture complex aspects of meaning composition. This is an important point at which the latent structure of a CNFformula diverges from that of natural language. Logical conjunctions and disjunctions are fully commutative and thereis no natural interpretation of CNF syntax as being tree-like.There is a more philosophical distinction to be made about semantic classification here. In natural language there aremultiple kinds of meaning , of which sentiment is but one example. The difficulty of classifying satisfiability arises inpart from the fact that it is not a question of a particular kind of meaning, but rather at a Boolean formula’s capacity tomean . The task for a learner is not akin to classifying sentiment, or even to finding a valid interpretation. The learneris charged with determining whether or not some valid interpretation, of indeterminate form, exists at all. Satisfiabilityclassification is in this sense a problem of higher-order.In Appendix B we demonstrate our finding that Recursive Neural Networks over tree-like structures are ineffective atlearning about satisfiability.
The difficulty of SAT problems exhibits a hard phase-shift phenomenon that has persistently puzzled theorists ofcomputability. In k-SAT problems where the number of atoms per CNF clause is fixed to be exactly k, one observesa drastic increase in the percentage of unsatisfiable problems after a very specific clause-to-literal ratio. In 3-SAT, forexample, the phase shift control ratio α cr ≈ . (Saitta et al. , 2011; Haim and Walsh, 2009).2t is further understood that for problems that fall far enough to the left of the phase shift, the relative Hammingdistance among solutions is low. There is another threshold before α cr , after which point the solution space is progres-sively broken up into exponentially many clusters. At the phase shift, then, solutions to formulas that are satisfiableare scattered sparingly in an exponentially large Boolean hypercube, and are difficult find. After α cr , an altogetherdifferent phenomenon emerges. For the rare satisfiable formulas that exist after the phase-shift, problems are discov-ered to have a backbone . The backbone is a set of variables, each of which takes on the same value for every satisfyingassignment to a given formula. (Saitta et al. , 2011)The phase shift and related phenomena likely have important implications for a learner of features of satisfiability andunsatisfiability. In an unsatisfiable instance some unresolvable conflict arises out of the complex interplay of multipleBoolean variables distributed across multiple conjoined clauses, resulting in a logical contradiction. Given this fact, italready seems a daunting task to develop a neural architecture capable of discovering these features on its own givena set of problem-label pairs.There are other unanswered questions. Do unsatisfiable formulas occuring after the phase shift, where satisfiabileproblems have backbones, look very different from the unsatisfiable problems to the left of the phase shift? Arethere different kinds of unsatisfiability, each better highlighted by different representations of the relations that inhereamong variables and clauses in CNF formulas? These are questions that loom large as we design experiments andassess results. Graph Neural Networks, or GNNs, denote a class of neural networks that implement functions of the form τ ( G, n ) ∈ R m which map a graph G and one of its nodes into an m -dimensional Euclidean space. Scarselli et al. (2009a) showthat GNNs approximate any functions on graphs that satisfy preservation of unfolding equivalence . Informally, thismeans that GNNs fail to produce distinct outputs only when input graphs exhibit certain highly particular symmetries.By implication GNNs are capable of counting node degree, second-order node degree, and detecting cliques of a givensize in a set of graphs (Scarselli et al. , 2009b).Given a graph G = ( N, E ) , where N is a set of nodes and E is the set of edges between them, a GNN employs twofunctions. h w is a parametric function describing the relationships among nodes in a graph. g w models the connectionbetween output labels and the relationships described by h w . More specifically the state x n of node n ∈ G is computedas x n = (cid:88) u ∈ ne [ n ] h w ( l n , l n,u , x u , l u ) where l n is the label of node n , l n,u is the label of the edge between nodes n and its neighbor u , and so forth for u .Note the transformation defines a system of equations over all nodes, where the state x n of a node is a function ofthe states of x u of its neighbors. It is precisely this functional form that allows GNNs to extend beyond the capacityof FNNs and RNNs, because it can be defined for undirected and cyclic graphs. In line with Banach’s fixed-pointtheorem, it is assumed that h is a contraction map with respect to node state so that this system of equations has astable state.In Linear GNNs, transition functions are modeled as in graph autocorrelative models: h w ( l n , l n,u , x u , l u ) = A n,u x u + b n In Non-Linear GNNs, transition functions are instead modeled as multi-layered feed-forward neural networks (FNNs).In this instance, additional penalty terms are required to ensure that the resulting function is still a contraction map.In graph-level classification, as opposed to node-level classification, the learning set is given by L = { ( G i , n i,j , t i,j ) | G i = ( N i , E i ) ∈ G , n i,j ∈ N i , t i,j ∈ R m , ≤ i ≤ p, q i = j = 1 } where n i,j is a special output node containing classification or regression targets. For node-level applications q i ≥ .Learning proceeds with a gradient descent method, and gradients are computed using backpropagation for graphnetworks. For details, see (Scarselli et al. , 2009b). 3 x ∨ x ¬x ∨ ¬x x ∨ ¬x x ∨ x ¬x ∨ x x x x Figure 1: Factor-graph representation of CNF formula
As previously discussed, the satisfiability of a Boolean formula is a matter of whether or not the formula encodessome (potentially very complex) conflict among the variables over which it is defined. It seems true that if a weakly-supervised algorithm is to learn to recognize these patterns of conflict it must provided at the very least with relationalinformation among variables in a formula. For example, whether a variable is negated in a given formula or whetheror not two variables appear in a clause together.That inter-variable and inter-clausal dependencies define a graph is borne out by the findings of Devlin and O’Sullivan(2008) for whom manually-designed graph-like features were found to help classification of satisfiability. This viewis additionally motivated by negative results presented in Appendix B, to the effect that neural networks for naturallanguage sentential semantics fail to classify satisfiability. We suspect this is in part due to the fact that those models aredesigned to operate over structures in which linear ordering is important. As aforementioned, CNFs are commutative,and so dependencies among clauses are not a function of distance in any way.There are at least two obvious representations of Boolean formulas as graphs. These are the familiar clause-variablefactor graph (Figure 1) and variable-variable graph. The former is an undirected bipartite graph that reserves one nodetype for clauses and another for nodes. In one instance of this setup, the edge label between a node and a clausedenotes whether that node is negated or not in the clause. In the latter, nodes in a graph correspond exactly with thevariables in a formula, and are connected by undirected edges if two nodes appear in at least one clause together.
To our knowledge there has not been any work that has studied, either directly or indirectly, the effect of the phase-shiftcontrol parameters on CNFs as graphs.In one likely scenario, a weakly-supervised learner like a Graph Neural Net will be more effective at recognizingfeatures of satisfiability in graphs describing CNF formulas that are distant from the phase shift, than for CNF formulasthat are close to it. This effect is described by (Devlin and O’Sullivan, 2008): Optimized search-solvers must performmore backtracking steps the closer they get to α cr . In another, more general scenario, the capacity to learn from graphrepresentations will be different in some way at points before the phase shift, at the phase shift, and after it. From an implementation perspsective, the variable-variable type is the simpler of the two representations. In GNNswith multiple node types it is sensible to implement type-specific transition functions (Scarselli et al. , 2009b). As weare just beginning to understand the behavior of GNNs we limit ourselves to a consideration of the variable-variablegraph type in our experiments.A variable-variable GNN graph G contains twice the number of nodes in the Boolean formula φ with which it cor-responds. Binary labels indicate whether a node is a negated literal or not ( x n vs. ¬ x n ), and equivalently indexedvariables are connected by a special edge. Nodes x n and x n (cid:48) , for n (cid:54) = n (cid:48) are connected by an edge if they appeartogether in a clause in φ . In our experiment, edges are given Euclidean labels l n ∈ R m where m is maximum number4 a) 4.4 Clause Atom Ratio (b) 6.6 Clause Atom Ratio (c) 10 Clause Atom Ratio Figure 2: Training and Validation error vs. epochsof clauses in the problems one chooses to consider. Entries e i = 1 in an edge label e if the two corresponding variablesappear together in clause i , and equal otherwise. CNF formulas encode a very powerful and theoretically studied classifier: The clause-to-atom ratio. In order todetermine whether GNNs can learn anything beyond this intrinsic classifier, we generate multiple training sets, eachat a fixed clause-to-variable ratio. Additionally in order to analyze the effect of the phase shift in the learning abilityof GNNs we chose to use uniformly randomly generated 3-SAT instances (RAND-SAT). These instances have a fixednumber of clauses and atoms and exactly 3 literals per clause. Atoms appear in clauses with uniform probability andare negated with uniform probability.We created 3 datasets with clause to atom ratios of (4.4,6.6,10). According to the clause-to-atom ratio, the probabilitythat a formula is satisfiable is roughly 90% in the first dataset, 50% in the second and 10% in the third. To preventstatistical skew from clouding the learning of features of satisfiability we created balanced datasets that were exactly50% satisfiable.GNNs significantly improve over a random baseline for all three datasets. Interestingly, for the ratio closest to thephase-shift (where problems are thought to be the hardest) the network converged to a test accuracy of approximately70%, whereas the best accuracy for the two other problem sets was roughly 65%. The exact errors are shown in Table1. Figure 4 shows the training and validation error over time.Clause/Atoms Train Error Validation Error Train Error4.4 70.71% 69.50% 69.80%6.6 65.65% 66.30% 64.40%10 65.92% 65.70% 67.30%Table 1: Training, Validation and Testing Error
In this paper we made nontrivial headway in applying deep learning with Boolean satisfiability. Initially, we soughtto analogize Boolean satisfiability with natural language semantic classification, a domain in which neural networkshave obtained state-of-the-art results for certain tasks. While our experiments to this effect were backgrounded infavor of later findings, they point to an interesting discovery: It does not seem that Boolean formulas in conjunctivenormal form can, in any principled way, be made to “look like” expressions in natural language. This discovery iscoextensive with the finding that more general graph-like representations are appropriate for representing the complexinterdependencies necessary for classifying satisfiability. 5e also apply Graph Neural Networks in a novel way. Our results suggest that graph classification may provide a newway of thinking about the theoretical properties of SAT, and that SAT can be used as a test domain for the expressiveproperties of neural learners.Fascinatingly, we find that without any explicit feature engineering Graph Neural Networks trained on a variable-variable representation of CNF formulas can in fact learn features of satisfiability, even for theoretically difficultproblem instances.There are several obvious directions for future research. As mentioned in our results section, we did not have the timeto optimize for hyperparameters. For example, what kinds of nonlinearities are most appropriate for nonlinear, FNNtransformation functions? The variable-variable graph as we define it is just one possible graph representation of CNFformulas. We intend to explore the effect of other representations on the ability of GNNs to recognize satisfiability.6
Circuit Solvers
As mentioned in Section 2 neural architectures can be used to represent combinatorial problems, such as SAT, asglobal optimization problems.Consider the following SAT formulation. Let x ∈ {− , } N be a vector where the j ’th element of x representsBoolean variable x j and is thus if x i is true and − otherwise. A SAT instance can be described by a matrix W where W i,j = x j ∈ ω i − ¬ x j ∈ ω i otherwiseFurther consider the step activation function: θ ( x ) = (cid:26) if x ≥ otherwiseLet SAT W : {− , } N → { , } , defined for a specific SAT instance W , be a function which maps an assignmentvector x to if x satisfies the instance and otherwise. Specifically: SAT W ( x ) = θ ( (cid:88) i θ ( W i, · · x + W i, · · x − . − M + . Note that max x SAT W ( x ) (cid:26) if W is satisfiable otherwiseThat is, the global optima of the following function correspond bijectively with satisfying assignments. More specifi-cally W i, · · x + W i, · · x − . is greater than zero if and only if W i,j = x j for some j . That is, if clause the disjunctiveclause i is satisfied. The objective value of SAT W is 1 if and only if all clauses are satisfied.The idea of the circuit solver is to approximate the step activation function using σ and additionally approximate thebinary nature of x by replacing x with tanh (ˆ x ) where ˆ x ∈ R N . Concretely, let: APPROXSAT W (ˆ x ) := σ ( (cid:88) i σ ( W i, · · tanh ( x ) + W i, · · tanh ( x ) − . − M + . APPROXSAT W : R N → (0 , retains some of the nice properties of SAT W while being defined over R N and havingwell-defined gradients. Specifically, when interpreting x j as true if x j > and false if x j < , note that v i · tanh ( x ) + v i · tanh ( x ) − . > implies that sgn ( v i j ) = sgn ( x j ) for some j and thus that the conjunctive clause i is satisfied.The objective value for a clause can only be positive if the clause is in fact satisfied. Additionally this implies that if APPROXSAT W (ˆ x ) > ⇒ SAT W ( sgn (ˆ x )) = 1 . The real valued ˆ x can be rounded to a satisfying assignment. A.1 Experiments
To test the neural sat encoding we ran a set of experiments. Random SAT problems are well known to be hard onlyfor a very narrow ratio of clauses to variables. For 3-Sat that ratio is believed to be around 4.3 (Saitta et al. , 2011)(Haim and Walsh, 2009). To test the scalability of our optimization approach we tested how well we were able tofind satisfying assignments, or determine unsatisfiability for different problem sizes. As stated before the procedurecannot result in false negatives, as an unsatisfying assignment could never be mistaken for a satisfying one. Figure 3thus plots the false positive rate vs. different problem sizes. We can see that for small problem instances the annealingapproach finds satisfying solutions for all instances that were in fact satisfiable. For larger instances, the approachbegins to fail and is not successful.The optimization procedure gets stuck in local optima. the sigmoid function sgn is the signum function that returns the sign of a value %20%40%60%80%100% 0 5 10 15 20 25 30 35 SATs solved
SATs solved
Figure 3: Number of SAT’s solved vs. number of variables. 100 instances per data point
B Recursive Neural Networks
As described in Section 5, we maintain that CNF formulas are best represented as graphs. Feed-forward neuralnetworks are incapable of operating on graphs, as they are general function approximators for functions defined overEuclidean space.Recursive Neural Networks (RNNs) are capable of classifying tree-structures. As a first experiment we investigatedwhether the success of RNNs in natural language processing task had any implictions for classifiaction of Booleanformulas. In this setting we interpret disjunctive CNF clauses as words within a “sentence,” and a represent themusing vector representations of clauses ω j defined in Appendix A.Unlike sentences in natural language, Boolean formulas do not have a tree-like syntactic structure. The clauses in aformula are inherently commutable and their meaning composition is not recursive or scopal. In order to impose sometree-structure on Boolean formulas, we hierarchically cluster clause vectors based on their cosine similarity. Eachnode in the resulting binary tree contains a subset of the clauses of the original formula. The root then “represents”the whole formula.In keeping with the sentiment classification task of Socher et al. (2012), each node in the tree is labeled according tothe satisfiability of the subproblem. Note that if a node is labeled SAT then implicitly so must the whole subtree belowit. An example is shown in Figure 4.The RNN was not able to successfully learn any meaningful distinction between satisfiable and unsatisfiable formulas.As described in Section 3 the satisfiability is largely dependent on the clause to variable ratio. As each inner node ofthe RNN tree contains only a subset of the clauses and the same number of variables it is highly unlikely for such asubtree to be unsatisfiable. For this reason, even though the full problems were chosen to be balanced between SATand UNSAT the cost function of the RNN was highly biased towards SAT. The RNN, thus, would guess every instanceto be SAT and could not improve upon that.We tested the setting in which only root nodes were labeled. Moreover, we tested other neural architectures with asomewhat tree-like, directed acyclic structure, but none of them showed any success over a random baseline.8igure 4: Number of SAT’s solved vs. number of variables. 100 instances per data point References
Stephen A. Cook. The complexity of theorem-proving procedures. In
Proceedings of the Third Annual ACM Sympo-sium on Theory of Computing , STOC ’71, pages 151–158, New York, NY, USA, 1971. ACM.David Devlin and Barry O’Sullivan. B.: Satisfiability as a classification problem. In
Proc. of the 19th Irish Conf. onArtificial Intelligence and Cognitive Science , 2008.Shai Haim and Toby Walsh. Restart strategy selection using machine learning techniques. In Oliver Kullmann, editor,
Theory and Applications of Satisfiability Testing - SAT 2009 , volume 5584 of
Lecture Notes in Computer Science ,pages 312–325. Springer Berlin Heidelberg, 2009.J.J. Hopfield and D.W. Tank. neural computation of decisions in optimization problems.
Biological Cybernetics ,52(3):141–152, 1985.James L. Johnson. A neural network approach to the 3-satisfiability problem.
Journal of Parallel and DistributedComputing , 6(2):435 – 449, 1989.Lorenza Saitta, Attilio Giordana, and Antoine Cornujols.
Phase Transitions in Machine Learning . Cambridge Uni-versity Press, New York, NY, USA, 1st edition, 2011.Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. Computationalcapabilities of graph neural networks.
Neural Networks, IEEE Transactions on , 20(1):81–102, 2009.Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neuralnetwork model.
Neural Networks, IEEE Transactions on , 20(1):61–80, 2009.Bart Selman, Henry Kautz, and Bram Cohen. Local search strategies for satisfiability testing. In
DIMACS SERIES INDISCRETE MATHEMATICS AND THEORETICAL COMPUTER SCIENCE , pages 521–532, 1995.Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. Semantic Compositionality Through Re-cursive Matrix-Vector Spaces. In