Circuit Satisfiability Problem for circuits of small complexity
aa r X i v : . [ c s . CC ] D ec CIRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALLCOMPLEXITY
MARSEL MATDINOV
Abstract.
The following problem is considered. A Turing machine M that accepts astring of fixed length t as input, runs for a time not exceeding a fixed value n and isguaranteed to produce a binary output is given. It’s required to find a string X suchthat M ( X ) = 1 effectively in terms of t , n , the size of the alphabet of M and thenumber of states of M . The problem is close to the well-known Circuit SatisfiabilityProblem. The difference from Circuit Satisfiability Problem is that when reduced toCircuit Satisfiability Problem, we get circuits with a rich internal structure (in particular,these are circuits of small Kolmogorov complexity). The proof system, operating withpotential proofs of the fact that, for a given machine M , the string X does not exist,is provided, its completeness is proved and the algorithm guaranteed to find a proof ofthe absence of the string X in the case of its actual absence is presented (in the worstcase, the algorithm is exponential, but in a wide class of interesting cases it works inpolynomial time). We present an algorithm searching for the string X , for which itsefficiency was neither tested, nor proven, and it may require serious improvement in thefuture, so it can be regarded as an idea. We also discuss first steps towards solving amore complex problem similar to this one: a Turing machine M that accepts two strings X and Y of fixed length and running for a time that does not exceed a fixed value isgiven; it is required to build an algorithm N that builds a string Y = N ( X ) for anystring X , such that M ( X, Y ) = 1 (details in the introduction).
Contents
1. Introduction. 32. Proof system 52.1. Ordinary 52.2. With suspension and additional construction 63. Links to topology 84. Cocyclic polynomials 104.1. Definition 104.2. Local check for cocyclicity. 114.3. Reduction of a proof system "generalized cell" to a proof system "cocyclicpolynomial" . 124.4. Completeness 134.5. Efficient search for protected cocyclic polynomials of bounded degree. 145. General algorithm 156. Search for cocyclic polynomials. 266.1. What we are looking for. 266.2. Half-space method. 286.3. Quadratic function optimization method. 297. On the estimated complexity of the algorithm. 318. Big string writing problem. 319. Further work. 349.1. Feature space. 34
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 3 Introduction.
We will talk about a problem that is close to the question of equality of classes P and N P . I assumed that many problems of olympiad programming, discrete mathematics, andtheoretical computer science can be formulated as special cases of the following problem.Turing machine M that accepts two strings is given. (it will be convenient for us to solvethe problem if the lengths of these two strings are fixed and the machine is running fora time that does not exceed a fixed value). It’s required to build a Turing machine N that builds either a string N ( x ) , given a string x , such that M ( x, N ( x )) = 1 , or outputsa special character if there is no such string y that M ( x, y ) = 1 . We will refer to thisproblem as the "big string writing problem".As a "small string writing problem" we will call the following problem. A Turing ma-chine is given that accepts a fixed-length string as input and runs for a time not exceedinga fixed value. It’s required to write a string on which this Turing machine returns 1 ormake sure that there is no such string.A small string writing problem can be reduced to a big one, a big problem can bealmost reduced to a small one. I do not want to talk now about these simple reductionsand explain what "almost" means, we will not need it yet.There is a large, pretty poorly defined, class of problems similar to these two whichare often found in mentioned areas. For example, the following problem: Given a Turingmachine (hereinafter - TM) M that accepts three strings. It is required to build a Turingmachine N that builds either a string N ( x ) such that M ( x, N ( x ) , y ) = 1 for any string y , or outputs a special character if there is no such string t that M ( x, t, y ) = 1 for anystring y (this time the length of the string y is not limited).To solve one of these problems, it is reasonable to solve all these problems at once,because solving one we can meet the other as a subtask.Of course, all these problems are algorithmically unsolvable in general case. But, as Isaid, in practice, very often people successfully solve problems of this type.Firstly it should be said that, first of all, we are looking for an efficient algorithm. Forexample, in the case of small string writing problem, we will not be satisfied by algorithmthat, roughly speaking, iterates through all the strings and tries to substitute them inthe given TM. Secondly, in the case of a big string writing problem, we want to see asan answer, not any algorithm, but, again, an effective one, otherwise it would give theopportunity to construct an algorithm that iterates over all possible candidates for the roleof N ( x ) and somehow compares them. In other words, in the big string writing problem,our goal is to efficiently search for an efficient algorithm.Let’s talk about small string writing problem. We assume that the TM’s head is initiallylocated in the first cell of the input string. Let’s also agree that at the end of work of TM,the head of our TM comes to the same first cell, where it was initially, and one of theTM states at the end of the work encodes the output; and let’s agree that after the endof TM’s work, neither the symbols of the tape, nor the position of the head, nor the stateof the TM is changed anymore. If the TM given to us does not meet these conditions, itis easy to change it so that it meets them. The time of its work will not increase much.Calculation of the output by described, working during time, non exceeding fixed amountof time, TM, in a standard well known way can be converted to calculating the outputfrom the same string using a boolean circuit. This calculation is arranged as follows. Notethat since the machine runs for a time not exceeding n , its head moves to a distanceno more than n from its initial position during operation. Consider the cell rectangle (2 n + 1) · n . Let’s identify its lower side with the segment of tape on which the TM isworking, in its initial state. The left and right ends of this segment are at a distance of n IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 4 from the initial position of the head. Consider running of our TM on some input string.Let’s mark the cell of the rectangle with the coordinates ( x, y ) with the state of the cellof the tape with the number x at the moment of time y . The state of the cell is made upof the character, written in this cell at the moment, a bit that means the presence of theTM head in this cell at the moment, as well as the state of the TM at a given momentin time. All three objects are encoded with zeros and ones. Note that the label of a cellwith coordinates ( x, y ) is determined by the labels of cells with coordinates ( x − , y − , ( x, y − , ( x + 1 , y − (we work in a computation model where there is one head, at anytime we look at the symbol in the tape where the head is situated and at the state of TM;depending on them, we choose the next state, change the symbol at the current cell ofthe tape and also choose whether the head will remain in place, move to the left or moveto the right). Thus, we can construct a boolean circuit, each gate of which corresponds tothe bit of a label of certain rectangle cell, and the value of this gate is calculated naturallyfrom all the gates of a labels of three rectangle cells located on a row one lower, directlybelow this cell, slightly to the left or slightly to the right. The line that we feed to theinput of our TM determines the input of the circuit, and the value of one of the gatescorresponding to the cell on the upper side of the rectangle, which corresponds to thetape cell, on which the first character of the input line was located at the beginning ofthe TM operation, is the output that the circuit gives.In most sections, we will focus on the small string writing problem, which, as we havejust seen, is equivalent to the problem of selecting the input values of a special Booleancircuit, such that this circuit outputs 1. We will often refer to this variant of formulationwith the circuit as a small string writing problem.If we do not impose any restrictions on the Boolean circuit for which we want to findan input, then we get the Circuit Satisfiability Problem - the well-known N P -completeproblem. Sinse it is believed that P = N P , we will hardly be able to come up with apolynomial algorithm for such a problem. Our case differs in that the circuit has a certainstructure. In our case the circuit has a small Kolmogorov complexity.In the case of a big string writing problem, we can similarly "restrict the computationto a rectangle" , two strings of fixed length (separated by a separator) will be fed to thisrectangular circuit, and our task is to build an algorithm that, based on the first of thesestrings (to the left of the separator) constructs a suitable second row (to the right) suchthat the circuit outputs 1 on this pair of strings.Unfortunately, now I can say little about class of problems the development of myalgorithm is aimed at. The algorithm is stochastic and it will hardly be possible to outlinethis class explicitly and strictly. However, if our algorithm does not solve some unnatural,fanciful problem like "given a graph, you need to place different prime numbers on its edgesso that the sum of the numbers on each cycle of length no more than 7 is represented as n + n + 2 for the integer n " , then, I think, it can be forgiven for this. (The given exampleof the problem is an instance of the big string writing problem - the given properties ofthe arrangement of numbers on the edges are checked on a polynomial Turing machine.)It would be nice to solve a problem in approximately those cases in which a human copeswith it.We can also consider a following generalization of the big string writing problem. Itis required, given TM M to construct an algorithm A , which, given the strings X , Y ,constructs a string Z such that M ( X, Y, Z ) = 1 . The difference is that we cannot see the Y string, it is, so to speak, an invisible part of the world. But we are allowed to experimentwith this invisible string. That is, we are given another TM N , which takes this invisiblestring Y and any string that the algorithm A wants, giving us an output. The work of the IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 5 algorithm A , therefore, consists, among other things, of generating requests to this verymachine N and processing the outputs it gives.A similar task to the one under consideration is given by Marcus Hutter in the book[1] in the section "The Cybernetic Agent Model" . In short, in it the agent representedby TM p interacts with the environment represented by TM q . The interaction consistsof cycles, in the i -th of which, based on the interaction history y x ...y i − x i − , the agentissues the string y i , after which the environment, based on the same interaction history,supplemented by the string y i , issues the string x i . The string x i always consists of twoparts: the main part o i and the reward r i . Our goal is to build an agent p that maximizesthe sum of r i . TM q can be probabilistic. The book examines the problem in the aspectof reinforcement learning. However, the big string writing problem can be almost thoughtof as a special case of this one. Namely, if it is the environment that makes the first(probabilistic) move, after which the agent makes the move, after which the environmentmakes its second, final deterministic move containing only a reward, that is equal to one,if the agent’s move in combination with the first move of the environment satisfies it andis equal to zero if not satisfied. The environment, in this case, must be known for us.2. Proof system
Ordinary.
We will talk about the circuits that implement the computation on theTuring machine, described in the previous section. It would be nice to certify the factthat such a circuit issues 0 at all possible inputs. In other words, to build a construction,the presence of which indicates that the circuit always outputs 0. This proof system canbe naturally generalized to circuits of a somewhat more general form, namely, circuits onwhich there is a naturally defined notion of proximity of gates and for each gate, you canselect a small neighborhood of this gate, in which the number of gates is constant.We assign each gate (including input and output) an arbitrary value - 0 or 1, but thegate, which is considered to be the output produced by the circuit, we assign 1. We wantto prove that any such set of values is inconsistent, that is, there is a gate that is notfulfilled by this a set of values. (The expression "gate is not fulfilled by a set of values"means that the value of this gate does not correspond to the value assigned to this gateby the values of the gates on which it depends).The proof is an object that we call "cage". A cage is a collection of small sets (ofconstant size) of closely spaced gates (from some constant neighborhood), to each ofwhich a function is tied, defined on a constant number of boolean variables, which assignsa rational number to a set of values in these gates. At that, two statements must besatisfied:1) The sum of the values of all these functions does not depend on the values that webind to the gates and it is greater than zero (let me remind you that in a gate that isconsidered an output, there is always 1, it cannot be changed). This property is quicklychecked; it is enough to check it locally: only those few functions depend on the value ofgiven gate, which depend on several gates from the vicinity of this gate, including thisgate itself (localized in the vicinity of this gate). One of the required properties of thiscollection of functions, which I, however, did not highlight as a separate item, is that thereis only constant number of functions that depend on one or several gates from a constantneighborhood of any particular gate (neighborhood includes this gate itself) (however, ifthere were a lot of such functions for some gate, then those of them that depend on thesame gates, we can simply sum up and replace them all by the sum). And to check, we cansimply iterate over the values of all the gates from some sufficiently large but constantneighborhood and check that the sum of functions depending only on the gates of the
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 6 neighborhood does not change when we change the value of the considered gate from zeroto one and back.If we check this for each gate, then the invariance will be proved, since any set of gatevalues can be translated into any other by changing the values one by one. To make sureafter that that the sum of the values of the functions is greater than zero, it is enough tocheck this for an arbitrary set of gate values.2) From the presence of a function of a cage for a given set of values of gates, whichis positive on this set of values, it follows that in some constant neighborhood of the setof gates on which this function depends, a gate that is not fulfilled by this set of valuesexists. This property is checked by iterating, for each function, through the all sets ofvalues associated with the gates of the corresponding neighborhood and finding, for eachset of values on which the function is positive, a gate in this neighborhood that is notfulfilled. Perhaps this neighborhood should be indicated in the cage itself (in the presentedstructure itself).Obviously, from the fulfillment of these two conditions, it follows that the circuit neveroutputs one: if it outputs one, we would consider a set of values tied to the gates, at whichall gates are fulfilled and with it, the sum of the function values would not be positive,by property (2) , and would be positive by property (1), a contradiction.2.2.
With suspension and additional construction.
Now let’s talk about the "gen-eralized cage". We will consider two improvements over the ordinary cage. The first isthat we are introducing additional gates and additional restrictions on gates. (There wererestrictions that the value of each of the old gates is equal to a function of the values ofthe old gates located directly below it in the circuit, now there are new gates and newrestrictions. Each of the new restrictions is a prohibition for a constant number of gatesto take a certain set of values.) At the same time, it must be fulfilled that if in the oldcircuit (without additional gates) there is a fulfilling set of gate values, then it can besupplemented with the values of the gates added to the circuit so that all new constraintsare also satisfied. This can be achieved most simply by requiring that there is an orderon the gates, in which the old gates occupy positions from 1 to n , and the added gatesoccupy positions from n + 1 to n + m , where n is the number of old gates, and m is thenumber of gates added; and m restrictions are added, the i -th of which says that thevalue of the n + i -th gate is equal to the function of several previous gates, and such that itimplies that the n + i -th gate does not conflict with the previous gates (is not contained inforbidden configuration with previous gates that is other than m indicated restrictions).That is, we sort of "build up" the circuit outward. So if we have the proper values in theold gates, we can simply compute the values in the new gates one by one, according tothe new constraints.In the general case, such a construction will not be a circuit, but rather a set of clauses,so we will call it a generalized circuit. Also, a generalized circuit is required to have asystem of constant neighborhoods in it. It is needed so that the invariance of the sum of thevalues of all functions can be checked locally (we will consider the system of functions onthe gates of a generalized circuit in absolutely the same way). Formally, this requirementcan be formulated by presenting the definition of a d -constant generalized circuit. Let usfix some constant number d . The d -constant generalized circuit must satisfy the followingconditions: 1. It is a metric space, that is, for each pair of gates a real non-negative distancebetween them is defined, which satisfies the properties of a metric space (identity ofindiscernibles, symmetry, triangle inequality). 2. Each d -neighborhood of a gate containsonly a constant number of gates. 3. If two gates are included in the same clause, then thedistance between them is not more than 1. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 7
We will work with d -constant circuits, again, in order to be able to check the invarianceof the sum of the values of all functions locally.A good example of a d -local circuit is a rectangular parallelepiped extended from aboveto the rectangle of the original circuit, as on a base, each gate of which depends on thosegates of its neighborhood that are located slightly below it, or slightly to the left of it butnot above it (we assume that the original circuit is calculated from left to right). For sucha class of generalized circuits, the choice of d is not essential.The second improvement is as follows. We will not have one cage, but a tree of cages.Each of them is, in the same way, a set of functions on the values of the gates, each ofwhich depends on a constant number of gates and these gates must be located next toeach other. The first difference (though not essential) from functions from the old versionof cage is that each of these functions is equal to one on exactly one of the sets of values ofgates of its domain. On any other set of values of this gates, it is zero. That is, a functionis simply a light bulb that lights up if and only if the gates on which it depends takea strictly defined set of values (hereinafter we will sometimes call them that way - lightbulbs). The second difference is that a rational weight is assigned to each such function.Moreover, each of the cages of the tree (and, thus, all of its functions) is not defined on theentire set of values of all gates, but only on those where certain fixed values are assignedto some gates. In other words, for each cage, some gates are constant.The first condition for such a cage is that the sum of the weights of the lit light bulbsis invariant on the set of sets of gate values, in which the correct values are assigned tothe constant gates, and is equal to some number greater than zero.Next, how will our tree be arranged in general? It will be suspended from the top vertice.The top cage will have no constant gates. The cages that are suspended from this cagewill correspond to the light bulbs of this cage, which have a positive weight. In each ofthese cages, several constant gates appear - exactly those on which the corresponding lightbulb depends, and for these gates, exactly that values will be assigned which are neededfor the corresponding light bulb to light up. The cages of the third level suspended fromthe cage of the second level correspond to the light bulbs of this cage of the second levelof positive weight and their constant gates are constant gates of the corresponding cageof the second level (with the same values), to which the gates on which the correspondinglight bulb depends, set exactly at those values that are needed to this light to light up, areadded. And so on. The vertices of the tree correspond to the cages, the edges correspondto the light bulbs of these cages with a positive weight. At the same time, each positivelight bulb of each cage of the tree can correspond to either a cage of the level one lower(as it was just described), or a specific restriction, which is violated by the gate valuesthat make this light lighting up (this time the gates included in this restriction, all mustbe inside the set of gates of domain of this light bulb, and not just in the neighborhood).Let’s call the light bulb of the second kind (breaking the constraint) terminal.So, how do we find the violated gate by the gate values and the tree of cages? Let’s lookat the top cage. Some of its light bulbs light up. The sum of their weights is greater thanzero. Let us choose from the lit light bulbs the one that has positive weight. It correspondsto a cage one level below (or a terminal light). Let’s choose a positive lit light bulb init. And so on. Sometime we will come to a lit terminal light bulb and a violated gatecontained in its domain.The invariance of sum of weights of lit light bulbs of each cage of the tree is checked,again, locally, for each non-constant gate. However, since we are working with a d -constantgeneralized circuits, it is worth clarifying that the diameter of the domain of light bulbshould not exceed d/ (then we will definitely be able to check the invariance locally, foreach cage of the tree). IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 8
It should be said that I have analyzed several (nearly eight) examples of circuits ofdifferent (mostly not big) complexity that cannot output 1, and in all of them, except thelast two, I managed to come up with a cage. Just a cage, not a generalized cage. Thisgives me a reason to assume that the proof system "cage" is quite strong.For the last two examples, I also managed to come up with a proofs, but they requiredthe addition of new gates to the circuit and a tree of cages of nontrivial depth.It was nice that among the examples I analyzed there was a circuit that encodes theDirichlet principle: the inputs of this circuit form a rectangle n · ( n + 1) , this rectangle isadjoined from above by n + 1 circuits that check that each column contains at least one 1,as well as n circuits checking that each line contains at most one 1. Higher, conjunction ismade with the results of these n + 1 circuits. It is clear that no matter how we put zerosand ones in the inputs, the output of the circuit (the result of conjunction) will be zero.I managed to build a cage for this circuit - to arrange the bulbs on this circuit so thatthe sum of the weights of ones lit would always be one and each lit light bulb of positiveweight concealed a violated restriction. It was pleasant for me, because, in particular, itallows us to prove various estimates and inequalities - in my opinion, a rather difficultthing to automate.I want to believe that for problems that arise in practice, it is often possible to come upwith a generalized circuit (of course, if the corresponding TM never outputs one), which,moreover, has a small complexity and a small number of additional gates.3. Links to topology
The described proof system arose from topological considerations and has a topologicalnature. I will not describe this connection with the topology in details, since we will notneed it, I will only say in short what it is about. This can be seen on the example of asimple circuit that comes from the natural decomposition of a large equilateral triangleinto small equilateral triangles of the same size (the circuit does not come from the workof a Turing machine). We place a gate of circuit in each node of this partition. We willarrange the triangle so that one of its vertices is at the top, and the side, opposite to it, ishorizontal (and the input gates of the circuit are located at the nodes of the partition onthis side). The value at each node will somehow be calculated from the two values at thenodes directly below this node (and forming a regular triangle with it). We embed thisplane, divided into triangles, into three-dimensional space and put a point at each nodeof the partition. Let us draw from each set point the normal to the plane of length 1 inthe same direction and put a point at the end of each such normal. The point at the startof the normal corresponds to the value 0 in the corresponding gate, the point at the endof the normal corresponds to the value 1 in this gate.Let’s call the nodes on the border of the triangle, as well as the points obtained fromthese nodes by adding of the indicated normal, boundary points. Let us embed our three-dimensional space into a seven-dimensional space. We will slightly move each markedpoint so that in seven-dimensional space they form points of general position. Considertwo adjacent nodes of the boundary of the original triangle, as well as two points thatdiffer from them by the indicated normal. Consider the 4 points that came out of themafter the shift. In seven-dimensional space, they form a tetrahedron. Let’s call it boundarytetrahedron. Let us call the union of all boundary tetrahedra a boundary of the circuit.Consider an arbitrary cage that assosiates each small triangle with binary values fixedat its vertices with a rational number, the weight of a light bulb corresponding to a givensmall triangle with values fixed at its vertices - exactly these values are needed for a bulbto light up (in this section, we will talk about just such cages - cages, each of whose bulbs
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 9 depends exactly on the values at the vertices of the small triangle). Using the property ofinvariance of the sum of the weights of the bulbs lit up in the cage, I managed to constructan oriented five-dimensional manifold with a zero boundary, the intersection index ofwhich with each triangle, the vertices of which are a triple of shifted nodes projectedapproximately to the vertices of a small equilateral triangle (to the neighborhood of onevertex of a small equilateral triangle, let me remind you, two shifted nodes are projected,differing one from another by the normal discussed above) is exactly equal to the weight ofthe light bulb corresponding to this triple. Moreover, this manifold does not intersect theboundary of the circuit. In addition, consider an arbitrary choice of zeros and ones at thecircuit nodes and consider a two-dimensional manifold equal to the union of the trianglescorresponding to the lit light bulbs (this is a "two-dimensional film" whose boundary lieson the circuit’s boundary). So, the intersection index of our five-dimensional manifold withsuch a film is greater than zero (and from the fact that our five-dimensional manifold withzero boundary does not intersect the boundary of the circuit, and any film, obviously, canbe continuously deformed into any other such film (corresponding to another set of zerosand units) so that the boundary of the deformed film always remains at the boundary ofthe circuit, it follows that the indicated intersection index is the same for all such films(does not depend on the choice of zeros and ones)).The converse is also true. Given a five-dimensional manifold without an edge thatdoes not intersect the boundary of the circuit, of a suitable homological class (that is,intersecting the above-described film of triangles corresponding to an arbitrary choiceof zeros and ones, in a positive way), one can construct a cage, provided that this five-dimensional manifold intersects each triangle corresponding to light bulb "in the rightway". The phrase "in the right way" means the following. We have a condition for thecage that some of its bulbs can have an arbitrary weight, and the rest - not a positiveweight. So, our variety is allowed to intersect triangles corresponding to bulbs of the firsttype in an arbitrary way, and triangles corresponding to bulbs of the second kind - not ina positive way. Let’s agree that for our manifold it is forbidden to cross the sides of thetriangles, this does not limit us too much. A cage is constructed from a manifold in anobvious way: the index of intersection of the manifold with the corresponding triangle istaken as the weight of the bulb.Thus, in the case of this circuit and the considered family of cages (cages operating with"triangular" bulbs), the search for a cage that is a correct proof is reduced to a searchfor a manifold in a certain space of the desired homological class (positively intersectinga two-dimensional film "stretched" onto the boundary of a circuit) passing through thisspace "in the correct way" (some of the triangles can be crossed only "in one direction").When constructing a manifold from a cage, we used one structure, the constructionof which, in particular, required that the dimension of the enclosing space be at least 7.I almost succeeded in completing of a similar construction (manifolds from a cage andvice versa) for the case of an arbitrary circuit that come from a Turing machine. Now Isee no reason to tell in detail what "almost" means, I will only say that the existence ofa manifold in the general case, given by my construction, depends on whether a certainmanifold always contracts homologically within a certain space. I have not verified thisfact yet, but it looks like it is true. If it is true, my construction works, and if not, thenit will be necessary to complicate the construction a little, and then the ambient space(in which we are looking for a manifold) will not be Euclidean. One way or another, ifsomeone needs it, I’m ready to refine the construction for the case of arbitrary circuitsthat came from TM.
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 10 Cocyclic polynomials
Definition.
We will talk about the generalization of the proof system "cage". A cage,ordinary or "multistorey" (when there is a tree of cages) illustrates this proof system verywell. To the case when we add additional "external" gates to the circuit, just as it waswhen we described the system "generalized cage" this system can also be generalized inan obvious way.So, let’s consider the circuit that came from the work of TM. For the gates of this circuitwe can naturally determine the distance. To do this, you can, for example, naturally embedthe circuit in a 3-dimensional space. To do this, each gate of the circuit, representing acertain bit of the state of a certain cell of tape of number x at the time y , is associatedwith a point in three-dimensional space near the point ( x , y , 0). And the distance betweenthe corresponding points we declare the distance between the gates.Consider the constant k . Our proof system will operate with the same light-bulb-functions that were in the "cage" system. Each light bulb lights if the gates it depends ontake a specific set of values. The diameter of a light bulb is the maximum distance betweenthe gates on which it depends. We will be interested in light bulbs with a diameter of nomore than k . We will call such bulbs small.Consider an integer d (usually, d will be a constant). Let’s define a cocyclic polynomial.Cocyclic polynomial of degree d is a polynomial of degree d , the variables of which are inone-to-one correspondence with the small light bulbs of our circuit and which satisfies theinvariance condition. The invariance condition is that if we choose the values of gates of thecircuit in an arbitrary way and substitute in each variable the value of the correspondinglight bulb (0 if the bulb is not lit and 1 if the bulb is lit), then the value issued by thepolynomial will be independent of the choise of the gate values.A cocyclic polynomial can be considered a proof of the insatisfiability of a circuit if itsatisfies the security condition. A cocyclic polynomial is considered satisfying the securitycondition if 2 conditions are met. First, the value it produces for any choice of gate values(the same for all choices of values) is greater than zero. And secondly, for each monomialwith a positive coefficient, at least one of the bulbs corresponding to its variables "impliesa contradiction." The phrase "implies a contradiction" means that among the gates onwhich the value of this bulb depends, there is some gate a and all the gates on which a depends in the circuit, and at the same time, if these gates take the values that areneeded for the bulb to light up, then they will form a contradiction: the value in the gate a will not correspond to the value prescribed to a by the circuit according to the gatevalues on which a depends in the circuit. (In theory, nothing would prevent us from alsoallowing the coefficients of monomials to be positive, for which it’s union of domains ofbulbs of the corresponding monomial with corresponding values ascribed to each of itsgates (so that each bulb of monomial lights up) that implies a contradiction - a slightlyweaker condition.)Indeed, if a circuit with a cocyclic polynomial satisfying the security condition (here-inafter referred to as a protected cocyclic polynomial) were satisfiable, we would take aset of gate values satisfying it, choose a monomial of a cocyclic polynomial with a positivecoefficient that does not equal to zero (it exists, since if it did not exist, the value of thepolynomial for a given set of gate values would not be positive), we would choose a lightbulb corresponding to one of its variables, implying a contradiction, and extract from itthe gate violated by this set of values.Note that for d = 1 we get exactly the proof system "cage".By a lamp polynomial of degree d we mean a polynomial of degree d , whose variablesare in one-to-one correspondence with the small bulbs of our circuit (which, however, does IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 11 not have to satisfy the invariance condition). The lamp polynomial can be viewed as afunction of choise of the gate values of the circuit, calculated in the same way as describedin the paragraph just above (just like in the case of a cocyclic polynomial).4.2.
Local check for cocyclicity.
The good news are that the cocyclicity of a polyno-mial can be checked locally. Consider a polynomial f of degree at most d . Let’s check if itis cocyclic. Let’s choose the gate u . Let’s check that regardless of the values of all othergates, the value of f does not change when the value of u changes from zero to one. Con-sider a neighborhood of u consisting of all gates at a distance of at most k from it. Since k is a constant, we can iterate over all possible values of all gates in this neighborhoodexcept for u . Let’s do it. And for each set of values of these gates, by temporarily fixingit, we will check that for any possible set of values of all other gates, the value of f doesnot change when u is changed from zero to one.The key idea is that for fixed values of the neighborhood gates other than u , the changein the value of f when the value of u changes from zero to one is a lamp polynomial,with light bulbs depending on all other gates, of degree which is at least one less. Why isthis so? Consider an arbitrary monomial f . It is a product of several bulbs (to be moreprecise, the variables corresponding to the bulbs), with a certain coefficient. There aretwo possibilities.The first possibility is that none of these bulbs "cover" u (more precisely, none of thesebulbs depend on u ). This means that the value of this monomial does not depend on thevalue of u and, thus, contribution of this monomial to the considered change in the valueof the polynomial, when the value of u changes from zero to one, is zero.The second option is that u is covered by at least one monomials bulb. Note that in thiscase the monomial value can be nonzero only for u = 0 , or only for u = 1 (because thebulb containing u can take a nonzero value, as a maximum, in one of these two options).And this is only if the values at which the monomial bulbs light up are consistent with thevalues we fixed in the "punctured" neighborhood of u : if at least one of the gate valuesneeded for one of these bulbs to light up does not coincide with the fixed value of thecorresponding gate of the punctured neighborhood u , then the value of the monomial inboth cases will necessarily be equal to zero.Thus, we are interested in precisely this subcase of the second case, when the values atwhich the monomial bulbs light up are consistent with the values we fixed in the puncturedneighborhood of u .Let’s call the truncated monomial bulb the monomial bulb, from which all dependenceson the gates included in the considered neighborhood of u , including u , have beenremoved. Let me explain. Suppose the monomial bulb lights up when a certain set ofgates take their prescribed values. So, the truncated version of this bulb lights up exactlywhen these prescribed values are taken by those of these gates that are not included inthe considered neighborhood of u : we simply throw out the binary constraints on thegates of this neighborhood.It is easy to see that the change in the value of the monomial in our case (a subcase ofthe second case) is equal to the product of the values of the truncated bulbs, taken withthe same coefficient as the monomial itself, multiplied by 1 or -1, depending on case inwhich the original monomial can be non-zero: in the case when u = 1 or when u = 0 .Note that the truncated version of the bulb covering u is trivial (does not depend onany gates and is identically equal to one). Thus, the number of nontrivial truncated bulbsfor a monomial of the type under consideration of degree l does not exceed l − . IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 12
Summing up the change in the values (when u changes from zero to one) of all monomi-als, we obtain the required lamp polynomial, bulbs of which depend only on gates outsideof neighborhood, of degree at most d − .The next idea is that in order to prove the cocyclicity of a polynomial, it suffices toverify that for each gate u , the polynomial of degree at most d − constructed the waydescribed above, for each possible set of values of the gates of the neighborhood, first, iscocyclic, and secondly, its value is always zero. But we can check the cocyclicity of thispolynomial recursively, and it is enough to check the second condition for any set of valuesof gates outside the neighborhood. Let’s do it. The recursion depth will thus be d . Forconstant values of k , d , the algorithm runs for polynomial time.A small offtopic about the designation. The name "cocyclic polynomial" was chosenbecause of the topological interpretation of the proof system "cage" . I wrote that acage (presumably) can be associated with a manifold in the corresponding space. Duringconstructing of this manifold, a certain CW -complex was constructed and the manifoldsimply originated from the cocycle of this CW -complex of the corresponding dimension.That is, a cocyclic polynomial of degree 1 is a cage, and a cage is a cocycle.4.3. Reduction of a proof system "generalized cell" to a proof system "cocyclicpolynomial" .
Consider a circuit with a "multistorey cage" (this is a generalized cage,in which we, however, do not add additional gates to the original circuit, that is, it is justa tree of cages). To the more general case of a "generalized cage" (with additional gates),our arguments will be generalized in an obvious way.Let me remind you that our multistorey cage is a tree, each node of which is a cage,however, defined on the set of choices of gate values, in which the values of some gatesare constant.The first thing we will do is "normalizing each node". We know that the cage corre-sponding to each node, whenever the gates, which should be constant, take the desiredvalues, has a constant sum of lit bulbs equal to some positive number. We will divide allthe weights of the bulbs in a given cage by this positive number. Now the sum of the litbulbs will always be equal to one. We will perform this operation for each node of thetree.Let us describe all the monomials of the cocyclic polynomial that we want to associatewith a given tree of cages. Now I will describe the process of building such a monomial.Let’s choose an arbitrary bulb of the root cage. If its weight is not positive, we will stopthere. If its weight is positive, but this bulb implies a contradiction, we will also stopthere. If its weight is positive and it does not imply a contradiction, the next level cageshould be suspended from it. Let’s choose an arbitrary light bulb of this cage. If its weightis not positive, we will stop. If the weight is positive, but the bulb is terminal (implies acontradiction), we stop. If the weight is positive and the bulb is not terminal, the nextlevel cage is suspended from it. Let’s choose its arbitrary light bulb. And so on, until wereach a light bulb of non-positive weight or a terminal light bulb. The result is a chain oflight bulbs.We will associate each such chain with a monomial equal to the product of the variablescorresponding to the light bulbs included in the chain. We will choose the coefficient ofthis monomial in our polynomial equal to the product of the weights of the bulbs includedin this chain (we mean the weight of the bulb in the cage of the corresponding node afternormalization).Let’s check the security condition. Note that the sign of the coefficient at the monomialis determined by the sign of the weight of the last bulb in the corresponding chain (theweights of the remaining bulbs in the chain are positive). If the weight of the last light
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 13 bulb is positive, then it is terminal, which means that it implies a contradiction. If theweight of the last bulb is not positive, then the monomial coefficient is also not positive,which means that there is no need to look for a light bulb implying a contradiction.Let’s check the cocyclicity. We have described the chains of light bulbs of the cages ofthe tree, from which we obtain the monomials of the polynomial. Now we are interestedin "pre-chains" - the initial segments of such chains. By the restriction of a polynomialto a pre-chain, we mean our polynomial, in which we have left only monomials (with thesame coefficients), the chains corresponding to which begin precisely with this prechain,and the remaining monomials are simply discarded.The co-cyclicity of the polynomial follows from the statement that if the gates on whichat least one pre-chain bulb depends are set to those values at which all pre-chain lightbulbs light up, then the restriction of our polynomial to this pre-chain always takes avalue equal to the product of the weights of the light bulbs (as a part of correspondingcages - tree nodes) included in the prechain. If at least one of the gates is set to a differentvalue (and, thus, at least one bulb of the pre-chain does not light up), then the restrictionof our polynomial to this pre-chain always takes zero value (the latter is obvious, since inthis case all monomials of the restriction are equal to zero).The first (less obvious) part of the statement is proved by induction on the prechainlength from top to bottom. Base. If a pre-chain ends with a terminal bulb or a bulb ofnegative weight and, thus, is a complete chain, then the statement is obvious for it, sincethe restriction of the polynomial on it consists of only one monomial with a coefficientexactly equal to the specified product. Transition. If the prechain allows continuation, thenall these continuations (that is, chains starting with this prechain) are grouped accordingto the next bulb of this continuation; moreover, each of the obtained groups, obviously, isa pre-chain of length 1 more; it remains for us to use the induction assumption for eachof these prechains and the invariance (and equality to one) of the sum of the weights ofthe cage of a tree corresponding to these prechains. The statement is proven.The identity of our polynomial to one follows from the application of the statement toan empty prechain (we assume that the product of the empty set of factors is equal toone).4.4.
Completeness. .Here I show the existence of a protected cocyclic polynomial for every unsatisfiablecircuit that comes from the work of TM. However, its degree will not be constant, butonly polynomial in terms of size of the circuit. When a polynomial has an exponentialnumber of monomials, it complicates practical work with it, but it seems to me that it isstill pleasant to realize the completeness of the proof system with which we are working. Ihope that in majority of the unsatisfiable examples that come from practice, there existsa constant degree polynomial.Let’s call a light bulb basic if it depends on some gate not of the lower level, as wellas on all gates on which this gate in the circuit depends and only on them (the values ofthe listed gates at which the light lights up are not important, it is only important thatit depends exactly on a set of gates of this kind). Consider an arbitrary choice of gatevalues, as always, with the restriction that the gate, which is considered to be the outputgiven by the circuit, must be equal to 1 (let’s call this gate the upper gate). Considerthe product of the variables corresponding to the base light bulbs that light up with thischoice of gate values. Our polynomial is the sum of such products over all such choices ofgate values.Note that such a polynomial is cocyclic (since for any choice of gate values in whichthe upper gate is 1, one described product is 1 - the one that corresponds to this choice
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 14 of values - and the rest are equal to 0, thus the sum is always 1). The security of thispolynomial follows from the fact that in any described monomial product there is a lightbulb that implies a contradiction (this follows from the impossibility for the circuit tooutput 1: for any choice of gate values, at which the upper gate is 1, there is a contradictionsomewhere).4.5.
Efficient search for protected cocyclic polynomials of bounded degree.
Obviously, in the linear space of all lamp polynomials of degree at most d , the space ofcocyclic polynomials forms a linear subspace. Now I will describe how we can efficientlysearch for this subspace (in polynomial time).In the section on local checking of a lamp polynomial for cocyclicity, we discussed lamppolynomials of degree 1 less than the original, which are equal to the change in the valueof the original polynomial if we change the value of one of the gates from zero to one,for fixed gate values of the neighborhood of this gate of radius k (we will call such apolynomial a difference polynomial). Note that for a fixed gate (which we change) andits neighborhood, the operation of obtaining the described difference polynomial from theoriginal polynomial is realized by a specific linear operator known to us (from the spaceof lamp polynomials of degree at most d to the space of lamp polynomials of degree atmost d − ). Thus, we have a set of linear operators of cardinality linear in the size of thecircuit. It is easy to see that for the original polynomial, to be cocyclic, it is necessary andsufficient that each of these linear operators send our polynomial to the space of cocyclicpolynomials of degree at most d − , such that always output 0.Thus, if we have in our hands the space of cocyclic polynomials of degree at most d − ,and those that always output 0, then we obtain the space of cocyclic polynomials of degree d by solving a system of linear equations polynomial in size (each linear operator from alist polynomial in size must send a vector, claiming to be a cocyclic polynomial, to thedescribed linear space).We can recursively find the space of cocyclic polynomials of degree at most d − . Itremains to select in it the subspace of such polynomials that always output 0. This canbe done by noticing that the value that each of these cocyclic polynomials produces (ofcourse, constant, for each of these polynomials) is a linear function on the space of cocyclicpolynomials of degree at most d − , which is easy to find because we can easily computethe value given by each cocyclic polynomial. Thus, it suffices to find such a function andthen find its kernel, which is a subspace of codimension 1.So, now we can find the space of cocyclic polynomials of degree at most d in polynomialtime. Let us now show how we can look for a protected cocyclic polynomial of degreeat most d or verify its absence in polinomial time. To do this, it suffices to note that aprotected cocyclic polynomial is basically a cocyclic polynomial that satisfies a polynomialset of linear inequalities: each such inequality is a condition that the coefficient in somemonomials must not be positive. We can simply go through all the products of no morethan d variables corresponding to small bulbs and see if a given set of bulbs containsa bulb that implies a contradiction. If it does not exist (we will call such monomialsunprotected, if such a light bulb still exists - we will call it protected), the coefficient ofthis product should not be positive. Systems of a polynomial number of linear equationsand inequalities over the field of rational numbers can be solved in polynomial time, aswas proved, for example, in [2].I have an assumption that each cocyclic polynomial can be represented as a linear com-bination of products of several (in the number not exceeding the degree of the polynomial)cocyclic polynomials of degree 1. If so, then this will allow finding the space of cocyclic IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 15 polynomials of fixed degree in a slightly simpler way. However, I have not yet verified thisassumption. 5.
General algorithm
Suppose that we, based on the resources available to us, chose a sufficiently large con-stant d and tried to find a protected cocyclic polynomial of degree at most d for thecircuit, but we failed to do this and found out that there is no such polynomial. Thenit’s time to try to find an arrangement of values in the gates of the circuit that does notviolate any gate (in which each gate takes exactly proper value, according to the gates onwhich it depends in the circuit), and input line to the circuit as well, which is the answerto the task. This section explains how to find such an arrangement of values.The idea is that other cocyclic polynomials (not necessarily protected) can help uswith this. Imagine, for example, a situation where we have found a cocyclic polynomial inwhich there is only one unprotected monomial, the coefficient of which is positive, and thepolynomial itself always takes some positive value. This means that if we want to arrangevalues into gates that do not violate any gate, then all the bulbs corresponding to thismonome should light up in it. Because since there are no violated gates, in each protectedgate there is at least one light bulb (implying a contradiction) that does not light up, andtherefore each protected monom is reset to zero; and where to get positive terms, if notfrom unprotected monomials with a positive coefficient?. And this information (about thelight bulbs that should light up) can be used to build the desired arrangement of values.In the same way, not only positive (as I call polynomials that always take a positivevalue) cocyclic polynomials with a single positive unprotected monomial can be useful,but also, more generally, positive cocyclic polynomials, in which there are little positiveunprotected monomials. For example, if a polynomial of degree 1 always takes the value 1and its coefficients for unprotected monomials of degree 1 (that is, in fact, for small bulbs)are k ones, k minus ones and maybe some number of zeros, and the free coefficient ofthis polynomial is zero, then this is, let’s say, an argument in the direction of lighting upfor unprotected bulbs, the coefficient at which is 1 and an argument for not lighting upfor unprotected bulbs, the coefficient for which is -1 (among the latter, no more than onetenth can light up, and exactly one less bulb should light up among them than amongthe former). Of course, the same is true for large degree cocyclic polynomials.And we can collect many different cocyclic polynomials, each of which gives a certainnumber of such arguments of different strengths, for different bulbs, indicating that thecorresponding bulb should light up or should not light up. And if for some light bulbthe total strength of the arguments in the direction of lighting up turns out to be muchgreater than the total strength of the arguments in the direction of not lighting up, thenyou should think about choosing the gate values so that the corresponding light bulblights up.Now let’s talk about the algorithm more detailed.We will pursue to obtain the desired arrangement of values in gates from the final stateof one structure modified in time. The structure is following. Consider a certain system ofsmall bulbs, as always, each lights up (takes the value 1) if and only if the gates on whichit depends take a strictly defined set of values (otherwise it does not light up, taking thevalue 0). The structure is a set of numbers p i , ≤ p i ≤ , ( i ranges from 1 to the numberof bulbs) such that if the index i corresponds to a bulb implying a contradiction, then p i = 0 . We will call this structure the light bulb probability distribution.Let’s call an arbitrary mapping of zeros and ones to the bulbs of our system of bulbs asituation. That is, a situation is an act of instructing some bulbs to be lit, and the rest not IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 16 to be lit, and it is not at all necessary for the situation to come from some arrangement ofzeros and ones in the gates (hereinafter we will call them simply arrangements). We willbe interested in the situation as a random variable in which the i -th bulb is lit with theprobability p i and whether the bulb is lit or not is selected for each bulb independently.The algorithm will change the values of the probabilities p i so that the probabilitiescorresponding to the bulbs implying a problem are always zero. As a result, we wantto see the probability distribution on the light bulbs, in which each probability is eitherzero or one. Thus, it is assumed that in the final position, our random variable producesexactly one fixed situation. Another thing that we want to see in the end is that the onlysituation that our random variable always produces has the property that it comes fromsome arrangement of zeros and ones in the gates. That is, there is such an arrangementthat the light in it lights up if and only if it lights up in our "final" situation (hereinafter Iwill call it the final situation - without quotes). We will call a situation with this propertyconsistent.The cocyclic polynomial can be interpreted in an obvious way as a function of thearrangement and as a function of the situation. The following statement gives a criterionfor the consistency of the situation, under some condition for the system of light bulbs.Lemma. If a system of light bulbs satisfies the property that, together with a bulbdepending exactly on some set of gates A , it contains all bulbs satisfying the propertythat they depend, exactly, on the set of gates, which is some subset of A (there is afinite (and constant) number of such bulbs for each A : two bulbs that always light up ordo not light up at the same time, we equate) - we will call this the inclusion closenessproperty - then the situation in this system of light bulbs is consistent if and only if anycocyclic polynomial of degree 1 outputs in this situation the same value that it gives inany arrangement.Proof. The "only if" part is obvious. Let us prove that if each linear cocyclic polynomialoutputs on a situation exactly the same value that it outputs on all arrangements (or,which is the same, on all situations that come from the arrangements), then the situationis consistent.Let some light bulb depend on some set of gates A (we will call this set A basic).Thus, by the inclusion closeness property, our system of bulbs contains all bulbs, each ofwhich satisfies the property that the set of gates on which this bulb depends coincideswith A . Consider a linear cocyclic polynomial in which all these bulbs are included witha coefficient of 1, and all other bulbs - with a coefficient of 0 (the free coefficient is0). Obviously, it is cocyclic and outputs 1 in all arrangements. This means that in oursituation it should also output 1, which means that exactly one of the considered bulbsshould light up in our situation, and the rest should not light up.For the basic set A , thus, it is possible to uniquely determine the set of values for thegates of this set, to which in our situation the value 1 is assigned. We will call this set ofvalues the realization of the set A in this situation. It is easy to see that the consistencyof the situation now follows from the statement that, for any two base sets A and B , theirrealizations are concerted at their intersections (if we prove this statement, then it willbe enough to choose such an arrangement whose restriction to each basic set coincideswith the realization of this set). Let us prove this statement. By the inclusion closenessproperty, in our system of light bulbs, in addition to A and B , there is a set A ∩ B , soit suffices to prove the concertedness at the intersection of the realizations A and A ∩ B ,as well as, concertedness at the intersection of the realizations B and A ∩ B . By virtue ofthe symmetry of these statements, it is enough to prove the first.Consider a cocyclic polynomial, with coefficient 1 for bulbs, each of which has theproperty that its base set coincides with A and that it lights up when the gates take IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 17 strictly defined set of values, different for each such bulb and it runs through all possiblesuch sets, with the restriction that on A ∩ B this set of values coincides with the realizationof A ∩ B ; and also, with a coefficient -1 for a light bulb with a basic set A ∩ B , whichlights up when a set of gate values from A ∩ B coincides with the realization of A ∩ B ;the other coefficients of the polynomial are equal to 0.It is easy to see that the polynomial is indeed cocyclic and outputs 0 on all arrangements.It is also easy to see that if the realizations of A and A ∩ B are not concerted on A ∩ B ,then the polynomial in our situation will return -1. Contradiction. So, the realizations ofthe basic sets are concerted at their intersection. The lemma is proved.So, if all the values of the bulbs in the final situation are good in the sense of linearcocyclic polynomials, then they give an arrangement that does not violate any gate.We will work under the assumption that our system of light bulbs is closed by inclusion,for example, it can be a system of light bulbs with a diameter not exceeding k , for aconstant k . Although many systems that are not closed by inclusion would fit.Let’s go back to the algorithm. So, we want to know how to change the probabilities oflight bulbs in such a way that in the end each of them equals to zero or one, and in theresulting situation obtained in this way, all the cocyclic polynomials output the correctvalue (we could limit ourselves only to linear cocyclic polynomials, but our algorithm willuse large degree polynomials).At each moment of time we will maintain a set of cocyclic polynomials Ω , which wewill supplement from time to time and, possibly, discard polynomials from it.For each light bulb s from our system of bulbs and a cocyclic polynomial f from thecurrent set, we will evaluate 2 numerical values, one corresponds to the position of thelight bulb s in the off state (let’s call it T = T ( s, f ) ) , the other - to its position in theon state (let’s call it T = T ( s, f ) ).So, the on position of the s light bulb. Consider such a probability distribution onsituations K j = K j ( s ) . For each i , if the i -th bulb does not coincide with the s -th bulb,we randomly and independently of the other bulbs choose the state of the i -th bulb: withthe probability p i we make it on and with the probability − p i we make it off ( p i is thecurrent value of the probability at the i -th bulb). We make the bulb s lit on if j = 1 andlit off if j = 0 . We will be interested in the random variable F j , which is calculated as thevalue of the polynomial f in a random situation from the distribution K j .Recall that each cocyclic polynomial f has a corresponding value of t = t ( f ) , whichis equal to the value that f takes in all consistent situations. So, the value T j we needis calculated as minimum of two probabilities: the probability that F j ≤ t ( f ) and theprobability that F j ≥ t ( f ) .We also need the value T = T ( f ) , which is calculated in a very similar way, differingonly in that we do not select any light bulb s : we build a random situation where eachlight bulb turns on or off with the current assigned probability p i (we will call such adistribution on situations K ), calculate the value of f on it and choose the minimum oftwo probabilities: the probability that the resulting random variable is not greater than t ( f ) and the probability that the resulting random variable is not less than t ( f ) .Let’s talk about how we can evaluate T , T . Consider the distribution on situations S i We have a real-valued random variable (equal to a given function of a random situation),which can take an exponential number of values in the size of the circuit, and we want toestimate how often it takes a value, say, not exceeding a certain number. We will evaluatesimply by generating a random situation from a given distribution K j a sufficiently largenumber of times, polynomial in size N of the circuit (say, p ( N ) , where p is some poly-nomial) (we can, independently of the other light bulbs make the i -th light bulb lit with IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 18 probability p i using a pseudorandom number generator) and calculating the value of f inthis situation. Doing this we get a polynomial set of numbers.Maybe some of these numbers will be less than t ( f ) , some will be greater, and somewill be equal to t ( f ) . Let’s calculate two numbers: v j. , equal to the number of those ofthese numbers that do not exceed t ( f ) and v j. , equal to the number of those of thesenumbers that are not less then t ( f ) . In case both v j. and v j. are greater than zero, forboth values of j (first opportunity), we estimate T j by minimum over i of v j.i /p ( N ) .If exactly one of v j.i (over all values of i , j ) is equal to zero (second possibility), weestimate T j by minimum over i of value v j.i /p ( N ) , if this minimum is nonzero; if thisminimum is equal to zero, we estimate T j as / p ( n ) .If, for each j , there exists an i such that v j.i = 0 (the third possibility), our estimatewill be a little more complicated and somewhat more rough. It would be possible toestimate both values of T j as / p ( n ) , but the point is that we will be interested in how T and T correspond to each other, so we want the relation between the estimates of thesequantities to even remotely approximate the ratio of the values themselves. Therefore, wewill do this. Let’s choose a not very large, although not small positive constant h . Wehave two polynomial in size sets of values that were taken by our random variables F and F . Let’s place them on separate number lines. Next, we will walk on each of theselines from the point t ( f ) in the direction where all the points of the corresponding setof values lie, with the same constant speed and we will walk until we collect at least h points on each of the lines. Suppose we have collected a points on the line correspondingto index 0 and a points on the line corresponding to index 1. Then our estimate of T j will be a j / (( a + a ) p ( N )) . Again, this estimate is rather rough, but it has a pretty goodchance of at least "guessing" which of T j is greater and "catching" the case when T j arevery different (in this case, these estimates also have a good chance to differ greatly inabsolute value).Before continuing, I will introduce a real-valued function U = U ( s, f ) = R ( T ( s, f ) , T ( s, f )) . The U function represents the very "power of argument" that Iwrote about closer to the beginning of this section. I’ll define U by defining R . R ( u, v ) = l ( u/ ( u + v )) m (1 / ( u + v )) , where m is some increasing positive-valued con-tinuous function, and l is function on the interval (0 , satisfying the properties that,firstly, l (1 /
2) = 0 , and, secondly, l (1 / t ) = − l (1 / − t ) , for any t , < t < / andthirdly, on the interval (1 / , , l is an increasing positive-valued function, and we requirethat l tends to infinity when approaching 1.Let’s pay attention to how the function U behaves. If it is greater than zero, then T ( s, f ) < T ( s, f ) . If it is less than zero, then T ( s, f ) < T ( s, f ) . If it is equal to zero,then T ( s, f ) = T ( s, f ) . Note also that U tends to increase in absolute value when T ( s, f ) and T ( s, f ) decrease in absolute value (due to the multiplier with the function m ).Let’s call a polynomial f discriminated if T ( f ) is small. It would be more correct to talkabout the degree of discrimination of a polynomial: the smaller the value of T for it, themore the polynomial is discriminated. But for the sake of brevity, we will refer to heavilydiscriminated polynomials as simply discriminated. The polynomial f is discriminatedif, in a random situation from the distribution K , the polynomial f calculated in thissituation gives a value, IN MOST CASES, greater than t ( f ) , or, in most cases, less than t ( f ) . And the more significant this majority is, the more discriminated the polynomial is.We will change the values of the probabilities p i , mainly in such a way as to make thepolynomials from the list Ω as less discriminated as possible. Moreover, we are much moreinterested in reducing the extent of discrimination of highly discriminated polynomialsthan their less discriminated counterparts. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 19
Consider, for example, some polynomial f from Ω and some light bulb s . Suppose that T ( s, f ) < T ( s, f ) , and, as well as for j = 0 and j = 1 , the probability that the value of f on a random situations from the distribution K j is more than t ( f ) , is more than / (orless than / for both values of j ). Note that by decreasing p s we decrease the extent ofdiscrimination of f . Indeed, if we change p s linearly (in one direction, with the constantspeed), then the probability that, in a random situation from the distribution K , f willoutput a value less (or greater) than f ( t ) , will change linearly, since the probability ofeach situation will change linearly. Moreover, when the value of p s is 1, this probabilitywill be equal to T ( s, f ) , and when the value of p s is equal to 0, this probability will beequal to T ( s, f ) .Under similar conditions, with the only difference that T ( s, f ) > T ( s, f ) , in order todecrease the extent of discrimination of f , we need to increase p s .If, say, T ( s, f ) < T ( s, f ) , and also, the probability that the value of f on a randomsituation from the distribution K is greater than t ( f ) , is greater than / , and the prob-ability that the value of f in a random situation from the distribution K is greater than t ( f ) , is less than / , then with a linear decrease in p s , the extent of discrimination mayfirst decrease, and then start to increase, nevertheless, at p s = 0 , f is less discriminatedthan at p s = 1 .I wrote that the value of U ( s, f ) represents the power of the argument for changing p s in one direction or another. So, let us clarify that if U ( s, f ) > , then from what we havejust written, we see that it makes sense to decrease p s , and if U ( s, f ) < , then it makessense to increase p s . And the more U ( s, f ) , modulo, the more sense it makes to change p s in this way.For example, if the polynomial f is to a large extent discriminated and in randomsituations, as a rule (with probability − ε ), takes a value less than t ( f ) , and for s = 1 ,the probability that f will take a value greater than or equal to t ( f ) is slightly greaterthan the same probability at s = 0 , then in order to reduce the extent of discriminationof f , it makes sense to increase the probability that s = 1 . Moreover, I wrote that theless ε (ie, the more discriminated the polynomial), the more we want to "get him out oftrouble" that is, reduce its extent of discrimination, but exactly for small values of ε , U ( s, f ) tends to take the largest values modulo (for this we introduced a multiplier withthe m function when defining U ).At every regular moment of the algorithm’s work (there will be irregular moments ofits work, the word "regular" in this context is close in meaning to the word "normal"we endow it with the same meaning as it is usually given in mathematical texts), for foreach light bulb s we will find the total argument L ( s ) for the change of probability p s .It is simply the sum over all f ∈ Ω of U ( s, f ) . If the total argument turns out to begreater than zero, we will decrease p s by a small amount (naturally, depending on L ( s ) ),if it is less than zero, we will increase it, if equal to zero, we will not change it. We keepthe magnitude of the change in p s small because we want to approximate the continuouschange in probabilities.How does the magnitude of change depend on L ( s ) ? Let’s restrict ourselves to changingthe probability p i so that the value tan( π (2 p i − changes exactly by − λL ( i ) , where λ is a small positive constant, which we choose as small as we want to bring the change inprobabilities closer to continuous.The arguments corresponding to the most discriminated polynomials are the largest inabsolute value and it is assumed that they often make a decisive contribution to the totalargument.Let us now talk about why we reduce the extent of discrimination of cocyclic polyno-mials. The point is that T ( f ) correlates quite well with the probability that, in a random IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 20 situation from the current distribution K , the value of the polynomial f IS EQUAL TO t ( f ) .Let me explain. Suppose we are dealing with a linear cocyclic polynomial f . Then, bythe central limit theorem, the distribution of values of f on situations from the distribution K is close to normal (we take the sum of independent random variables corresponding tothe light bulbs: if the i -th light bulb lights up (and this happens with the probability p i ),such a random variable is equal to the coefficient in f for this light bulb, if the light doesnot light up, such a random variable is zero). That is, the density of such a distributionis close to a Gaussian on a straight line.And if f is discriminated, it means that the mark t ( f ) is far from the median point ofthis Gaussian (as I call the point at which the Gaussian reaches its maximum) and thatpart of the subgraph of this Gaussian that is on one side of t ( f ) (the smaller of thesetwo parts) is a small part of the area of this subgraph (this part roughly coincides with T ( f ) ). That is, t ( f ) kind of bites off a rather small piece of the Gaussian. If f is notdiscriminated, t ( f ) hits closer to the median point of the Gaussian and breaks it downinto parts that are more comparable in size. It is clear that in the first case the density ofour distribution at the point t ( f ) (that is, the very value of the Gaussian at this point)will be slightly less than its density in the second case. And this means that in the firstcase, in a random situation, we will get a value exactly equal to t ( f ) , as expected, with alower probability than in the second.In the case of polynomials of a greater degree, I do not yet know an explicit class offunctions that approximate the distribution of the values of such a polynomial on randomsituations from the distribution of K , as in the case with linear polynomials and Gaussians.It is possible that the explicit form of such functions can be specified (and this, by theway, would allow us to more accurately estimate T ( f ) , T ( f ) and T ( f ) , in the case thatone of these values turned out to be small and on one of the sides of t ( f ) none of thevalues of f from our polynomial sample was found - we would just pick a function fromthis explicit class of functions that best approximates our distribution and estimated ourvalues as if it were the true distribution density - later I will tell you how to slightlymodify the algorithm so that this class of functions is, again, the class of Gaussians, thatis, our random variable will be distributed normally - and this, as I said, will allow usto more accurately estimate T ( f ) , T ( f ) and T ( f ) , when the overwhelming part of theGaussian subgraph is on one side of T ( f ) ). But if not, we will still allow ourselves to talkabout the distribution density at a given point, although I do not give an explicit way tounambiguously compare our discrete distribution of values of f in random situations from K with a continuous distribution density function approximating this discrete distribution... Most often, the density of our distribution, while walking along the number line to theleft (or to the right), will not decrease abruptly - most likely it will decrease smoothly,gradually tapering off (it is unlikely that at a certain moment there will be a sharp break,after which the density will become zero). This means that in this case, as in the caseof the Gaussian, if we take a polynomial f with very small T ( f ) , the density of thecorresponding distribution at the point t ( f ) turns out to be expectedly less than a similarvalue for the polynomial g with T ( g ) close to / .So, we look at T ( f ) as at an estimate of the probability that in a random situationfrom the current distribution K , the value of the polynomial f is equal to the "correct"value, that is, t ( f ) . In exactly the same way, one can look at T j ( s, f ) as at an estimate ofthe probability that, in a random situation from the current distribution K j ( s ) , the valueof the polynomial f is equal to t ( f ) (in the sense that from the value of T j ( s, f ) you can IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 21 extract a number that is expected to approximate the given probability, and this numberis the smaller, the less is T j ( s, f ) ).It is possible that in order to estimate these probabilities, it would be better to worknot with the functionals T ( f ) , but to estimate the density of the distribution at the point t ( f ) , observing how often the values of f on random situations from K fall into a smallneighborhood t ( f ) , but for some reason I decided that it is better to deal with functionals T ( f ) .As I have already mentioned, the work of our algorithm at regular moments of time willconsist of changing the probabilities p i . A specific way to change these probabilities hasbeen described above. Changing these probabilities is aimed at making the polynomialsfrom the list Ω , and in particular, the most discriminated polynomials from this list, aslittle as possible.But we have seen that the value T ( f ) , which determines the extent of discriminationof the polynomial f , correlates well with the probability that the value of f in a randomsituation from the current distribution K will take the value t ( f ) . Therefore, our processof changing probabilities can be viewed as a process that "continuously" changes proba-bilities in such a way that for each polynomial f from Ω a randomly chosen situation fromthe distribution K (that is, i -th light bulb turns on with probability p i , independently ofother light bulbs) with the highest possible probability had the property that f on it takesthe value t ( f ) . This means that we jointly optimize this probability for all polynomialsfrom Ω , finding some compromise.Our algorithm tries to increase this probability for each polynomial f from Ω , andespecially for those of these polynomials for which such a probability is small (of course,not the real probability - we do not know it - but our estimate of this real probability).Recall, for example, the cocyclic polynomial f of degree 1 described closer to the be-ginning of the section, which always takes the value 1 and its coefficients for unprotectedmonomials of degree 1 (that is, in fact, for small light bulbs) are k ones ( we call suchbulbs as A bulbs), k minus ones ( B bulbs), and maybe some number of zeros, and thefree coefficient of this polynomial is zero. If initially all probabilities tied with unprotectedbulbs (not implying a contradiction) are equal to / (remember, probabilities with pro-tected bulbs are always zero - protected bulbs are always off), then initially we have anargument towards moving the probabilities for A bulbs from / towards one: when aparticular A light bulb is on, the probability that f will take the value t ( f ) is initiallyslightly higher than when it is off. However, over time, the situation may change: afterthe algorithm has worked for a certain time, for example, the probabilities for B bulbscan become, for the most part, very close to zero, but with A bulbs - close to 1 Andthen, for a particular A light bulb, f will generate an argument in the direction of movingcorresponding probability towards zero.Now I will give the final algorithm for finding a situation, and then I will motivate itsdetails.The algorithm will use the procedure for finding and adding new cocyclic polynomialsto the list of processed cocyclic polynomials Ω . This procedure will be the subject of thenext section, in which the procedure will be described and detailed. For now, I will use it asa black box, and I will only say that the procedure will try to find the most discriminatedand semi-discriminated (I will give the definition of the latter later) cocyclic polynomialsfor the current values of probabilities.Also, the algorithm will use the operations of fixing and unfixing of unprotected bulbs.Fixing the i -th light bulb means freezing the value of p i in state 0 or 1 (just beforethe moment of freezing, the probability value can differ from 0 and 1). As long as thelight bulb is fixed, we do not change the probability value for this light bulb. Unfixing IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 22 the i -th light bulb means unfreezing the value of p i , allowing it to change (immediatelyafter the moment of unfreezing, the probability value is exactly the same as it was in thefrozen state). Thus, at any moment of time, light bulbs are divided into 2 types: fixed,the probability for which is temporarily fixed and takes the value 0 or 1, and non-fixed,the value of which varies within the segment [0 , .Also, at any given time, there will be some subset of bulbs in the "unfixable" statusamong the unfixed bulbs. If the light bulb is in this state, it signals to the algorithm thatit cannot be fixed. The rest of the unfixed light bulbs are in the "fixable" status.At the moment, the algorithm is a little unfinished, in the sense that it uses a numberof parameters (a set of numbers and two real-valued functions of a real domain), the mostoptimal value of which is still unknown to me. We only know the magnitude of theseparameters (if we are talking about numbers) and how they should roughly relate to eachother and to the size of the circuit. It might be worth experimenting to find the mostappropriate values for these parameters. The efficiency of the algorithm is unlikely to besensitive to slight changes in these parameters.For brevity, I will call these parameters selectable.When describing the algorithm, for the sake of convenience, I call the selecatbleparameters-numbers constants, but in reality they are not really constants, but numbers,possibly depending on the size of the circuit.We have already introduced some of these parameters above.The list of parameters is as follows: d (the degree of the cocyclic polynomials we areworking with), a (the length of the time interval after which we unfix the light bulbfrom the beginning of the list), b (the length of the time interval after which we updatethe list of cocyclic polynomials), m (the length of the first step inside the cycle), m (the maximum length of the second step inside the cycle), C (the time during which thebulbs remain unfixed), L (the threshold for unfixing a fixed bulb), δ (the size of theneighborhoods of zero and one, that we fix the light bulb, the probability hits within thisneighborhood), p ( N ) (the size of the sample with which we estimate T j ), h (the parameterused to estimate T j ), l ( t ) and m ( t ) (real-valued functions used to define U (s, f)), λ (smallreal-valued multiplicative constant, via which we reduce the change in probabilities byeach step, bringing the change in probabilities closer to continuous).So, the algorithm. Comments to it are shown in curly braces.* Check if there are no protected cocyclic polynomials of degree at most d , if such apolynomial is found, stop the algorithm and report that the situation does not exist* r : = 0; { r is the counter we need in order to understand how long the algorithm is running,in order to update the list of cocyclic polynomials from time to time, to unfix the lightbulb from the beginning of the list, which will be discussed later, as well as, in order tokeep track of how long each light bulb in the "unfixable" status stays in this status, if youcount from the moment when it acquired this status the last time. } * p i : = 0.5, for all unprotected i { we initialized the initial probabilities for unprotected bulbs } * p i : = 0, for all protected i { you can forget about protected bulbs altogether, they are always off and the proba-bilities are always equal to zero } * initialize P list as empty IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 23 { At any given time, the P list will contain all currently fixed bulbs, in the order inwhich they were fixed last time; thus, at the beginning of the list, there will be a light bulbthat is in a fixed state, since the moment of the last fastening, the longest time among allfixed bulbs. } * Firstly all unprotected bulbs are unfixed, we initialize their status as "fixable" ;* As long as there is at least one unfixed value of an unprotected light bulb, or all ofthem are fixed, but the corresponding situation is not consistent, we do: { The main part of the algorithm’s work consists of one cycle, the body of which consistsof three steps. The first of these steps is optional: I added it because my intuition (whichI cannot explain yet) tells me that the algorithm will work better if there is a substantialperiod of time between two adjacent fixings of light bulbs (this first step is exactly aimedto "wait" this substantial period of time, simply changing the probabilities and not fixinganything, but possibly unfixing). However, perhaps everything will be fine if the first stepis simply removed.After completing each iteration of the loop, the algorithm checks if we have reached thegoal. This check consists in that, firstly, we check if all unprotected bulbs are fixed (if notall, then we have not yet arrived at the desired result). Secondly, if all the bulbs are fixed,it is necessary to check if the corresponding situation is consistent. The correspondingsituation is understood as a situation in which unprotected bulbs, in which the probabilityis 1, are turned on, and unprotected bulbs, in which the probability is 0, are not turnedon (protected bulbs, as always, are off).Whether a particular situation is consistent is checked locally at each point in thecircuit. } * (optional): do m times: { This is the first step. It is a loop that repeats m times, where m is the selectableparameter. } * For each unprotected light bulb i , we calculate the total argument L ( i ) as described above;* Change all probabilities p i corresponding to unprotected unfixed bulbs asdescribed above;* For each fixed unprotected light bulb s , such that the probability corre-sponding to it, at the moment, is 1, if the calculated value of the total argument L ( s ) withit is greater than the constant L we have chosen, then we unfix the light bulb s , removeit from the list P and declare it unfixable. Symmetrically, for each fixed unprotected lightbulb s , such that the probability corresponding to it, at the moment, is equal to 0, if thecalculated value of the total argument L ( s ) for it is less than − L , then we unfix the lightbulb s , remove it from the list P , and declare it to be unfixable;* r : = r + 1 ;* if r is divisible by a , unfix the light bulb from the beginning of the list P ,remove it from the list P and declare it unfixable;* if r is divisible by b , we look for and add several (in the amount of thechosen parameter) new cocyclic polynomials (see the next section);* if an unprotected light bulb is in the "unfixable" status for the time C (since the moment it changed its status to "unfixable" the last time) ( C is a selectableconstant; times are measured in values of counter r : if the counter value has changed by10, 10 units of time have passed), we change its status to "fixable"* do m times ( m is the constant we have chosen), or until we had to fix something: { This is the second step. It is a cycle, the body of which differs from the body of thecycle of the first step only in that some kind of light bulb is possibly fixed in it. } IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 24 * For each unprotected light bulb i , we calculate the total argument L ( i ) as described above;* Change all probabilities p i corresponding to unprotected unfixed bulbs asdescribed above;* If any of p i , for unprotected fixed i , differs from 0 or from 1 by no morethan δ , where δ is the constant we have chosen, by less than / , we fix the i -th bulb,making p i equal to 0 or 1, respectively (if there are several such bulbs, we do this, fordefiniteness, with bulb of the smallest number); In this case, as well, we add the light bulbwe just fixed to the end of the list P , and also interrupt the execution of the cycle of thesecond step and go to the third step;* For each fixed unprotected light bulb s , such that the probability with it,at the moment, is 1, if the calculated value of the total argument L ( s ) with it is greaterthan the chosen by us constant L , then we unfix the light bulb s , remove it from the list P and declare it unfixable. Symmetrically, for each fixed unprotected light bulb s , suchthat the probability with it, at the moment, is equal to 0, if the calculated value of thetotal argument L ( s ) for it is less than − L , then we unfix the light bulb s , remove it fromthe list P , and declare it to be unfixable;* r : = r + 1 ;* if r is divisible by a , unfix the light bulb from the beginning of the list P ,remove it from the list P and declare it unfixable;* if r is divisible by b , look for and add several (in the amount of theselectable parameter) new cocyclic polynomials (see the next section);* if an unprotected light bulb is in the "unfixable" state for time C (sincethe moment it changed its status to "unfixable" the last time), we change its status to"fixable"* if at the previous (second) step we did not have to fix anything (the cycle workedthrough all m loops and none of the probabilities of unprotected bulbs never approachedneither zero nor one), we fix that unprotected fixable light bulb, the probability value atwhich is farthest from / at the moment; at the same time, if this value is closer to 0,we fix it to 0, and if the value is closer to 1, we fix it to 1 (if there are no unprotectedfixable bulbs at all, we do nothing at this step); { This was the third step. } First, we will motivate that element of the algorithm, which I call fixing the light bulb.We need fixing in order to get a set of probabilities, each of which is equal to zero or one.If we did not fix the light bulbs, but simply changed the probabilities, we would greatlyreduce our chances of getting a set of probabilities, each of which is very close to zero orone, and hence the situation. The change in the probabilities with unprotected unfixedbulbs, in turn, is necessary for us in order to understand which of the bulbs should befixed next and in what value (0 or 1).I already wrote above that we change the probabilities in such a way that a randomsituation from the distribution K satisfies with the highest probability that the value of f on it is equal to t ( f ) , finding a compromise of this probability for polynomials f from Ω . Therefore, it is fair to say that we increase the expected probability p that a randomsituation from the distribution K will be consistent. The same remains true in the casewhen we start from time to time to fix and unfix the light bulbs in the manner described inthe algorithm. This probability may perhaps expectedly decrease slightly at the momentwhen we fix the light bulb. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 25
Imagine a situation where we do not change the probabilities: all probabilities withunfixed bulbs are always equal to / and from time to time we randomly choose a bulband fix it, making the probability 0 or 1 with the same probability. The probability of p , in this case, in the end, most likely, will become equal to 0, since in the end we willobtain the distribution of K , always giving the same completely arbitrary situation. Thatis, along the way, at a certain moment, we will most likely simply "lose all consistentsituations" - K will no longer output them at all and p will become equal to 0.What can prevent us from "losing all the consistent situations along the way" in ourcase? First, between two adjacent fixings of light bulbs, we intentionally change the prob-abilities so that the probability p is expected to increase. Secondly, for fixing we alwayschoose a light bulb, which at the moment corresponds to the probability farthest from / (or one of the most distant, in the case of "premature" fixation, when we find aprobability that is farther no more than by δ from zero or one and there are several suchprobabilities at once), let it be s and let it correspond to the probability q . Let, also, fordefiniteness, q > / . This means that choosing a random situation from the version ofthe distribution K before fixing, we, in fact, with the probability q choose the situationfrom the distribution K ( s ) and with the probability − q choose the situation from thedistribution K ( s ) . And choosing a random situation from the version of the distribution K after fixing, we choose a situation from the distribution K ( s ) . It is clear that the closerthe probability q , in this case, to 1, the less expectedly the probability p will change as aresult of fixation.For similar reasons, when we choose a light bulb for fixing with the probability farthestfrom / , the values of T ( s, f ) also change expectedly the weakest.From time to time, while we change the probabilities and fix the bulbs, we may haveserious arguments to unfix a particular bulb. For example, this can happen when we havefound (and added to Ω ) a new cocyclic polynomial f , which for sure takes a value less (orgreater) than t ( f ) in those situations in which the values of those bulbs , which we fixed,correspond to the values of the probabilities fixed with these bulbs: in this case, at leastone of the fixed bulbs must definitely must be unfixed. Such arguments can also arise frompolynomials added to Ω long ago, as a result of changing probabilities. We unfix the lightbulbs "reluctantly" we do this only if the arguments to do so are strong enough, to bemore precise, these arguments in total must be greater than the selectable constant L .We also sometimes unfix a light bulb from the beginning of the list P , even if thereare no serious arguments to unfix it, which were discussed above. This is done in orderto give a chance to change the probabilities of light bulbs that have been in a fixed statefor a very long time. When a light bulb stays in a fixed state for a very long time and itturns out that we do not have serious arguments to unfix it, and we have not found thedesired situation, it may be time to change something - to unfix this light bulb and seewhat happens, if the probabilities are given the opportunity to change. But it is worthdoing this quite rarely - the constant a should be made a lot bigger than m , m , b , and C .After we have unfixed the light bulb, we need to give the probability with it the opportu-nity to change freely for some time, without giving the algorithm a chance to immediatelyfix it on the grounds that the probability with it is at a distance less than δ from zeroor one. For this, immediately after unfixing, we assign the status "unfixable" to the lightbulb for the time C . IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 26 Search for cocyclic polynomials.
What we are looking for.
In the last section I wrote that we will look for (and givepreference to them when adding to Ω ) most discriminated and most semi-discriminatedcocyclic polynomials. I think, first, I should explain why this is so, at the same timedefining the semi-discriminated polynomials.First, let’s discuss why we are adding new polynomials to Ω at all. After all, if we selecta fixed set of cocyclic polynomials satisfying the property that if in the situation they alltake the correct values, then the situation is consistent and we will be able to obtain aset of probabilities, each of which is equal to 0 or 1, and the situation corresponding tothis set is such that all polynomials of our set take the correct values on it, then we haveachieved the goal.Let me remind you that the algorithm for changing the probabilities works in sucha way that it maximizes the probability that a random situation from the K distribu-tion will be consistent. We implicitly estimate this probability based on what values thepolynomials take on random situations from K . It is fair to say that we change the val-ues of the probabilities based on an (implicit) estimate of the probability of consistencyof a random situation from K . The reason for adding new polynomials to Ω is that themore polynomials, the more accurate this estimate is - new polynomials give us additionalinformation.Let me remind you that at each step the change in each probability consists of sev-eral summands - each polynomial brings its own summand - the argument of the givenpolynomial.I argue that when adding a polynomial, we should add the polynomial with the expect-edly strongest arguments. To explain this, one could refer to the well-known saying thatwe can learn little from someone whose opinion never contradicts ours (a polynomial withweak arguments says that the current probabilities are good enough, a polynomial withstrong arguments says that no, something needs to be changed), but the reader probablywants more.The reasoning is following. Imagine a situation when we are faced with a choice ofwhich of two polynomials f and f to add to the current set of polynomials Ω . Moreover,the arguments of the f polynomial are somewhat stronger than the arguments of the f polynomial. The most optimal would be to change the probabilities using the argumentsof all the listed polynomials: polynomials from Ω , f and f (change polynomials basedon more information). But since we are allowed to add only one of f i , it is better to add f , because if we talk about the next few changes, then the result of changes based on Ω and f differs less from the result of the most optimal changes based on Ω , f and f ,than the result of changes based on Ω and f differ from the result of changes based on Ω , f and f , since the first difference is exactly arguments of f , and the second differenceis exactly arguments of f .So, every time you add polynomials with the expectedly strongest arguments. Now Iwill explain why discriminated polynomials have this property. I will explain this withthe example of two linear cocyclic polynomials, one of which, f , is discriminated, andthe other, f , is not. t ( f ) = t ( f ) = 0 , among the coefficients of f there are k ones, k minus ones, other coefficients are zeros, among the coefficients of f there are k ones, k minus ones, other coefficients are zeros. All probabilities p i are / .Suppose that we want to calculate the strength of the argument of f correspondingto the light bulb s . If the coefficient of s in f is zero, then this argument will be zero. Ifthe coefficient is nonzero, then we divide all situations into classes according to the valuesof the light bulbs with coefficient 1, other than s . Consider an arbitrary of these classes, IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 27 let l of light bulbs with coefficient 1, other than s , take the value 1 in this class. Thecontribution to f , given by s , differs by 1, depending on whether we turn on the s ornot, be the coefficient of s is one or minus one. This means that the number of bulbs witha coefficient minus one, other than s , which we have the right to turn on, so that f isgreater than or equal to t ( f ) , that is, greater than or equal to zero (we want to evaluatethe probability that f will take a value that is on the side of t ( f ) , on which it will takea value with a lower probability) also differs by 1: say, l and l − . But the number ofsubsets of size at most l in a set of size k is significantly larger than the number ofsubsets of size at most l − in a set of size k . Approximately k/l times as larger.That is, for each of the considered classes, the probability of f being greater than zeroin one of the cases is significantly greater than in the second, and therefore the argumentaimed at increasing the probability corresponding to the second case will be quite large.Carrying out a similar reasoning in the case of the polynomial f , we can also divide allsituations into classes according to the criterion of what values the bulbs will take witha coefficient of 1 and see that in most of these equally probable classes, the probabilitythat f will be greater than zero, for different values of the light bulb s , differ much lesssignificantly, and therefore, in general, the probabilities that f will be greater than zerofor different values of the light bulb s differ much less significantly than in the case of f ... Symmetric reasoning suggests that the probabilities that f will be less than zero fordifferent values of the bulb s also differ much less significantly than in the case of f and"less than zero".By the way, the idea of using the most discriminated cocyclic polynomials correlateswith the book [3], judging by its title and some of the phrases I read from it. The bookconsists of various examples of analysis of solving olympiad mathematical problems and,as far as I understand, the main promoted idea is that if we understand where in theproblem it is most difficult for us, then this difficulty should be overcame in the firstplace.In addition to discriminated cocyclic polynomials, we will be interested in semi-discriminated ones. A semi-discriminated cocyclic polynomial f is one for which thereis a light bulb i and j ∈ , such that if you replace p i with j , then f becomes discrimi-nated. Then we say that the polynomial f is semi-discriminated with respect to the lightbulb i and the value j . In other words, it is a polynomial f for which T j ( i, f ) is close tozero. It is easy to see that the argument (given the initial probabilities) for the i th lightbulb generated by such a polynomial has a good chance of being strong. We will searchfor semi-discriminated polynomials for each light bulb i in a way that is no different fromthe way we search for discriminated polynomials, with the only difference that we willreplace p i with 0 or 1.Perhaps, for reasons of saving resources, it would be worthwhile to restrict ourselvesto that for each found semi-discriminated polynomial with respect to an arbitrary lightbulb i and the value j , when changing probabilities, using only that of its arguments thatcorresponds to the light bulb i , and arguments that match other bulbs should not beevaluated or used.Let us now turn to the question of how one can find the most discriminated (andsemi-discriminated) cocyclic polynomials.I will propose two approaches to finding the most discriminated polynomials. The firstapproach we will call "the half-space method" and the second - "the quadratic functionoptimization method" . When implementing the algorithm, you can combine both of thesemethods somehow.Among these two methods, I prefer the quadratic function optimization method.Both methods will use the selectable parameters, just like in the previous section. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 28
Half-space method.
Note that if we change the value of f by a constant, then itsvalue in each of the situations and t ( f ) will change by this constant. Therefore, the searchfor discriminated cocyclic polynomials is reduced to the search for discriminated cocyclicpolynomials f such that t ( f ) = 0 . Note that the cocyclic polynomials f such that t ( f ) = 0 form a vector space. Let’s call it L . Note that any situation z defines a linear function H z on L , given by the formula H z ( f ) = f ( z ) . A polynomial f ∈ L will be discriminated if,for a random situation z from the distribution K , either with a probability very close to1, H z ( f ) > , or with a probability very close to 1, H z ( f ) < .Let us choose the situations z , z , ..., z q ( n ) randomly from the distribution K , where n is the size of the circuit, q is some polynomial that is selectable parameter. If we find apolynomial f ∈ L such that for the vast majority of z i , H z i ( f ) > (or H z i ( f ) < , for thevast majority of z i ), then f with a high probability is discriminated.Note that the set of those f for which H z i ( f ) is greater than zero (less than zero),forms a half-space in L , for each i . Thus, our task is following: given a polynomial setof open half-spaces, it is required to select a point belonging to as many of these half-spaces, or a point belonging to as many interiors of their complements as possible (I said"interiors" , since the boundary hyperplane on which H z i is zero should be discarded fromcomplement).So, how to "pierce" as many half-spaces from a given set as possible? Here’s the idea.We start from a random point from some natural distribution on L , which is a selectableparameter (there are few requirements for the distribution, we just want the randompoint not to fall too far from zero too often). We begin to move around L in jumps. Thedisplacement vector corresponding to each jump is calculated as follows, depending on thecurrent point x at which we stand. For each half-space H from our collection, construct anormal d H of unit length to the hyperplane separating H from the complement to H anddirected towards H . So, the displacement vector corresponding to the jump from the point x is equal to the sum of d H over all half-spaces H from our set that DO NOT CONTAINthe point x , multiplied by some small selectable parameter λ (we multiply by λ , again,out of a desire to bring the movement closer to continuous).I believe that if there are points in L that pierce a fraction of − ε of half-spaces ofthe set, we have a good chance, when moving, to walk through a point piercing, say, afraction of − ε of half-spaces of the set.The reason why I think so is following. Let x be a point piercing a part of − ε ofhalf-spaces of the collection. Let’s color the half-spaces pierced by the point x in blue,and not pierced - in red. Let δ ( x ) be the fraction of all half-spaces of the set of bluehalf-spaces that do not contain the point x . Since each such half-space H contains x and does not contain x , the normal d H corresponding to this half-space satisfies that itsscalar product with the vector connecting x and x is greater than zero. So, if the point x , at which we are staying at the moment, satisfies that δ ( x ) ≤ ε , then we are alreadyat the point piercing the fraction − ε of half-spaces of the set ... If δ ( x ) > ε , then thedisplacement vector at the moment will consist of more than εq ( n ) normals "directedtowards the point x " and only εq ( n ) normals corresponding to red half-spaces (you alsoneed to multiply by λ ). This means that the next move is more likely to bring us closer to x than to move us away of x . It is unlikely that it will get too long to approach the point x . However, you need to understand that that it’s not always that such a movement willlead us to the desired points.Let’s do the following. Let m and k be two sufficiently large numbers that are selectableparameters (I intuitively estimate the optimal choice of m as much larger than the optimalchoice of k ). The algorithm is following: we select a random point k times from thedistribution on L , which I wrote about above, start from it and make m steps in the way IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 29 described above. We will get ( m + 1) k points. We choose the one that is contained inthe largest number of half-spaces in the collection. After that we add the correspondingcocyclic polynomial to Ω .6.3. Quadratic function optimization method.
We start by looking for linear dis-criminated cocyclic polynomials f with zero free term, such that t ( f ) = 0 . That is, inother words, these are simply invariant linear combinations of a certain set of light bulbvalues, which output 0 in consistent situations. The value that such a linear combinationoutputs in a random situation is the sum of random variables τ i , where τ i = 0 in casewhen the i − th bulb is not lit and τ i is equal to the coefficient at the i -th bulb, in casethe i -th bulb is lit.Recall that the τ i are independent. This means that you can use the Central LimitTheorem with caution. We are interested in the central limit theorem in the Lindebergform: Theorem 6.1.
Let the independent random variables X , . . . , X n , . . . be defined on thesame probability space and have finite mathematical expectations and variances: E [ X i ] = µ i , D[ X i ] = σ i .Let S n = n P i =1 X i .Then E [ S n ] = m n = n P i =1 µ i , D[ S n ] = s n = n P i =1 σ i .And let the “ Lindeberg condition ” be satisfied: ∀ ε > , lim n →∞ n P i =1 E h ( X i − µ i ) s n {| X i − µ i | >εs n } i = 0 , where {| X i − µ i | >εs n } is an indicator func-tion.Then S n − m n s n → N (0 , in distribution for n → ∞ . Thus, if there are quite a lot of nonzero coefficients and among them there are no thosethat are much larger than the others, then we can expect that the distribution of thedeviation of our random variable from its expectation divided by s n (see the theorem)will be close to that indicated in the theorem normal distribution N (0 , . Thus, if wetake the plot of the N (0 , distribution, stretch it s n times horizontally and shift it bythe expectation of our random variable P τ i (let’s call it C ), then we we get a distributionthat approximates the distribution P τ i .So, P τ i has a distribution specified by two parameters: C (expectation) and s n (stretch). If a linear cocyclic polynomial f , with t ( f ) = 0 and zero free coefficient, in-deed gives a close to normal distribution with parameters C and s n , then looking atthe plot of the normal distribution, we understand that the extent of its discrimination(of polynomial) is determined by how much less is s n than C in absolute value. T ( f ) isdetermined by the ratio s n / | C | and T ( f ) is the less, the less this ratio is.Note that s n is a positive definite quadratic form on the space of linear cocyclic poly-nomials (let’s call this space L ), with a minimum at zero (by definition, s n is equal to thesum of variances of all τ i , but the i -th coefficient is a linear function on L , and the varianceof τ i , in turn, is equal to the square of this coefficient multiplied by p i (1 − p i ) + (1 − p i ) p i ;in this context we consider p i as a constant). In turn, C is a linear function on L (takingzero value at zero).The set of points in L at which C is equal to -1 forms a hyperplane L (unless, of course, C is not identical zero).The elements of the space L that we add to Ω are following. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 30
First of all, this is the point h in L , at which the quadratic function s n reaches itsminimum (on L ). This is the very point that is the estimate for the most discriminatedcocyclic polynomial in L .Intuition tells me that along with this polynomial, in Ω it is worth adding a few moreslightly less discriminated polynomials. I propose to add polynomials corresponding to thefollowing points. Let us write the quadratic function corresponding to s n as a function on L , in some orthonormal coordinate system on L , centered at the point h . The result is aquadratic function with a zero linear part and a nonzero constant part. Such a quadraticfunction can be reduced to the principal axes.Consider the set of lines on L passing through h corresponding to the given principalaxes. The points we need are the union of pairs of points along these lines, in whichour quadratic function is equal to θ , where θ > is a selectable numerical parameter,small in absolute value. You shouldn’t look for internal logic in this particular choice ofpoints other than h , I just want to add a set of points surrounding the point, which isthe estimated most discriminated cocyclic polynomial, at all sides. I decided that theindicated points at the main axes would be the most natural choice ...The search for the minimum point of a positive quadratic function and the presentationof the quadratic form to the principal axes is carried out by standard algorithms inpolynomial time.What can be said about cocyclic polynomials of greater degree, with possibly nonzerofree coefficient?Let τ i be a random variable corresponding to the i -th monome (equal to zero if thelight bulbs corresponding to the given monome are not all lit and equal to the coefficientof this monome if all of them are lit). The problem is that the τ i are not independent thistime, so the Central Limit Theorem can not be applied.But the Internet is full of generalizations of the central limit theorem to cases whenthe independence condition is somehow weakened, and our random variables { τ i } areclose to independent, since two monomials of a cocyclic polynomial of small degree takenat random with high probability do not intersect in light bulbs, which means that therandom variables corresponding to these monomials are independent with high probability.Therefore, it is possible that the distribution of the random variable P τ i interesting tous is still very often close to normal and we can apply the same method.But I like the other approach more. We considered light bulbs, each of which eitherlights up or does not light up and each monomial is calculated as the product of zerosand ones corresponding to the corresponding light bulbs, multiplied by some factor. Youcan do a little differently by assigning one large light bulb to each monome. This lightbulb will no longer depend on the adjacent gates, it will depend on the union of the gatesof all the bulbs corresponding to a given monome, and if we want to understand whichbulbs light up at a selected set of gate values, then a specific bulb will light up exactlywhen each of bulbs corresponding to this monome lights up.And the concept of a situation can be given meaning as a matching of zeros and onesto each of the bulbs of this considered set of bulbs, each of which is determined by a setof no more than d sets of adjacent gates and the values chosen for these gates.The case when the number of such sets is equal to zero corresponds to the free term ofpolynomial.That is, each lamp polynomial in the old sense corresponds to a linear lamp polynomialon the considered set of bulbs, the value of which, calculated for any set of gate values ofthe circuit, will be the same. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 31
Since the values for the same sets of gate values coincide, the invariance of the lamppolynomial on the considered set of bulbs can be checked by checking the cocyclicity ofthe corresponding lamp polynomial in the old sense.The algorithm in the case of linear polynomials on the extended set of light bulbs looksexactly the same, but the random variables τ i corresponding to the bulbs of the extendedset will be independent, which opens the way for applying the Central Limit Theorem andsearching for the expectedly most discriminated cocyclic polynomials the way discribedabove.Sometimes it is very useful to add to Ω cocyclic polynomials in which there are fewnonzero coefficients for monomials (or bulbs of an extended set of bulbs) with the proba-bility of becoming one separated from 0 and 1 (in particular, polynomials in which thereare few nonzero coefficients for monomials / bulbs of an expanded set of bulbs with aprobability of becoming one other than 0 and 1). But such polynomials are no longerdiscussed in the context of Central Limit Theorem and it cannot be argued that thecorresponding distribution is close to normal (there are terms that make a significantcontribution), therefore, we shouldn’t wait for our algorithm to find such a polynomial.To partially compensate for this, one can choose the constant Q , iterate over all subsetsof the set of monomials/bulbs of the extended set of bulbs with a probability of becomingone, different from 0 and 1, of size at most Q , and for each such subset find cocyclicpolynomials (and if the find will suit us - add it to Ω ) in the constant-dimensional spaceof cocyclic polynomials whose coefficients are zero outside this subset. You can search forthem, for example, using the half-space method.7. On the estimated complexity of the algorithm.
Since the algorithm is stochastic, one should not expect explicit upper bounds on therunning time for it. We can only talk about estimates of the average work time, for somevariation in the definition of the average complexity. It is assumed that the ideal algorithmI am striving for will work on average in a time equal to the polynomial of the length ofthe input string and the running time of the Turing machine (circuit height) multiplied bythe exponent of the record length m of the Turing machine. The exponential dependenceon the TM size appears at least due to the fact that in each local neighborhood of the gatewe use an exponential in m number of bulbs. The multiplicative constant in the exponentof m almost certainly allows for the possibility of its rather serious decrease.Moreover, in cases that come from practice, the dependence can often be made polyno-mial in m . This happens because often for TMs that come from practice, the calculationon this TM can be represented as a circuit, each gate of which is calculated from no morethan 8 other gates (the constant 8 was chosen very conditionally). Having presented thecircuit in this form, one can afford to consider and use only light bulbs that depend, say,on no more than 12 gates of the circuit (for each light bulb, these gates can either belocated next to each other or be scattered over different parts of the circuit). But, wemust understand that winning in resources, limiting ourselves in the size of the bulbsused, we lose something. In particular, the strength of our evidence system is diminishingsomewhat. 8. Big string writing problem.
Now let’s talk about the big string writing problem: it is required to construct analgorithm S , according to the circuit M , constructing an algorithm L , which, from a string A of fixed (depending on M ) length, constructs a string B = L ( A ) of fixed (dependingon M ) length such that M ( A, B ) = 1 . IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 32
Consider a Turing machine representation of L . We will restrict ourselves a little, decid-ing to look only for such Turing machines L that work for the time bounded from aboveby the polynomial p in terms of the sum of the total lengths of A and B and the size ofthe circuit, the head of which does not move away from the initial position during oper-ation by a distance exceeding the polynomial p in terms of the sum of the total lengthsof A and B and the size of the circuit, and such that the alphabet that this machineoperates with and the number of its states are limited by the constant C . We do notlimit ourselves too much, since we can apply our algorithm many times for increasinglylarge polynomials p and p and C constant. This means that L can be represented as apolynomial-sized circuit.Let me remind you how we transform the Turing machine into a circuit. Let’s markeach cell of a rectangle of the corresponding size with coordinates ( x, y ) with the stateof the tape in the tape cell with the number x at the time y , if we run our TM on somestring. The state of the tape consists of the symbol written in this cell at the moment,a bit that means the presence of the TM head in this cell at the moment, as well as theTM state encoded with zeros and ones at a given moment. Note that the labeling of a cellwith coordinates ( x, y ) is determined by the labels in cells with coordinates ( x − , y − , ( x, y − , ( x + 1 , y − . Thus, we can compose a Boolean circuit, each gate of whichcorresponds to a bit of a mark of a certain rectangle cell, and the value of this gate iscalculated in a natural way from all the gates of the marks of three rectangle cells locatedon a row one below directly under this cell, slightly to the left or slightly to the right.Thus, to find L , it is enough to find the circuit, L will be determined unambiguouslyafter that. Since the state of the tape at a given point and time is encoded by a constantnumber of bits - the gates of the circuit, all the gates of the circuit are divided into aconstant number of types: each type consists of a set of gates corresponding to each otherand has one gate in each cell. And gates of the same type are calculated from the gatesof the cells located under them in the same way. Thus, it is enough for us to constructa function for each of the types, via which the value of each of the gates of this type iscalculated from the values of the gates of the cells below it.And each of these functions is naturally determined by the set of zeros and ones in aquantity equal to the number of possible values that the bits of three consecutive cellscan take.Thus, L is defined by the string N , consisting of such sets of zeros and ones, for alltypes of gates in the circuit. If such a string does not correspond to any Turing machine,its consideration is still pertinent: the Turing machine cannot be determined according tothe corresponding circuit (it of course can, but in another, slightly less trivial, way), butthis circuit can as well be used as an algorithm for finding the string B by string A .Moreover, you can give yourself more freedom and allow functions corresponding todifferent gates of the same type to be different (and then the length of the string definingthe algorithm L - we will allow the freedom of speech and also call it N - will be multipliedby the size of a circuit). In this case, perhaps we will get a circuit of big complexity.One way or another, the algorithm L can be represented as a string N (this was alreadyclear, I just wanted to emphasize that this can be done in different ways and since theseways are quite different in nature, the efficiency of the algorithm S may depend on thechosen way) and we will look for the algorithm L in the form of the string N .Let us denote by N ( A ) the string B , which is built from the string A by the algorithmthat is encoded by the string N . Let M be a naturally constructed Turing machine thattakes two rows A and N and satisfies the condition M ( A, N ) = M ( A, N ( A )) . IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 33
Thus, we have brought the task to following form. Given a machine M , you needto construct a string N of fixed length, such that for any string A of a fixed length, M ( A, N ) = 1 .My approach is following. We will maintain two lists: the first consists of algorithms { A i } that construct the first argument of M from the second argument, the secondconsists of candidates { N i } for the second argument of M . We will find and add itemsto each of these lists.We initialize the lists as you like. Each time, adding an element to the second list, weselect some sufficiently large, polynomial in size, subset A i , A i , ..., A i k of the first list andlook for a string N such that M ( A i j ( N ) , N ) = 1 , for any j .It is easy to see that such a problem can be formulated as a small string writing problem.Let’s try to solve this problem. If it turns out that such a string N does not exist, thenwe can conclude that we will not be able to find a solution to this problem: for any string N , there is a string A for which M ( A, N ) = 0 . If the required string N is found, thenwe add it to the second list. If we cannot solve a given instance of a small string writingproblem, then we do nothing and move on.Each time, adding an element to the first list, we select some sufficiently large polyno-mial in size subset N i , N i , ..., N i k of the second list and look for an algorithm A ′ , suchthat M ( A ′ ( N i j ) , N i j ) = 0 , for any j .It is easy to see that such a problem can be formulated as a small string writing problem.(If, again, we restrict ourselves and represent A ′ as a string of fixed polynomial length l .)Let’s try to solve it. If such an algorithm A ′ can be found, we add it to the first list. If not,do nothing and move on. It should be noted that here it is necessary not to overdo withthe value of the parameter l : if we make l large compared to the sample size k , then we canget an algorithm that simply iterates over all N i j , choosing one of them , which coincideswith the input N and outputs the string A stored in advance for the given instance, suchthat M ( A, N ) = 0 . Therefore, some kind of regularization is needed, for example, in theform of prohibiting l from being large.We will add items in the specified way to the first and second lists in turn. Our goal iseither someday to find the required string N in the second list, for which, for any string A , M ( A, N ) = 1 , or to find the algorithm A ′ in the first list, for which for any string N , M ( A ′ ( N ) , N ) = 0 .We will say that the string N wins the algorithm A ′ if M ( A ′ ( N ) , N ) = 1 . In case if M ( A ′ ( N ) , N ) = 0 , we say that A ′ wins N .The logic of the algorithm is to get algorithms in the first list that defeat as manystrings of the second list as possible, and in the second list - strings that defeat as manyalgorithms of the first list as possible. So we hope to find an algorithm that conquers allpossible strings, or a string that conquers all possible algorithms (and therefore is thedesired one).Certainly, since we want to find such a string or an algorithm, we should from time totime go through all the elements of both lists and check for each of them whether it isthe desired one (obviously using the algorithm for a small string writing problem).I have omitted until now how exactly we should collect samples from k elements ofone list or another. We will choose these k elements randomly, but different elements willhave different probabilities p . And the probability of a given element will monotonicallydepend on the strength r of this element. (I do not specify exactly how it will depend,this function is a selectable parameter.) The strength of an element is defined as theprobability that an element will win an element of another list, where the probability q of an element of another list will be chosen taking into account its strength. Now I willexplain how to give a logical meaning to this definition. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 34
The set of values of the strength r and the probability q on all elements of both lists isa system that is self-balancing in time.In addition to the probability p and the strength r of the elements of both lists, we willsupport such parameters as the ideal of strength r ′ , the distribution Q , consisting of theprobabilities q of choosing an element of each list in the distribution Q , and the ideal q ′ of the probabilities of choosing an element in the distribution Q .The strength of r is a time-varying quantity, parallel to the main part of the algorithm,which is adding and removing items from lists. The ideal of strength r ′ of an element isequal to the probability that an element of the opposite list loses to this element if thiselement of the opposite list is selected from the distribution Q , where the probability q of choosing each element is a time-varying quantity, like r . The ideal q ′ of the probabilityof choosing an element in the distribution Q is a monotonic function of the strength r ofthis element, which is a chosen parameter.In one step of the parallel to the main part algorithm that changes the values of r and q , we change each value q by (1 − ε ) q + εq ′ , and each value r by (1 − ε ) r + εr ′ , where ε isa small in absolute value selectable parameter.As for the main part of the algorithm, I described, how we add elements to the lists, itremains to add that from time to time, in order to maintain the optimal size of the lists,we throw out the elements of the least strength r .In this algorithm, minor details of which have been omitted, echoes the already men-tioned idea from [3].Perhaps it was worthwhile to limit ourselves to always choosing an algorithm from thefirst list that always produces the same fixed string, in other words, arrange everything sothat the first list contains the strings themselves, and not the algorithms looking for thesestrings (after all the second argument of M in our case always encodes some algorithm,and looking for an algorithm that builds a string by the algorithm is not very natural).But perhaps, if we leave everything as it is, the algorithm will often be more efficient (wecan say with a fairly high degree of certainty that it would be more efficient if we did nothave the assumption that the second argument of M - the string N - comes from somealgorithm, and the machine M is a machine of this rather special kind).9. Further work.
In this section, I will often talk informally. I will give a number of ideas that I plan towork on further and a rather crude scheme for their implementation (namely the scheme,not the implementation itself). Perhaps this section is worth reading if the reader plansto join me, or, on the contrary, wants to know what it is about, and perhaps not to returnto this in the future. Otherwise, it may be worth waiting for further works on this topic,in which, I hope, the topic will be covered in more detail.9.1.
Feature space.
In the previous sections, we considered all possible arrangements ofzeros and ones in the gates of the circuit and calculated for each such arrangement thevalues of the light bulbs: each such value was calculated as a function of the values of theconstant number of sets of adjacent gates (we had an indicator function for a specific set ofvalues, but you can do not limit yourself so and use any propositional formula). But whatif go further and start wondering about the values of the functions of gate assignmentsthat are computed not locally?If we are talking about circuits that came from a Turing machine, then the set of gatevalues for such a circuit can be viewed as a kind of two-dimensional string. And on thistwo-dimensional string, a Turing machine can work, which is specially designed to workwith two-dimensional strings: each time the head of such a machine is shifted left, right,
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 35 forward or backward, its new state and displacement are determined by the old state andthe characters written in the cells, each coordinate of which differs from the correspondingcoordinate of the head by no more than one.Thus, each such "two-dimensional" Turing machine corresponds to a function of as-signing values to gates - the answer given by such a machine. Moreover, we can representsuch a Turing machine in the form of a three-dimensional circuit, it will have its owngates and its own light bulbs. And we can consider the cocyclic polynomials for thisthree-dimensional circuit in the same way, changing the probabilities with its light bulbs(although this time, we do not have a fixed value that such a circuit should give as ananswer) and this can help to restore the values of the bulbs of the original circuit (sincethey are also light bulbs of the new circuit, which means they can participate in the co-cyclic polynomials of the new circuit). I wrote about this when I was talking about a cagewith an additional construction.On this three-dimensional circuit, we can compute something on a "three-dimensional"Turing machine, and so on.Each light bulb and gate of any circuit constructed in this way corresponds to the func-tion of the input string: from the input string, you can calculate the history of calculatingof output of the original circuit (the values of all gates), from the history of computing ofoutput of the original circuit, you can calculate the values of all the bulbs on the originalcircuit. We can also calculate the history of computing of output of any receiving it as aninput three-dimensional circuit. This history, in turn, can be used to calculate the valuesof all corresponding to it bulbs, and so on.Note that each such function of the input string can be written as some Turing machineapplied to the input string and a certain number of natural numbers written in the unarynumber system.For example, the light bulb of the original circuit, depending on the neighborhoods oftwo cells of the original circuit, is encoded by four numbers - the coordinates of these twocells in the circuit rectangle - and by a Turing machine, which first calculates the entirehistory of work of the original Turing machine (implemented by the original circuit) fromthe input string, then via given four numbers finds all the values of the gates from theneighborhoods of the two required cells, and from them finds the required value, applyingthe required propositional formula.Let’s call a particular feature a pair consisting of a Turing machine and a set of certainnatural numbers. We have just realized that any function of interest to us is given by aparticular feature.Let us call a generall feature a pair consisting of a Turing machine and one (usuallysmall) natural number - feature arity. A generall feature specifies not one function, buta whole class of functions of the original string, parameterized by a set of sets of naturalnumbers in an amount equal to the arity of a given generall feature: indeed, to each suchset of natural numbers there corresponds a particular feature, which corresponds to thesame TM and exactly this set of natural numbers. We will call such a particular featurethe specification of this general feature.Thus, the small string writing problem can be formulated as follows. Find a stringwhose given feature is equal to one. (This feature can be considered both particular andgeneral arity zero.)9.2.
Navigation in the feature space.
At any moment in time, the algorithm willmaintain a set of particular features, which we will call consciousness. (I call it conscious-ness because there is an analogy with human consciousness. These are the particular signs"of which the algorithm is thinking at the moment.")
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 36
When our algorithm works, it will add new particular features to consciousness andthrow out some particular features from it. At the same time, for each of the particularfeatures of consciousness (and for some other particular features), the probability (theprobability of holding of the feature) will be maintained, which, as the reader has probablyalready guessed, we will change over time.At any moment, the attention of the algorithm will be focused on a set of particularfeatures, which we will call the locus of attention.Each time we select particular features in the set, which we want to make the locus ofattention (on which we want to focus attention) to a large extent randomly, but at thesame time, we are interested in the that there are quite a few particular features fromconsciousness in the locus of attention, as well as in that there are many connectionsbetween the particular features of the locus of attention. By connections we mean simplyclauses - propositional constraints on the set of particular features of constant size. Wewant all this in order to be able to determine some values of particular features of thelocus of attention from others. We will determine it in a probabilistic sense - we will bringthe probabilities of some features closer to zero or one.Now let’s talk about how exactly we will change the probabilities. We will change theprobabilities of those features that are in the locus of attention. We will change them inthe same way - with the help of cocyclic polynomials. It is only necessary to generalize theconcept of a cocyclic polynomial to the case of a feature space. For each such polynomialthere is a set of so-called basic features, as well as a set of so-called lamp features. Eachlamp feature is calculated for no more than d groups of adjacent basic features ( d is aconstant) using a propositional formula. A protected lamp feature is one that cannot lightup (to light up - this means to take the value 1) if the basic features take such values thatthey should take on some input string. We consider all possible arrangements of zerosand ones in basic features, just as we considered all possible arrangements of zeros andones in gates in the previous sections. A cocyclic polynomial is an arrangement of rationalnumbers on lamp features, such that the sum of the numbers corresponding to the lampfeatures that are lit up does not depend on what values the basic features take. We wouldalso like to check the cocyclicity of a polynomial locally. Of course, we also need to definewhat "adjacent features" mean and how exactly to check cocyclicity locally (most likely,in the same way, but questions may arise if in a small neighborhood of a feature there aremore other features than we would like) , this remains to be done.We will change the probabilities of unprotected lamp features in the same way as in theprevious sections. The probabilities of protected lamp features are always zero. For thecurrent locus of attention, we will try to identify as many sets of basic features as possible,a set of lamp features corresponding to which intersects as strongly as possible with thelocus of attention. One of the options for doing this is to declare a subset of the locus ofattention as basic features and to include these basic features in the set of lamp features.But perhaps, in some cases, you can try to select the necessary basic features outside thelocus of attention. In general, it is likely right to choose the locus of attention every timeso that it includes many constant sets of features, such that many of the propositionalformulas calculated from them also belong to the locus of attention. And we combine thesets of unprotected lamp features for all the found sets of basic features, add them tothe locus of attention, if they are not there yet, and change the probabilities for themin the same way as we did in the previous sections: looking for the most discriminatedpolynomials, a set of basic features of which is one of the found sets, we add them tothe set of processed polynomials Ω , counting the total arguments for each of the lampfeatures of the set we are considering, changing the probabilities according to these totalarguments, fixing and unfixing values when needed. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 37
I wrote that we strive to ensure that there are many connections between the particularfeatures of the locus of attention. Why do we need many connections? If there are fewconnections (few clauses) on the set of particular features of the locus of attention, then,in this case, the cocyclic polynomials are not informative: a lot of lamp features will notbe protected (if they hold - take the value one - it will not imply a contradiction), whichmeans, say, if we we want to get a cocyclic polynomial that always gives one, it will be verydifficult for us to "hide" most of its positive coefficients in the set of protected monomials.And in general, from general considerations, it is clear that if there are few connections,then it will be difficult to restore some values of particular features from others.Thus, at any moment of time we keep in our memory the probabilities for all theparticular features of consciousness, as well as for all the features that have visited thelocus of attention at least once and were not thrown out of consciousness.What particular features are we going to add to consciousness? We will be interestedin features that have two properties to a large extent. Adding a feature, holding, to alarge extent, only one of these properties (and the second - to a mediocre degree) is alsopossible, but the possession of the second property is encouraged. We are interested insome summary indicator.We add to consciousness, first, the most definite signs. A feature is considered definiteif its probability is close to zero or one. A feature can become definite if, for example,we focused on a certain set of particular features containing this feature, and its meaningwas determined unambiguously.Another case, which is just as interesting for us, is when a feature has visited the locusof attention several times and each time we found weak arguments to make it, say, a one.But, if you like, the total argument for all these loci of attention to make it a one turnedout to be quite large. Therefore, we brought the probability of its holding closer to one.When the arguments to make a feature zero or one are consistent, this is a reason topay attention to the feature. In general, it seems to me that the meaning that the word"consistency" (consensus, conformity, alignment, concurrence - I don’t know which of thesewords fits more to the russian word "soglasovannost’") is fraught with is very importantfor artificial intelligence. We can also concentrate attention on the set of particular featuresthat lie entirely in consciousness. And if, while focusing on such a set, the arguments ofthe corresponding cocyclic polynomials say that it is necessary to increase the alreadylarge probabilities and decrease the already small ones, then this is always nice (and ifnot, then something needs to be changed). We are pleased when different loci of attention"speak about the same thing," that is, they are consistent with each other.The second property that is welcome in order to add a particular feature to conscious-ness is the influence of the trait. So far I understand less about the nature of this propertythan about definiteness and so far I am not able to strictly define it. A feature is consideredinfluential if it has many connections with features of consciousness. In other words, if byits value it is possible to determine the value of many features of consciousness. Featuresthat can be calculated from the features of consciousness in many natural ways oftenbecome influential. Features that participate in many clauses (propositional restrictions)with features of consciousness also often become influential.There is a symmetry between definitness and influence: definite features are those thatcan be reconstructed from the features of consciousness, and influential are those fromwhich features of consciousness can be reconstructed.
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 38
Particular features, in the values of which we lose interest, we throw out of consciousness.(Again, it remains to clarify what these features are.) In general, I believe, that it isnecessary to learn to select from the set of particular features the smallest possible subsetsof the most influential features, by which the rest of the particular features of the set arereconstructed. If we select such a subset, it will be possible to remove from consciousnessthe rest of the features of the set as unnecessary, freeing up memory for new features.I would like to point out the not yet fully thought-out possibility of adding features toconsciousness, for which we do not indicate at all a way for calculating them from theinput string and parameters, as though leaving the finding of this way for later. Initially,we only possibly indicate the connections in which such a feature is with other featuresof consciousness.It would be nice to somehow outline the set of particular features, from which we collectfeatures into the locus of attention. If we want to briefly and roughly describe the set ofparticular features that we consider candidates for inclusion in consciousness, then theseare those particular features that are briefly calculated based on particular features ofconsciousness.Let’s try to say it less roughly. We will be interested in "refined general features". Arefined general feature (hereinafter referred to as RGF) is a general feature, however,the set of numbers supplied to the input of its Turing machine is limited by another,subordinate Turing machine. In other words, we are interested in the value of the firstTM, by far, not on all possible sets of numbers, but only on those that satisfy a certainproperty given by the subordinate TM.In addition to consciousness itself, we will maintain "set of central RGF" and "set ofperipheral RGF". Set of central RGF and set of peripheral RGF are the sets og RGFthat will change over time. The particular features covered by the union of central RGFand peripheral RGF are precisely the set of particular features from which we collect thefeatures in the locus of attention.The set of central RGF satisfies that the set of particular features covered by the centralRGF substantially intersects with consciousness. In some sense, it can be considered aprototype of consciousness. When we see that one of the peripheral RGF began to overlapstrongly with consciousness, we add it to the set of central RGF.The set of peripheral RGF can be characterized as a set of features that are shortlycomputed from the central RGF. The property of being shortly computable from centralRGF can be defined formally as follows. Since the peripheral RGF is a feature, it calculatessomething from the input string and a set of numbers in the unary number system. Theperipheral feature operates as follows. It takes several central RGFs U , U , ..., U k andcalculates all values of each of them on the input string and all possible sets of numberssatisfying the property specified by the corresponding subordinate TM. I will make areservation that it will be problematic to calculate something on absolutely all sets ofnumbers satisfying this property, since there can be infinitely many of them, so we willrestrict each of these numbers from above by polynom of the numbers fed to the input ofthis peripheral RGF. Of all the calculated values of the central RGF, TM, correspondingto our peripheral RGF makes a string in a natural way: the set of considered inputs foreach of the considered central RGF forms a rectangular parallelepiped of dimension equalto the arity of the corresponding central RGF, we number its nodes in a natural way andput in turn in a string the values of the corresponding central RGF found for all thesenodes, or special symbols if the corresponding cells of the parallelepiped do not satisfy the IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 39 property specified by the corresponding subordinate TM. There will be several strings,one for each taken central RGF, and we concatenate them, inserting separators betweenthem. So, according to the string obtained in this way and natural numbers - the inputsof the peripheral feature, the output of the peripheral feature is calculated using a smallTuring machine R (that is, written using a small number of bits). The feature obtainedin this way will be denoted by ( U , U , ..., U k ) · R .Thus, to set the peripheral RGF, you need to specify a set of central RGF and TMof small size. It is likely that among small TMs one TM can be preferred over others,and among the peripheral features, only the most preferable, most meaningful, and mostrelevant ones can be left, and the rest can simply be not considered. If not, all small TMswill have to be iterated through.We see that the set of peripheral RGF is determined by the set of central RGF. There-fore, when we change the set of central RGF, the set of peripheral ones also change. As Isaid, the RGF passes from the set of peripheral to the set of central ones if many of theparticular features it covers pass into consciousness. If there are few particular features ofconsciousness among the particular features covered by some central RGF, we throw thisRGF out of the set of central ones.Another thing that should be negotiated is that all particular features corresponding tothe gates of the original circuit always remain in consciousness and we never throw themout of there. I hope, I sucseeded in conveying the intuition behind this all. Consciousnessis such a cloud that moves through the space of features, changes their probabilities andalways contains the gates of the original circuit (the gate, which is considered the outputof the circuit, also always remains in the cloud, but the probability with it is always equalto 1 - it is prohibited to touch it). When the probabilities at the gates of the originalcircuit are all equal to zero or one, and no clause of the original circuit is violated, thework of algorithm ends.9.3. Gluing.
Here I will talk about an idea related to the fact that there are often verymany ways to calculate the same particular feature.I will reason in the context of the big string writing problem. More precisely, the problemto which we have reduced the big string writing problem in the section on the big stringwriting problem: given a TM M that takes two strings A and B of fixed length as inputand it is required to find such a string B , so that M ( A, B ) is equal to one for any A .(When reduced, the string B encoded some algorithm, and the machine M was of a ratherspecial kind, but the described problem is of independent interest in the general case.)In the context of this task, it is necessary to slightly modify the concept of a feature.This time, in addition to the input string B and several natural numbers - parameters,the Turing machine calculating the particular feature will take the string A (the firstargument of the machine M ) as a parameter. A general feature is just a Turing machinethat calculates the value of a feature, without specifying numbers - parameters and thestring A . Accordingly, a refined general feature is a general feature for which the set ofparameters on which it is defined is limited by some subordinate TM.The algorithm works like this. We collect a large enough polynomial sample of strings { A i } , which M takes as the first input. And for each of these strings in the way describedabove, we solve the problem of finding a string B - a small string writing problem, butwith one nuance: for different { A i } we identify the features corresponding to the bits ofthe input string B . That is, we obtain, say so, many floors, each { A i } corresponds to itsown floor, on each of the floors its own consciousness travels; the features correspondingto the bits of the input string B play the role of heating pipes, each of which runs througheach of the floors. When the probabilities change, everything happens in exactly the same IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 40 way, except that the probabilities at the features corresponding to the bits of the inputstring B take arguments from all floors at once and the probability at each such feature isalways the same for all floors. In the rest, the change in probabilities at particular featureson each floor happens autonomously.The idea is that if at some point we find that the probabilities for the correspondingparticular features covered by two general refined features U and V are correlated, thismeans that it is possible that pairs of the corresponding particular features covered bythese RGF can be identified ... By corresponding we mean particular features, representingthe application, respectively, TMs of U and V to the same set of parameters (that is, tothe same string A and the same numbers - parameters) ...In more detail, instead of storing the union of consciousnesses across all floors (here-inafter I will call this union simply consciousness) as a set of triples of the form (TM,string - the first input of M , a set of numeric parameters), we will store an abstract set N , for each of the elements of which one or more particular features are specified, thevalues of which are the same. And for each element of N is assigned the probability (thesame) that the corresponding features will turn to one. In turn, we will not store the setof central and peripheral RGFs either, instead we will store an abstract set N ′ , for eachof the elements of which one or more refined general features are indicated, the values ofwhich coincide on all possible values of the parameters.There is an expanded version of consciousness (which we do not store) and there is aset N . Each particular feature of consciousness is projected into one of the elements of N ,however, for each element of N , we store in the attached list of elements projected into itnot all the elements projected into it, but, perhaps, only a small part of them.Thus, on the glued version of consciousness - the set N - we have new cocyclic polyno-mials that did not exist before and from which a lot of useful information can be extracted- we are not considering basic and lamp features, but the basic and lamp elements of N .Similarly, there is an expanded version of the union of central and peripheral RGF(which we do not store) and there is a set of N ′ . Each particular feature of the expandedversion of the union of the central and peripheral RGF is projected into one of the elementsof N ′ , however, for each element of N ′ , we store in the attached list of elements projectedinto it, not all elements projected into it, but, possibly, only a small part of them.If a particular feature is an RGF specification, then we say that the N element intowhich the particular feature is projected is the specification of the N ′ element into whichthis RGF is projected.For each pair (or for some set of pairs) of N ′ elements, we keep track of a certain numberof corresponding pairs of N elements, which are specifications of these N ′ elements. Andif the probabilities for the corresponding specifications are sufficiently strongly correlated(in the sense that, roughly speaking, most often either they are both close to one, or bothare close to zero), we decide to perform a large-scale gluing operation.Let’s talk about this operation in more detail. This operation consists, firstly, in glueingthese N ′ elements and glueing the corresponding pairs of their specifications in N . Butit was possible to stop at this, why am I talking about this operation as something on alarger scale? Because from the equality of two features, the equality of very many otherpairs of features usually follows semantically. For example, we decided to glue the features U and V and all their corresponding specifications and there are two another features T and T . This means that it would be correct to glue, in particular, in the same way,all pairs of features (and, of course, pairs of corresponding specifications), which are, inone case, a propositional formula in terms of U , T and T , and in the other, the samepropositional formula in terms of V , T and T . Next, you can produce gluing of pairs offeatures, the equality of which follows from these identifications, and so on. One gluing IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 41 can be followed like an avalanche by many other gluings and a kind of collapse of thespace N into a smaller space (as well as the space N ′ ) will take place.We could make these identifications (gluings), following like an avalanche after one"trigger" gluing, in the same way: after trigger gluing, the considered pairs of featuresthat are destined to be glued are expected to correlate, which means they can be gluedin a general order. But it seems to me that these subsequent glues can be done moreefficiently. For example, a propositional formula in terms of U , T and T and the samepropositional formula from V , T and T , which was discussed above, can be glued togetherimmediately, without waiting for the corresponding probabilities to start correlating.This is the direction of further work, which I currently consider the main one - how tocorrectly make large-scale gluing of elements of N (and elements of N ′ ). It is unlikely, ofcourse, that the gluings will be grouped into clearly distinguished groups, each of whichwill consist of trigger gluing and many other gluings arising from it; most likely, suchgroups will mix in a single stochastic process.It is worth clarifying that there are three different situations in which we should gluefeatures (between which, in our context, it might be pointless to draw a border). First,two particular features can, in principle, always always coincide, regardless of the stateof some probabilities that we support. Secondly, the coincidence of particular featuresmay follow from the fact that the probabilities for some of the features turn to one orzero, or from the fact that some features have already been glued together. And thirdly,the coincidence of features may not follow neither from the structure of mathematics,nor from the properties of the system of probabilities, nor from something else, but westill decide to identify these features, since they, for a given probability distribution onfeatures, coincide with a large probability or even rather weakly correlated; the reasonsto force the equality of such features by gluing them together are following.We glue the features together to reduce the number of elements in the set N , so thatmany other features can be added to the set N - our memory is limited.When we glue together features that are only fairly weakly correlated, we certainly losesome freedom, but it pays off with the opportunity to add many new features to N andstart processing them.Perhaps the gluing needs to be organized more gently. Let me explain. In the describedapproach, we sharply glue features together as soon as their correlation exceeds some lowerbound. But it is possible, that this lower bound can be raised and the "gluing pressure"can be used: when the probabilities change, the arguments described in the previoussections to change the probabilities in one direction or another will be proposed not onlyby cocyclic polynomials, but also by pairs of refined general features. Each pair of refinedgeneral features will impose their arguments on each of the specifications of these twofeatures, aimed at ensuring that the specifications of these two features with the sameparameters take the same value as often as possible, that is, that the probabilities forthe specifications of these two features correlated as strongly as possible (each argumentmoves the probability with its own feature towards the probability with the correspondingfeature). The peculiarity is that when features are weakly correlated, the force to makethem correlate as strongly as possible is weak (that is, the corresponding arguments areweak). When traits are strongly correlated, this power is strong. This force, forcing tocorrelate features, is the stronger, the more these features are already correlated. Whenfeatures begin to correlate strongly enough, we glue them together (the lower correlationlimit at which we glue features should be raised compared to the lower correlation limitat which we glue features used in the approach without gluing pressure). When using thegluing pressure, the gluing of the features is smoother - the leaps generated by the gluingchange the situation less strongly. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 42
There is some reason why, perhaps, we should not introduce an abstract space N ′ , butwork with refined general features as they are. The reason is that two RGF often correlatenot on their entire domain (determined by the subordinate TM), but on a certain subsetof their domains, set by some Turing machine. Anyway, the domains themselves may bedifferent. This means that we cannot just take and glue these two RGF. And we wouldlike to glue their specifications on the indicated subset of their domains (if this subsetis quite simple). This means that it may be worth keeping the RGF in its original form,without starting the N ′ set. But here such a problem arises that with a fixed size of N ,the set of involved RGF can grow a lot: for example, we glued something in N , the size of N decreased 10 times, this allowed us to add to N many new features, which means thatfor them it is necessary to store the corresponding RGF, thus, the set of stored RGF hasincreased. In the failed (as it just turned out) model, when there is set N ′ , the problemwas not so pronounced, because at the moment when N shrinks, we expected that N ′ would shrink in the same way due to gluing in N ′ , which means that some memory fornew RGF will be freed.From this situation, in my opinion, it is necessary to get out by intentionally throwingout the least relevant RGF from the set of considered (central and peripheral) RGF.However, it is quite possible that here we can get out and still create a structure that storesthe involved RGFs, similar to N ′ , within which it is appropriate to glue something, due towhich the amount of memory occupied by the structure will decrease; it is also possiblethat this structure will allow a combination of this gluing analogue for this structure anddiscarding the least relevant RGF from consideration.An example can be given in which we have a class of features, each of which calculatessome function of an element of a finite group obtained from a group unit by sequentiallyadding the generators of the group and their inverses, in some order (this order is hard-wired into TM corresponding to the feature) and possibly some other parameter. If wehave added many more such features to N than the elements in the group contain, thenmaybe it’s time to make a large-scale gluing, dividing such features into classes accordingto which element of our finite group is the sum of generators and their inverse correspond-ing to this feature and gluing all the features within each class. And for sure, it will notbe necessary for each of these elements to store information about all the features thatare glued into this element - it is already clear what element it is (most likely it will bepossible to leave information about only a few); so that the amount of used memory issuccessfully reduced.Most likely, it is worth supporting some kind of structure on the set N . For example,it is worth storing information about some propositional constraints on elements of N that must be satisfied (for example, propositional constraints on the sets of features cor-responding to neighboring gates of the original circuit, which say how the gate shouldbe calculated from the gates just below it). It may also be worth storing, for some setsof elements of N of constant size, the result of applying some propositional formulas tothis set. The result of the operation of applying a propositional formula to a constant setof features is obviously also a feature (ideally, such an operation should commute withprojection from the version of the set N before gluing to the version of the set N aftergluing; if it does not commute, this means that you can glue something else).Gluing gives me different associations. In the beginning, our consciousness is in a notyet glued form (the N set coincides with the area in the feature space). Walking throughthe space of signs, passing through the area of consciousness B , we can notice that wehave already seen something like this somewhere (now I am not describing the work ofthe algorithm): when we walked through the area of consciousness A , such that betweenits features and features of the area B exists a natural correspondence, the probability IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 43 distribution on the features of this area and set of propositional relations betwen them wasapproximately the same as in the case of the area B . Then we notice the same distributionon the features of the C area (and same set of propositional relations). And we start towonder: what if A , B and C are actually the same area? It is quite probable that in sucha situation it is possible to perform gluing so that the domains A , B and C are projectedinto the same domain of the resulting set N .Moreover, if the set of propositional connections on sets A , B , C , etc. of equal size(possibly from different floors) is the same, then you can put the same "label" on allthe corresponding elements in the sets A , B , C , etc., by doing this for each type ofelements (we obtain set of labels equal to the set A ). The presence of the same label ontwo particular features signals to us that these particular features may be a specificationof the same, perhaps not considered until now, general feature, though, possibly, withdifferent parameters. The information that certain labels hang on certain features canbe used to force the emerging propositional connections (we can force not only equality,but also any other propositional connection - and, by the way, not only propositional)between, firstly, the corresponding features on which these labels are hung, and secondly,between some of the corresponding features on which these labels are hung and someother features that are in the same connections with these features (which are ) - forexample, obtained from the first in the same way, and, thirdly, between the features thatare in the same connections with some of the corresponding features, on which specificlabels are hung (in the third case, the connections between the features which are labeledthemselves, or between them and other features, we do not force).I have associations between gluing and topological covering: if the set N is glued intothe set N , then N plays the role of the base of the covering, and N plays the role of thecovering space: several regions from N can be projected into the еру same region in N at once. In the case of a (glued) feature space, as in the case of a covering, we can walkin different places and not notice that, in some sense, we are walking in the same place.Surely, during the work of the algorithm, there will be situations when, due to someinconsistencies, we understand that we have glued something in vain. You need to learnhow to unstick the glued elements in such situations. This is also on the list of thingsworth working with.Important examples of gluing are gluing with zero and gluing with one. When theprobabilities at the observed specifications of one general feature correlate with each other,in the sense that a significant majority of them are close to zero, or a significant majorityof them are close to one (not to something in between), we glue this feature, respectively,with an identical zero or an identical one, so after that this feature is a constant (and ifwe use the gluing pressure, then this means that if all the specifications of one RGF arecorrelated, we start to force their coincidence by imposing the gluing pressure argumentsand then gluing the feature with zero or one, and we force the more, the stronger thesespecifications are correlated - if they hardly correlate, these arguments should be madevery weak).It is also possible (and quite appropriate) to upgrade our system, in which particularfeatures are divided into two types: those that give out binary values - binary features -and those that give out numerical values (a natural number in the unary number system)- numerical features. Perhaps it would be more convenient to make features that yieldseveral values at once - several binary and several numeric. Numerical values are needed,for example, in order to give them to the input of another feature that calculates somethingbased on the results given by this feature, for example, a pair of numbers can indicate toanother feature a specific place in the original circuit. IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 44
It is convenient to consider the place of the tape cell in which the TM head stopped asa natural number produced by the Turing machine.The coincidence of two numbers produced by different numerical features gives seriousreasons to think about gluing these features in some cases (there is more information inthe coincidence of two natural numbers than in the coincidence of two bits).Further work seems to me, first of all, in the work with gluing: how to efficiently performgluing, whether is it necessary to maintain some structure on the set N , beyond sets of oneor more elements of the feature space projected into each element and a set of propositionalclauses - connections between elements ...Another important direction is how to build a set of peripheral RGF based on the setof central RGF. I wrote that it might be worth doing just by going through all the smallTuring machines and "applying them to the central feature sets" as described above. Butit may be worth doing it differently. If a set of central RGF (or a set N ′ ) is given, thenwhat other features would be RELEVANT to consider for possible further addition oftheir specifications to the set N ? This is the question. There are features that are, say so,more or less always relevant. For a string of zeros and ones, it is often relevant to considera sign that finds the number of ones in such a string. If we consider the entire originalcircuit - a rectangle, in each cell of which there is a set of zeros and ones, one bit of eachtype, then it would be relevant to consider a feature that takes two natural numbers - thecoordinates of the cell of this circuit and returns the coordinates of the first cell that wemeet when moving from a given (given by coordinates) cell straight upwards, in whichthe bit of the first type is equal to one. The question is what features will be relevant ineach specific situation - when we consider a specific set of features, with specific gluingsproduced. The first thought, as I said, is that the features will be relevant if they areeasily (briefly) computable by the set of features already in the field of consideration.In addition to the small complexity (short computability), there are two points simalte-niously that are worth paying attention to when deciding whether to consider some featurerelevant. First, it’s worth paying attention to the objects that I call "areas of consistency"(perhaps it would be more correct to call such an object a "structure"). The area of con-sistency is a set of features, for which there are many sufficiently determined propositionalrelations connecting them with each other, as well as with the already considered features(and added to N ). By "sufficiently determined" is meant that the probability of fulfillingof given propositional constraint is close to one. I call them areas of consistency, becausethey are characterized by the fact that in them it is often possible to predict a specificfeature based on different groups of other features, and these predictions are consistentwith each other. A feature entering the area of consistensy is often worth considering andadding to consciousness. When we look at a feature and decide whether to add it or not,it is important for us that something else can be said about it, besides the fact that it iscalculated according to the features already considered as prescribed by its definition. Itis important for us that there is some structure at all.Secondly, it is worth paying attention to those features for which, when combined withthe features already considered, at the current moment of time there is the most discrim-inated cocyclic polynomial (the more discriminated - the better). This often happens, inparticular, when for the same feature there are two strongly contradicting groups of argu-ments: one prescribes this feature to be one, other - to zero. We consider such signs/groupsof signs and "eliminate this tension" - we change the probabilities according to the ar-guments so as to reduce the discrimination of the most discriminated polynomials (that IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 45 is, we add these signs to consciousness and work with them in a general manner). Weare interested in the groups of features with the most discriminated cocyclic polynomials,since, again, working with them will change the other probabilities for the features underconsideration the most. In this aspect, one can recall the work of an investigator who walksthrough neighbors and asks if they have seen anything strange. We are curious about thestrangenesses: when some predictions about an object contradict the predictions aboutthe same object given on other basis. The only difference is that the investigator is lookingfor the cause of the strangeness, and we are trying to eliminate this strangeness.Perhaps new features can be "shaped". That is, to represent the scheme that implementsthe Turing machine in the form of zeros and ones, as described in one of the previoussections, to associate each such bit with the probability of becoming one and modifythem together with the rest of the probabilities, among other things, with the help ofcocyclic polynomials ... Thus, when shaping / searching for features, we have two goals:firstly, we pursue the structure (areas of consistensy), and secondly, we are looking for thegreatest contradiction (tension), in order to eventually eliminate it.I suppose that the rich possibility for gluing in the feature space is precisely what distin-guishes circuits of small Kolmogorov complexity in a set of general circuits, and perhapsthis is something that can help to solve the problem for circuits of small Kolmogorovcomplexity efficiently in a wide variety of cases.9.4.
One more idea.
I’ll tell you about one more possible upgrade of the algorithm, forwhich I, however, am not sure that this will help. First, perhaps, we want to give ourselfmore freedom and allow the probability distribution on the lamp features to be dependent.I propose for this to consider many - a list - of equal in rights versions of the independentprobability distributions on light bulbs and each of them can be changed autonomously (ifwe talk about one single distribution, then it can be defined as follows: we choose one of thegiven equitable distributions randomly, equiprobably, and then we choose the arrangementzeros and ones into features (situation) randomly from this distribution; we will call thisdistribution joint). Moreover, in addition to the cocyclic polynomials discriminated byeach specific distribution in the list (and intended to modify this distribution), we willlook for cocyclic polynomials discriminated by as many of our distributions as possible anduse such polynomials to change all distributions of the list at once. (But it may not workanymore, if we want to find the optimum, as in the case of one independent distribution,simply by finding the minimum of some quadratic function, we need to carry out somekind of joint optimization.)The logic of such a construction is that, for example, there may be two lamp featuresthat cannot both be one or both be zero in a correct, consistent (coming from somearrangement of zeros and ones in the basic gates) situation. If we consider one distribution,then at a randomly selected moment in time, it is more likely that either the probabilitiesfor both features will be far from zero and one, or one of the probabilities will be close toone, and the other - to zero. This means that, say, in the case when the first probabilityis close to zero, and the second - to one, we are looking for cocyclic polynomials "tunedprecisely to this special class of situations", that is, discriminated precisely by distributionswith such prevailing situations for which the first feature is zero, and the second is one.In the case when we consider many distributions, we will have cocyclic polynomials that"cover" both classes of situations: those in which the first feature is zero, and the secondis one, and those in which the first feature is one, and the second is zero.Since for smaller sets of independent distributions it is expectedly easier to searchfor cocyclic polynomials discriminated by them jointly, it may make sense to consider abipartite graph (perhaps some expander), the left part of which is the set of independent
IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 46 distributions, and for each vertex of the right part v i we will maintain its own family ofcocyclic polynomials Ω i , discriminated by the distributions of the left part correspondingto the vertices adjacent to v i . In this case, each distribution will not be affected by allcocyclic polynomials, but only by those that correspond to the vertices of the right partadjacent to the vertex of the left part corresponding to this distribution.Another possible improvement is that, in addition to the current list of feature distri-butions (let’s call it A ), we store some of the states of these distributions in past times. Atall times that are multiple of a fixed time interval, we remember all the current distribu-tions of the list and add them to another, larger list. We will not change the distributionsof this larger list (let’s call it B ). But we will look for and use, when changing the cur-rent probabilities, cocyclic polynomials discriminated by the largest possible number ofdistributions in the list B .Why this is needed can be seen in the example when the list A consists of only onedistribution. If we do not "take screenshots" of this distribution from time to time andsend them to the list B , something like a loop can happen: the distribution can fallinto some zone in the space of distributions, in which, however, there is no distributionconsisting of probabilities zero and one corresponding to the desired (consistent) situation;at the same time, the polynomials we find each time will only shift this distribution withinthis zone. The list B is needed just in order to find the polynomials discriminated by alldistributions from this zone together, in order to push the current distribution out of thiszone.9.5. A digression in mathematical logic.
Here I want to talk about one hypothesis,for which I have no grounds indicating that it should be true, however, if it will turn outthat it is true, it would be very interesting, so I decided to add it here.I am not an expert in mathematical logic, so I ask you to excuse me if this issue insome form has already been investigated, or is easily resolved with the help of some factsalready established in this science.There are a variety of philosophical positions, not all of which imply that every clearlyformulated mathematical statement has a specific truth value - yes or no. All this debatesabout the rule of the excluded middle and the assumption that things like the ContinuumHypothesis may not have a definite yes or no answer at all.If we try to formulate this hypothesis briefly, then our world may turn out to be onlylocally consistent, but globally nonconsistent. By local consistency, I mean that any finiteset of statements of a given formal system can be associated with a set of truth values ofthese statements so that they do not contradict each other (I will explain what it means"do not contradict each other" using the example of the formal system described below)... Global consistency implies that it is possible in a consistent way to associate the truthvalues to all statements of a given system simaltaneously.I will work with a formal system close to the feature space described above. It is a set ofstatements of the form ∀ x ∃ x ∀ x ... ∃ x k M ( x , ..., x k , y , ..., y l ) = 1 , where x i are strings towhich quantifiers apply, y i - specific strings, own for each statement, M - a Turing machinethat produces a binary value 0 or 1, or does not stop at all, k , l - specific numbers, ownfor each statement.It is possible to protect a certain set of propositional relationships for some finite sets ofstatements of the form ∀ x ∃ x ∀ x ... ∃ x k M ( x , ..., x k , y , ..., y l ) = 1 . To protect, or to makea propositional connection protected, is to prove that a given set of statements cannottake a given set of truth values. On this system of statements, one can absolutely in thesame way as it was in the space of features to consider light bulbs, some of which areprotected (those that, when lit, imply a protected propositional connection), and some IRCUIT SATISFIABILITY PROBLEM FOR CIRCUITS OF SMALL COMPLEXITY 47 are not. With any choice of binary truth values for statements, some bulbs light up andsome do not, each according to its own internal propositional law, everything is exactlythe same as in the case of the feature space, with the only difference that now there areinfinitely many statements (and light bulbs).In order to prove the global inconsistency, it is sufficient to provide a protected cocyclicabsolutely convergent series, the existence of which is exactly what my hypothesis predicts.This series is a function from a set of bulbs into a set of real numbers, the set of valuesof which, firstly, is an absolutely converging series, and secondly, the sum of the valuesof the bulbs that are lit does not depend on the choice of binary values for each of thestatements of the system and is equal to one, and thirdly, positive values can be associatedonly with protected bulbs. It can be thought about whether it is possible to make thenumber of statements that form a light bulb from the set of light bulbs we are considering,not limited from above. This time, the invariance of such a series will be proven, perhapsin a different way from how it was in the case of cocyclic polynomials. Perhaps by countingthe sum of certain series.It is clear that if a protected cocyclic absolutely convergent series exists, then this provesthat it is impossible to simultaneously assign truth values 0 and 1 to all statements ofthe indicated type so that they do not contradict each other (do not form protectedpropositional connections).