Minimization of Quadratic Binary Functional with Additive Connection Matrix
aa r X i v : . [ c ond - m a t . d i s - nn ] J u l Minimization of Quadratic Binary Functional withAdditive Connection Matrix
Leonid Litinskii
Center of Optical Neural Technologies Scientific Research Institute for System Analysis,Russian Academy of Sciences,44/2 Vavilov Str, Moscow 119333, Russia. [email protected]
Abstract. ( N × N ) -matrix is called additive when its elements are pair-wisesums of N real numbers a i . For a quadratic binary functional with an additiveconnection matrix we succeeded in finding the global minimum expressing itthrough external parameters of the problem. Computer simulations show that en-ergy surface of a quadratic binary functional with an additive matrix is complicateenough. In the present paper we analyze the classic problem of discrete mathematics that isminimization of a quadratic functional depending on the great number N of binaryvariables s i : E ( s ) = − ( Js , s )2 N = − N N X i,j =1 J ij s i s j −→ min , s i = ± . (1) This problem arises in a lot of scientific fields of knowledge beginning from physics ofmagnetic materials and neural networks up to analysis of results of physical experimentsand logistics. Usually the connection matrix J = ( J ij ) N is supposed to be symmetricone with zero diagonal elements: J ij = J ji , J ii = 0 . The state of the system as a wholeis given by N -dimensional vector s = ( s , s , ..., s N ) . Such vectors will be called configuration vectors or simply configurations . The characteristic E ( s ) that has to beminimized will be called the energy of the state, and the configuration providing theglobal minimum of the functional (1) will be called the ground state .In general the number of local minima of the functional (1) is exponentially large.Practically all minimization algorithms guarantee finding of a local minimum only. Theexceptions are very rare and, as a rule, they are relied on specific properties of the con-nection matrix [1], [2]. The most widespread is the random minimization [1]. Accordingthis algorithm the spin dynamics is started from a random configuration. In randomizedorder the states of dissatisfied spins are changed. As a result the dynamic system stepby step falls into the nearest local minimum. We used just the random minimization inour computer simulations (Section 3).Very little is known about properties of the energy surface of the functional (1),namely, about the number and the structure of the set of local minima, about the groundtate and the probability to find it and so on. In fact there is only one nontrivial con-nection matrix for which the ground state of the functional (1) can be indicated exactly.This is the Hebb matrix in the case when the value of the loading parameter α = M/N is small: α < . [3]. Then the global minimum of the functional (1) is achieved atany of M random patterns.Due to discrete character of the problem its theoretical analysis is very rare. Fromrecent results let us point out the papers [4], [5], where the authors succeeded in con-necting the depth of the local minimum with the probability of its random finding, andalso described some characteristics of the energy surface.In our work we introduce a class of additive matrices whose elements are pair-wisesums of a set of predetermined numbers a i : J ij = (1 − δ ij )( a i + a j ) , i, j = 1 , ..., N, where { a i } N ∈ R (2) and δ ij is the Kronecker delta symbol.The additive matrices generalize a special class of Hebb’s matrices analyzed in [6].For the functional (1) with the connection matrix (2) the ground state can be obtainedexactly. We succeeded in presentation additive matrices in the form when the depen-dence of the ground state on external parameters of the problem can be described ana-lytically. When the ground state is known, interesting results can be obtained with theaid of computer simulation. In the next Section we present the theory relating to theproblem. In Section 3 we give the results of computer simulations. It can be verified directly that for an additive matrix (2) the value of the functional(1) is equal to E ( s ) = ( e , a ) − ( s , e )( s , a ) N . (3)
Here a = ( a , a , .., a N ) is N -dimensional vector whose coordinates are a i , and e =(1 , , ..., is the ”bisector” of the principal orthant of the space R N . From minimiza-tion of the functional (3) one can pass to maximization of the functional F ( s ) = ( s , e )( s , a ) −→ max . (4) Let us denote by Σ k the class of all configurations s for which exactly k coordinatesare equal ”-1”: Σ k = { s : ( s , e ) = N − k } , k = 0 , , ..., N. The class Σ k consists of C Nk configurations. For all these configurations the first mul-tiplier in the expression (4) takes the same value N − k . Consequently, to maximize(4) among configurations from the class Σ k , it is sufficient to find a vector s ∈ Σ k maximizing the scalar product ( s , a ) . This problem is not so difficult (see item 3). . Suppose, we can find the vector s maximizing the scalar product ( s , a ) in theclass Σ k . Let us denote this vector as s ( k ) , and let the value of the functional (4) forthis vector be F k = F ( s ( k )) : F k = ( N − k )( s ( k ) , a ) = max s ∈ Σ k F ( s ) , k = 0 , , .., N. When finding all these vectors s ( k ) ( k = 0 , , .., N ), it is easy to find the global maxi-mum of the functional (4), since the functional reaches its maximal value on one of thevectors s ( k ) .Note we do not need to compare between themselves all N + 1 numbers F k , butthe first half of them only. The reason is that for any k the classes Σ k and Σ N − k areinversion of each other: Σ N − k = − Σ k . Since for any configuration s the equality F ( s ) = F ( − s ) is fulfilled, we obtain that F k = F N − k for all values of k . Combiningthe cases of even and odd N in one formula we obtain that to find the global maximumof the functional (4) it is necessary to find the largest of the values F k , when k ≤ n =[ N /2] : F , F , .., F n , n = (cid:20) N (cid:21) . Without loss of generality the numbers { a i } N can be put in order according theirincrease: a < a < ... < a N . (5) Let us take any k ≤ n . It is easy to see that the scalar product ( s , a ) reaches its maxi-mum inside the class Σ k when the configuration vector is s ( k ) = ( − , − , ..., − , | {z } k , , .. . (6) Indeed, ( s ( k ) , a ) = − k X i =1 a i + N X i = k +1 a i = ˆ a − a k , where ˆ a = N X i =1 a i , ˆ a k = k X i =1 a i , and ˆ a = 0 . (7) Let s be another configuration vector from the class Σ k for which numbers of neg-ative coordinates j < j < ... < j k dose not take the first places. The scalar product ( s , a ) is equal to ( s , a ) = ˆ a − k P i =1 a j i , and inequality ( s ( k ) , a ) > ( s , a ) is fulfilledsince k P i =1 a i < k P i =1 a j i for any set of indices { j , j , ..., j k } that differs from { , , ..., k } .Thus, under the condition of ordering (5), to find the global minimum of the func-tional (3) it is necessary to find the largest among the numbers F k = ( N − k ) (ˆ a − a k ) , k = 0 , , .., n = (cid:20) N (cid:21) , (8) here ˆ a and ˆ a k are given in Eq.(7). The initial problem (1)-(2) can be considered as solved: the expressions (6) re-strict the set of configurations among which the ground state of the functional (2) shouldbe found. To define which configuration is the ground state it is necessary to calculate n numbers (8) and find the largest among them. It reminds unclear under which condi-tions this or that configuration (6) would be the ground state. If any of them will be theground state or not? It turned out that these questions can be answered.Without loss of generality let us suppose that the numbers a i have a special form: a i = α i − t, α i ∈ [0 , , t ≥ . (9) In this presentation the values α i are positive numbers from the unit interval, and thepositive parameter t can take an arbitrary value. It is not difficult to see that from thepoint of view of minimization of our functional an arbitrary set of numbers a i can bereduced to the form (9). For example, let us suppose that a < ... < a N < and a N − a ≤ . Then we set α i = a i − a and t = | a | . This means that the numbers a i have the form (9). On the contrary, let the initial numbers ˜ a i have different signs andtake on arbitrary values: ˜ a < ... < ... < ˜ a N , and ˜ a = max ( | ˜ a | , ˜ a N ) >> . Letus normalize these numbers dividing them by a : a i = ˜ a i / a ∈ [ − / , +1 / . It isclear that the solution of the problem (1) is the same when we use initial numbers ˜ a i or normalized numbers a i . The last numbers can be presented in the form (9), if we set α i = a i + and t = . From our argumentation it follows that the numbers a i canalways be presented in the form (9). Then the following statement is right (the proof seein the Appendix). Theorem.
When t increasing from the initial value t = 0 , the ground state sequen-tially coincides with the vectors s ( k ) (6) in the following order: s (0) → s (1) → ... → s ( k − → s ( k ) → ... → s ( n − → s ( n ) . (10) The jump of the ground state s ( k − → s ( k ) occurs when t transfers through thecritical value: t k = ˆ α − α k + ( N − k + 2) α k N − k + 1) , k = 1 , .., n, (11) where analogously of Eq.(7) ˆ α = N P i =1 α i and ˆ α k = k P i =1 α i . When t belongs to theinterval [ t k , t k +1 ] , the ground state of the functional is the configuration s ( k ) .This theorem generalizes the previous results obtained in [6]. The theorem describesexhaustively the behavior of the ground state for the problem (1)-(2). Depending on thevalues of external parameters { a i } each of the configurations s ( k ) , k = 0 , , ..., n canturn out to be the ground state of the functional. For t from the interval [ t k , t k +1 ] theenergy of the ground state is the linear function of the parameter t . It can be easily seenfrom the expressions (8) substituting the values a i in the form (9): E k ( t ) = A k + t · B k , k = 0 , , ..., n, (12) where up to the factor / N we have: A k = ˆ α − ( N − k )(ˆ α − α k ) , B k = ( N − k ) − N. (13) E G S E G S Fig. 1.
The dependence of the ground state energy E GS on the parameter t for additivematrices of the dimensionality N = 100 : the upper one is the arithmetical additivematrix, and the lower one is the random matrix (see the body of the text). The values E GS for the points t k are marked.n Fig.1 for N = 100 it is shown how the energy of the ground state depends on theparameter t . The upper panel corresponds to the case when the values α i constitute thearithmetical progression: α i = i/N, i = 1 , , .., N . On the lower panel the analogousplot is shown for a random additive matrix, when α i are random numbers from theinterval [0, 1]. Along the abscissa axis the values of the parameter t are shown, alongthe axis of ordinates we show the energy of the ground state calculated in the points t k (11). The first value of the parameter t for which the energy of the ground state iscalculated is equal to zero: t = 0 . Since both plots are very similar, we analyze onlyone of them; for example the upper one.We see that the energy of the ground state is nontrivially depended on the parameter t . For small values, t ∼ , very deep minima correspond to the ground state. Then, when t increases, the depth of the global minimum decreases very quickly and it reaches aminimal value when t ≈ . For these values of t all matrix elements become negative.During further increase of t the depth of the global minimum slowly but steadily in-creases. It becomes deeper and deeper. Which properties of the energy surface reflectnon-monotone change of the depth of the global minimum? What properties are respon-sible for its minimal depth? For the time being we cannot answer these questions. Usingformulae (10)-(13) everyone can be certain of universal character of the curves shownin Fig.1.Up till now we can neither extend these results onto local minima of the functional,nor obtain analytical description of other interesting characteristics such as the num-ber of different minima, distribution of local minima with respect to their depths anddistances to the ground state and so on. However, if the ground state is known, thesecharacteristics can be studied with the aid of computer simulations. Now we turn topresentation of these results. For given N and { α i } for each value of t we generated an additive matrix. We did random starts (see Introduction) and obtained the same number of local minima. Foreach minimum we fixed its depth (the energy E l ), the relative Hamming distance D l between the minimum and the ground state and other characteristics. Thus as a result ofa great number of random trials for each value of t we could estimate: a) the probabilityof random finding of the ground state p GS ; b) the deepest of the obtained minimumand the distance from it to the ground state; c) the number of different minima K , theirdistribution over energies and distances from the ground state and so on. The parameter t was varied from zero up to the maximal value t n . For two dimensionalities N = 100 and N = 1000 such experiments were done for both arithmetical and random additivematrices.In Fig.2 for the arithmetical additive matrix of dimensionality N = 100 the depen-dence of some of the listed characteristics on the parameter t is shown. Let us explainwhat the graphs shown on different panels of the figure mean. On the upper panel the probability to find the ground state p GS is shown. We seethat in the region of small values of t ( t < . ), where the depth of the global minimumis large, the probability to find the ground state is notably different from zero. On the p G S E m i n / E G S D K Fig. 2.
For arithmetical additive matrix of dimensionality N = 100 the following graphsare shown: on the upper panel is the probability to find the ground state; on the nextpanel is the ratio of depth of the deepest found minimum to the depth of the globalminimum; on the next panel is the relative Hamming distance between the deepestminimum and the ground state; on the bottom panel is the number of different energiesof local minima.ontrary, in the region of large values of t , where the global minimum becomes rathershallow, the probability to find it is equal to zero (it is less than − ). At the same timeit is not important which configuration s ( k ) is the ground state.Apparently, such behavior of the probability p GS is one more confirmation of thelaw, which was theoretically predicted in [3], [4]: the deeper minimum, the greaterprobability to find it under the random search.For the matrix of dimensionality N = 1000 the behavior of the given characteristicis an analogous one. The value of the parameter t for which the probability to find theground state becomes zero, increases up to the value t ≈ . On the second panel from the top the ratio of the found deepest minimum E min tothe global minimum E GS , E min /E GS , is shown. This ratio takes on a value from theinterval [0, 1]. At first, while the ground state still can be found, this ratio is equal to 1.Then in the region of the values t ≈ . − . this characteristic has a sharp downwardexcursion, which soon changes to a steady increasing and tends to 1 asymptotically.The minimal value of this characteristic is E min /E GS ≈ . . It shows that in theworst case the objective function is 15% less than the optimal value.For matrices of dimensionality N = 1000 the behavior of the ratio E min /E GS isabsolutely analogous. The deepest downward excursion of the graph takes place when t ≈ , and its depth increases noticeably: the minimal value of the ratio is equal E min /E GS ≈ . . In other words, when the dimensionality of the problem increasesthe found suboptimal solution will be worse comparing with the global minimum.Note, that for the large values of t ∼ − , when the ratio E min /E GS is close to1, the probability to find the ground state as before is equal to 0. The same also takesplace in the case N = 1000 . On the second panel from the bottom it is shown how the distance D betweenthe deepest local minimum and the ground state depends on the parameter t . (By thedistance we understand the relative Hamming distance D = ( N − abs ( s , s ′ ))/2 N ∈ [0 , . .)At first, while the ground state still can be found this distance is equal to 0 (seethe beginning of the graph). Then in the interval of the ”worst” values of t the distance D increases sharply up to the value D max ≈ . . After that the distance between thedeepest local minimum and the ground state is stabilized near the value D = 0 . . Letus add that for additive matrices of dimensionality N = 1000 suboptimal solution is faraway from the ground state. This distance is D ≈ . .The general conclusion is as follows: for rather large values of t , when as a resultof the random search it is possible to find suboptimal solution only, this solution is suf-ficiently far from the ground state. However, the ratio of the minima depths E min /E GS can be of order of 1 (in Fig.2 this situation corresponds to the values of t > ). Thiscombination of properties is possible only if the energy surface consists of a large num-ber of local minima, which depths not strongly differ one from each other and fromthe global minimum. We may conclude, that for large values of t , when elements ofconnection matrix are large negative numbers, the construction of the energy surface isas aforesaid. On the bottom panel we show the dependence of the number of different energiesof local minima K on the value of the parameter t . As a rule each energy is many timesegenerated. To estimate the number of different local minima it is necessary to analyzehow many different configurations correspond to the same energy. Nevertheless, suchcharacteristic as the number of energy levels is also of interest.For an arithmetical additive matrix of the dimensionality N = 100 the maximalvalue of the characteristic K is reached in the region t ∼ . This maximum is compar-atively small, ∼ . However, it turns out that each energy is many times degenerated,and the number of different local minima is an orders of magnitude greater. For a ran-dom additive matrix of the same dimensionality the maximal value of the characteristic K is equal to tens of thousands (the graph is not presented).For the additive matrices of the dimensionality N = 1000 the general form of thegraph of the characteristic K is analogous. In this case the maximal value, K max ∼ · , is reached in the region t ≈ . Since for each t only random starts havebeen done, this means that each second start leads the system into new local minimum.In other words, for middle values of t the number of local minima is very big. For additive matrices the method of finding of the global minimum of the quadraticbinary functional is pointed out. We propose the t -parametrization of additive matricesthat allows one to get an exhaustive classification for all variants possible for the groundstate.For not great values of t (let us say for t ∈ [0 , ) among matrix elements thereare positive as well as negative ones; or all elements are negative, but they are small inmodulus. In this case the depth of the global minimum is very big. Here the probabilityto find the ground state in random search is rather high: p GS ∼ . − . . It can besupposed that in this case the energy surface has a small number of local minima whosedepths noticeably less then the depth of the global minimum.On the contrary, for the great values of t all matrix elements are negative and theyare big in modulus. In this case it is practically impossible to find the ground statewith the aid of the random minimization, since the probability to get into the globalminimum is negligible small. Apparently in this case the energy surface contains verylarge number of local minima that only slightly differ from each other in depths. Herethe global minimum is only insignificantly deeper than local minima. Varying the valueof the parameter t it is possible to get over from one type of the energy surface to theother one. So, additive matrices are good models for examining the energy surfaces inthe general case.By this time additive matrices for large values of t can be used for testing of newalgorithms of the quadratic binary minimization. Indeed, on the one hand, with the aidof the formulae (10)-(13) the ground state always can be found. On the other hand, forlarge values of the parameter t it is practically impossible to find the ground state withthe aid of the random minimization. cknowledgment The work has been done with participation of Anton Pichugin in the framework of theprogram supported by the grant
References
Appendix
In the beginning of Section 2 it was shown that only one of configuration vectors s ( k ) (6), k = 0 , , ..., n = [ N/ can be the ground state. Using the representation (9) of a i it is easy to obtain Eq. (12) for the energies of s ( k ) -configurations: E k ( t ) = A k + tB k ,where A k and B k are given by Eq.(13). As functions of the parameter t energies E k ( t ) are straight lines. We have to analyze the behavior of the set { E k ( t ) } n . When a straightline E l ( t ) is lower all other straight lines, the configuration s ( l ) is the ground state .For simplicity we restrict ourselves to the case of even N = 2 n . Let us write downthe expression (13) in more details: A = − ( N − α < A < ... < A n = ˆ α,B = ( N − N > B > ... > B n = − N. ( A When k increasing, the free term A k of the straight line E k ( t ) increases monotoni-cally. In other words, the intersection of the straight line with ordinate axis rises higherand higher. On the other hand, when k increasing the coefficient B k decreases mono-tonically, so that in the end it even becomes negative. For the case N = 6 the typicalbehavior of the set of straight lines { E k ( t ) } n is shown in Fig.3. We use this figure toexplain how the ground state depends on the parameter t .When t = 0 all the matrix elements are positive and the configuration s (0) =(1 , ... is the ground state. Let us increase t little by little. At first the straight line ( t ) is lower than all other straight lines. Consequently, s (0) remains the ground state.Than for some value of the parameter t the straight line E ( t ) is intersected by anotherstraight line. After that this straight line turns out to be lower than all other straightlines. Taking into account the relations (A1) it is easy to see that the first straight linethat intersects E ( t ) is E ( t ) (see also Fig.3). After this intersection the configuration s (1) becomes the ground state. It is the ground state until another straight line intersectsthe straight line E ( t ) . After that this straight line turns out to be lower than all otherstraight lines. From the aforesaid argumentation it is evident that it will be the straightline E ( t ) (see Fig.3). Then the configuration s (2) will be the ground state, and so on.It can be shown that if the straight line E k − ( t ) is lower than all other straight lines,the first straight line that intersects E k − ( t ) is E k ( t ) . The intersection takes place in thepoint t k (11) that is the solution of equation A k − + t · B k − = A k + t · B k . A A A tE (t)E (t)E (t)E (t) Fig. 3.
For random additive matrix of dimensionality N = 6 the straight lines E k ( t ) = A k + t · B k are shown for k = 0 , , ,3