Bandgap optimization in combinatorial graphs with tailored ground states: Application in Quantum annealing
BB ANDGAP OPTIMIZATION IN COMBINATORIAL GRAPHS WITHTAILORED GROUND STATES : A
PPLICATION IN Q UANTUMANNEALING
A P
REPRINT
Siddhartha Srivastava ∗ , Veera Sundararaghavan † Department of Aerospace EngineeringUniversity of MichiganAnn Arbor, MI 48109February 3, 2021 A BSTRACT
A mixed-integer linear programming (MILP) formulation is presented for parameter estimation of thePotts model. Two algorithms are developed; the first method estimates the parameters such that the setof ground states replicate the user-prescribed data set; the second method allows the user to prescribethe ground states multiplicity. In both instances, the optimization process ensures that the bandgap ismaximized. Consequently, the model parameter efficiently describes the user data for a broad rangeof temperatures. This is useful in the development of energy-based graph models to be simulated onQuantum annealing hardware where the exact simulation temperature is unknown. Computationally,the memory requirement in this method grows exponentially with the graph size. Therefore, thismethod can only be practically applied to small graphs. Such applications include learning of smallgenerative classifiers and spin-lattice model with energy described by Ising hamiltonian. Learninglarge data sets poses no extra cost to this method; however, applications involving the learning ofhigh dimensional data are out of scope. K eywords Potts model · Ising model · Parameter estimation · Mixed Integer Linear Programming
Potts energy model was initially developed to describe interacting spins on a crystalline lattice. Since then, it has becomean archetypal model in other fields involving operations research, network theory, and physics of phase transition. Themotion of biological cells was described by Graner and Glazier [1] using a large-Q Potts model. A similar approach wasused in [2] to study grain boundary motion in polycrystalline microstructures during thermally induced grain growthand recrystallization process. In such studies, the system’s dynamics is represented as a transition probability governedby the model’s energy description. These problems can be simulated using Monte Carlo based simulations. On theother hand, there are problems where the equilibrium solutions are required, for instance, in computer vision, Pottsmodel is often used to describe the cut energy of a segmentation problem (c.f.[3]). These problems are usually solvedusing the Graph-cut method. The computation of this process becomes exceedingly challenging as more generality isintroduced. Bagon’s thesis [4] provides an excellent review of these generalities and suggests practical algorithms.Traditionally, these models are trained by considering them as Markov Random Fields (MRFs) and using gradient-basedapproaches to maximize the likelihood [5]. However, analytical estimates of the gradients are hard to compute. Amongapproximate techniques, Hinton’s contrastive divergence method [6] provides an efficient way to approximate thegradients in the parameter optimization problem successively. An excellent review of this subject is presented in [7].Recently, the advent of Quantum annealing technology has made it easier to sample states from the model’s probability ∗ [email protected] † [email protected] a r X i v : . [ c s . D S ] J a n PREPRINT - F
EBRUARY
3, 2021distribution [8]. This development has significantly eased the approximation of the required gradients. However,these methods have a critical drawback. These techniques only work for finite temperature probability distribution.Consequently, the model trained using these techniques is often temperature-dependent and shows disagreement withthe data as the temperature is lowered [9]. As an example, the negative log-likelihood of a model trained using thistechnique is presented in Fig1. It can be seen that the minimum is close to the training β (inverse temperature), whichwas chosen as β = 1 . A possible reason for this problem is that the training results in a locally optimal solution. Usingquantum annealers adds another layer of complication because the simulation temperature is not known and depends onthe graph size [9].Figure 1: Comparative analysis of likelihoods of models trained using the Likelihood maximization and band gapmaximization. The predicted models are presented in Appendix B.1. A lower value of Negative Log likelihood signifiesa better trained modelIn contrast, this work is based on the band gap’s maximization, while the ground states are chosen as the data states.This approach guarantees that the states’ probability distribution gets closer to that of the data set as the temperatureis reduced. Moreover, it ensures that the model adequately represents the data set for a broad range of temperatures.However, the downside of this approach is that there is no guarantee of the existence of parameters for every data set.This fact can be easily motivated by noticing that the number of ground states can be more than the number of modelparameters and may result in an over-constrained optimization problem. Such problems do not exist at a non-zerotemperature as all the states appear with non-zero probability.In this paper, a Mixed Integer Linear Programming (MILP) formulation is presented to estimate Potts model parameters.Two variations of the algorithm are presented. The first algorithm assigns a prescribed data set as the model’s groundstates while maximizing the bandgap. The second algorithm identifies a set of ground states with a prescribedmultiplicity while maximizing bandgap. It should be noted that the computational complexity of both the algorithmsgrows exponentially with the size of the problem. Therefore, these methods are only suited for small graph structures.These problems arise in designing energies of smaller motifs in a lattice structure.The paper is organized as follows: The formulation for the Potts energy is reviewed in section 2. Concepts like theground state, bandgap, and probability of a state are also reviewed. A theorem is presented to estimate the efficiency ofthe developed algorithms quantifiably. The problem statement is summarized in section 3. The developed algorithms arepresented in section 4. A case study for the Ising model is presented in 5. Few details on the computational complexityare also outlined. Section 6 provides a summary of the paper. Potts model is a type of a discrete pairwise energy model on an undirected simple graph. In lieu of introducing someuseful terms, following definition for graph is used: 2
PREPRINT - F
EBRUARY
3, 2021
Graph : A graph, G , is a pair of sets ( V , C ) , where V is the set of vertices and C is the set of edges/connections. Foreach element e ∈ C there is a corresponding ordered pair ( x, y ); x, y ∈ V i.e. C ⊆ V × V . A Graph, G = ( V , C ) isundirected if an edge does not have any directionality i.e ( x, y ) ≡ ( y, x ) . A graph is simple if ( x, x ) (cid:54)∈ C for all x ∈ V .Also, this work requires the graph to be finite, i.e., the number of vertices is finite. Next, the definition of Potts energy isintroduced. Consider a finite undirected simple graph G ( V , C ) . The number of vertices are denoted by N V = |V| and the numberof edges are denoted by N C = |C| . The indices of connections and vertices are related using the maps, π and π such that for a connection with index, k ∈ { , .., N C } , the index of the corresponding vertices are π ( k ) and π ( k ) with ≤ π ( k ) < π ( k ) ≤ N V . This essentially means e k ≡ ( v π ( k ) , v π ( k ) ) . Each vertex, v i ∈ V is assigned astate s i ∈ { , , . . . , N L } for all i ∈ , . . . , N V . This determines the complete state of the graph as an ordered tuple S = ( s , . . . , s i , . . . , s n ) ∈ { , . . . , N L } N V . The set of all possible states is referred to as S = { , . . . , N L } N V withthe total number of states denoted by N T S = |S| = N N V L . The Potts energy for a particular state can be evaluated asfollows: E ( S ) = N V (cid:88) i =1 H i U ( s i ) + N C (cid:88) k =1 J k V (cid:0) s π ( k, , s π ( k, (cid:1) (1)where, U ( s ) is the energy of labeling a vertex with label s , and V ( s i , s j ) is the energy of labeling two connected verticesas s i and s j . The parameters H i and J k are referred to as the Field strength and Interaction strength, respectively.Since the graph is undirected, following symmetry is imposed: V ( s i , s j ) = V ( s j , s i ) The parameter set is represented as a vector, θ = [ θ , . . . , θ N v + N C ] T . In this work, it is specialized to following form: θ = [ H , . . . , H N V , J , . . . , J N C ] T This notation allows to describe energy as a matrix-product evaluated as E ( S | θ ) = ε ( S ) θ where ε ( S ) ε ( S ) = (cid:2) U ( s ) , . . . , U ( s N V ) , V (cid:0) s π (1) , s π (1) (cid:1) , . . . , V (cid:0) s π ( N C ) , s π ( N C ) (cid:1)(cid:3) For a given set of parameters, θ , the set of ground states ( S G ( θ ) ⊆ S ) is the set of states with minimum energy, E ( θ ) ),i.e. S G ( θ ) = argmin S ∈S E ( S | θ ) , E ( θ ) = min S ∈S E ( S | θ ) In contrast, all the non-minimal states are referred to as exited states. The set of all excited states, denoted by S E ( θ ) ,can be evaluated as: S E ( θ ) = S − S G ( θ ) The cardinalities of the set of ground states ( S G ) and excited states ( S E ) are denoted by N GS and N ES , respectively.All excited states may or may not have the same energy. However, the minimum excited energy referred to as the ‘firstexcited energy’ is used in defining the band gap and is evaluated as: E ( θ ) = min S ∈S E ( θ ) E ( S | θ ) It should be noted that no assumption is made on the multiplicity of states with energy E ( θ ) . The band gap(a positivequantity) defines the energy gap between S G and S E . It is estimated as: ∆ E ( θ ) = E ( θ ) − E ( θ ) At any given temperature, T , the probability of occurrence of a state, S is described by the Boltzmann distribution as: p ( S | θ , β ) = 1 Z e − βE ( S ) (2)where β = 1 /k B T is the inverse thermodynamic temperature, k B is the Boltzmann constant and Z denotes the partitionfunction which is estimated as Z = (cid:88) S ∈S e − βE ( S ) PREPRINT - F
EBRUARY
3, 2021
Given a data set, S D ⊆ S , the parameters set, θ , is optimized such that the states in S D have higher probability ofoccurrence at a prescribed β value. Mathematically, this procedure entails minimization of negative log-likelihood asdefined below: η ( θ , β ) = − (cid:88) S ∈S D log p ( S | θ , β ) (3)It can be observed that at high temperatures i.e. β → , all states occur with equal likelihood and therefore η = lim β → ( θ , β ) = N DS log( N T S ) where N DS = |S D | . On the other hand, at low temperatures i.e. β → ∞ , only ground states occur with equalprobability and occurrence of any other state has probability 0. Consequently, the value of η in this limit is finite onlywhen S D ⊆ S G . It is evaluated as: η ∞ ( θ ) = lim β →∞ ( θ , β ) = N DS log( N GS ) It is desirable to estimate parameters such that the ground state replicates the data set, and the bandgap is maximized.The reason will be apparent after the next theorem (proof in Appendix A).
Theorem : For a given set of parameters, θ D , such that (i) S G ( θ D ) = S D (ii) ∆ E > , following statements hold true:(a) η ( θ D , β ) monotonically decreases with β and the low temperature limit η ∞ ( θ D ) = lim β →∞ η ( θ D , β ) = N GS log( N GS ) (4)(b) η ( θ D , β ) is bounded as: N GS log( N GS ) < η ( θ D , β ) ≤ N GS log (cid:0) N GS + N ES e − β ∆ E (cid:1) (5)(c) For any (cid:15) > , there exists a β ∗ such that for all β > β ∗ , η ( θ D , β ) − η ∞ ( θ D , β ) < (cid:15) where β ∗ is estimatedas: β ∗ = 1∆ E (cid:18) log N ES N GS − log (cid:16) e (cid:15)/N GS − (cid:17)(cid:19) (6)The consequence of this theorem is that it guarantees that if the parameters are chosen appropriately, η will approachto its global minimum in the low temperature (high β ) limit. Moreover, at a finite β , η is bounded from above by adecreasing function. It can be seen in Fig2(a), that the bound gets tighter for higher values of ∆ E . It is also shown thatthe trained model is efficient in the range of β determined by [ β ∗ , ∞ ) . Fig2(a) shows that a higher bandgap allows abroader range of temperatures. Given a finite undirected simple graph G ( V , C ) , find parameters, θ that maximizes the band gap in following twosituations: Case 1 : S D is prescribed and S G ( θ D ) = S D . Case 2 : Ground state multiplicity, N GS , is prescribed.To make this optimization problem well posed, it is additionally imposed that H min i ≤ H i ≤ H max i and J min k ≤ J k ≤ J max k . Moreover, the functions U ( s ) and V ( s i , s j ) are predetermined and not calibrated in the optimization process. A Mixed Integer Linear Programming (MILP) problem is formalized for parameter estimation of Potts model. A briefoverview of the MILP formulation is presented below:
Mixed Integer Linear Programming (MILP) : An optimization problem is considered to be of MILP type when theobjective function is linear in the decision variables and some of the decision variables are integer. A typical setup4
PREPRINT - F
EBRUARY
3, 2021 (a) (b)
Figure 2: An illustration of bounds for a trained Potts model with N GS = 10 and N V = 10 . (a) The upper bound on η with respect to β for various values of energy gap (b) β ∗ as a function of band gap for various bounds on η of MILP problem is given in Eq(7) where x is the decision variable of size N , I is the set of indices of x which areintegers and the matrices A eq , b eq , A and B are used to define linear constraints.Optimize: min x cx Inequality constraints: Ax ≤ b Equality constraints: A eq x = b eq Bounds: lb ≤ x ≤ ub Integer variables: x I ∈ Z (7)The MILP formulation for the two cases is presented next. In both cases, the decision variables include the parameters, θ , and some auxiliary variables. These variables are introduced along with the algorithm description. Moreover, thealgorithms do not enforce that ∆ E > . Therefore, the results are accepted only if this condition is met. The energies of individual states can be evaluated as a matrix product operation (shown in Section 2 ) which workswell with linear programming framework. However, the calculation of band gap requires calculation of a minimum ofenergy over S E . This operation introduces a non-linearity. Thus, following auxiliary variables are introduced to posethis optimization as a linear programming problem:• E (real valued scalar): It represents the energy of the st excited state.• m = [ m , ..., m N ES ] (binary valued vector of size N E ): It is defined such that it’s value is 1 on exactly oneindex and 0 everywhere else. The index with value 1 must correspond to one of the st excited state.• M (real valued scalar): It represents a large positive number. For computational purposes it can be evaluatedas: M = (cid:16) max s | U ( s ) | (cid:17) N V (cid:88) i =1 (cid:0) | H max i | + | H min i | (cid:1) + (cid:18) max s ,s | V ( s , s ) | (cid:19) N C (cid:88) k =1 (cid:0) | J max k | + | J min k | (cid:1) (8)The decision variable in this formulation are given as: x = [ θ , E , m ] T Consider a data set, S D = { S , ..., S N DS } . The optimization cost ( − ∆ E ) is estimated by substituting the E ( S ) asthat of ground state and E for the 1st excited state energy. Thus the cost is evaluated as:Cost = E ( S ) − E PREPRINT - F
EBRUARY
3, 2021The energy of all data states are explicitly equated as follows: E ( S ) − E ( S i ) = 0 , ∀ i ∈ { , ..., N DS } The st excited energy, E is estimated by bounding it from above by energies of all the excited states. It is boundedfrom below by the energy of state corresponding to the index at which m i = 1 . The upper bound on E insures that if m i = 1 , then E ( θ ) = E ( S i ) . These conditions can be imposed using following set of equations and inequality: E ( S i ) − E + M m i ≤ M, ∀ i ∈ { , ..., N ES }− E ( S i ) + E ≤ , ∀ i ∈ { , ..., N ES } N ES (cid:88) i =1 m i = 1 Most computing software only allows integer valued variables. In such a case, the binary value of variable m can beexplicitly enforced by setting following bounds on integer valued m : ≤ m i ≤ , ∀ i ∈ { , ..., N ES } This formulation is presented in Box 1 in the matrix format.Optimization cost: c = (cid:2) ε ( S ) − × N ES (cid:3) Inequality constraints A = ... ... ... ε ( S i ) − , ..., , M (cid:124)(cid:123)(cid:122)(cid:125) i th index , , ..., × N ES − ε ( S i ) 1 × N ES ... ... ... , b = ... M ... Equality constraints: A eq = × N V × N ES ε ( S ) − ε ( S ) 0 × N ES ... ... ... ε ( S N DS ) − ε ( S ) 0 × N ES , b eq = ... Bounds: lb = (cid:2) H min1 , ..., H min N V , J min1 , ..., J min N C , − M, × N ES (cid:3) ub = (cid:2) H max1 , ..., H max N V , J max1 , ..., J max N C , M, × N ES (cid:3) Box 1: MILP formulation for PEPDAS method
In this formulation only the variable N GS is provided by the user in stead of S D ata . This condition adds the complexityof locating the ground states and evaluating the ground state energy, E ( θ ) . This problem is resolved by includingfollowing auxiliary variables:• E (real valued scalar): It represents the ground state energy.• l = [ l , ..., l N TS ] (binary valued vector of size N T S ): It is defined such that it’s value is 1 on exactly N GS indices and 0 everywhere else. The index has value 1 if and only if it corresponds to the ground state.• E and M as defined in algorithm 1 6 PREPRINT - F
EBRUARY
3, 2021• m = [ m , ..., m N TS ] (binary valued vector of size N T S ): It is same as algorithm 1, except that the index arenow enumerated based on the set S The decision variable in this formulation are given as: x = [ θ E E l m ] The optimization cost is given as: Cost = E − E The estimation of E is done using the same idea of bounding E from above and below. The bound is tight only forindices where l i = 1 . − E ( S i ) + E ≤ , ∀ i ∈ { , ..., N T S } E ( S i ) − E + M l i ≤ M, ∀ i ∈ { , ..., N T S } N TS (cid:88) i =1 l i = N GS For the estimation of E , the upper bound is lifted on indices corresponding to ground states. This allows to estimateminimum over non-optimal states. Moreover, index of st excited state cannot coincide with ground state i.e. l i = 1 and m i = 1 cannot occur simultaneously. These conditions are imposed using following inequalities and equations: − E ( S i ) + E − M l i ≤ , ∀ i ∈ { , ..., N T S } E ( S i ) − E + M m i ≤ M, ∀ i ∈ { , ..., N T S } l i + m i ≤ , ∀ i ∈ { , ..., N T S } N TS (cid:88) i =1 l i = 1 The condition of binary valued variables is imposed on integer variables as follows: ≤ l i , m i ≤ , ∀ i ∈ { , ..., N ES } This formulation is presented in Box 2 in the matrix format.
In this section, an example is presented to show the efficiency of both the algorithms. It is shown by example that thepredicted η decays and is bounded. Moreover, the PEPGSM method can predict ground states that provide higherbandgap compared to randomly picked ground states. Next, the computational cost of this method is discussed. The parametric estimation of Ising model is presented as an application of this method. In this model, the states take abinary form i.e. N L = 2 . Traditionally the labels are denoted as { +1 , − } and the corresponding energy functions aredefined as: U (+1) = +1 , U ( −
1) = − V (+1 , +1) = V ( − , −
1) = 1 , V (+1 , −
1) = − Therefore, the energy can be effectively written as: E ( S ) = N V (cid:88) i =1 H i s i + N C (cid:88) k =1 J k s π ( k, s π ( k, (9)This model is applied on a 10-noded Peterson graph with | H | ≤ and | J | ≤ . First, the graph is trained by prescribingup to 4 data states using the PEPDAS method. Next, the graph is trained by prescribing the number of states from 1 to7 PREPRINT - F
EBRUARY
3, 2021Optimization cost: c = (cid:2) × ( N V + N C ) − × N TS × N TS (cid:3) Inequality constraints A = × ( N V + N C ) × N TS × N TS ... ... ... ... ... − ε ( S i ) 1 0 × N TS × N TS ε ( S i ) − , ..., , M (cid:124)(cid:123)(cid:122)(cid:125) i th index , , ..., × N TS × N TS − ε ( S i ) 0 1 [0 , ..., , − M (cid:124)(cid:123)(cid:122)(cid:125) i th index , , ..., × N TS × N TS ε ( S i ) 0 − × N TS [0 , ..., , M (cid:124)(cid:123)(cid:122)(cid:125) i th index , , ..., × N TS ... ... ... ... ... , b = ... M M ... Equality constraints: A eq = (cid:20) × ( N V + N C ) × N TS × N TS × ( N V + N C ) × N TS × N TS (cid:21) , b eq = (cid:20) N GS (cid:21) Bounds: lb = (cid:2) H min1 , ..., H min N V , J min1 , ..., J min N C , − M, − M, × N TS , × N TS (cid:3) ub = (cid:2) H max1 , ..., H max N V , J max1 , ..., J max N C , M, M, × N TS , × N TS (cid:3) Box 2: MILP formulation for PEPGSM method4 using the PEPGSM method. The predicted band gaps are shown in Table1. It can be observed that the PEPGSMmethod predicts the same bandgap as the PEPDAS method for data sets with a size up to 3. However, for 4 data points,the PEPGSM method can identify ground states that provide higher bandgap. The predicted parameters for a graphwith four ground states are shown in Fig 3. Likelihood estimates are not well defined in the case of PEPGSM method asit is not trained using the data. However, for comparison, η is estimated using the set of ground states in place of thedata set. The results for negative log-likelihood of the PEPDAS predicted model and PEPGSM predicted model areshown in Fig3(c). As expected, PEPGSM predicted model performs better than PEPDAS predicted model in terms ofthe range of β for which they can be used. The details of the other three models are presented in Appendix B.2.Algorithm N GS = 1 N GS = 2 N GS = 3 N GS = 4 PEPDAS 8.0 6.0 4.0 6.0PEPGSM 8.0 6.0 4.0 4.0Table 1: Predicted maximum band gap for Peterson graph
One of the limiting features of these algorithms is that it grows exponentially with the graph size. An exact number ofvariables and equations is provided in Table2. It should be noted that the number of states, N T S = N N V L and is thereason for the large size of the decision variable. The system of equations and inequalities in both algorithms have largesparse blocks which provide some computational easing. It should also be noted that the sparsity of graph, G , does notgive considerable advantage in the algorithm as the size of the problem is mainly dictated by the number of labels, N T ,and the number of vertices, N V . 8 PREPRINT - F
EBRUARY
3, 2021 (a) (b)(c)
Figure 3: Optimal Ising parameters of a Peterson graph with 4 ground states found using (a) PEPDAS method, and (b)PEPGSM method. The ground states are presented as the colored graph in the top right corner of each image. A greenlabel denotes the ‘ +1 ’ state, and the red label denotes the ‘ − ’ state. (c) The normalized Negative log-likelihood of theoptimized graphs 9 PREPRINT - F
EBRUARY
3, 2021Quantity PEPDAS PEPGSMTotal variables N V + N C + 1 + N ES N V + N C + 2 + 2 N T S
Integer (Binary) variables N ES N T S
Inequality conditions N ES N T S + 1
Equality conditions N DS Table 2: Variable size for Algorithms PEPDAS and PEPGSM
Two algorithms were developed and analyzed for estimating parameters of Potts model. The functionality of eachmethod is as follows:1. PEPDAS method estimates the parameters to exactly replicate the ground states as the prescribed data set.2. PEPGSM method estimates the parameters to identify ground states based on their prescribed quantity.Both algorithms maximize the band gap between the ground and excited states of the model. It was shown that modelsoptimized in this manner have a higher probability of being in the ground state for a broader range of temperatures. Theupper bounds on the optimized model’s performance are also estimated. This efficiency is measured in terms of therange of temperature for which ground states’ likelihood remains in the desired range. The examples included in thepaper show promising practical results on small graphs. As suggested in the main body of the paper, these methods donot scale well with the graph size, and their usage should be restricted to small problems.
The codes are available at https://github.com/sidsriva/PEP
A Proof of theorem (a)
Since S G ( θ , β ) = S D , the Negative Log Likelhood, η ( θ D , β ) , is estimated as: η ( θ D , β ) = N GS βE + N GS log Z The derivative is estimated as:: dηdβ = N GS (( E − E ( E )) (10)where E ( E ) = (cid:88) S ∈S E ( S ) p ( S | θ D , β ) Since ∆ E > , the expected energy is strictly bounded below as E ( E ) > E . Consequently: dηdβ < In the low temperature limit, Eq(2) estimates that the probability of all excited states approaches 0 while all groundstates are equally likely with probability ( N GS ) − . Therefore, the value of η in this limit is estimated as Eq(4). (b) Let S G ∈ S G and P = p ( S G | θ D , β ) so that η ( θ D , β ) = − N GS log P . The probability of occurrence of a groundstate is given by N GS P and occurrence of a excited state is given as (1 − N GS P ) . Moreover, for any finite value of β both of these probabilities are finite. Therefore, the expectation of energy, E , can be bounded as E = N GS P E + (cid:88) S ∈S E E ( S ) p ( S | θ D , β ) ≤ N GS P E + (1 − N GS P ) E Substituting in Eq(10), dηdβ = E − E ( E ) ≤ ( N GS P − N GS ∆ E PREPRINT - F
EBRUARY
3, 2021Substituting P = e − η/N GS gives the following differential inequality dηdβ ≤ (cid:16) N GS e − η/N GS − (cid:17) N GS ∆ E (11)Consider the differential equation for β ∈ [0 , ∞ ) , dξdβ = (cid:16) N GS e − ξ/N GS − (cid:17) N GS ∆ E (12)with initial condition ξ ( θ D ,
0) = η ( θ D ,
0) = N GS log N T S . Noting that N GS e − ξ/N GS − N GS P − > , thisODE is integrated to give the following solution: ξ ( θ D , β ) = N GS log (cid:0) N GS + N ES e − β ∆ E (cid:1) (13)Using Comparison Lemma [10], for all < β < ∞ , η ( θ D , β ) ≤ ξ ( θ D , β ) (14)This proves the upper bound. The lower bound is a direct consequence from monotonicity proved in part 1. (c) For any β < ∞ η ( θ D , β ) − N GS log N GS ≤ N GS log (cid:18) N ES N GS e − β ∆ E (cid:19) For any (cid:15) > , choose a β > β ∗ ( (cid:15) ) using Eq(6) and observe that, N GS log (cid:18) N ES N GS e − β ∆ E (cid:19) < (cid:15) This proves the third statement.
B Optimized Graphs
B.1 K-3 graph
A fully connected 3-noded graph is optimized for 4 data states. The energy of the graph is modeled using Ising modelEq(9) with | H | ≤ and | J | ≤ . The optimized parameters using the (1) Minimization of Negative Log-likelihood,and (2) PEPDAS method are presented in Fig.4 (a) (b) (c) Figure 4: (a) Training data set of states with green representing a ‘+1’ state and red representing a ‘-1’ state. (b)Optimized graph using minimization of Negative Log-likelhood at β = 1 (c) Optimized graph using PEPDAS method.The field terms are mentioned in blue color and interaction terms are mentioned in red color B.2 Peterson graph
A Peterson graph is first optimized for upto 3 user prescribed data states using PEPDAS method. Then it is optimizedfor 3 ground states using PEPGSM method. The energy of the graph is modeled using Ising model Eq(9) with | H | < and | J | < . The optimized graphs are presented in Fig5 and their respective Negative log likelhood is presented inFig.6. 11 PREPRINT - F
EBRUARY
3, 2021Figure 5: Optimal Ising parameters of a Peterson graph found using PEPDAS method (left) and PEPGSM method(right). The ground states are presented as the colored graph in the top right corner of each image. A green label denotesthe ‘ +1 ’ state and the red label denotes the ‘ − ’ state. 12 PREPRINT - F
EBRUARY
3, 2021Figure 6: Normalized Negative log likelihood and their respective bounds for Peterson graphs trained using PEPDASmethod
References [1] François Graner and James A Glazier. Simulation of biological cell sorting using a two-dimensional extendedpotts model.
Physical review letters , 69(13):2013, 1992.[2] Mark Miodownik. Monte carlo potts model. In Koenraad G.F. Janssens, Dierk Raabe, Ernst Kozeschnik, Mark A.Miodownik, and Britta Nestler, editors,
Computational Materials Engineering , pages 47 – 108. Academic Press,Burlington, 2007.[3] Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy minimization via graph cuts.
IEEETransactions on pattern analysis and machine intelligence , 23(11):1222–1239, 2001.[4] Shai Bagon. Discrete energy minimization, beyond submodularity: Applications and approximations. arXivpreprint arXiv:1210.7362 , 2012.[5] Xavier Descombes, Robin D Morris, Josiane Zerubia, and Marc Berthod. Estimation of markov random fieldprior parameters using markov chain monte carlo maximum likelihood.
IEEE Transactions on Image Processing ,8(7):954–963, 1999.[6] Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence.
Neural computation ,14(8):1771–1800, 2002.[7] Asja Fischer and Christian Igel. An introduction to restricted boltzmann machines. In
Iberoamerican Congress onPattern Recognition , pages 14–36. Springer, 2012.[8] Steven H Adachi and Maxwell P Henderson. Application of quantum annealing to training of deep neural networks. arXiv preprint arXiv:1510.06356 , 2015.[9] Siddhartha Srivastava and Veera Sundararaghavan. Machine learning in quantum computers via general boltzmannmachines: Generative and discriminative training through annealing. arXiv preprint arXiv:2002.00792 , 2020.[10] Hassan K. Khalil. Fundamental properties. In