A QUBO Model for Gaussian Process Variance Reduction
AA QUBO Model forGaussian Process Variance Reduction
Lorenzo Bottarelli, Alessandro FarinelliDepartment of Computer Science, University of Verona, Italy
Abstract
Gaussian Processes are used in many applications to model spatialphenomena. Within this context, a key issue is to decide the set oflocations where to take measurements so as to obtain a better approx-imation of the underlying function. Current state of the art techniquesselect such set to minimize the posterior variance of the Gaussian pro-cess. We explore the feasibility of solving this problem by proposing anovel Quadratic Unconstrained Binary Optimization (QUBO) model.In recent years this QUBO formulation has gained increasing atten-tion since it represents the input for the specialized quantum annealerD-Wave machines. Hence, our contribution takes an important firststep towards the sampling optimization of Gaussian processes in thecontext of quantum computation. Results of our empirical evaluationshows that the optimum of the QUBO objective function we derivedrepresents a good solution for the above mentioned problem. In factwe are able to obtain comparable and in some cases better results thanthe widely used submodular technique.
Gaussian processes are a widely used tool in machine learning [1, 2] andprovides a statistical distribution together with a way to model an unknownfunction f . A Gaussian process (GP) defines a prior distribution over func-tions, which can be converted into a posterior distribution over functionsonce we have observed some data.In many spatial analysis, such as environmental monitoring applications,the unknown scalar field of a phenomenon of interest (e.g. the temperatureof the environment or the pH value of water in rivers or in lakes [3, 4])is modeled using a GP. In this context it is necessary to choose a set oflocations in space in which to measure the specific phenomenon of interest,or similarly it is necessary to chose the displacement positions of fixed sensors[5, 6, 7]. However, in both cases the process is usually costly and one wants1 a r X i v : . [ c s . ET ] J a n o select observations that are especially informative with respect to someobjective function. A good choice of sampling locations allows to obtain abetter approximation of the underling phenomenon.Research in the context aims at selecting the set of measurements so as tooptimize an important sensing quality function for spatial prediction that isrepresented by the reduction of predictive variance of the Gaussian Process[8]. Das and Kempe [9] showed that, in many cases, the variance reductionat any particular location is submodular. Submodularity is a property of aspecific class of set functions. They encode an intuitive diminishing returnsproperty that allows for a greedy forward-selection algorithm that is widelyexploited in sensing optimization literature [7, 10, 11, 12, 13].A more recent work aims at simultaneously optimize sensing locationsby minimizing the posterior variance of a GP through the use of a gradientdescent algorithm [14]. However, this technique makes the assumption thatthe space where observation can be made is continuous and requires aninitialization of sampling points to be optimized.In general, the selection of the optimal set of measurement locationsin order to minimize the posterior variance of a GP is NP-hard given thecombinatorial nature of the problem and the exponential number of candi-date solutions. This and may problems in artificial intelligence and patternrecognition are computationally difficult due to their inherent complexityand the exponential size of the solution space. Quantum information pro-cessing could provide a viable alternative to combat such a complexity. Anotable progress in this direction is represented by the recent developmentof the D-Wave quantum annealer, whose processor has been designed to thepurpose of solving Quadratic Unconstrained Binary Optimization (QUBO)problems. As a consequence, many works in literature investigate the possi-bility of using quantum annealing to address hard artificial intelligence andpattern recognition problems by proposing their QUBO formulation.Examples include image recognition [15], bayesian network structurelearning [16], fault detection and diagnosis [17], training a binary classi-fier [18] and portfolio optimization [19, 20, 21]. Moreover, in the context ofmathematical logic, Bian et al. [22] propose a QUBO formulation to tacklethe maxSAT problem, an optimization extension of the well known SAT(boolean satisfiability) problem [23].NASA’s Quantum Artificial Intelligence Laboratory (QuAIL) team hostsone of the D-Wave machine and aims to investigate whether quantum com-puting can improve the ability to address difficult optimization and machinelearning problems related to several fields that include NASA’s aeronautics,Earth and space sciences, and space exploration missions. The focus of theQuAIL team is both theoretical and empirical investigations of quantumannealing. Biswas et al. [24] reviews NASA perspective on quantum com- https://ti.arc.nasa.gov/tech/dash/groups/physics/quail/ • We propose a QUBO model to minimize the posterior variance of aGaussian process. • We provide a mathematical demonstration that the optimum of ourQUBO model satisfies the constraint of the problem. • We study the performance of the proposed QUBO model with respectto the submodular greedy algorithm and a random selection.
A Gaussian Process is a flexible and non-parametric tool that defines aprior distribution over functions, which can be converted into a posteriordistribution over functions once we have observed some data. A GP iscompletely defined by its mean and kernel function (also called covariancefunction) which encodes the smoothness properties of the modeled function f . Suppose we observe a training set K = { ( µ i , y i ) | i = 1 , . . . , K } , that is, aset of K measurements { y , y , · · · , y K } taken at locations { µ , µ , · · · , µ K } .We consider Gaussian processes that are estimated based on a set of noisymeasurements. Hence, we assume that y i = f ( x i ) + (cid:15) where (cid:15) ∼ N (0 , σ n ),that is, observations with additive independent identically distributed Gaus-sian noise (cid:15) with variance σ n . The posterior mean and variance over f for atest point x ∗ can be computed as follows [1, 2]: f ( x ∗ ) = k T ∗ ( K + σ n I ) − y (1) σ ( x ∗ ) = k ( x ∗ , x ∗ ) − k T ∗ ( K + σ n I ) − k ∗ (2)where k ∗ = [ k ( µ , x ∗ ) , · · · , k ( µ K , x ∗ )] T and K = [ k ( µ i , µ j )] µ i , µ j ∈K . Us-ing the above equations we can compute the GP to update our knowledgeabout the unknown function f based on information acquired through ob-servations. 3ote that the variance computed using Equation 2 does not dependof the observed values y i but only on the locations µ i of the training set.This is an important property of GPs and plays a significant role in ourcontribution.The predictive performance of GPs depends exclusively on the suitabil-ity of the chosen kernel and parameters. There are lots of possible kernelfunctions to choose from [1, 2]. A common property is to have the covariancedecrease as the distance between the points grows, so that the prediction ismostly based on the near locations. A famous and widely use kernel is thesquared exponential, also known as Gaussian kernel: k ( a , b ) = σ f exp (cid:18) − ( a − b ) T ( a − b )2 l (cid:19) (3)The kernel function will often have some parameters, for example, alength parameter that determines how quickly the covariance decreases withdistance. In the squared exponential kernel l controls the horizontal lengthscale over which the function varies, and σ f controls the vertical variation. A set function is a function which takes as input a set of elements. Partic-ular classes of set functions turn out to be submodular. A fairly intuitivecharacterization of a submodular function has been given [35]:
Definition 1.
A function F is submodular if and only if for all A ⊆ B ⊆ X and x ∈ X \ B it holds that: F ( A ∪ { x } ) − F ( A ) ≥ F ( B ∪ { x } ) − F ( B ) (4)This definition captures a concept known as diminishing return property.Informally we can say that if F is submodular, adding a new element x to aset increases the value of F more if we have fewer elements than if we havemore. This property allows for a simple greedy forward-selection algorithmwith an optimality bound guarantees [35]. This is widely exploited in sensingoptimization literature [7, 10, 11, 12, 13].This concept is of our interest as it directly apply to Gaussian processes.Specifically, the posterior variance of a Gaussian process belongs to this classof submodular functions. [9] show that the variance reduction: F x ( A ) = σ ( x ) − σ ( x | A ) (5)at any particular point x , satisfies the diminishing returns property: addinga new observation reduces the variance in x more if we have made fewobservations so far, and less if we have already made many observations.4 .3 Quadratic Unconstrained Binary Optimization (QUBO) The goal of a Quadratic Unconstrained Binary Optimization problem is tofind the assignment of a set of binary variables z ...z n so as to minimize agiven objective function: O ( z , ..., z n ) = n (cid:88) i =1 a i z i + (cid:88) ≤ i Values β i,j are guaranteed to be higher that 0.Proof. Given Equation 10: J ( ∅ ) − J ( { x i , x j } ) < (cid:0) J ( ∅ ) − J ( { x i } ) (cid:1) + (cid:0) J ( ∅ ) − J ( { x j } ) (cid:1) J ( ∅ ) − J ( { x i , x j } ) < J ( ∅ ) − J ( { x i } ) − J ( { x j } ) − J ( ∅ ) − J ( { x i , x j } ) + J ( { x i } ) + J ( { x j } ) < J ( ∅ ) + J ( { x i , x j } ) − J ( { x i } ) − J ( { x j } ) > K of sampling points, it is now important to implement a constraint thatguarantees that the correct number is selected. This is described in the nextsection. In order to implement a constraint into an unconstrained problem we haveto encode it as penalties in our objective function. In case of a QUBOmodel that has to be minimized, any combination that does not satisfy theconstraint must have an higher value that prevents the selection of suchconfiguration as an optimal solution.In what follows we describe how we can implement the constraint of ourproblem into a complete graph. To do so, we further divide the descriptionof our implementation in two separate parts, specifically:1. How to guarantee that exactly K nodes are selected in an zero-valuegraph (i.e. a weighted graph with zero values on every node and everyedge).2. How to guarantee that the constraint is strong enough given that wewant to implement it in a non zero-value graph.7 .2.1 Selecting K nodes Here, we describe how we can set the values in a complete graph such thatonly K nodes are selected. In order to do so we have to guarantee that theobjective function is minimized if and only if exactly K nodes are selected.For any other cases the objective function needs to have a higher value.Starting from a zero-value graph, to implement the constraint we needto add some values on the nodes and edges. Specifically, we will assign thesame value A ∈ R to every node and the same value B ∈ R to every edge.If in a complete graph we select n nodes, this lead to the following sum ofvalues: Bn ( n − An (12)If we want to guarantee that exactly K nodes are selected Equation 12needs to have its minimum value when n = K . Theorem 2. Given the function Bn ( n − + An , for any B > if A = − BK + B the minimum is in n = K .Proof. Bn ( n − An = Bn − Bn An With B > ∂∂n (cid:16) Bn − Bn An (cid:17) = 0 Bn − B A = 0Now we want to fix the derivative to be 0 when n = K . BK − B A = 0 A = − BK + B K , hence also in the restricted case of our problem when K ∈ N [2 , | X |− . 8 .2.2 Guarantee the strength of the constraint In the discussion above we have shown how to implement the constraintin a zero-value graph. However, in our case we have to guarantee thatit is satisfied in a complete graph that is already populated with values.In this section we show that values A and B can be set such that theenergy penalties are strong enough to guarantee that the constraint is alwayssatisfied.Given A = − BK + B , Equation 12 becomes: Bn ( n − − BKn + Bn Bn − BKn (13)We want to analyze how this function increases as we move away from n = K that represent the minimum. Specifically, given l ∈ N we analyzehow this function increases when n = K + l and when n = K − l . (cid:104) Bn − BKn (cid:105) n = K + l − (cid:104) Bn − BKn (cid:105) n = K = B ( K + l ) − BK ( K + l ) − (cid:16) BK − BK (cid:17) = Bl (cid:104) Bn − BKn (cid:105) n = K − l − (cid:104) Bn − BKn (cid:105) n = K = B ( K − l ) − BK ( K − l ) − (cid:16) BK − BK (cid:17) = Bl K .Note that the constraint that we have built is generic and can be appliedto any other problems where a specific given number of nodes has to beselected. From the mathematical point of view of a QUBO function theconstraint built so far guarantees that exactly K binary variables needsto have value 1 in order to minimize the function. Moreover, any feasiblecombination (combinations where K binary variables has value 1) starts withthe same energy value as expressed in Equation 16, leaving these solutionsto ‘compete’ for which is the optimal one for a specific instance. BK ( K − − BK + BK − BK B .In order to guarantee that it is satisfied in a non zero-value graph (i.e. aweighted graph with non-zero weights) we have to selected the value of B big enough to overcome additional ‘forces’.9rom a practical point of view, since B ∈ R + we can just set it as a verybig number to guarantee that the constraint is satisfied. For our specificproblem we show in the next section that it is possible to compute a boundfor the value B . Here we want to compute a lower bound for the value of B such that theconstraint is satisfied in a generic weighted graph with values computed aspresented in Section 3.1. Let start by making the following considerations: • The constraint is satisfied if and only if the minimum of the QUBOobjective function correspond to a solution with exactly K nodes se-lected. • The strength of the constraint (i.e. the energy penalty) as computedin Equations 14 and 15 is Bl l ∈ N represents the differencefrom K on the number of selected points. • Values α i computed with Equation 9 are always negative. • Values β i,j computed with Equation 11 are always positive.Given these considerations, we can compute what is the strongest “force”that a non feasible solution is applying to deviate from a feasible solution.In other terms, what is the highest contribution to the QUBO functionthat a configuration without the constraint satisfied is applying. Once wecompute this value, we can set B to be high enough such that the energypenalty overcome the worst case scenario. In what follow we analyze thetwo possible cases. Configuration with more nodes selected Let A be the set of the α i values computed with Equation 9, that is, the setof values assigned to the nodes and that represent the amount of variancereduction obtainable by sampling in that point.Now, we define a set A n as the set of the n lowest values of the set A : A n (cid:44) (cid:40) A = ∅A n = A n − ∪ min( A \ A n − ) (17)Since we want to minimize the QUBO objective function the contributionof values α i correspond to a “force” that is trying to add more nodes to thesolution, whereas values β i,j on the other side opposes to the selection ofmore nodes. For a configuration with K + l nodes selected, the strongest10ontribution of forces that is trying to deviate from a feasible solution of K nodes is the following: (cid:88) α i ∈A l | α i | (cid:124) (cid:123)(cid:122) (cid:125) Contribution of additional nodes − (cid:18) ( K + l )( K + l − − K ( K − (cid:19) β i,j (cid:124) (cid:123)(cid:122) (cid:125) Contribution of additional edges (18)We can now compute an upper bound for this quantity as follows: (cid:88) α i ∈A l | α i | − (cid:18) ( K + l )( K + l − − K ( K − (cid:19) β i,j < (cid:88) α i ∈A l | α i | − ≤ l | min( A ) | ≤ l | min( A ) | (19)Now, if we want the constraint to be strong enough to overcome any con-figuration with more than K nodes selected we need to impose the strengthof the constraint computed in Equations 14 and 15 to be higher then theupper bounded force applied by non-feasible configuration as computed inEquation 19, that is: Bl > l | min( A ) | B > | min( A ) | (20) Configuration with less nodes selected Similarly to what we have done above, here we compute a bound for thevalue B such that the energy penalty of the constraint is strong enough toovercome configurations with less than K nodes selected.Let B be the set of the β i,j computed with Equation 11, that is, the setof the value that we have assigned for the moment to the edges. We definea set B n as the set of the n highest values of the set B : B n (cid:44) (cid:40) B = ∅B n = B n − ∪ max( B \ B n − ) (21)Since we want to minimize the QUBO objective function the contributionof values β i,j correspond to a “force” that is trying to remove nodes to thesolution, whereas values α i on the other side opposes to the removal of nodes.For a configuration with K − l nodes selected, the strongest contribution offorces that is trying to deviate from the feasible solution of K nodes is thefollowing: 11 β i,j ∈B (cid:0) K ( K − − ( K − l )( K − l − (cid:1) β i,j (cid:124) (cid:123)(cid:122) (cid:125) Contribution of removing edges − l | α i | (cid:124)(cid:123)(cid:122)(cid:125) contribution of removing nodes (22)where K ( K − − ( K − l )( K − l − represents the number of extra edges if weselect K sensors as opposed to K − l . We can now compute an upper boundfor this quantity as follow: (cid:88) β i,j ∈B ( K ( K − − ( K − l )( K − l − β i,j − l | α i | < (cid:88) β i,j ∈B ( K ( K − − ( K − l )( K − l − β i,j − ≤ (cid:18) K ( K − − ( K − l )( K − l − (cid:19) max( B ) = (cid:18) Kl − l − l (cid:19) max( B ) = l (cid:18) Kl − − l (cid:19) max( B ) < Kl B ) (23)Now, if we want the constraint to be strong enough to overcome anyconfiguration with less than K nodes selected we need to impose the strengthof the constraint computed in Equations 14 and 15 to be higher then theupper bounded force applied by non-feasible configuration as computed inEquation 24, that is: Bl > Kl B ) B > K max( B ) (24)Now, given the nature of the problem and considering that we wantto present a general model that works with any possible hyperparameterscombinations of the Gaussian process, we cannot infer which one betweenEquations 20 and 24 imposes a bigger value of B . However, for any instanceof the QUBO objective function we can easily compute a feasible value for B as follows: B > max (cid:16) | min( A ) | , K max( B ) (cid:17) (25)12 .4 The complete model The complete model can easily be expressed as a complete weighted graphwhose values are the sum of the values as seen in section 3.1 and the con-straint that we explained above.Specifically, the nodes of the graph will have value: a i (cid:44) α i − BK + B b i,j (cid:44) β i,j + B (27)To conclude, the final QUBO instance will be: O ( z , ..., z n ) = |X | (cid:88) i =1 (cid:16) J ( { x i } ) − J ( ∅ ) − BK + B (cid:17) z i + (cid:88) ≤ i 100 experiments (8 combination of hyperparameters and100 randomly selected combinations of sampling points).Figure 1: Remaining variance of the Gaussian process by varying the number K of sampling locations on the dataset with a domain composed of 25 points.The results represent the average over the 8 combination of hyperparametersused during the experiment.First of all we notice that in general the optimized variant of the QUBOmodel (described in Section 3.5) where we module the strength of the quadraticterms by a parameter w , allows us to obtain much better results compared15igure 2: Posterior variance of the Gaussian process by varying the number K of sampling locations on the dataset with a domain composed of 36 points.The results represent the average over the 8 combination of hyperparametersused during the experiment.to the ‘standard’ QUBO model (that correspond to use w = 1), provingthat indeed a good tuning of that parameter provides an advantage for ourQUBO model. Regarding the w parameter we remand to the considerationpresented in the next section.Moreover, we can observe that the optimum solutions of our QUBOmodel (by tuning the w parameter) in the first dataset (Figure 1) are com-parable with submodular selection technique when using 2 sampling points,worst with 3 and better in all the other cases. A similar situation is truefor the second dataset (Figure 2) where we have in some cases a comparableresults, in some worst and in the remaining better results than submodular.On average the solutions obtained with a random selection of samplingpoints perform worst than both submodular and the QUBO model.As we can observe for both the datasets (Figures 1 and 2) the trend forall the techniques tested is the same. As expected, by adding more measure-ment locations the variance of the Gaussian process decreases. However, theinterleaving of the curve representing our QUBO model and the curve rep-resenting the submodular selection shows that the optimum of the QUBOmodel represent indeed a good approximation of the objective function thatwe are approximating. 16 Conclusions In this paper we proposed a novel QUBO model to tackle the problem ofoptimizing sampling locations in order to minimize the posterior variance ofa Gaussian process. The strength of this contribution is the proposal of acompletely alternative method that can be used by non-classical computingarchitectures (quantum annealer) and therefore benefit from research in thisfield.Although the w parameter of our model has to be determined empirically,results shows that the optimum of the QUBO objective function represent agood solution for the above mentioned problem, obtaining comparable andin some cases better results than the widely used submodular technique.We believe that our contribution with this QUBO model takes an im-portant first step towards the sampling optimization of Gaussian processesin the context of quantum computation. References [1] C. E. Rasmussen and Williams C. K. I. Gaussian Processes for MachineLearning . MIT Press, Cambridge, MA, USA, 2006.[2] Kevin P. Murphy. Machine Learning: A Probabilistic Perspective . TheMIT Press, 2012.[3] G. Hitz, A. Gotovos, F. Pomerleau, M.-E. Garneau, C. Pradalier,A. Krause, and R.Y. Siegwart. Fully autonomous focused explorationfor robotic environmental monitoring. In Robotics and Automation(ICRA), 2014 IEEE International Conference on , pages 2658–2664,May 2014.[4] Aarti Singh, Robert Nowak, and Parmesh Ramanathan. Active learningfor adaptive mobile sensing networks. In Proceedings of the 5th Interna-tional Conference on Information Processing in Sensor Networks , IPSN’06, pages 60–68, New York, NY, USA, 2006. ACM.[5] Carlos Guestrin, Andreas Krause, and Ajit Paul Singh. Near-optimalsensor placements in gaussian processes. In Proceedings of the 22ndinternational conference on Machine learning , pages 265–272. ACM,2005.[6] Andreas Krause, Ajit Singh, and Carlos Guestrin. Near-optimal sensorplacements in gaussian processes: Theory, efficient algorithms and em-pirical studies. Journal of Machine Learning Research , 9(Feb):235–284,2008. 177] Andreas Krause, Carlos Guestrin, Anupam Gupta, and Jon Kleinberg.Robust sensor placements at informative and communication-efficientlocations. ACM Trans. Sen. Netw. , 7(4):31:1–31:33, February 2011.[8] Andreas Krause, H Brendan McMahan, Carlos Guestrin, and AnupamGupta. Robust submodular observation selection. Journal of MachineLearning Research , 9(Dec):2761–2801, 2008.[9] Abhimanyu Das and David Kempe. Algorithms for subset selection inlinear regression. In Proceedings of the fortieth annual ACM symposiumon Theory of computing , pages 45–54. ACM, 2008.[10] Andreas Krause and Carlos Guestrin. Near-optimal observation selec-tion using submodular functions. In National Conference on ArtificialIntelligence (AAAI), Nectar track , July 2007.[11] T. Powers, J. Bilmes, D. W. Krout, and L. Atlas. Constrained robustsubmodular sensor selection with applications to multistatic sonar ar-rays. In , pages 2179–2185, July 2016.[12] Andreas Krause and Daniel Golovin. Submodular function maximiza-tion., 2014.[13] Vasileios Tzoumas. Resilient Submodular Maximization for Control andSensing . PhD thesis, University of Pennsylvania, 2018.[14] Lorenzo Bottarelli and Marco Loog. Gradient descent for gaussian pro-cesses variance reduction. In Xiao Bai, Edwin R. Hancock, Tin KamHo, Richard C. Wilson, Battista Biggio, and Antonio Robles-Kelly, ed-itors, Structural, Syntactic, and Statistical Pattern Recognition , pages160–169, Cham, 2018. Springer International Publishing.[15] Hartmut Neven, Geordie Rose, and William G Macready. Image recog-nition with an adiabatic quantum computer i. mapping to quadraticunconstrained binary optimization. arXiv preprint arXiv:0804.4457 ,2008.[16] B. O’Gorman, R. Babbush, A. Perdomo-Ortiz, A. Aspuru-Guzik, andV. Smelyanskiy. Bayesian network structure learning using quantumannealing. The European Physical Journal Special Topics , 224(1):163–188, 2015.[17] Alejandro Perdomo-Ortiz, Joseph Fluegemann, Sriram Narasimhan,Rupak Biswas, and Vadim N Smelyanskiy. A quantum annealing ap-proach for fault detection and diagnosis of graph-based systems. TheEuropean Physical Journal Special Topics , 224(1):131–148, 2015.1818] Hartmut Neven, Vasil S Denchev, Geordie Rose, and William GMacready. Training a binary classifier with the quantum adiabaticalgorithm. arXiv preprint arXiv:0811.0416 , 2008.[19] Gili Rosenberg, Poya Haghnegahdar, Phil Goddard, Peter Carr, Kesh-eng Wu, and Marcos L´opez De Prado. Solving the optimal tradingtrajectory problem using a quantum annealer. IEEE Journal of Se-lected Topics in Signal Processing , 10(6):1053–1060, 2016.[20] Michael Marzec. Portfolio optimization: applications in quantum com-puting. Handbook of High-Frequency Trading and Modeling in Finance(John Wiley & Sons, Inc., 2016) pp , pages 73–106, 2016.[21] Davide Venturelli and Alexei Kondratyev. Reverse quantum an-nealing approach to portfolio optimization problems. arXiv preprintarXiv:1810.08584 , 2018.[22] Zhengbing Bian, Fabian Chudak, William Macready, Aidan Roy,Roberto Sebastiani, and Stefano Varotti. Solving sat and maxsat witha quantum annealer: Foundations, encodings, and preliminary results. arXiv preprint arXiv:1811.02524 , 2018.[23] Stephen A Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of comput-ing , pages 151–158. ACM, 1971.[24] Rupak Biswas, Zhang Jiang, Kostya Kechezhi, Sergey Knysh, SalvatoreMandr`a, Bryan O’Gorman, Alejandro Perdomo-Ortiz, Andre Petukhov,John Realpe-G´omez, Eleanor Rieffel, Davide Venturelli, Fedir Vasko,and Zhihui Wang. A nasa perspective on quantum computing: Oppor-tunities and challenges. Parallel Computing , 64:81 – 98, 2017. High-EndComputing for Next-Generation Scientific Discovery.[25] Vadim N Smelyanskiy, Eleanor G Rieffel, Sergey I Knysh, Colin PWilliams, Mark W Johnson, Murray C Thom, William G Macready,and Kristen L Pudenz. A near-term quantum computing approachfor hard computational problems in space exploration. arXiv preprintarXiv:1204.2821 , 2012.[26] E. G. Rieffel, D. Venturelli, B. O’Gorman, M. B. Do, E. M. Prystay,and V. N. Smelyanskiy. A case study in programming a quantum an-nealer for hard operational planning problems. Quantum InformationProcessing , 14:1–36, January 2015.[27] Davide Venturelli, Dominic JJ Marchand, and Galo Rojo. Quan-tum annealing implementation of job-shop scheduling. arXiv preprintarXiv:1506.08479 , 2015. 1928] Tony T Tran, Zhihui Wang, Minh Do, Eleanor G Rieffel, Jeremy Frank,Bryan O’Gorman, Davide Venturelli, and J Christopher Beck. Explo-rations of quantum-classical approaches to scheduling a mars landeractivity problem. In AAAI Workshop: Planning for Hybrid Systems ,2016.[29] Tony T Tran, Minh Do, Eleanor G Rieffel, Jeremy Frank, Zhihui Wang,Bryan O’Gorman, Davide Venturelli, and J Christopher Beck. A hybridquantum-classical approach to solving scheduling problems. In NinthAnnual Symposium on Combinatorial Search , 2016.[30] Marcello Benedetti, John Realpe-G´omez, Rupak Biswas, and Alejan-dro Perdomo-Ortiz. Estimation of effective temperatures in quantumannealers for sampling applications: A case study with possible appli-cations in deep learning. Physical Review A , 94(2):022308, 2016.[31] Mohammad H Amin. Searching for quantum speedup in quasistaticquantum annealers. Physical Review A , 92(5):052323, 2015.[32] Steven H Adachi and Maxwell P Henderson. Application of quan-tum annealing to training of deep neural networks. arXiv preprintarXiv:1510.06356 , 2015.[33] Marcello Benedetti, John Realpe-G´omez, Rupak Biswas, and AlejandroPerdomo-Ortiz. Quantum-assisted learning of graphical models witharbitrary pairwise connectivity. arXiv preprint arXiv:1609.02542 , 2016.[34] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. An introduc-tion to quantum machine learning. Contemporary Physics , 56(2):172–185, 2015.[35] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis ofapproximations for maximizing submodular set functions—i.