Euclidean correlations in combinatorial optimization problems: a statistical physics approach
UUNIVERSIT `A DEGLI STUDI DI MILANO
Department of Physics
PhD School in Physics, Astrophysics, and Applied Physics
Cycle XXXIIEUCLIDEAN CORRELATIONS IN COMBINATORIALOPTIMIZATION PROBLEMS: A STATISTICAL PHYSICSAPPROACH
Disciplinary scientific sector: FIS/02
Director of the School:
Prof. Matteo Paris
Supervisor of the Thesis:
Prof. Sergio Caracciolo
PhD Thesis of:
Andrea Di GioacchinoA.Y. 2019-2020 a r X i v : . [ c ond - m a t . d i s - nn ] J a n Acknowledgements
My PhD has been a long journey, full of both beautiful and difficult moments.I would never have been able to arrive here without the help of many people, and I amwriting these lines to express my deepest thanks to each of them. First of all, I wish to thankmy supervisor, Professor Sergio Caracciolo. I have had the luck of enjoying his precious advices,and a fruitful scientific collaboration. And I also am grateful to him for the freedom he alwaysgranted me, for all the time he spent helping me in many situations, and for his lectures onConformal Field theory.In addition to my supervisor, I also had other people guiding me in the stormy waters of thePhD: Professor Luca Guido Molinari, who has been an invaluable mentor in several scientificprojects, as well as a wonderful teacher of Random Matrix theory, and Salvatore Mandr`a, whosupervised my work during my internship at NASA Ames in California and taught me manythings about optimization and quantum computing.In a sense, the people involved with my PhD can be compared with family members, and twoof them have the role of elder brothers: Pietro Rotondo and Marco Gherardi. They have beenalways present for technical discussions about (the most diverse topics of) physics, but also foradvices about schools, conferences, applications for post-docs, and many other important choicesI made during my PhD.At the end of a doctoral course, it is common to have several collaborators. I had the privilegeto have many of them also as friends: Riccardo Capelli, Vittorio Erba, Riccardo Fabbricatore,Enrico Malatesta, Alessandro Montoli, Mauro Pastore, German Sborlini, Federica Simonetto.Thank you for all the scientific interactions, but also for all the time spent having fun together.I also want to thank the QuAIL group at NASA Ames, for their kindness and scientifictraining during my internship, especially Eleanor Rieffel, Davide Venturelli, Gianni Mossi, NormTubman, Jeff Marshall, Eugeniu Plamadeala.In addition to the scientific staff of the University of Milan and the “Istituto Nazionale diFisica Nucleare”, I also want to thank the administrative/management staff, and in particularAndrea Zanzani.Few other very special people helped me during my PhD even though they are not expertsin statistical physics: my family. They have provided me with constant support and love, andgave me the strength to get through the hardest moments.Finally, thank you Greta. This (and many other things) would have been impossible withoutyou. ontents p -spin model in a magnetic field . . . 34 CONTENTS
A Volume and surface of a sphere 101B Calculations for the p-spin spherical model 103
B.1 The replicated partition function . . . . . . . . . . . . . . . . . . . . . . . . . . . 103B.2 1RSB free energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104B.3 Rammal construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105B.4 1RSB with magnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
C Supplemental material to Chapter 3 111
C.1 Order statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111C.2 Proofs for the traveling salesman problems . . . . . . . . . . . . . . . . . . . . . . 112C.2.1 Optimal cycle on the complete bipartite graph . . . . . . . . . . . . . . . 112C.2.2 Optimal cycle on the complete graph: proofs . . . . . . . . . . . . . . . . 114C.2.3 Optimal TSP and 2-factor for p < N even . . . . . . . . . . . . . . 118C.2.4 Second moment of the optimal cost distribution on the complete graph . 122C.3 2-factor problem and the plastic constant . . . . . . . . . . . . . . . . . . . . . . 124C.3.1 The Padovan numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124C.3.2 The recursion on the complete graph . . . . . . . . . . . . . . . . . . . . . 125C.3.3 The plastic constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 D Supplemental material to Chapter 4 127
D.1 Time-to-solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127D.2 Hamming weight example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127D.2.1 Classical annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128D.2.2 Quantum annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129D.3 Failure of our method for an instance of the minor embedding problem . . . . . . 131D.4 Other methods to find the minimum penalty term weight . . . . . . . . . . . . . 132D.5 Proof of the inequality (4.72) for the matching problem . . . . . . . . . . . . . . 133 hapter 1
Why bothering withcombinatorial optimizationproblems?
Combinatorial optimization problems (COPs) arise each time we have a (finite) set of choices,and a well defined manner to assign a “cost” to each of them. Given this very general (as well asrough) definition, it should not be surprising that we encounter many COPs in our everyday life:for example, it happens when we use Google Maps to find the fastest route to our workplace, orto a restaurant. But we deal with COPs also in much more specific situations, ranging from thecreation of safer investment portfolios to the training of neural networks. Despite their ubiquity,COPs are far from being completely understood. The most impressive example of our lack ofknowledge is the so-called “P vs NP” problem, which puzzles theoretical computer scientists andmathematics since 1971, when Levin and Cook discovered that the Boolean satisfiability problemis NP-complete [Coo71].The study of COPs attracted soon the statistical physics community which, in those years, wasbeginning the study of spin glasses and thermodynamics of disordered systems. The connectionbetween COPs and thermodynamics was clear since the work of Kirckpatrick, Gelatt and Vecchi[KGV83], and after that it became even stronger when physicists realized that “random” COPs(RCOPs) display phase-transition like behaviors (the so-called SAT-UNSAT transitions) [KS94].The application of statistical mechanics techniques to COPs flourished after the seminal paperby Mezard and Parisi [MP85], where they applied the so-called replica method to study typicalproperties of the random matching problem. Their results, together with those obtained afterthem, are astonishing and elegant, but they heavily rely on a sort of “mean-field” assumption:the cost of each possible solution of the COP studied is a sum of independent random variables.Let us be more precise with an example: consider the problem of going from the left-bottomcorner of a square city to the opposite one. The possible solutions (or configurations) are thesequences of streets that connect these two corners of the city, and the cost of a possible solutionis the total length of the path. In the random version of the problem, one consider an ensemble of cities, each of them with its pattern of streets, and a distribution of probability on them.In this case one is interested statistical properties of the ensemble such as the average cost ofthe solution, rather than the cost of the minimum-length path for a specific city (instance). Inthis example the mean-field approximation would consist in choosing the ensemble such that thelength of each road is an independent random variable. On the opposite, in the original problem5
CHAPTER 1. WHY BOTHERING WITH COPS? the
Euclidean structure of the problem introduces correlations between the street lengths, whichare completely neglected in the mean-field version of the problem. Euclidean correlations are notthe only possible which are neglected in mean-field problems: using again the examples givenbefore, assets that can be used in a portfolio are correlated (for example, shares of two companiesin the same business area) and images used to train a neural network are typically “structured”,in opposition with the hypothesis of mean-field problems.Most of this manuscript will deal with the introduction of Euclidean correlations in RCOPs.We will see that several RCOPs can be analyzed with a well-understood formalism in one spatialdimension, and this can sometimes be extended, in very non-trivial ways, to two-dimensionalproblems.We will also discuss another route toward solutions of COPs that physicists (together withmathematics and computer scientists) are exploring in this years with intense interest: usingquantum computers to solve hard combinatorial optimization problems. Even though the originalidea has been discussed by Feynman in 1982 [Fey82], many questions are still without an answer.Here we will use Euclidean COPs as workhorse to analyze some of the open questions of thefield.This manuscript is organized as follows: • In Chap. 2 we introduce all the necessary formalism to deal with COPs and RCOPs fromthe statistical mechanics point of view. In particular we start by defining formally what aCOP is, and explaining why the statistical physics framework is a useful point of view tostudy COPs (Secs. 2.1, 2.2, 2.3). We also briefly review the more relevant points (for ourdiscussion) of spin glass theory, using the spherical p-spin problem as an example (Secs. 2.4,2.4.1). Finally, we discuss large deviation theory (again using the spherical p-spin model)as a possible path to go beyond the study of the typical-case complexity for RCOPs. • In Chap. 3 we address the problem of Euclidean correlations in RCOPs. We discuss firstlywhy we tackle problems starting from the 1-dimensional case (Sec. 3.1.2), then we analyzein details several problems where our techniques can be used (Sec. 3.2, 3.3, 3.4). • In Chap. 4 we briefly discuss why quantum computation can be useful to solve COPs(Sec. 4.1), using the famous Grover problem as an example (Sec. 4.1.3). Then we intro-duce two general algorithms of quantum computing which are used to solve COPs, thequantum adiabatic algorithm (Sec. 4.2) and the quantum approximate optimization algo-rithm (Sec. 4.3). In the QAA case, we also analyze the performance of the DWave 2000Qquantum annealer to solve a specific COP problem, and we use the results obtained toaddress one of the current problems for the QAA, the so-called parameter setting problem(Sec. 4.2.3). • Finally, in Chap. 5 we summarize the main results of this work and explore the possibilityof future works to further extend our understanding of COPs with correlations.Throughout the manuscript, we make the effort to relegate the technical details of computationsin the appendix, whenever possible, to lighten the text and to ease the reading. To do that, wehave an appendix for each main chapter where we put the corresponding technical computations. hapter 2
Statistical physics forcombinatorial optimizationproblems
Combinatorial optimization problems (COPs) have been addressed by using methods comingfrom statistical physics almost since their introduction. In this chapter we give a concise intro-duction of both COPs and the statistical physics of disordered systems (spin glass theory), witha focus on the links between these two fields. An important remark is due: both COPs and spinglass theory are deep and well-developed topics, and we do not want (neither would be able)to give a comprehensive review of them. In fact, we will limit ourselves to introduce the basicnotions that we will need here and in the following chapters.
Consider a finite set Ω, that is | Ω | < ∞ , and a cost function C such that C : Ω → R . (2.1)The combinatorial optimization problem defined by Ω and C consists in finding the element σ (cid:63) ∈ Ω such that σ (cid:63) = arg min σ ∈ Ω C ( σ ) . (2.2)We will call the set Ω configuration space , each element of the configuration space will be aconfiguration (of the system). We will call C also Hamiltonian of the system (sometimes we willalso use the label H for it, instead of C ) and C ( σ ) will be the cost or energy of the configuration σ . Notice that we are willingly using a terminology borrowed from the physics (and statisticalmechanics) context, but up to this point this is pure appearance. However, as we will see in thefollowing, this choice has deep root and can lead to extremely useful insights.Let us now consider an example of COP. Suppose you and a friend of yours are invited to abountiful feast. The two of you sit at the table, and then start discussing about who should eatwhat, since each dish is there in a single portion. Therefore you assign a “value” to each dish, andtry to divide all of them in two equally-valued meal. This is the so-called “integer partitioning”problem: given a set { a , . . . , a N } of N integer positive numbers, find whether there is a subset7 CHAPTER 2. STATISTICAL PHYSICS FOR COPS A such that the sum of elements in A is equal to the sum of those not in A , or their differenceis 1, if (cid:80) Ni =1 a i is odd. More precisely, this is the “decision” version of the problem, that is itadmits a yes/no answer. We will see later the importance of decision problems, while we willfocus here on restating the problem as an optimization one: given our set { a , . . . , a N } , find thesubset A which minimizes the cost function C ( A ) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:88) j ∈ A a j − (cid:88) j / ∈ A a j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (2.3)Therefore in this case the configuration of a system is the subset A , and its cost is the “un-balance” between the elements of A and those not belonging to A . Also notice that if one cansolve the optimization problem, then the solution to the decision problem is readily obtained.Each COP has some parameters which fully specify it, which most of the times are inside thecost function. These parameters are, basically, the input of our problem. When the full set ofthese parameters is given, we say that we have an instance of our COP. For example, an instanceof the integer partitioning problem is specified by the set { a , . . . , a N } .If we decide to deal with a COP in general, that is without specifying an instance, we have twochoices: we can start searching for an algorithm to solve our problem for each possible value ofthe input, or try to say something more general about the solutions. The first one is the direction(mostly!) taken by computer scientists (however, we will say something about it later), whilephysicists (mostly!) prefer to analyze the problem from the second point of view. We will followthis second road, but to do that we have to deal with the fact that the solution will dependdrastically on the specific instance of the problem.The way out this thorny situation consists in defining an ensemble of instances and in givingto each of them a certain probability to be selected. Then many interesting quantities canbe computed by averaging over this ensemble, so they do not depend anymore on any specificinstance. For example, let Ω and C be respectively the configuration space and the cost functionof a given COP. An instance is specified by the continuous parameters x , so we will have C = C x and the joint probability p ( x ) over the parameters (and therefore over the instances). A quantitythat we will be interested in is the average cost of the solution of our problem, which is given by C (cid:63) = min σ ∈ Ω C x ( σ ) = (cid:90) dx p ( x ) min σ ∈ Ω C x ( σ ) . (2.4)How do we choose p ( x )? In general, we would like to have an ensemble and a p ( x ) such thatthe averages over the ensemble are representative of the typical case of our COP. That is, wehope that if we define an ensemble of integer partitioning problems, than our findings will beuseful for our banquet problem.This observation brings us to another important point: on one hand, we would like to havesimple ensembles, where we can carry out as much analytic computations as possible; on theother hand, this is typically a oversimplified situation. For example, the standard ensembledefined for integer partitioning is composed of all the possible instances made of N integers ofthe set {
0, 1, . . . , 2 b − } (for a certain parameter b ), and each of them has the same probability.In practice, this is done by choosing at random N integers from our possibility set, each time weneed an instance.We will say something about what can be learned from this ensemble of integer partitioning, butone can immediately see that our banquet problem is considerably different: when choosing thevalue of each dish, you will probably have a lot of correlations. For example, you could decide togive to an ingredient, say sea bass, a high value and therefore all dishes containing sea bass willhave a high, correlated value. And you could (and probably would) do the same with many other .2. WHY STATISTICAL PHYSICS? structure in our instance, which is often difficult to capturewith simpler ensembles where each parameter of the problem is uncorrelated with the others.In the following, we will deal a lot with a specific kind of structure, that is the one inducedby Euclidean correlations. The paradigms of statistical physics, and in particular those of spin glass theory, are particularlysuited to deal with RCOPs. There are three main reasons for this fact, that we will now discuss.
A COP is defined by its configuration space Ω and its cost function C , and we are interested infinding its minimum. We introduce a fictitious temperature T and its inverse β , and define thepartition function of our problem as Z = (cid:88) σ ∈ Ω e − βC ( σ ) , (2.5)where the name, partition function, arises from the fact that we are interpreting the cost of aconfiguration the energy of a (statistical) physics system. When the temperature is sent to zero,only the solutions of the COP, which minimizes C ( σ ), are relevant in Eq. (2.5). Therefore, inthis sense, a COP can be seen as the zero-temperature limit of a statistical physics problem. Wecan compute many quantities starting from this point of view, but we will mainly be interestedin the following: F ( β ) = − β log Z, (2.6)since when we send β → ∞ this quantity is the cost of the solution of our COP. A usefulconsequence of the parallelism between low-temperature thermodynamics and COP that we justdescribed is that we can use the well-developed techniques coming from the first field to addressproblems in the second. The first successful example of this program is the celebrate simulatedannealing algorithm, introduced by Kirkpatrick, Gelatt and Vecchi in [KGV83].Of course, there is no way we are able to compute Z and F for a given instance of a realistic(not over simplified) COP since both of these quantities depend on the parameters which defineour instance. Here the idea of RCOPs comes in our help, and we can connect our formalism withthat of disordered systems. We define an average, labeled by an overline, exactly as in Eq. (2.4)and F av ( β ) = F ( β ) = − β log Z. (2.7)The computation of this quantity is at the heart of the so-called spin glass theory (see, forexample, the books [MPV87, Dot05, Nis01]) and several methods have been devised to deal withthis kind of problems. Later we will review in detail one of these methods, the celebrated replicamethod.Before moving to the next section, we want to add an important remark: the average donein Eq. (2.7) is called “with quenched disorder” and it very different from computing the averageof the partition function first, Z , and then taking its logarithm (which is called “with annealed disorder”). In general, the difference is that in the annealed case the disorder degrees of freedomare considered on the same footing of the “configurational” degrees of freedom of our systems,while in the quenched case the thermodynamic degrees of freedom are only the configurational0 CHAPTER 2. STATISTICAL PHYSICS FOR COPS ones, and the average over the disorder is done after the computation of the partition function.This distinction is very sharp when we take the COP/RCOP point of view: computing Eq. (2.7)(quenched case) corresponds to take many instances of our COP, computing each time the costof the solution, and then take the average of that. On the other hand, when we compute theannealed version of Eq. (2.7) (that is, the one with log Z instead of log Z ) we are solving onesingle instance of a COP, which in general will be different from the one we started with becauseof our average operation. The connection between statistical physics and RCOPs goes beyond the simple fact that we canuse methods developed for the former to deal with the latter. This became clear after a firstsequence of works [Goe90, CR92, KS94], where it has been discovered that a certain COP, the k -SAT problem, when promoted to its random version, exhibits a behavior which is stronglyreminiscent of a statistical-mechanics phase transition. In the k -SAT problem, the input is asequence of M clauses, in each of which k variables are connected by the logical operation OR( ∨ ). There are N different variables, which can appear inside the M clauses also in negatedform. For example x ∨ y ∨ z is a possible clause of an instance of 3-SAT. The problem consistsin finding an assignment to each variable such that all the clauses return TRUE , or to say thatsuch an assignment does not exist.At the beginning of the 90s it has been discovered that, given the ratio α = M/N and giving thesame probability to each instance of k -SAT with parameter α , when α < α c the probability offinding an instance that can be solved goes to zero when N → ∞ , and if α > α c this probabilitygoes to one when N → ∞ . This is the so-called SAT-UNSAT transition for the k -SAT problem,and α c is a quantity which depends on k .Actually, k -SAT problems with k ≥ α from zero to α c .Other SAT-UNSAT transitions have been found in many other problems quite different fromthe k -SAT, for example the Traveling Salesman Problem (TSP) [GW96b], which we will discusslater, and the familiar integer partitioning problem (IPP) [Mer98]. Most of the times the tran-sition is found by extensive numerical experiments, while for the IPP the critical point can becomputed analytically. To have a feeling of why this transition happens, we will present herean intuitive argument which allows to obtain the correct transition point. Following Gent andWalsh [GW96a], we consider the IPP problem where n values are taken from the set { , . . . , B } .Giving a choice of a subset A , we compute C as in Eq. (2.3) and we notice that C ≤ nB , sowe can write it as a sequence of about log n + log B bits. Remember that we want to takethe limit n → ∞ , so there is no need to be very precise with the number of bits since we are inany case neglecting sub-dominant terms. Now, the problem has a solution if C = 0 or C = 1,therefore all the bits of C but the last have to be 0 for the problem to admit a solution. Thiscorresponds to (cid:39) log n + log B constraints on the choice of A . Let us suppose now that, for arandom instance of the problem, each given choice of A has probability 1 / n .Given that there are 2 n different partitions of n objects, the expected number of partitions whichrespect all constraints is E [ N ] = 2 n − (log n +log B ) . (2.8) .2. WHY STATISTICAL PHYSICS? / b p ( n / b ) n = = Figure 2.1: Numerical results for the probability of an instance of the integer partition problemwith n integers in the set { , . . . , b } to have a solution, as a function of the ratio n/b . Each pointis obtained by randomly extracting 100 instances of the problem, solving them and computingthe number of instances with a solution. As we can see, at n/b (cid:39) . b = n − log ( n ) / n to account for the finite size of the system.The critical point is given by E [ N ] = 1, solog B = n − log n, (2.9)that is B (cid:39) n to the first order. Actually, the approximation of independent constraints used iscorrect up to the second order, but the number of bits in C , that is the number of constraints, isoverestimated by this simple argument. Indeed we have used the maximum C , which is a crudeapproximation of the typical one.A more formal treatment giving the same result at the first order and the correct one at thesecond order can be found in the beautiful book of Moore and Mertens [MM11] (chapter 14)or, in a language more familiar to the statistical physics community, in [Mer01]. In Fig. 2.1 wereport the result of a numerical experiment showing the SAT-UNSAT phase transition of theIPP. Another point in common of COPs and RCOPs with statistical physics, is that we can oftenwrite COPs cost functions as Hamiltonians in which the thermodynamical degrees of freedom arespin variables. For many COPs studied with statistical physics techniques this has been actuallythe first step. In this case a configuration of the system is given by specifying the state of all the2
CHAPTER 2. STATISTICAL PHYSICS FOR COPS spins.Although the re-writing of the COP as a spin problem can be very useful, there is not a generalprocedure and in many problems with constraints (as we will discuss later) there is often a certainfreedom in choosing the spin system. Indeed the minimum request the spin system has to satisfyis that given its ground state we can obtain the solution of the original COP.To be more concrete, let us discuss a spin system associated to the familiar IPP. Given a set { a , . . . , a n } , we can specify a partition A by assigning a spin variable σ i to each a i , such that a i = 1 (or −
1) if a i belongs (or not) to A . As a function of these new variables, the cost functiongiven in Eq. (2.3) can be written as C = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 a i σ i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (2.10)We can get rid of the absolute value by defining the Hamiltonian H = C = N (cid:88) i,j =1 J ij σ i σ j , (2.11)where J ij = a i a j , whose ground states correspond to the solutions of the original instance ofIPP. Starting from this Hamiltonian, the problem has been analyzed in [FF98], where the anti-ferromagnetic and random nature of the couplings J ij makes the thermodynamics non-trivial.As a final remark, notice that there are many other Hamiltonians which are good spin modelsfor our IPP, for example: H = C = (cid:88) i,j,k,(cid:96) J i,j,k,(cid:96) σ i σ j σ k σ (cid:96) , (2.12)with J i,j,k,(cid:96) = a i a j a k a (cid:96) . The choice of one model rather than another is driven by the search forthe simplest possible one which is well-suited to the techniques that we want to use. Consider a generic problem, not necessarily a combinatorial optimization one: we have an inputand, according to certain rules which specify the problem, we want to get the output, that is thesolution to the problem. Is a certain problem difficult or easy? Can we quantify this difficultyand say that some problems are harder than others? These are deep questions which are notcompletely understood, and are the holy grail (in their formalized version) of a branch of sciencewhich involves computer science, mathematics and physics and it is called complexity theory .In this section we want to briefly introduce some concepts from complexity theory that willbe relevant in the following and elaborate on the differences between the worst-case analysis ofa COP and a typical-case analysis, which is the one usually carried out by means of disorderedsystems techniques.
Let us focus on a specific kind of problems, those called decision problems. In this case, we havea problem and an input and the output has to be a yes/no answer. The paradigmatic exampleis the k-SAT problem introduced in Sec. 2.2.2, and another example is the definition of theinteger partition problem that we gave in Sec. 2.1. A first attempt to measure the “hardness”or complexity of a problem could be done by measuring the number of operations needed to .3. COMPLEXITY THEORY AND TYPICAL-CASE COMPLEXITY • the complexity of a problem depends on the algorithm used, while we would like to char-acterize the problem itself; • the complexity depends on the specific instance.Both problems are solved by introducing the concept of complexity classes. Before defining them,let us address a tricky point: in general the number of operations needed to solve a problem willincrease with the input size. For example, if we have an algorithm to solve IPP, whateveralgorithm it is, we expect that we will need to wait longer for the solution if we have a set of N = 100 integers with respect to the case with N = 10.However, the exact determination of the size of an instance is a subtle point, since there is acertain freedom in deciding it. For our purposes, we will always deal with problems that admita re-writing in terms of spin systems, so we can safely define the number of spins as the size ofthe instance.Now we are ready to introduce complexity classes. A problem is said to be “nondeterministicpolynomial” (NP) or in the NP complexity class if, given a configuration of an instance of size N of the problem, the time needed to check whether this configuration is a solution of the problemscales as O ( N α ) for N → ∞ , where α ∈ R does not depend on the configuration and on theinstance. O is the “Landau big-O” notation, that is f ( x ) = O ( g ( x )) if there are c > x > f ( x ) ≤ cg ( x ) for all x > x .For example, IPP is in NP: given a partition A of N objects, to check that this is or not thesolution it is sufficient to compute the sum of | A | objects, those of N − | A | objects and a singleoperations to compare this two quantities, so O ( N ) operations.Many COPs are not so easy to place in the class NP: consider the optimization variant of IPP,that is the problem of finding the minimum of the cost function Eq. (2.3), even if this is not 0or 1. Now given a partition A we can again compute its cost in O ( N ) time, but this is not acertificate that this partition is or is not the one with minimum cost. To check that, we wouldneed to solve the whole problem, so the complexity of checking whether a given partition is thesolution is the same of solving the original problem.Another very important class is the “polynomial” (P) complexity class. A problem is said tobe in P if there exists an algorithm which is guaranteed to solve each instance of size N in a timewhich scales as O ( N α ) for N → ∞ , with α ∈ R . This class is defined such that the two issuesin our definition of complexity are now solved: for a problem to be in P it is now sufficient that one polynomial-time algorithm exists, and, for a given algorithm, the complexity is computedon the worst-case instance, that is the one where our algorithm needs more operations to reachthe solution.In 1971, Cook [Coo71] discovered a special property of the SAT problem, a relaxation ofk-SAT where the clauses are allowed to be of any size: each other problem in NP can be mappedto SAT in polynomial time, in such a way that if we know are able to solve SAT, we can obtainthe solution to each other problem in NP with a polynomial overhead. In particular, if SATturned out to be in P, each other NP problem would be in P as well. After the work of Cook,Karp [Kar72] discovered several other problems with this property (among them, our friend thedecision version of IPP) and many others have been found since then. This class of problems iscalled NP-complete, and these problems are sometimes referred to “the hardest problem in NP”.The question whether an algorithm which solves one of these in polynomial time exists or notis still open, it is called “P vs NP” problem and is one of the biggest theoretical challenge fortodays computer scientists and mathematics.4 CHAPTER 2. STATISTICAL PHYSICS FOR COPS
Another question remains open: if the the decision version of IPP is NP-complete, to whatclass the optimization version of IPP belongs to? The answer is the so-called “NP-hard” com-plexity class, which contains all those problems which are almost as difficult as NP-completeproblems. This means that if we were able to solve the optimization version of IPP in polyno-mial time, we would be able to solve also the decision IPP in polynomial time, and then all theNP problems in polynomial time.Fig. 2.2 is a cartoon representing the relations among the various complexity classes discussedhere.There are many other complexity classes, and various refinement of the ideas presented here.For example, we can incorporate in our class definitions the scaling with the instance size of theusage of memory (in addition to the time of computation). These topics are treated in detail inseveral very good textbooks, for example [Pap03, JG79].
As we have already discussed, a possible point of view consists in working out some typical properties of a COP and this can be done through the definition of an ensemble of instances anda probability weight of each instance. This is the standard program carried out by physicists,since spin glass techniques are particularly suited for it.But has this something to do with the complexity-theoretical perspective described before? Aswe have seen, for a given algorithm its running time is computed on the worst-case scenario,that is the hardest instance for a that particular algorithm. However, it can be that this difficultinstance has very low probability in an ensemble, so it gives a very little contribution to whatevertypical quantity we are computing.This reasoning could lead us to abandon the idea of describing COPs through their randomformulations, but there are also some positive facts about adopting this perspective. For example,the problems that we usually observe in practical application are far from being those worst-caseinstances for our algorithm (and, even if they are, we can in principle use a different algorithmfor which the hardest instances are different from the typical instances we encounter in practicalapplications).This is actually related to a well-known phenomenon, called self-averaging property, which takesplace in many physical systems. In practice, consider a random variable, for example the averagecost of the solution of a given RCOP, E . This quantity will depend on the instance size N andit is said to be self-averaging if the limit for N → ∞ of it is not a random variable anymore. Inother words, we have E = lim N →∞ E N (2.13)and lim N →∞ E N − E = 0 . (2.14)As we will see in the following, this happens for some problems and does not happen for others,and it is an indicator which is telling us whether a random large instance of the problem is wellcharacterized by the typical one.Notice that even if each quantity of interest of the RCOP (for example, cost of the solution,number of solutions and others) is self-averaging, we can still build rare instances, which in ourensemble will have a probability which is going to zero with their size, in which this quantitiesare completely different from the typical case. Actually, there is a branch of physics which dealswith this rare instances: it is called large deviation theory and we will briefly discuss it at theend of this chapter, while in most of this work the focus will be on typical properties of RCOPs. .3. COMPLEXITY THEORY AND TYPICAL-CASE COMPLEXITY TSPGraph Isomorphism
NP-hard
SAT (opt)IPP (opt) P SAT IPPHamiltonianCycleFactoring NP MatchingEulerianCycleMST 2-SAT
Figure 2.2: A cartoon highlighting the relations between the complexity classes P, NP, NP-complete and NP-hard, under the hypothesis that P (cid:54) =NP. Inside each class, we wrote the namesof some COPs belonging to that class. Problems in the NP-complete class are written in redand are positioned at the border of the NP class, to indicate that if one of the turned outto be in P, then all the NP class would be in P. Notice that the problems such that SATor IPP are understood in their decision version, and their optimization version is present inthe class NP-hard (we use the word “opt” to indicate optimization version). Finally, a listof the acronyms used: MST → Minimum Spanning Tree, SAT → Satisfiability, IPP → IntegerPartition Problem, TSP → Traveling Salesman Problem.6
CHAPTER 2. STATISTICAL PHYSICS FOR COPS
As a final remark, there is another point that we want to mention regarding the worst-caseversus typical-case problems. Using the typical case approach we can locate phase transitions inRCOPs, as the SAT-UNSAT transition that we discussed in Sec. 2.2.2. It turns out that mostof the times the presence of “intrinsic hard instances” is related to the presence of these phasetransitions: for example, in the 3-SAT problems we know an algorithm (described in [CO09])which is guaranteed to find the solution in polynomial time as long as the parameter α definedin Sec. 2.2.2 is such that α < α r , where α r is a critical point where the so-called “rigidity” phasetransition takes place (see [ACO08]). As we said several times, the paradigm of statistical mechanics most suited to the application toRCOPs is spin glass theory. The two main ingredients which distinguish spin glasses from the“standard” statistical mechanics spin systems are quenched disorder and frustration , two thingsthat we already met in our general discussion about RCOPs. • Quenched disorder : the Hamiltonian of our system has some random variables in it, andthe probability density of these variables is explicitly given. Moreover, the word quenchedmeans that the thermodynamics of the system has to be considered after that a specificrealization of these random variables is chosen. We have already met that in the definitionof RCOPs. • Frustration : consider a spin Hamiltonian (this discussion can be trivially extended to nonbinary-spin as well) and a random configuration of the degrees of freedom. Now randomlyselect a spin (or a set of k spin with fixed k ) and flip it (or all of them) only if this lowersthe total energy of the system, and keep doing that until a minimum of energy is reached,choosing each time a random spin. Repeat the whole experiment many times: start froma random configuration and flip spins at random until a local energy minimum is reached.If it happens that the final state is not always the same or the final states are not relatedby a symmetry of the initial Hamiltonian, the system is said to be frustrated. For manysystems it happens that most of these local minima have different energy from the groundstate. In these cases, a frustrated system is such that we cannot find its ground state bylocal minimization of the energy. For example, is easy to see that (most of the instancesof) optimization IPP is frustrated, and this situation applies.The spin glass theory exploration begun with the so-called Edwards-Anderson model [EA75],whose Hamiltonian describes spins arranged in a 2-dimensional square lattice, and it is H EA = (cid:88) (cid:104) i,j (cid:105) J ij σ i σ j , (2.15)where σ i is the spin variable number i and the brackets mean that the sum has to be performedon first neighbors. The J ij are independent and identically distributed (IID) random variables,and two choices that are often used as probability density are p ( J ij ) = 1 √ πJ e − ( Jij − J J (2.16) there is a more simple but less generic definition of frustration, see for example [Dot05] (chapter 1), basedon frustrated plaquettes , that is closed chains of interactions among spins whose product is negative; however, formany RCOPs this definition is not immediately applicable, so we prefer to stick with that given here. .4. SPIN GLASS THEORY COMES INTO PLAY p ( J ij ) = 12 δ ( J ij −
1) + 12 δ ( J ij + 1) (2.17)(bimodal disorder).As for the ferromagnetic 2-dimensional Ising model, the analytical solution, which basicallycoincide with the calculation of the partition function, could be difficult (or even impossible) tofind, so a good starting point is to consider a mean-field approximation of the problem, which inthis case takes the name “Sherrington-Kirkpatrick” model [SK75]: H SK = (cid:88) i 2, but the p = 2 case is qualitatively different from theothers, see for example [KTJ76]), the spin variables are promoted to continuous variable definedon the real axis and subject to the “spherical constraint” N (cid:88) i =1 σ i = N. (2.20)The interaction strengths are IID random variables and their probability density is p ( J ) = N p − √ p ! π e − J Np − p ! . (2.21)8 CHAPTER 2. STATISTICAL PHYSICS FOR COPS Notice that the choice of the power of N in p ( J ) is fixed by the request of extensivity of the annealed free energy, indeed Z = (cid:90) ∞−∞ (cid:89) i < ···
To start our computation, we will introduce the replica trick in its standard form, that islog Z = lim n → Z n − n . (2.27)This exact identity is not a trick, so why the name? The real trick is in our way to use it: we willconsider integer values for n , so that we can compute Z n , which is simply the average over thedisorder of the partition function of a system made by n non-interacting replicas of the original .4. SPIN GLASS THEORY COMES INTO PLAY Z n for integer n we will try to extend analytically ourfunction to n ∈ R to obtain the n → replica trick, and we will call n “replica index”, or “number ofreplicas”.We will see that many of the manipulations done to recover a meaningful analytic extension toreal values of n will be impossible to justify formally, but the whole strategy, sometimes called“replica method”, has been proved to be exact by many numerical simulations and, for someproblems, also by analytical and rigorous arguments (see [GT02, Gue03]).The computation of Z n is carried out in Appendix B, the result is Z n = e nN log(2 π ) / (cid:90) DQ Dλ e − NS ( Q,λ ) , (2.28)where S ( Q, λ ) = − β (cid:88) a,b Q pab − (cid:88) a,b λ ab Q ab + 12 log det( λ ) , (2.29)the Q and λ are n × n matrices, Q aa = 1 for all a , the integral with measure DQ is done overthe symmetric real matrices with 1 on the diagonal, the integral with measure Dλ is done overall symmetric real matrices.We integrate over the λ matrices by exploiting the saddle-point method , so at the saddle point λ is such that ∂∂λ ab S ( Q, λ ) = 0 . (2.30)Therefore, exploiting the formula valid for a generic matrix M ∂∂M ab log det M = ( M − ) ba , (2.31)we obtain the equation for λ Q ab = ( λ − ) ab . (2.32)Putting that back into Eq. (2.29), we obtain S ( Q ) = − β (cid:88) a,b Q pab − 12 log det Q + n . (2.33)The term n/ − N T S ( ∞ ), the same thatwe met already in the annealed calculation Eq. (2.25). The last step is again a saddle-pointintegration on the Q variables, and we obtain the free energy density, f = F/N , f = lim n → − β n (cid:88) a,b Q pab − nβ log det Q + T S ( ∞ ) , (2.34)where the matrix Q has Q aa = 1, is symmetric and the off-diagonal entries are given by thesaddle point equations (we use again Eq. (2.31)) β p Q p − ab + ( Q − ) ab = 0 . (2.35) actually the integral on λ ab should be done over the imaginary line (with the methods discussed, for example,in Appendix B.1 of [Pel11]); however, at the end of the day this is perfectly equivalent to the usual saddle point,as pointed out in [CS92]. CHAPTER 2. STATISTICAL PHYSICS FOR COPS Replica-symmetric ansatz and its failure (for this problem!) At this point of the discussion, the introduction of n replicas of our system is simply a technicaltrick, a formal manipulation. Therefore it seems reasonable to impose symmetry among replicasto deal with Eq. (2.35). In other words, we consider the following ansatz for the matrix Q : Q ab = δ ab + q (1 − δ ab ) , (2.36)that is Q has 1 on the diagonal entries and q on the off-diagonal entries. The inversion of amatrix with this form is ( Q − ) ab = 11 − q δ ab − q (1 − q )(1 + ( n − q ) (2.37)and from Eq. (2.35) we obtain the equation for q when n → β p q p − − q (1 − q ) = 0 . (2.38)The first observation is that q = 0 is a solution. In this case one obtains for the free energydensity f RS = − β/ T S ( ∞ ) (2.39)where the subscript RS stands, here and in the following, for “Replica Symmetric”. This isexactly the same result we obtained with the annealed computation, and indeed is the correctresult for the high-temperature regime. This does not happen by chance: if the overlap matrix Q has null off-diagonal entries, the whole replica-trick computation coincide with the annealed one,as can be checked confronting Eq. (B.4) and Eq. (2.22). However, the annealed calculation andthe RS ansatz differ when the temperature is decreased: for T < T c , Eq. (2.38) develop anothersolution, with q (cid:54) = 0. Unfortunately, this solution is not stable : one can compute the Hessian(with the second derivatives of Eq. (2.33) or directly of Eq. (2.34)) in the saddle point and checkthat the eigenvalues have different signs . This stability problem has been first noticed for theSherrington-Kirkpatrick model in [dAT78], and today it is well known that to go beyond thisimpasse, we need to give up our RS ansatz. Replicas and pure states The conceptual error that we made in the previous part of our calculation is to think aboutreplicas as purely “abstract” mathematical objects that we exploited to ease our computation.This idea brought us to the RS ansatz, which turned out to be wrong, since it gives a unstablesaddle point under a certain T = T c .Before trying to modify our ansatz, let us introduce some useful concepts for the descriptionof the physics of spin glasses. The first one is the idea of pure state . A pure state can bedefined as a part of the configuration space such that the connected correlation functions decayto zero at large distances. A standard example of pure states are the two ferromagnetic phasesof a Ising model in more than 2 dimension, below the critical temperature: one with positivemagnetization, (cid:104) σ (cid:105) + = m > 0, the other with negative magnetization (cid:104) σ (cid:105) − = − m < 0. In thiscase, the Gibbs measure splits in two components with the same statistical weight (due to thesymmetry of the model), so that we have for the thermodynamical average (cid:104)·(cid:105)(cid:104)·(cid:105) = 12 (cid:104)·(cid:105) − + 12 (cid:104)·(cid:105) + . (2.40) for a saddle point to be stable, in general the eigenvalues have to be all positive, so that the matrix is positive-definite and we are actually in a minimum; however, in this case, for (actually quite nebulous) reasons connectedto the limit of vanishing number of replicas, the saddle point would be stable if all the eigenvalues were negative. .4. SPIN GLASS THEORY COMES INTO PLAY (cid:104) σ (cid:105) = 0, and for the connected two-point correlation function (cid:104) σ i σ j (cid:105) ∼ (cid:104) σ (cid:105) − + 12 (cid:104) σ (cid:105) = m (cid:54) = 0 , (2.41)where we used that the (cid:104)·(cid:105) − and (cid:104)·(cid:105) + are averages done inside the two pure states.It can happen that there are more than 2 pure states, and in this case we have for thethermodynamical average (cid:104) A (cid:105) = 1 Z (cid:88) σ e − βH ( σ ) A = 1 Z (cid:88) α (cid:88) σ ∈ α e − βH ( σ ) A = (cid:88) α w α Z α (cid:88) σ ∈ α e − βH ( σ ) A = (cid:88) α w α (cid:104) A (cid:105) α (2.42)where the sum over α runs over all the pure states, Z α = (cid:88) σ ∈ α e − βH ( σ ) (2.43)and w α = Z α Z . (2.44)Now, we go back to our p-spin spherical model and consider the quantity q αβ = 1 N (cid:88) i (cid:104) σ i (cid:105) β (cid:104) σ i (cid:105) α (2.45)where α and β are indexes which label pure states, and the angle brackets mean thermodynamicalaverage (possibly, as in this case, done inside a pure state). This quantity is the overlap betweenpure states, and depends on the specific instance. Now, given the statistical weights of the purestates defined as in Eq. (2.44), we introduce the probability P J ( q ) that two pure states, chosenaccording to their statistical weight, have overlap q and is P J ( q ) = (cid:88) α,β w α w β δ ( q − q αβ ) . (2.46)The index J simply means that this quantity is instance dependent, so we average on the disorderto obtain P ( q ) = P J ( q ).Now, one can prove that [MPV87] q ( k ) = 1 N k (cid:88) i ,...,i k (cid:104) σ i · · · σ i k (cid:105) = (cid:90) dq P ( q ) q k . (2.47)These quantities can be computed also exploiting the replica method. Consider q (1) , q (1) = 1 N (cid:88) i (cid:104) σ i (cid:105) . (2.48)We can insert the identity 1 = lim n → Z n and write q (1) = 1 N (cid:88) i lim n → (cid:104) σ i (cid:105) Z nJ , (2.49)2 CHAPTER 2. STATISTICAL PHYSICS FOR COPS where Z J is the partition function at fixed disorder. Now, in the spirit of the replica trick, weconsider n integer. We have q (1) = lim n → N (cid:88) i Z nJ Z J (cid:90) Dσ a σ ai e − βH [ σ ai ] Z J (cid:90) Dσ b σ bi e − βH [ σ bi ] = lim n → N (cid:88) i (cid:90) (cid:32) n (cid:89) c =1 Dσ c (cid:33) σ ai σ bi e − β (cid:80) nc =1 H [ σ ci ] , (2.50)where we considered Dσ a = (cid:81) i σ ai δ ( N − (cid:80) i ( σ ai ) ). Following the same step used to evaluatethe free energy, we obtain q (1) = Q ( SP ) ab , (2.51)where we have necessarily a (cid:54) = b because of the steps done in Eq. (2.50) and SP labels the valueof the quantity Q computed on the (correct) saddle point. Notice that Eq. (2.51) makes senseonly if Q ab does not depend on the choice of the replica a and b . This would have been trueif the RS ansatz had been correct. Unfortunately, this is not the case. Therefore, as discussedin [Par83, DY83], we need to average over the contribution of all the (different) pairs of replicasand we finally obtain q (1) = lim n → n ( n − (cid:88) a
Consider again our RS ansatz: because of the discussion on Q , we have seen that there is acorrespondence between the values inside the matrix Q and the (average) properties of the freeenergy landscape. In particular, the presence of a single variational parameter can be interpreted .4. SPIN GLASS THEORY COMES INTO PLAY Q aa = 1, if we choose two different configurations in the pure state we have that their average(on the disorder and on the Gibbs measure inside the pure state) overlap is q . Since this pictureturned out to be wrong (the corresponding saddle point is unstable), we need to assume thepresence of more than one pure state.However, the simplest possible ansatz in this direction if far from obvious, and it required a deepintuition pointed out for the first time by Parisi [Par79a]: we consider that there are many purestates of “size” m , and two possible values of the overlap between configurations taken fromthem, that is q if the two configurations belong to the same pure state, q if they belong to twodifferent pure states. Notice that this interpretation implies that q ≥ q . This is called one-stepreplica-symmetry breaking (1RSB) ansatz and the corresponding matrix is Q = (1 − q ) I + ( q − q ) E + q C , (2.55)where I is the identity matrix, E is a block diagonal matrix, where each block is a m × m blockwith all entries equal to 1 and C is a matrix with constant entries equal to 1.By using this form of Q in Eq. (2.34), we find − βf = β m − q p − mq p ) + m − m log(1 − q )++ 1 m log ( m ( q − q ) + 1 − q ) + q m ( q − q ) + 1 − q − S ( ∞ ) . (2.56)The details of this computation are given in Appendix B. The parameters are such that f isminimum, so they can be found by extremizing Eq. (2.56). Notice that the 1RSB ansatz includesthe RS one, since taking m = 1 or q = q gives back the RS free energy density. What we havedone, in other words, is to enlarge our ansatz to search for new, stable, saddle points, in a waysuggested by the underlying physical interpretation.We have that the equation ∂∂q f = 0 implies q = 0 to have a solution which is differentfrom the unstable RS under the critical temperature. The other two equations are:( m − (cid:18) β pq p − − q (1 − q )(1 + ( m − q ) (cid:19) = 0 (2.57)and β q p + 1 m log (cid:18) − q m − q (cid:19) + q m (1 + ( m − q ) = 0 . (2.58)The m = 1 solution of the first equation makes the 1RSB ansatz to coincide with the RS one, andis the only solution for T > T c . For T < T c (notice that this critical temperature is different fromthe one where the unstable replica-symmetric solution appears, see Fig. 2.3), another solutionwith m (cid:54) = 1 appears, and actually is the one which gives the most relevant and stable saddlepoint. A plot of the situation is given in Fig. 2.3.Therefore the system at the critical temperature T c has a phase transition between the param-agnetic phase and the so called “spin glass” phase, where the order parameter q (1) = (1 − m ) q ( q (1) is defined in Eq. (2.48), and because of Eq. (2.52), m and q are the values of the variationalparameters at the saddle point) starts to be different from 0.4 CHAPTER 2. STATISTICAL PHYSICS FOR COPS β - β f ( β ) ParamagneticRS1RSB Figure 2.3: Numerical evaluation of the free energy density of the p -spin model with p = 3, withthe various ans¨atze . Notice that the function plotted is − βf ( β ) (without the constant term S ( ∞ ) /β ) at the saddle points obtained by using the various ans¨atze described in Sec. 2.4.1: theblue curve is given by the annealed computation, the orange curve is given by the RS ansatzand the red curve is given by the 1RSB ans¨atze. Since this function appear in a saddle-pointintegration (Eq. (2.28)), the correct one is always the smallest. As we can see, for β < β c ≈ . β > β c the 1RSB solution becomes the most relevant stable saddle point, while at a smallertemperature ( β ≈ . 2) the replica symmetric solution appear, but this is an unstable and notrelevant saddle point. .5. LARGE DEVIATIONS Spin glass and optimization problems What we learned with the p -spin spherical model is that when we deal with disorder and frustra-tion, it can happen that the free energy landscape breaks into a plethora of pure states, whichare taken into account via a RSB ansatz (we write RSB and not 1RSB because for some othermodels, for example the Sherrington-Kirkpatrick one, a more sophisticated ansatz, called fullreplica symmetry breaking , is needed).The presence of pure states is related to metastable states, that is groups of configurations sep-arated by free energy barriers which become infinitely high in the thermodynamical limit. Inturn, the presence of such metastable states results in the so-called “ergodicity breaking”. In-tuitively, this means that if the system is in a given configuration in a metastable state, evenin the presence of thermal fluctuations (up to a certain temperature) it will stay in the samemetastable state “forever”, even if there are other regions of the configuration space with thesame (or lower) free energy (for a more precise definition, see [VCC + . On the opposite, it can happen that a NP-hard problem can be solved via the RS ansatz.This could be related to the fact that all the discussions about the energy landscape that wehave done here are in fact about the typical situation, while a problem is NP-hard even if only one instance is hard (for each known algorithm). In other words, consider a NP-hard COP.To study the thermodynamics of the corresponding disordered system, as already discussed,we need to introduce an ensemble of instances and a probability measure on it, obtaining aRCOP. Now, it can be that the “hard” instances belong to the ensemble but have zero weightin the thermodynamical limit for a certain choice of probability measure. Therefore also if thespeculated connection between RSB and NP-hardness is correct, we would not see RSB in thethermodynamics of a problem unless we change in a suitable way our probability measure. The standard approach of spin glass theory regards only the average over the disorder (or some-times, also the variance) of some quantities, such as we have seen in Sec. 2.4.1 with the p-spinspherical model free energy. On the opposite, the standard perspective of complexity theory isbased on the idea of worst-case scenario.A possible way to reduce the gap between these two fields is the large deviation theory. Basically,as we will see in a minute, large deviation theory (LDT) deals with the non-typical properties ofrandom variables which depends on many other random variables. We will now introduce briefly at first sight, the XORSAT problem (a SAT where the clauses use the logic operation XOR instead of OR)could seem a counterexample: it is in the P complexity class, but shows 1RSB when the thermodynamics isstudied. However, the tractable problem consists actually in answering the question “does this system admitsolutions?”, while the optimization problems, “what is the configuration that minimizes the number of FALSEclauses” is NP-hard. Clearly, the thermodynamic can only say something about the optimization problem, or thegeneralized decision problem where we ask (for any given n ): “does this system admit a configuration which hasup to n FALSE clauses?”. CHAPTER 2. STATISTICAL PHYSICS FOR COPS the basic concepts of LDT, while for a more formal and comprehensive discussion we suggest toread one of the many good books [Ell07, VCC + 14] or the beautiful review [Tou09]. Large deviation principle We introduce now the large deviation principle (LDP). Consider a random variable A N , whichdepends on an integer N . Let p A N ( a ) be the probability density of A N , such that (cid:82) B p A N ( a ) da = P ( A N ∈ B ) is the probability that A N assumes a value in the set B . We say that for A N a LDPholds if the limit lim N →∞ − N log( p A N ( a )) (2.59)exists, and in that case we introduce the rate function of A N , I , aslim N →∞ − N log( p A N ( a )) = I ( a ) . (2.60)In other words, in a less precise but more transparent way we can write p A N ( a ) (cid:39) e − NI ( a ) , (2.61)where the meaning of “ (cid:39) ” is given by Eq. (2.60). Sometimes, as we will see, the situations where I = ∞ or I = 0 in an interval are of particular interest. In these cases we say, respectively, that p A N ( a ) decays faster than exponentially in N (these are the so-called very large deviations) orthat it decays slower than exponentially. LDT essentially consists in taking a random variableof interest and trying to understand whether a LDP holds for it, and what is its rate function. Recovering the law of large numbers and the central limit theorem A first comment on the LDP is that it encompasses both the law of large numbers and the centrallimit theorem. Indeed, consider a set of N IID random variables x , . . . , x N with finite mean (cid:104) x i (cid:105) = x and variance (cid:104) x (cid:105) − x = σ . Their empirical average is A N = 1 N N (cid:88) i =1 x i (2.62)and the law of large numbers guarantees thatlim N →∞ P ( | A N − x | > (cid:15) ) = 0 (2.63)for each (cid:15) > A N inside an interval [ a, b ]:lim N →∞ P ( √ N ( A N − x ) ∈ [ a, b ]) = 1 √ πσ (cid:90) ba dz e − z σ , (2.64)The analogous for the probability density islim N →∞ − N log( p A N ( a )) = ( a − x ) σ . (2.65) all this can be extend to non-IID random variables, provided that they are not too much correlated, but wewill use IID random variables to keep things as simple as possible. .5. LARGE DEVIATIONS A N . We then have p A N ( a ) (cid:39) e − NI ( a ) . (2.66)The only values of a such that lim N →∞ p A N ( a ) (cid:54) = 0 have to be all the values a (cid:63) such that I ( a (cid:63) ) = 0. Therefore we recover the law of large number by noticing that a (cid:63) = x , where x is theone in Eq. (2.63). Moreover, we can expand I around its zero, x , and obtain I ( a ) = 12 I (cid:48)(cid:48) ( x )( a − x ) + o (( a − x ) ) , (2.67)where the “small o ” notation means that we are neglecting terms of order ( a − x ) or less relevantin the limit a → x . Therefore, we have p A N ( a ) (cid:39) e − N ( I (cid:48)(cid:48) ( x )( a − x ) + o (( a − x ) ) ) , (2.68)which is the central limit theorem, after identifying I (cid:48)(cid:48) ( x ) = σ . Notice that this approximationis valid up to | a − x | ∼ N − / , while for larger distances from the average one needs to keep intoaccount higher terms in the expansion Eq. (2.67).If one has the full form of I , then the probability of each value of A N can be computed, alsofor values very far from the average x . This is the reason why this field is called large deviation theory and in this sense we can consider LDT a generalization of the central limit theorem andof the law of large numbers. The G¨artner-Ellis theorem But how to compute rate functions? Unfortunately, there is not a general way. However, oftenthe rate function can be computed by means of the G¨artner-Ellis theorem, which in its simplerformulation states the following.Consider the random variable A N , where N is an integer parameter. The scaled cumulantgenerating function (SCGF) is defined as ψ ( k ) = lim N →∞ N log (cid:10) e NkA N (cid:11) , (2.69)where k ∈ R and (cid:10) e NkA N (cid:11) = (cid:90) da p A N ( a ) e Nka . (2.70)If ψ ( k ) exists and is differentiable for all k ∈ R , then A N satisfies a large deviation principle,with rate function I given by the Legendre transform of the SCGF, that is I ( a ) = sup k ∈ R ( ka − ψ ( k )) . (2.71)We will not prove this theorem here, but the interested reader can find the proof, for example,on Ellis’ book [Ell07] or on Touchette’s review [Tou09].Here we will limit ourself to some consideration about the SCGF. First of all, its name isgiven by the fact that ∂ n ∂k n ψ ( k ) (cid:12)(cid:12)(cid:12)(cid:12) k =0 = lim N →∞ N n − C n , (2.72)where ∂ nk denotes n derivatives with respect to k and C n is the n -th order cumulant of A N . Inparticular, ∂∂k ψ k (cid:12)(cid:12)(cid:12)(cid:12) k =0 = lim N →∞ (cid:104) A N (cid:105) (2.73)8 CHAPTER 2. STATISTICAL PHYSICS FOR COPS and ∂ ∂k ψ k (cid:12)(cid:12)(cid:12)(cid:12) k =0 = lim N →∞ N (cid:16)(cid:10) A N (cid:11) − (cid:104) A N (cid:105) (cid:17) , (2.74)that is the first and second derivatives of the SCGF ψ ( k ) evaluated in k = 0 are, respectively,the mean and the variance (times N ) of A N , in the limit of large N .The SCGF ψ ( k ) has some remarkable properties, that will be useful in the following:1. ψ (0) = 0, because of normalization of the probability measure.2. The function ψ ( k ) is convex, as can be proven by using the H¨older inequality: (cid:104) XY (cid:105) ≤ (cid:68) X /p (cid:69) p (cid:68) Y /q (cid:69) q , ≤ p, q ≤ , p + q = 1 . (2.75)Indeed, if we choose X = e pk NA N , Y = e (1 − p ) k NA N , so that (cid:68) e [ pk +(1 − p ) k ] NA N (cid:69) ≤ (cid:10) e k NA N (cid:11) p (cid:10) e k NA N (cid:11) − p , (2.76)we now take the logarithm, divide by N and, since this inequality is valid for all N , we cantake the limit N → ∞ to obtain ψ ( pk + (1 − p ) k ) ≤ p ψ ( k ) + (1 − p ) ψ ( k ) . (2.77)3. The function ψ ( k ) /k is a monotonic non-decreasing function, as can be proven from anotherusage of the H¨older inequality: this time we choose X = e kpNA N , Y = 1. We have now (cid:10) e kpNA N (cid:11) ≤ (cid:10) e kNA N (cid:11) p (2.78)and we take the logarithm, divide by N and get and taking the log1 N log (cid:10) e kpNA N (cid:11) ≤ pN log (cid:10) e kNA N (cid:11) , (2.79)which implies ψ ( pk ) ≤ p ψ ( k ) . (2.80)Since p is an arbitrary number between 0 and 1, we have ψ ( pk ) p ≤ ψ ( k ) (2.81)and, by dividing by k , we obtain that the function ψ N ( k ) /k must be non decreasing. In this section, we follow Pastore, Di Gioacchino and Rotondo [PDGR19] in their discussionabout the large deviations of the p-spin spherical model introduced in Sec. 2.4.1. An interestingrelation between large deviation and replica method (and replica symmetry breaking) is firstlyelucidated, then exploited. We notice that similar techniques can be applied to RCOPs, oncethey are written as a spin glass problem. .5. LARGE DEVIATIONS Replica trick and large deviation theory As we have seen, the theory of disordered systems has been mainly developed to describe theaverage behavior of physical observables, which one hopes to coincide with the typical one (thisis true if the physical observable under discussion is self-averaging).However, as it has been argued since the early days of the subject, one can employ spinglass techniques in a more general setting, to estimate probability distributions [TD81] andfluctuations around the typical values [TFI89, CNPV90] of quantities of interest. More recently,Rivoire [Riv05], Parisi and Rizzo [PR08, PR09, PR10b, PR10a] and others [ABM04, NH08, NH09]followed this line of thought, providing a bridge between spin glasses (and disordered systemsmore in general, as in [MPS19]) and the theory of large deviations. The key quantity providingthe bridge is: G ( k ) = lim N →∞ − βN log Z kN , (2.82)where Z N is the partition function for a system of size N and the bar above quantities denotesaverage over disorder. The argument of the logarithm is the averaged replicated partition functionand k is the so-called replica index. We have changed our notation for the replica number toemphasize that we will not deal here only with vanishing number of replicas.From the viewpoint of large deviation theory, S ( k ) is simply related to the scaled cumulantgenerating function (SCGF) of the free energy f = lim N →∞ f N by ψ ( k ) = lim N →∞ log e kNf N N = − βG ( − k/β ) . (2.83)We are interested in obtaining, by using the G¨artner-Ellis theorem, as much information aspossible on the full form of the rate function I ( x ). To do that, one needs to work out the SCGFfor finite replica index k . This problem is clearly equivalent to determine the full analyticalcontinuation of the averaged replicated partition function from integer to real number of replicas k and it was extensively investigated in the early stage of the research in disordered systems inorder to understand the manifestation of the (at that time surprising) mechanism of replica sym-metry breaking we encountered in Sec. 2.4.1. Since these results are particularly interesting fromthe more modern large deviation viewpoint, we briefly mention the main ones in the following.Van Hemmen and Palmer [vHP79] were the first ones to observe that the expression in Eq. (2.82)must be a convex function of the replica index k , as we discussed Sec. 2.5. Shortly after, Ram-mal [Ram81] added that ψ ( k ) /k must be monotonic. However, in some situations, the replicasymmetric (RS) ansatz gives a trial SCGF which is not convex, or such that ψ ( k ) /k is notmonotonic. This problem has been analyzed for the first time in the context of the Sherrington-Kirkpatrick model. After Parisi introduced his remarkable hierarchical scheme for replica sym-metry breaking, Kondor [Kon83] argued that his full RSB solution was very likely to provide agood analytical continuation of Eq. (2.82), not only around k = 0.These results may be considered nowadays as the initial stage of a work that attempted togive mathematical soundness to the replica method. Although this vaste program is mostly un-finished, Parisi and Rizzo realized that the original analysis presented by Kondor is fundamentalto investigate the large deviations of the free-energy in the SK model. Large deviations havebeen examined only for a few other spin glass models: Gardner and Derrida discussed the formof the SCGF in the random energy model (REM) in a seminal paper [GD89], and many rigor-ous results have been established later on [FFM07]; on the other side of the story Ogure andKabashima [OK04, OK09a, OK09b] considered analyticity with respect to the replica numberin more general REM-like models; Nakajima and Hukushima investigated the p -body SK model0 CHAPTER 2. STATISTICAL PHYSICS FOR COPS [NH08] and dilute finite-connectivity spin glasses [NH09] to specifically address the form of theSCGF for models where one-step replica symmetry breaking (1RSB) is exact.In this section we add one more concrete example to this list, considering the p -spin sphericalmodel. In zero external magnetic field, we will show that the 1RSB calculation at finite k producesa SCGF with a linear behavior below a certain value k c and a nice geometrical interpretationof this, dating back to Kondor’s work on the SK model [Kon83], is discussed. Accordingly, therate function is infinite for fluctuations of the free energy above its typical value, which are thenmore than exponentially suppressed in N , giving rise to a regime of very-large deviations. Thishappens for several other spin glass problems, as discussed for example in [PR10b], and manyother systems showing extreme value statistics[DM08].The situation changes dramatically when a small external magnetic field is turned on: therate function becomes finite everywhere, although highly asymmetric around the typical value,and the very-large deviation feature disappears accordingly. We explain intuitively the reasonof this change of regime in light of the geometrical interpretation discussed for the case withoutmagnetic field, and argue that the introduction of a magnetic field could act as a regularizationprocedure for resolving the anomalous scaling of the large deviation principle for this kind ofsystems. Large deviations of the p -spin spherical model free energy We start our analysis from Eq. (2.28). After the integration on the λ degrees of freedom, thepartition function is (up to finite-size corrections in N ): Z kN = (cid:90) DQ e − NS ( Q ) , (2.84)where S ( Q ) = − β k (cid:88) α,β =1 Q pαβ − 12 log det Q − kS ( ∞ ) . (2.85)To evaluate the integrals on Q we use again the saddle point method together with the 1RSBansatz, which is formulated in terms of the three parameters ( q , q , m ) in Eq. (2.55).We compute S ( Q ) in terms of the 1RSB parameters as discussed in Appendix B, but now we donot take the limit k → S ( k ; q , q , m ) = − ( βJ ) k + k ( m − q p + k ( k − m ) q p ] − k ( m − m log ( η ) − k m log ( η ) − 12 log (cid:18) kq η (cid:19) − ks ( ∞ ) , (2.86)where η = 1 − q and η = 1 − (1 − m ) q are the two different eigenvalues of the 1RSB matrix Q once we use that q = 0 at the saddle point. This functional is evaluated numerically at thesaddle point (¯ q , ¯ q , ¯ m ) for the 1RSB parameters for each value of k . The three parameters takevalues in the domains q ∈ [0 , q ∈ [0 , q ], m ∈ [1 , k ] (if k > 1) or m ∈ [ k, 1] (otherwise),and for k < ψ ( k ) which becomes linear above a certain value k = k c .To ease the visualization of this feature, in Fig. 2.4 we plot the function G ( k ) /k = S ( k ; ¯ q , ¯ q , ¯ m ) / ( kβ )which, when ψ ( k ) is linear, becomes an horizontal line intercepting the vertical axis in f typ . Thefigure does not change qualitatively for p ≥ 3. For the p = 2 case, at low temperature the 1RSB .5. LARGE DEVIATIONS G ( k ) k k β =1 . RS PM0 1 2 − . − . β = β c RS PM0 1 2 − . − . β =2 R S PM 1 − . − . k dAT β =3 R S PM 1 − . − . k dAT Figure 2.4: The function G ( k ) /k for the ( p = 3)-spin in zero external magnetic field, for differentvalues of β . Top-left: at high temperature ( β = 1 . 5) the 1-RSB anstatz coincides with the RSone (blue curve); the solution joins the paramagnetic line (in black) in a point k c > 1, wherethe function is not differentiable. Top-right: at β = β c ≈ . k c = 1 andbecomes smooth. Bottom line: for β = 2 (left) and β = 3 (right), the 1RSB solution (red curve)departs from the RS one and becomes a straight line for all the k < k c , which is the point wherethe RS function loses its monotonicity. The critical value k c approaches zero for β → ∞ .2 CHAPTER 2. STATISTICAL PHYSICS FOR COPS ansatz reduces to the RS one (that is, ¯ q = ¯ q ) as long as k ≥ 0, therefore the typical values of allthe thermodynamic quantities are obtained under the RS ansatz. On the opposite, for k < p ≥ k c = 0 for the 2-spin spherical model for β > β c .Before turning to the evaluation of the rate function, we discuss an interesting geometricalinterpretation of the SCGF shape. To this aim, let us consider the RS ansatz (that is, Eq. (2.86)with q = q = q and m = 1). As we can see in Fig. 2.4, the RS solution (blue curve) is notmonotonic for β < β c . But as we have seen, G ( k ) /k has to be a monotonic quantity and thereforethe RS solution can be ruled out. We can check that the 1RSB solution gives a perfectly finemonotonic G ( k ) /k (red curve in Fig. 2.4), as one could expect due to the fact that this ansatzgives the correct typical free energy for this model. Interestingly, however, exactly the samemonotonic curve can be obtained by using a much simpler geometric construction: just considerthe RS solution, which is the right one for large k , and when G ( k ) /k starts to be non-monotoniccontinue with a straight horizontal line (in the G ( k ) /k vs k plot). This construction actuallydates back to Rammal [Ram81] and is discussed in more detail in Appendix B.3. Here we limitourselves to notice that G ( k ) /k obtained by using the 1RSB ansatz or the Rammal constructionare the same because of the following facts: (i) for k > k c the 1RSB and RS ans¨atze coincide(¯ q = ¯ q = q (cid:54) = 0) and k c is exactly the point where G ( k ) /k is not monotonic anymore if oneuses the RS ansatz; (ii) from the saddle point equations obtained by extremizing Eq. (2.86) when k < k c , one obtains ¯ q = 0; (iii) the remaining saddle point equations fix q and m , and onecan see that these equations are identical to those needed to perform the Rammal construction,which fix the point k c and the parameter of the RS ansatz q . Rate function and very large deviations Starting from the SCGF, we perform a numerical Legendre transformation to obtain the ratefunction according to Eq. (2.71). The result is shown in Fig. 2.5 for different values of β . Therate function displays the following behavior: • for x = f typ , it is null as expected; • for x < f typ , I ( x ) is finite, indicating that a regular large deviation principle holds forfluctuations below the typical value. When β > β c the SCGF is smooth, so we obtain therate function via the Gartner-Ellis theorem. On the other hand, when β < β c the SCGFis not differentiable in a point (see Fig. 2.4), so we are only able to obtain the convex hullof the rate function (see Fig. 2.5); • for x > f typ , I ( x ) = + ∞ . This is due to the linear behavior of the SCGF below k c discussedin the previous section and it is a signature of an anomalous scaling with N of the rarefluctuations above the typical value.An ambitious goal would be the identification of the correct behavior with N of these very largedeviations. Indeed, a more general way of stating a large deviation principle is P ( f N ∈ [ x, x + dx ]) ∼ (cid:40) e − a N I − ( x ) dx if x ≤ f typ ,e − b N I + ( x ) dx if x > f typ , (2.87)where a N , b N → ∞ when N → ∞ . In other words, the fluctuations resulting in values of x lowerthan f typ are given by the rate function I − ( x ), while those resulting in values larger than f typ have rate function I + ( x ), but with different scalings a N , b N . In our case, we have a N ∼ N , then .5. LARGE DEVIATIONS I ( x ) x β =1 . − . − . . f typ x ∗ ↑ + ∞ ? β =2 − . − f typ ↑ + ∞ Figure 2.5: Rate function of the free energy for the ( p = 3)-spin in zero external magnetic field,for different values of β . The fluctuations above the typical value correspond to the linear part ofthe SCGF, so that the Legendre transformation gives an infinite rate function. The fluctuationsbelow the typical value are described by the branch in red. For β = 1 . < β c (left), as the SCGFis not differentiable, we obtain only the convex-hull of the true rate function; in the interval[ x ∗ , f typ ], where our result gives a straight segment (the part of the curve overlapping the dottedline), the true, unknown rate function is represented by the curve in blue. For β = 2 > β c (right)the SCGF is smooth and the G¨artner-Ellis theorem applies.4 CHAPTER 2. STATISTICAL PHYSICS FOR COPS the rate function defined in Eq. (2.60) can be written as I ( x ) ∼ (cid:40) I − ( x ) if x ≤ f typ , b N N I + ( x ) if x > f typ , (2.88)with b N /N → ∞ . For this reason, fluctuations above the typical value are referred to as “verylarge deviations”. The physical explanation of the substantial difference in scaling of the devia-tions of thermodynamical quantities below and above their typical values resides in the differentnumber of elementary degrees of freedom involved to obtain the corresponding fluctuation: whilein the first case it is sufficient that only one of the elementary variables assumes an anomalousvalue below its typical, the others being fixed, in the second case all the variables have to fluc-tuate, a joint event with probability heavily suppressed with respect to the first one.This argument shows the importance of the resolution of the anomalous scaling behaviorleading to the very large deviations we explained above. In general, however, although theG¨artner-Ellis theorem can be extended to find rate functions for large deviation principles witharbitrary speed a N , b N , we lack techniques to compute the asymptotic scaling of a N and b N forlarge N , because of additional inputs needed to calculate the corresponding SCGF with a saddle-point approximation (for some other systems this problem has been solved with ad-hoc methods[ABM04, DM08], while in [PR10b] a method is proposed in the context of the SK model).In the next section we present the main result of our work, which could be useful to studythis anomalous kind of fluctuations also in other problems: through an extension of the replicacalculation to the case with an external magnetic field, we are able to numerically check that thevery large deviation effect disappears. More in detail, we obtain that with a magnetic field, nomatter how small, not only a N ∼ N as before, but also b N ∼ N . p -spin model in a magneticfield In this section we generalize the previous discussion to the case of non-zero magnetic field. TheHamiltonian for the model is H = H p − h N (cid:88) i =1 σ i , (2.89)where H p is the p-spin Hamiltonian and h represents an external magnetic field coupled withthe spins.The computation of the SCGF at h (cid:54) = 0 goes beyond the approach of the work by Crisanti andSommers, who only considered the typical case. In contrast to the problem with h = 0, where thefinite- k calculation consists of a quite straightforward generalization of the standard one, here amore substantial effort is needed to extend the k = 0 result. The derivation is quite technical,therefore to emphasize the discussion about the large deviation of the free energy we report hereonly the final expression we obtained for the SCGF, postponing the details in Appendix B.4.The functional g ( q ) in the 1RSB ansatz, for finite k is S ( k ; q , q , m ) = − ( βJ ) k + k ( m − q p + k ( k − m ) q p ] − k ˆ q − η − k ˆ q − ) − k ( m − m log ( η ) − k m log ( η ) − 12 log (cid:18) k ( q − ˆ q − ) η (cid:19) − ( βh ) k ( η − k ˆ q − ) − ks (+ ∞ ) , (2.90) .5. LARGE DEVIATIONS q − depends on the combination βh and the parameters of the 1RSB ansatz (its full formis given in Eq. (B.31) of Appendix B.4) and now η = 1 − q , η = 1 − (1 − m ) q − mq and η = 1 − (1 − m ) q − ( m − k ) q are the three eigenvalues of Q (now we do not have anymore q = 0).Again, we numerically compute and plot G ( k ) /k = S ( k ; ¯ q , ¯ q , ¯ m ) / ( kβ ) in Fig. 2.6, whereagain ¯ q , ¯ q , ¯ m are the solutions of the saddle point equations, obtained by extremization ofEq. (2.90). The most striking feature of these plots is the difference from those represented inFig. 2.4: all the horizontal lines disappear and their place is taken by curves (again given by the1RSB ansatz) with non-null derivative. Let us analyze more closely what is happening and whythe external magnetic field is changing the behavior of the system. As discussed in the last partof Sec. 2.5.1, one can apply the Rammal construction to correct the non-monotonic behavior ofthe RS version of G ( k ) /k (plotted as a blue curve in Fig. 2.6). Exactly as in the h = 0 case,the resulting function will be monotonic and will have an horizontal line, which is the smoothcontinuation of G ( k ) /k from k m , the point where it loses monotonicity. However, as one can seefrom Fig. 2.6), the result will not be the 1RSB solution. This difference from the h = 0 case canbe seen as a consequence of the saddle point equations: now the equation for q is non-trivial andso either ¯ q , ¯ q and ¯ m depends on k also in 1RSB phase, giving rise to the non-trivial behaviorof G ( k ) /k also for k < k c . Notice that another interesting feature appears: when h = 0 we havethat k c , the point where the 1RSB solution becomes different from the RS one, coincide with k m , the point where G ( k ) /k obtained by the RS ansatz loses monotonicity. With h (cid:54) = 0 we havethat k c > k m for β > β c , that is the 1RSB branch departs from the RS one before (coming fromlarge k ) the point where G ( k ) /k starts to be not monotonic. Finally, we numerically checkedthat the shape of G ( k ) /k below k c depends on p .This change in the SCGF has an important effect, in turn, on the rate function: taking thenumerical Legendre transformation of the SCGF we now obtain a continuous curve, meaningthat very rare fluctuations are disappeared, see Fig. 2.7. In other words, now the two quantities a N and b N introduced in Eq. (2.87) are such that a N ∼ N and b N ∼ N . This effect is present alsofor very small magnetic field, even though I ( x ) is more and more asymmetrical around x = f typ as we decrease h . This observation brings to a natural question, which for now remains open:can this effect be exploited to obtain insights on the very large fluctuations - that is how arethey suppressed with the system size? And what is the corresponding (finite) rate function?6 CHAPTER 2. STATISTICAL PHYSICS FOR COPS G ( k ) k k β =1 . ,h =0 . R S − . − . 35 0 1 − . − . β ≈ . ,h =0 . R S B R S Figure 2.6: The function G ( k ) /k for the ( p = 3)-spin in a magnetic field h = 0 . 2, for differentvalues of β : β = 1 . < β c ( h ) (left), β = β c ( h = 0) > β c ( h ) (right). The application of a magneticfield washes out the linear behavior at small k observed in zero magnetic field. − · − . · − x − f typ I ( x ) h = 0 h = 0 . h = 0 . h = 0 . Figure 2.7: Rate function of the free energy for the ( p = 3)-spin at β = 3, for different values ofthe external magnetic field. The infinite branch of the rate functions in Fig. 2.5 is replaced by acurve gradually less steep as the magnetic field is increased. hapter 3 In practice: from mean-field toEuclidean problems From now on we will deal with a very specific class of COPs, the so-called Euclidean problems.The main characteristics of these problems are: • an instance is specified by giving the positions of a certain number of points in a subspace(often compact) of R d ; • the cost function depends on the distances between pairs of these points; • each of these problems allows for a natural definition in terms of a problem on a graph.We will deal with certain specific problems, that is the matching and assignment problem, thetraveling salesman problem and the 2-factor problem.In all these cases, the RCOP version of these problems will be defined by considering an hypercubeof side 1 and a certain factorized probability density for the point positions, that will then beIID random variables. Therefore the quantity of interest, which is in our case the cost of thesolution, will be averaged over the point positions.All these problems can be also studied in the so-called mean-field approximation, where insteadof throwing the points according to a probability density and computing the distances, onedirectly chooses a probability density for the distances. If this probability density is factorized,each distance is a IID random variable and in this way we are neglecting correlations amongdistances. Notice that, on the opposite, these correlations emerge from the Euclidean structureof the space when we compute distances after having thrown the points, even if they are chosenin a IID way.Often we will refer to these mean-field results to make a comparison with our finite-dimensionalresults, and also because under the mean-field approximation the replica method can be (mostof the times) carried out to obtain the quantity of interest. On the other hand, in genuine Eu-clidean problems the emergence of the aforementioned correlations prevents us to successfullyapply replica methods. To overcome this technical issue, we will deal with problems in lownumber of dimensions d ( d = 1 and, when possible, d = 2) since they are simpler, and we willfocus on the search for a way to obtain the average cost of the solution without using the replicamethod. 378 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS Here we introduce the key concepts about graphs that we will use profusely in the following.Let us start with the definition of a graph: given a set A of labels (typically A = N ), a graph G is specified by two sets, the vertex set V ⊂ A and the edge set E ⊂ A × A × · · · × A = A (cid:96) andwe say that G = ( V, E ), the element of V are the vertices or nodes of G and the elements of E are the edges or links of G . Multilinks (or multiple edges ) are two or more edges connecting the same points. A self-loop is anedge in which the same point appears more than once. A graph without multilinks and self-loopsis called simple graph. From now on, we will consider always simple graphs with E ⊂ A × A (that is, (cid:96) = 2).A graph is said to be undirected if the following holds: given an edge ( i, j ) ∈ E , then ( j, i ) ∈ E (or, alternatively, the edges are unorderd pairs of vertices). As a further restriction, we will dealonly with undirected graphs.It is customary to represent G as a collection of points, which correspond to the vertices, andlines, which correspond to the edges, such that between vertices v i and v j there is a line if andonly if ( v i , v j ) ∈ E .We will say that a graph is weighted if there is a weight w ij ∈ R associated to each link (cid:15) ij = ( i, j ).Two vertices are said to be adjacent if there is a link connecting them. Given a vertex i we saythat the neighborhood of i , which we will denote as ∂i , is the set of all vertices adjacent to i .Given a vertex i , the number of vertices adjacent to him | ∂i | is said to be its degree .We introduce the adjacency matrix A of a graph: A ij = (cid:40) i, j ) ∈ E ;0 otherwise. (3.1)Notice that for undirected graphs, A is symmetric. We define the Laplacian matrix L of a graph: L ij = − i (cid:54) = j , ( i, j ) ∈ E ; (cid:88) j,j (cid:54) = i i = j ;0 if i (cid:54) = j , ( i, j ) / ∈ E . (3.2)When the graphs are weighted, we define the weighted adjacency matrix as A ij = (cid:40) w ij if ( i, j ) ∈ E ;0 otherwise (3.3)and the weighted Laplacian matrix as L ij = − w ij if i (cid:54) = j , ( i, j ) ∈ E ; (cid:88) j,j (cid:54) = i w ij if i = j ;0 if i (cid:54) = j , ( i, j ) / ∈ E . (3.4)where w ij is the weight associated to the edge ( i, j ). .1. EUCLIDEAN PROBLEMS IN LOW DIMENSION walk , that is an alternating series of vertices and edges suchthat two consecutive vertices are linked by the interleaving edge. Pictorially, this is actually a“walk” on the graphical representation of a graph. When the vertices and edges are all different,the walk is called path . A graph is connected if there is a path connecting each pair of vertices,and it is said to be disconnected otherwise. The length of a path is its number of vertices, andthe distance between two vertices is the length of the shortest path connecting them. If such apath does not exist, we say that the distance is infinite. A walk or path is closed if the startingvertex is the same of the ending one. A closed path is called cycle or loop . There are two specialkinds of cycles: the Eulerian cycle is such that it passes through each edge of the graph, the Hamiltonian cycle is such that it passes through each vertex of the graph. If a graph does notcontain any cycle, it is called forest . If it is also connected, it is called tree .There are some classes of graphs which one encounters particularly often, because of theirregularity properties (see Fig. 3.1): • the k -regular graphs is such that for each v ∈ V , we have | ∂V | = k , that is each vertex hasdegree k ; • a complete graph is such that for each pair of vertices i, j ∈ V , ( i, j ) ∈ E , that is each pairof vertices are connected by an edge; in particular, a complete graph with N vertices is N -regular and we will use for it the symbol K N ; • a graph G = ( V, E ) is p -partite if we can partition V in p non-empty subsets such that thereare no edges of G connecting vertices which belong to the same subset; we will consider inthe following only the case p = 2: in this case we say that the graph is bipartite ; • a graph G = ( V, E ) which is bipartite in such a way that each subset of vertices hasthe same number of vertices ( | V | = | V | = | V | / 2) and such that each vertex of a subsetis connected with all the vertices of all the other subsets is called complete bipartite ; inparticular, a complete bipartite graph with 2 N vertices is N -regular and and we will usefor it the symbol K N,N .A subgraph G (cid:48) = ( V (cid:48) , E (cid:48) ) of the graph G = ( V, E ) is such that V (cid:48) ⊂ V and E (cid:48) ⊂ E . A spanning subgraph or factor is a subgraph such that V (cid:48) = V . A k-factor is a factor that isk-regular. In particular, 1-factors are also called (perfect) matchings , or assignments when thegraph is bipartite. 2-factors are called loop coverings . Finally, whenever a spanning subgraph isa tree, it is called spanning tree .Finally, for completeness, we add that sometimes the disorder in COPs defined over graphsis introduced directly at the graph level with the concept of random graphs , that is a probabilitydistribution over a set of graphs with certain properties. The most used random graphs are: • k-regular random graphs - a graph is randomly chosen among all those with N verticeswhich are k regulars. Therefore, each graph has the same probability of being generated. • Erd¨os-R´enyi graphs - given a set of N vertices, each possible link is realized with fixedprobability p . • Barab´asi-Albert graphs (also known as preferential attachment graphs ) - one vertex at atime is added to the graph; if there are other vertices in the graph, the probability that CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS (a) Example of a 3-regular graph. (b) Graphic representation of K N withN=12.(c) Graphic representation of K N,N withN=6. Figure 3.1: Examples of several classes of graphs.the new one has a link with the already present vertex v is k v / N where k v is the degree of v and N = (cid:80) v k v is the normalization constant. A Euclidean problem can be seen as a problem on a weighted graph, which typically is completeor complete bipartite. Indeed, an instance of such a problem is specified when the positions ofall the involved points are given, and the cost function depends on the distances between pairs ofthese points. Therefore we can restate the problem on a graph as follows: each point chosen inthe Euclidean space corresponds to a vertex of the graph and the weight of the link connectingtwo points is its distance computed in the Euclidean d -dimensional space. For this reason we saythat the graph is embedded in the Euclidean d -dimensional space. In the next sections, we willsee how the cost function of Euclidean problems has often a very simple interpretation when theproblem is casted in graph language.From the next Section, we will start our analysis by considering problems on a graph embed-ded in one dimension. Now, the problems we will encounter are known to be in the P complexityclass as long as they are in one dimension, so why do we are so interested in them?The first reason is that interesting phenomena, such as non-self-averaging solution costs, canappear also in one dimension (as we will see later). The second is that the one-dimensional caseis much simpler than the higher-dimensional one, and can lead to insight useful for the latter.Let us now go through our general strategy to address one-dimensional RCOPs. Our aim isto compute the cost of the solution, averaged over the disorder (that is, over the ensemble ofinstances defined by probability density for the point positions).The first step consists in finding the structure of the solution of our problem. Indeed in one .2. MATCHING PROBLEM x < · · · < x N for N points, we say that that the first point isthe one in x , the second is the one in x and so on. Therefore, the point are ranked accordingto their position. We will see that often the solution is given in terms of this point rank, ratherthem the point specific positions. This will allow us to find the configuration which minimizesthe cost, and then to reach our goal it will be enough to average the cost of this configurationover the point positions.However, as we will see, for some problems the optimal configuration does depend on the specificpoint positions (and not only on their rank) even in one dimension. In these cases we will stillbe able to work out bounds for the cost, by carefully analyzing the full set of possible solutions.We will also see how this one-dimensional approach to Euclidean RCOPs will help us to makeexact predictions for the limiting (in the large problem size) behavior of the average cost, evenfor some problems which are NP-hard (in two or more dimensions). We start by the most general definition of the problem, which is the following: consider a weightedgraph G = ( V, E ), w e being the weight of the edge e ∈ E . Let us denote by M of matchings ofthis graph. To each matching M = ( V, E M ) ∈ M we associate a cost C M = (cid:88) e ∈ E M w e . (3.5)The matching problem consists in deciding whether M = ∅ or not, and if M (cid:54) = ∅ then the weighted matching problem consists in finding the matching with the minimum cost. Notice thatwe can easily recast the matching problem as a weighted matching problem as follows: given G = ( V, E ) with N vertices, build a weighted complete graph K N , where the weight of a link e is 0 if e ∈ E , 1 if e / ∈ E . Now solve the weighted matching problem on this weighted completegraph, and if the solution cost is 0 then G has at least one matching, if the cost is greater than0 then G does not have matchings.Therefore from now on we will only consider the weighted matching problem, which we willsimply call matching problem. This problem, even in this very general graph setting, is in the Pcomplexity class thanks to the work of Kuhn [Kuh55], who discovered a polynomial algorithmcalled Hungarian algorithm , and several other works [Edm65, MV80, EK03, LP09] in which thatalgorithm is extended and made faster.From this point on, we will only consider matchings on complete graphs K N and on completebipartite graphs K N,N . Notice that usually, when the graph is bipartite, the matching problemis called assignment problem .Let us come back to the question of the problem complexity, and consider more closely thematching problem on complete graphs. The graph K N has always(2 N − N − (cid:89) k =0 (2 N − − k ) = (cid:81) Nk =0 (2 N − k ) (cid:81) N − k =0 (2 N − k ) = (2 N )!2 k N ! ∼ √ e N (log(2 N )+1) (3.6)perfect matchings, where in the last step we used the Stirling approximation for the factorial forlarge N , N ! ∼ √ πN (cid:18) Ne (cid:19) N . (3.7)2 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS This is of course an enormous number which makes brute force approaches immediately unusable.Similarly, a bipartite graph K N,N has N ! ∼ √ πN (cid:18) Ne (cid:19) N = √ πe N (log N − log N (3.8)assignments.According to our discussion in Sec. 2.2, we can write down a spin Hamiltonian for this problemas follows: given a graph G = ( V, E ) (which we restrict to be complete or complete bipartite),we associate a binary variable to each edge of the graph, and x ij = 1 (or 0) if the edge ( i, j ) ispresent (or not) in the configuration (set of edges) x . The cost function is C ( x ) = 12 (cid:88) ( i,j ) ∈ E w ij x ij . (3.9)where the factor 1/2 is present because if ( i, j ) ∈ E , also ( j, i ) ∈ E . We also need to require that x is a matching, that is (cid:88) j ∈ ∂i x ij = 1 (3.10)for each i .We can proceed in two ways: • we can add these constraints in a hard manner, that is by restricting the configurationspace to those states for which Eq. (3.10) is satisfied (constraints of this kind are ofterreferred to as hard constraints ); • we can modify the cost function so that at least the minimum-energy configuration satisfiesEq. (3.10), for example by using C soft ( x ) = 12 (cid:88) ( i,j ) ∈ E w ij x ij + λ (cid:88) j ∈ ∂i x ij − , (3.11)where λ is a free parameter to be chosen sufficiently large (constraints of this kind are ofterreferred to as soft constraints ).In this Chapter we will always impose hard constraints, but in the next Chapter we will see that,to overcome some technical problems, sometimes it is necessary to use the soft variant.At this point, to obtain a genuine spin Hamiltonian, we should do the change of variables x ij = σ ij + 12 , (3.12)so that to the binary variable x ij = { , } we associate a spin variable σ ij = { , − } . Theresulting Hamiltonian is H ( σ ) = 14 (cid:88) ( i,j ) ∈ E w ij σ ij + C, (3.13)where C = (cid:80) w w/ 2. As we can see, the Hamiltonian of this problem is trivial, and all thenon-trivial part comes from the constraint term, which in terms of the new spin variables is (cid:88) j ∈ ∂i σ ij = 2 + C (cid:48) , (3.14) .2. MATCHING PROBLEM C (cid:48) = N − G = K N and C (cid:48) = N for G = K N,N and is the number of verticesadjacent to each vertex. By using either hard or soft constraints, one can check that the problemHamiltonian is frustrated (in the sense discussed in Sec. 2.4). Therefore, even if we know thatan algorithm which solves the problem in polynomial time does exist, the energy landscape isfar from being trivial for a generic choice of the weights w ij . We introduce the disorder in the matching/assignment problem to treat it as a RCOP. In themean field case, we do it by choosing a probability density function for the weights w ij so thatthey are IID random variables.The focus of this work is the Euclidean version of several problems, where the weights arecorrelated, but before than considering that more complicated case, we will very quickly reviewthe mean field case following the original paper by M´ezard and Parisi [MP85] (the interestedreaders can find the details of the computations in that paper, but also in one of these PhDtheses [Sic16, Mal19]).For the weights, we consider the probability density p ( w ) = θ ( w ) e − w . (3.15)We can write the partition function for the complete graph K N using Eqs. (3.9) and (3.10) as Z = N (cid:89) i,j =1 i 12 (3.19)for N (cid:29) 1. This same approach can be used for the assignment on the complete bipartite graph K N,N and the result has a factor 2 of difference: E (bip) N ∼ π . (3.20)4 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS At this point, one can wonder if there is a simple way to guess the fact that the cost of thesolution is (on average) of order 1 for N → ∞ , and if there is a simple way to explain this factor2 of difference. To answer that, notice that, even though w i = 1, the minimum among n IIDrandom variables can be computed by obtaining the cumulative: P (min w i ≥ x ) = n (cid:89) i =1 P ( w i ≥ x ) = e − nx . (3.21)From this, we can obtain p (min w i ) = − ∂∂x P (min w i ≥ x ) = ne − nx (3.22)and so, on average, min w i = N (cid:90) ∞ dx x e − nx = 1 n . (3.23)Therefore, one can reasonably expect that, if we need to find the matching of minimum cost of K N , each of the N edges chosen in that matching will be have a cost close to the minimumof a set of 2 N − ∼ N IID random variables drawn from the probability given in Eq. (3.15),so 1 / (2 N ). Since we have N edges in the cost function, if each edge of the matching wereindependent from the others, the total cost would have been ∼ / 2. However, there are theconstraints, that prevent a matching from being composed only of minimum cost links, and theextra cost due to this fact raises the total cost to π / (cid:39) . 82. A similar argument can be usedfor the bipartite matching, and in this case one has that again the matching is composed by N edges, but each edge now has to be chosen among N IID random variables with probabilityEq. (3.15), and this gives the factor 2 of difference. Assignment problem on complete bipartite graphs In this section we will focus on the Euclidean matching problem, beyond the mean field ap-proximation. Therefore, let us state the problem in the Euclidean setting, starting from theassignment: consider two sets of points in R d labeled by their coordinates, R = { r , . . . , r N } (red points) and B = { b , . . . , b N } (blue points). We want to match each red point to one andonly one blue point such that a certain function of the distances between matched points isminimized.Since we can connect the blue points only to the red points and vice versa, the problem canbe seen as a matching on a bipartite complete graph K N,N . Now, each choice of a matchingcorresponds to a permutation of N objects, π ∈ S ( N ), and vice versa. The cost function assignsa cost to each permutation as follows: E ( p ) N [ π ] = (cid:88) i (cid:12)(cid:12) r i − b π ( i ) (cid:12)(cid:12) p , (3.24)where p ∈ R is a parameter. We will focus here on the p > p = 1 and p = 0are special points where there can be many solutions [BCS14], but apart from that they sharethe properties about the typical cost with, respectively, the p > p < p < 0, see [CDS17], while some properties regarding theregion 0 < p < .2. MATCHING PROBLEM p > N = 2 (so two blue and two red points),then the solution is found by simply repeating this argument for each possible choice of two blueand two red points. It follows that, in these cases, the optimal cost is E ( p ) N = N (cid:88) i =1 | r i − b i | p . (3.25)Now we need to average over the disorder. To do that, we will use the Selberg integrals [Sel44] S n ( α, β, γ ) := (cid:32) n (cid:89) i =1 (cid:90) dx i x α − i (1 − x i ) β − (cid:33) | ∆( x ) | γ = n (cid:89) j =1 Γ( α + ( j − γ )Γ( β + ( j − γ )Γ(1 + jγ )Γ( α + β + ( n + j − γ )Γ(1 + γ ) (3.26)where ∆( x ) := (cid:89) ≤ i 0, Re( β ) > 0, Re( γ ) > min(1 /n, Re( α ) / ( n − , Re( β ) / ( n − n = 1.In Appendix C.1 we compute the probability that, once we have ordered our points, the k -thpoint is in the interval [ x, x + dx ]. By using that result, given in Eq. C.5, and the Selberg integralfrom Eq. (3.26) S (cid:16) k, N − k + 1 , p (cid:17) = (cid:32) (cid:89) i =1 (cid:90) dx i x k − i (1 − x i ) N − k (cid:33) | x − x | p = Γ( k )Γ( N − k + 1)Γ (cid:0) k + p (cid:1) Γ (cid:0) N − k + 1 + p (cid:1) Γ(1 + p )Γ (cid:0) N + 1 + p (cid:1) Γ( N + 1 + p )Γ (cid:0) p (cid:1) , (3.28)we get that the average of the k -th contribution is given by | r k − b k | p = (cid:90) dx (cid:90) dy P k ( x ) P k ( y ) | y − x | p = (cid:18) Γ( N + 1)Γ( k ) Γ( N − k + 1) (cid:19) (cid:32) (cid:89) i =1 (cid:90) dx i x k − i (1 − x i ) N − k (cid:33) | x − x | p = (cid:18) Γ( N + 1)Γ( k ) Γ( N − k + 1) (cid:19) S (cid:16) k, N − k + 1 , p (cid:17) = Γ ( N + 1)Γ (cid:0) k + p (cid:1) Γ (cid:0) N − k + 1 + p (cid:1) Γ(1 + p )Γ( k )Γ( N − k + 1)Γ (cid:0) N + 1 + p (cid:1) Γ( N + 1 + p )Γ (cid:0) p (cid:1) (3.29)6 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS and therefore we get the exact result E ( p ) N = Γ ( N + 1)Γ(1 + p )Γ (cid:0) N + 1 + p (cid:1) Γ( N + 1 + p )Γ (cid:0) p (cid:1) N (cid:88) k =1 Γ (cid:0) k + p (cid:1) Γ (cid:0) N − k + 1 + p (cid:1) Γ( k )Γ( N − k + 1)= Γ (cid:0) p (cid:1) p + 1 N Γ( N + 1)Γ (cid:0) N + 1 + p (cid:1) , (3.30)where we made repeated use of the duplication and Euler’s inversion formula for Γ-functionsΓ( z )Γ (cid:18) z + 12 (cid:19) = 2 − z √ π Γ(2 z ) (3.31a)Γ(1 − z )Γ( z ) = π sin( πz ) . (3.31b)For large N we obtain, at the first order, E ( p ) N ∼ Γ (cid:0) p (cid:1) p + 1 N − p . (3.32) Matching problem on the complete graph A similar technique can be carried out to compute the cost of the matching problem on thecomplete graph K N . Indeed, again by studying the case N = 4, it can be shown that the optimalsolution for p > x ≤ · · · ≤ x N ,matching x with x , x with x and so on. Therefore the optimal cost is E ( p ) N = N (cid:88) i =1 ( x i − x i − ) p . (3.33)To compute the average cost, it is convenient to define the variables φ i = x i +1 − x i . The cost ofthe solution in this new variables reads E ( p ) N = N (cid:88) i =1 φ p i − (3.34)Since the 2 N points are uniformly chosen in the unit interval and then ordered, their jointdistribution is p ( x , . . . , x N ) = (2 N )! N (cid:89) i =0 θ ( x i +1 − x i ) , (3.35)with x = 0 and x N +1 = 1. Therefore, the probability distribution function of the φ i variablesis p ( φ , . . . , φ N ) = (2 N )! δ (cid:32) N (cid:88) i =1 φ i , (cid:33) N (cid:89) i =0 θ ( φ i ) . (3.36) .2. MATCHING PROBLEM k -th spacing φ k , which is p (1) ( φ k ) = (2 N )! N (cid:89) a =0 a (cid:54) = k (cid:90) ∞ dφ a δ (cid:32) N (cid:88) i =1 φ i , (cid:33) = (2 N )! i N lim (cid:15) → + (cid:90) ∞−∞ dλ π e − iλ (1 − φ k ) ( λ + i(cid:15) ) N = (cid:40) N (1 − φ k ) N − if 0 < φ k < k and, by exploiting the Euler Beta integral (Eq. (3.26) with n = 1), we get φ pk = (cid:90) dφ k p (1) ( φ k ) φ pk = Γ(2 N + 1)Γ(1 + p )Γ(2 N + 1 + p ) . (3.38)Therefore we finally obtain E ( p ) N = N Γ(2 N + 1)Γ(1 + p )Γ(2 N + 1 + p ) . (3.39)For large N we obtain, at the first order, E ( p ) N ∼ Γ( p + 1)2 p N − p . (3.40) We have seen how to exploit properties of the solution structure to compute the average costof the matching problem solution in one spatial dimension, for both the complete and completebipartite version of the problem. However, we just scratched the surface: there are many knownresults, and many open questions about this fascinating COP. We will mention some of themhere.First of all, the scaling in N for N → ∞ of the average solution cost is known for all numberof dimension d . In particular, for the Euclidean matching problem on the complete graph K N embedded in d dimensions, where the cost of a link is the distance between points to the p ≥ E ( p ) N ∼ A ( p ) d N − p/d (3.41)These scalings can be obtained by a qualitative reasoning, such as the fact that if there are N points in a volume V = 1, then the distance between two first neighbors is ∼ N − /d and thereforethe cost of that link is ∼ N p/d and we have N of these links. A formal proof of Eq. (3.41) isgiven in [Ste97, Yuk06]. As we have seen, when d = 1 we have A ( p )1 = Γ( p + 1)2 p , (3.42)and we are actually able to compute the average cost for each finite N . It is known that the firstcorrection, in every d , scales as O ( N − p/d ) [HBM98], as we can again check in one dimension bystarting from Eq. (3.39). The exact value of the constant A ( p ) d is not known for d > CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS For the assignment problem on the complete bipartite graph K N,N , in d dimensions with theparameter p ≥ E ( p ) N ∼ B ( p )1 N − p/ d = 1; B ( p )2 N (cid:18) log( N ) N (cid:19) p/ d = 2; B ( p ) d N − p/d d > 2. (3.43)The scaling differences between the bipartite problem and the one with a single kind of pointsare due, intuitively, to the fact that in the former case if we consider the problem restricted to asmall region of space we can have fluctuations of the relative density of points of one kind withrespect to those of the other kind. Clearly, this is not possible when there is a single kind ofpoints. This fact, in turn, implies the presence of longer links even at a “microscopical” level,giving rise to the different behavior between this two versions of the matching problem. However,this difference is less and less important as we go in higher number of dimensions.When p = 2, additional results are known [CLPS14] for the case with periodic boundary condi-tions (so that the points are chosen on a d -dimensional torus): E ( p ) N ∼ 13 + O (1 /N ) d = 1;log( N )2 π + O (1) d = 2; N − p/d (cid:18) B ( p ) d + ζ (1)2 π N /d − (cid:19) d > 2, (3.44)where ζ ( x ) is the Epstein ζ function. Notice that in d > d = 2, p = 2 hasbeen extended to the case of open boundary condition [CS15, AST19], and the asymptotic resultgiven in Eq. (3.44) is proven to be correct also in this case.As for the problem on the complete graph, in d = 1 Eq. (3.30) gives the cost and all the correctionsfor each value of p . For the case with p (cid:54) = 2 and d > B ( p )2 nor the scalingof the corrections in N is known.Several other results are known about the self-averaging property of the solution cost: • for the matching problem (on complete graph), it has been proven [Ste97, Yuk06] that thecost is self averaging in any number of dimension d ; • for the assignment problem (on complete bipartite graph), in d = 1 one can check that thecost is not self-averaging with methods similar to those used in Sec. 3.2.3 (see [CDS17]),while for d > d = 2, this questionis still open. In this section we will analyze an archetypal combinatorial optimization problems, which hasbeen fueling a considerable amount of research, from its formalization to the present day. Given N cities and N ( N − / .3. TRAVELING SALESMAN PROBLEM In a generic graph, the determination of the existence of an Hamiltonian cycle is an NP-completeproblem (see Johnson and Papadimitriou in [LSKL85]). However, here we will deal with completegraphs K N , where at least one Hamiltonian cycle exists for N > 2, and bipartite complete graphs K N,N , where at least an Hamiltonian cycle exits for N > H the set of Hamiltonian cycles of the graph G . Let us suppose now thata weight w e > e ∈ E of the graph G . We can associate to eachHamiltonian cycle h ∈ H a total cost E ( h ) := (cid:88) e ∈ h w e . (3.45)In the (weighted) Hamiltonian cycle problem we search for the Hamiltonian cycle h ∈ H suchthat the total cost in Eq. (3.45) is minimized, i.e., the optimal Hamiltonian cycle h ∗ ∈ H is suchthat E ( h ∗ ) = min h ∈H E ( h ) . (3.46)When the N vertices of K N are seen as cities and the weight for each edge is the cost paid tocover the route distance between the cities, the search for h ∗ is called the traveling salesmanproblem (TSP). For example, consider when the graph K N is embedded in R d , that is for each i ∈ [ N ] = { , , . . . , N } we associate a point x i ∈ R d , and for e = ( i, j ) with i, j ∈ [ N ] weintroduce a weight which is a function of the Euclidean distance w e = | x i − x j | p with p ∈ R , aswe did previously for the matching problem. When p = 1, we obtain the usual Euclidean TSP.Analogously for the bipartite graph K N,N we will have two sets of points in R d , that is the red { r i } i ∈ [ N ] and the blue { b i } i ∈ [ N ] points and the edges connect red with blue points with a cost w e = | r i − b j | p . (3.47)When p = 1, we obtain the usual bipartite Euclidean TSP.Also this COP can be promoted to be a RCOP in many ways, and the simplest correspond tothe mean-field case: the randomness is introduced by considering the weights w e independent andidentically distributed random variables, thus neglecting any correlation due to the Euclideanstructure of the space. In this case the problem is called random TSP and has been extensively0 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS studied by disordered system techniques such as replica and cavity methods [VM84, Orl85,Sou86, MP86a, MP86b, KM89, RRG14] and by a rigorous approach [Was10]. In the randomEuclidean TSP [BHH59, Ste81, KS85, PM96, CBB + E = E ( h ∗ ) , (3.48)and its statistical properties. Hamiltonian cycles and permutations We shall now restrict to the complete bipartite graph K N,N . Before turning to the computationof the average cost of the TSP solution in one dimension, let us discuss some general properties,valid in every dimension number, and the relationship between the TSP on bipartite graphs andthe assignment problem discussed before.Let S N be the group of permutation of N elements. For each σ, π ∈ S N , the sequence of edgesfor i ∈ [ N ] e i − = ( r σ ( i ) , b π ( i ) ) e i = ( b π ( i ) , r σ ( i +1) ) (3.49)where σ ( N + 1) must be identified with σ (1), defines a Hamiltonian cycle. More properly, itdefines a Hamiltonian cycle with starting vertex r = r σ (1) with a particular orientation, that is h [( σ, π )] := ( r b π (1) r σ (2) b π (2) · · · r σ ( N ) b π ( N ) ) = ( r C ) , (3.50)where C is an open walk which visits once all the blue points and all the red points with theexception of r . Let C − be the open walk in opposite direction. This defines a new, dual, coupleof permutations which generate the same Hamiltonian cycle h [( σ, π ) (cid:63) ] := ( C − r ) = ( r C − ) = h [( σ, π )] , (3.51)since the cycle ( r C − ) is the same as ( r C ) (traveled in the opposite direction). By definition h [( σ, π ) (cid:63) ] = ( r b π ( N ) r σ ( N ) b π ( N − r σ ( N − · · · b π (2) r σ (2) b π (1) ) . (3.52)Let us introduce the cyclic permutation τ ∈ S N , which performs a left rotation, and the inversion I ∈ S N . That is τ ( i ) = i + 1 for i ∈ [ N − 1] with τ ( N ) = 1 and I ( i ) = N + 1 − i . In the followingwe will denote a permutation by using the second row in the usual two-row notation, that is, forexample τ = (2 , , · · · , N, 1) and I = ( N, N − , . . . , h [( σ, π ) (cid:63) ] = h [( σ ◦ τ ◦ I, π ◦ I )] . (3.53)There are N ! ( N − / K N,N . Indeed the couples of permutations are( N !) but we have to divide them by 2 N because of the N different starting points and the twodirections in which the cycle can be traveled.From Eq. (3.49) and weights of the form given in Eq. (3.47), we get an expression for thetotal cost E [ h [( σ, π )]] = (cid:88) i ∈ [ N ] (cid:2) | r σ ( i ) − b π ( i ) | p + | r σ ◦ τ ( i ) − b π ( i ) | p (cid:3) . (3.54) .3. TRAVELING SALESMAN PROBLEM E [ h [( σ, π )]] = (cid:88) i ∈ [ N ] | r i − b π ◦ σ − ( i ) | p + (cid:88) i ∈ [ N ] | r i − b π ◦ τ − ◦ σ − ( i ) | p = E [ m ( π ◦ σ − )] + E [ m ( π ◦ τ − ◦ σ − )] (3.55)where E [ m ( λ )] is the total cost of the assignment m in K N,N associated to the permutation λ ∈ S N . The duality transformation given in Eq. (3.53), interchanges the two matchings because µ := π ◦ σ − → π ◦ I ◦ I ◦ τ − ◦ σ − = π ◦ τ − ◦ σ − (3.56a) µ := π ◦ τ − ◦ σ − → π ◦ I ◦ τ − ◦ I ◦ τ − ◦ σ − = π ◦ σ − (3.56b)where we used I ◦ τ − ◦ I = τ. (3.57)The two matchings corresponding to the two permutations µ and µ have no edges in commonand therefore each vertex will appear twice in the union of their edges. Remark also that µ = µ ◦ σ ◦ τ − ◦ σ − (3.58)which means that µ and µ are related by a permutation which has to be, as it is τ − , a uniquecycle of length N . It follows that, if h ∗ is the optimal Hamiltonian cycle and m ∗ is the optimalassignment, E [ h ∗ ] ≥ E [ m ∗ ] . (3.59) Traveling on a line... and tying shoelaces! Here we shall focus on the one-dimensional case, where both red and blue points are chosenuniformly in the unit interval [0 , I = (1 , , . . . , N ). We will compute the average cost of the solution of the TSP on bipartitecomplete graphs, similarly to what we have done with the matching problem: as first step wewill obtain the general structure of the solution, and as second step we will use this informationto perform the average over the disorder.From now on, we will assume p > r ≤· · · ≤ r N and b ≤ · · · ≤ b N . Let˜ σ ( i ) = (cid:40) i − i ≤ ( N + 1) / N − i + 2 i > ( N + 1) / π ( i ) = ˜ σ ◦ I ( i ) = ˜ σ ( N + 1 − i ) = (cid:40) i i < ( N + 1) / N − i + 1 i ≥ ( N + 1) / CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS r r r r b b b b Figure 3.2: The optimal Hamiltonian cycle ˜ h for N = 4 blue and red points chosen in the unitinterval and sorted in increasing order.the couple (˜ σ, ˜ π ) will define a Hamiltonian cycle ˜ h ∈ H . More precisely, according to thecorrespondence given in Eq. (3.49), it contains the edges for even N ,˜ e i − = (cid:40) ( r i − , b i ) i ≤ N/ r N − i +2 , b N − i +1 ) i > N/ e i = ( b i , r i +1 ) i < N/ b N , r N ) i = N/ b N − i +1 , r N − i ) N/ < i < N ( b , r ) i = N (3.62b)while for N odd ˜ e i − = ( r i − , b i ) i < ( N − / r N , b N ) i = ( N − / r N − i +2 , b N − i +1 ) i > ( N − / e i = ( b i , r i +1 ) i < ( N − / b N − i +1 , r N − i ) ( N − / < i < N ( b , r ) i = N . (3.63b)The cycle ˜ h is the analogous of the criss-cross solution introduced by Halton [Hal95] (seeFig. 3.2). In his work, Halton studied the optimal way to tie a shoe. This problem can beseen as a peculiar instance of a 2-dimensional bipartite Euclidean TSP with the parameterwhich tunes the cost p = 1. One year later, Misiurewicz [Mis96] generalized Halton’s resultgiving the least restrictive requests on the 2-dimensional TSP instance to have the criss-crosscycle as solution. Other generalizations of these works have been investigated in more recentpapers [Pol02, GT17]. In Appendix C.2.1 we prove that for a convex and increasing cost functionthe optimal Hamiltonian cycle is provided by ˜ h . .3. TRAVELING SALESMAN PROBLEM Statistical properties of the solution cost Similarly to the assignment problem, we can exploit a generalization of the Selberg integral givenin Eq. (3.26) (see [AAR99, Sec. 8.3]), B n ( j, k ; α, β, γ ) := (cid:32) n (cid:89) i =1 (cid:90) dx i x α − i (1 − x i ) β − (cid:33)(cid:32) j (cid:89) s =1 x s (cid:33) j + k (cid:89) s = j +1 (1 − x s ) | ∆( x ) | γ = S n ( α, β, γ ) (cid:81) ji =1 [ α + ( n − i ) γ ] (cid:81) ki =1 [ β + ( n − i ) γ ] (cid:81) j + ki =1 [ α + β + (2 n − − i ) γ ] , (3.64)to compute the average solution cost for each N . By using Eq. (3.64) and the probability thatgiven N ordered points on a line the k -th is in [ x, x + dx ], Eq. (C.5), we obtain: B (cid:16) , k, N − k, p (cid:17) == (cid:90) dx (cid:90) dx x k − x k (1 − x ) N − k (1 − x ) N − k − | x − x | p = (cid:0) k + p (cid:1) (cid:0) N − k + p (cid:1) ( N + p ) (cid:0) N + p (cid:1) S (cid:16) k, N − k, p (cid:17) = Γ( k )Γ( N − k ) Γ( p + 1) Γ (cid:0) k + p + 1 (cid:1) Γ (cid:0) N − k + p + 1 (cid:1) Γ( N + p + 1) Γ (cid:0) N + p + 1 (cid:1) Γ (cid:0) p (cid:1) . (3.65)Therefor we get | b k +1 − r k | p = | r k +1 − b k | p = (cid:90) dx (cid:90) dy P k ( x ) P k +1 ( y ) | x − y | p = Γ ( N + 1)Γ( k ) Γ( N − k ) Γ( k + 1) Γ( N − k + 1) ×× (cid:90) dx dy x k − y k (1 − x ) N − k (1 − y ) N − k − | x − y | p = Γ ( N + 1)Γ( k ) Γ( N − k ) Γ( k + 1) Γ( N − k + 1) B (cid:16) , k, N − k, p (cid:17) = Γ ( N + 1) Γ( p + 1) Γ (cid:0) k + p + 1 (cid:1) Γ (cid:0) N − k + p + 1 (cid:1) Γ( k + 1) Γ( N − k + 1) Γ( N + p + 1) Γ (cid:0) N + p + 1 (cid:1) Γ (cid:0) p (cid:1) . (3.66)from which we obtain N − (cid:88) k =1 | b k +1 − r k | p = 2 Γ( N + 1)Γ(1 + p ) (cid:18) ( N + p + 1) Γ( p )4( p + 1) Γ( p ) Γ( N + 1 + p ) − N + 1 + p ) (cid:19) . (3.67)In addition | r − b | p = | r N − b N | p = N (cid:90) dx dy ( xy ) N − | x − y | p = N S (cid:16) N, , p (cid:17) = N Γ( N + 1) Γ( p + 1) (cid:0) N + p (cid:1) Γ( N + p + 1) . (3.68)4 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS . . . . . . / −→ N E ( ) N . . . . N E − E Figure 3.3: Numerical results for E (2) N for several values of N . The continuous line representsthe exact prediction given in Eq. (3.69) and the dashed line gives the value for infinitely large N . For every N we have used 10 instances. In the inset we show the numerical results for thevariance of the cost E (2) N obtained using the exact solution provided by Eq. (3.60) and Eq. (3.61).The dashed line represents the theoretical large N asymptotic value. Error bars are also plottedbut they are smaller than the mark size.Finally, the average optimal cost for every N and every p > E ( p ) N = 2 (cid:32) | r − b | p + N − (cid:88) k =1 | b k +1 − r k | p (cid:33) = 2 Γ( N + 1) (cid:34) ( N + p + 1) Γ (cid:0) p (cid:1) ( p + 1) Γ (cid:0) N + 1 + p (cid:1) − p + 1)(2 N + p ) Γ( N + p ) (cid:35) . (3.69)Notice that, for large N , lim N →∞ N p/ − E ( p ) N = 2 Γ (cid:0) p + 1 (cid:1) p + 1 , (3.70)which is twice the cost of the assignment problem in the limit of large N , Eq. (3.32). The case p = 2 of Eq. (3.69) is confronted with numerical simulation in Fig. 3.3.Finally, we can compute in the thermodynamical limit the variance of the solution cost, tocheck if this quantity is self-averaging or not. Given two sequences of N points randomly chosenon the segment [0 , φ k in the position between the ( k + 1)-th .3. TRAVELING SALESMAN PROBLEM k -th points isPr [ φ k ∈ dφ ] = k ( k + 1) (cid:18) Nk (cid:19) (cid:18) Nk + 1 (cid:19) dφ k (cid:90) dx dy δ ( φ k − y + x ) x k − y k (1 − x ) N − k (1 − y ) N − k − (3.71)Proceeding as in the case of the assignment discussed in [BCS14, CS14], one can show that theserandom variables φ k converge (in a weak sense specified by Donsker’s theorem) to φ ( s ), whichis a difference of two Brownian bridge processes [CDS17].One can write the re-scaled average optimal cost as E p ≡ lim N →∞ N p − E ( p ) N . (3.72)By starting at finite N with the representation given in Eq. (3.71), the large N limit can beobtained by setting k = N s + and introducing the variables ξ , η and ϕ such that x = s + ξ √ N , y = s + η √ N , φ k = ϕ ( s ) √ N , (3.73)in such a way that s is kept fixed when N → + ∞ . We obtain, at the leading order,Pr [ ϕ ( s ) ∈ dϕ ] = dϕ (cid:90) (cid:90) δ ( ϕ − ( η − ξ )) exp (cid:16) − ξ + η s (1 − s ) (cid:17) πs (1 − s ) dξ dη = 1 (cid:112) πs (1 − s ) exp (cid:26) − s (1 − s ) ϕ (cid:27) dϕ. (3.74)Similarly, see for example [CS14, Appendix A], it can be derived that the joint probabilitydistribution p t,s ( x, y ) for ϕ ( s ) is (for t < s ) a bivariate Gaussian distribution p t,s ( x, y ) = δ ( ϕ ( t ) − x ) δ ( ϕ ( s ) − y ) = e − x t − ( x − y )24( s − t ) − y − s ) π (cid:112) t ( s − t )(1 − s ) . (3.75)This allows to compute, for a generic p > 1, the average of the square of the re-scaled optimalcost E p = 4 (cid:90) dt (cid:90) ds | ϕ ( s ) | p | ϕ ( t ) | p , (3.76)which is 4 times the corresponding one of a bipartite matching problem. In the case p = 2, theaverage in Eq. (3.76) can be evaluated by using the Wick theorem for expectation values in aGaussian distribution E = 4 (cid:90) ds (cid:90) s dt (cid:90) ∞−∞ dx dy p t,s ( x, y ) x y = 45 , (3.77)and therefore E − E = 1645 = 0 . . (3.78)This result is in agreement with the numerical simulations (see inset of Fig. 3.3) and proves thatthe re-scaled optimal cost is not a self-averaging quantity.6 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS r r r r r r r r r r r r Figure 3.4: Optimal solutions for N = 6, for the p > < p < < p < Analogously to our previous analysis of the TSP on complete bipartite graphs, we can addressthe complete graph case. We will be able to study the problem not only in the p > < p < p < N limit. Optimal cycles on the complete graph We shall consider the complete graph K N with N vertices, that is with vertex set V = [ N ] := { , . . . , N } . This graph has ( N − / π in thesymmetric group of N elements, π ∈ S N , defines an Hamiltonian cycle on K N . The sequence ofpoints ( π (1) , π (2) , . . . , π ( N ) , π (1)) defines a closed walk with starting point π (1), but the samewalk is achieved by choosing any other vertex as starting point and also by following the walkin the opposite order, that is, ( π (1) , π ( N ) , . . . , π (2) , π (1)). As the cardinality of S N is N ! we getthat the number of Hamiltonian cycles in K N is N ! / (2 N ).In this section, we characterize the optimal Hamiltonian cycles for different values of theparameter p used in the cost function. Notice that p = 0 and p = 1 are degenerate cases, inwhich the optimal tour can be found easily by looking, for example, at the 0 < p < The p > case We start by proving the shape of the optimal cycle when p > 1, for everyrealization of the disorder. Let us suppose, now, to have N points R = { r i } i =1 ,...,N in the interval[0 , r ≤ · · · ≤ r N . Let us define thefollowing Hamiltonian cycle h ∗ = h [˜ σ ] = ( r ˜ σ (1) , r ˜ σ (2) , . . . , r ˜ σ ( N ) , r ˜ σ (1) ) (3.79)with ˜ σ defined as in Eq. (3.60). In Appendix C.2.2 we prove that the Hamiltonian cycle whichprovides the optimal cost is h ∗ .The main ideas behind the proof is that we can introduce a complete bipartite graph in such away that a solution of the bipartite matching problem on it is a solution of our original problem,with the same cost. Therefore, using the results known for the bipartite problem, we can provethe optimality of h ∗ .A graphical representation of the optimal cycle for p > N = 6 is given in Fig. 3.4, leftpanel. .3. TRAVELING SALESMAN PROBLEM The < p < case Given an ordered sequence R = { r i } i =1 ,...,N of N points in the interval[0 , r ≤ · · · ≤ r N , if 0 < p < h ∗ = h [ ] = ( r (1) , r (2) , . . . , r ( N ) , r (1) ) (3.80)where is the identity permutation, i.e.: ( j ) = j (3.81)then the Hamiltonian cycle which provides the optimal cost is h ∗ .The idea behind this result is that we can define a crossing in the cycle as follows: let { r i } i =1 ,...,N be the set of points, labeled in ordered fashion; consider two links ( r i , r j ) and ( r k , r (cid:96) ) with i < j and k < (cid:96) ; a crossing between them occurs if i < k < j < (cid:96) or k < i < (cid:96) < j . This correspondsgraphically to a crossing of lines if we draw all the links as, for example, semicircles in the upperhalf-plane. In the following, however, we will not use semicircles in our figures to improve clarity(we still draw them in such a way that we do not introduce extra-crossings between links otherthan those defined above). An example of crossing is in the following figure r r r r where we have not drawn the arcs which close the cycle to emphasize the crossing. Now, as shownin [BCS14], if we are able to swap two crossing arcs with two non-crossing ones, the differencebetween the cost of the original cycle and the new one simply consists in the difference betweena crossing matching and a non-crossing one, that is positive when 0 < p < 1. Therefore the proofof the optimality of the cycle in Eq. (3.80), which is given in Appendix C.2.2, consists in showinghow to remove a crossing (without breaking the cycle into multiple ones) and in proving that h ∗ is the only Hamiltonian cycle without crossings (see Fig. 3.4, right panel). The p < case Here we study the properties of the solution for p < 0. Our analysis is based,again, on the properties of the p < p < 0. This means that theoptimal matching solution of 2 N points on an interval is given by connecting the i -th pointto the ( i + N )-th one with i = 1 , . . . , N ; in this way every edge crosses the remaining N − < p < r r r r move that allows to replace non-crossing matchings by crossing ones, in such a way that the cyclethat contains the matching remains an Hamiltonian cycle. This move is such that the cost of8 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS r r r r r r r r r Figure 3.5: This is the optimal TSP and 2-factor problem solution for N = 5, in the p < p < p < < p < 1, but instead of findingthe cycle with no crossings, now we look for the one or ones that maximize them. However, aswe will see in the following, one must distinguish between the N odd and even case. In fact, inthe N odd case, only one cycle maximizes the total number of crossings, i.e. we have only onepossible solution. In the N even case, on the contrary, the number of Hamiltonian cycles thatmaximize the total number of crossings are N . The p < case: N odd Given an ordered sequence R = { r i } i =1 ,...,N of N points, with N odd, in the interval [0 , r ≤ · · · ≤ r N , consider the permutation σ defined as: σ ( i ) = i = 1 N − i +32 for even i > N − i +32 for odd i > h ∗ := h [ σ ] = ( r σ (1) , r σ (2) , . . . , r σ ( N ) ) . (3.83)The Hamiltonian cycle which provides the optimal cost is h ∗ .The proof consist in showing that the only Hamiltonian cycle with the maximum number ofcrossings is h ∗ . As we discuss in Appendix C.2.2, the maximum possible number of crossings anedge can have is N − 3. The Hamiltonian cycle under exam has N ( N − / h ∗ has the maximum possible number of crossings. Indeed, the vertex a is connectedwith the vertices a + N − (mod N ) and a + N +12 (mod N ). The edge ( a, a + N − (mod N )) has2 N − = N − N − vertices a + 1 (mod N ) , a + 2 (mod N ) , . . . , a + N − − N ) that contribute with 2 edges each. This holds also for the edge ( a, a + N +12 (mod N ))and for each a ∈ [ N ]. As shown in Appendix C.2.2 there is only one cycle with this number ofcrossings.An example of an Hamiltonian cycle discussed here is given in Fig. 3.5. The p < case: N even In this situation, differently from the above case, the solution isnot the same irrespectively of the disorder instance. More specifically, there is a set of possible .3. TRAVELING SALESMAN PROBLEM r r r r r r r r Figure 3.6: The two possible optimal Hamiltonian cycles for p < N = 4. For each specificinstance one of them has a lower cost than the other, but differently from all the other cases( p > N odd) the optimal cycle is not the same for each disorder instance.solutions, and at a given instance the optimal is the one among that set with the lowest cost.We will show how these solutions can be found and how they are related.Given the usual sequence of points R = { r i } i =1 ,...,N of N points, with N even, in the interval[0 , r ≤ · · · ≤ r N , if p < 0, consider the permutation σ such that: σ ( i ) = i = 1 N − i + 3 for even i ≤ N + 1 N − i + 3 for odd i ≤ N + 1 i − N for even i > N + 1 i for odd i > N + 1 (3.84)Given τ ∈ S N defined by τ ( i ) = i + 1 for i ∈ [ N − 1] and τ ( N ) = 1, we call Σ the set ofpermutations σ k , k = 1 , ..., N defined as: σ k ( i ) = τ k ( σ ( i )) (3.85)where τ k = τ ◦ τ k − . The optimal Hamiltonian cycle is one of the cycles defined as h ∗ k := h [ σ k ] = ( r σ k (1) , r σ k (2) , . . . , r σ k ( N ) ) . (3.86)An example with N = 4 points is shown in Fig. 3.6. In Appendix C.2.3 the 2-factor (or loopcovering) with minimum cost is obtained. The idea for the proof of the TSP is to show how tojoin the loops in the optimal way in order to obtain the optimal TSP. The complete proof of theoptimality of one among the cycles in Eq. 3.86 is given in Appendix C.2.3. Statistic properties of the solution cost Now we can use the insight on the solutions just obtained to compute typical properties of theoptimal cost for various values of p .In Appendix C.1 we computed the probability of finding the l -th point in [ x, x + dx ], Eq. (C.5),and the probability p l,l + k ( x, y ) dx dy of finding the l -th point in [ x, x + dx ] and the s -th point in[ y, y + dy ], Eq. (C.6). From these equations, it follows that (cid:90) dx dy ( y − x ) α p l, l + k ( x, y ) = Γ( N + 1) Γ( k + α )Γ( N + α + 1) Γ( k ) (3.87)independently from l , and, therefore, in the case p > E N [ h ∗ ] = [( N − p + 1) + 2] Γ( N + 1) Γ( p + 1)Γ( N + p + 1) (3.88)0 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS N N p − E ( p ) N Figure 3.7: Rescaled average optimal cost for p = 2, 1, 0.5 (from top to bottom).and in particular for p = 2 E N [ h ∗ ] = 2 (3 N − N + 1)( N + 2) , (3.89)and for p = 1 we get E N [ h ∗ ] = 2 ( N − N + 1 . (3.90)In the same way one can evaluate the average optimal cost when 0 < p < 1, obtaining E N [ h ∗ ] = Γ( N + 1)Γ( N + p + 1) (cid:20) ( N − 1) Γ( p + 1) + Γ( N + p − N − (cid:21) (3.91)which coincides at p = 1 with Eq. (3.90) and, at p = 0, provides E N [ h ∗ ] = N . For large N , weget lim N →∞ N p − E N [ h ∗ ] = (cid:40) Γ( p + 2) for p ≥ p + 1) for 0 < p < . (3.92)The asymptotic cost for large N and p > p + 1) times the average optimal cost of thematching problem on the complete graph K N given in Eq. (3.40) (notice that in Eq. (3.40) thecost is normalized with N and the number of points is 2 N , differently from what we do here).This factor 2( p + 1) is another difference with respect to the bipartite case, where we have seenthat the cost of the TSP is twice the cost of the assignment problem for large N , independentlyof p .For p < N odd we have only one possible solution, so that the average optimal cost is E N [ h ∗ ] = Γ( N + 1)2Γ( N + p + 1) (cid:34) ( N − 1) Γ (cid:0) N +12 + p (cid:1) Γ (cid:0) N +12 (cid:1) + ( N + 1) Γ (cid:0) N − + p (cid:1) Γ (cid:0) N − (cid:1) (cid:35) . (3.93)For large N it behaves as lim N →∞ E N [ h ∗ ] N = 12 p , (3.94) .3. TRAVELING SALESMAN PROBLEM 10 15 20 25 30 35 40 45 5022 . . . . . N N − E ( p ) N p = − Figure 3.8: Rescaled average optimal cost in the p = − N case. The blue line is the 2 times the theoretic value of the optimal matching. The orange lines(from top to bottom) are the average costs E N [ h ] and E N [ h ] defined in Eqs. (3.95) and (3.96)respectively. The dashed black line is the large N limit of all the curves.which coincides with the scaling derived before for p = 0. Note that for large N the averageoptimal cost of the TSP problem is two times the one of the corresponding matching problemfor p < N even, instead, there are N/ N/ − k of Eq. (3.87).These solutions have 2 links with k = N/ N/ k = N/ N/ − k = N/ h (although they are many differentconfigurations, we use only the label h to stress that all of them share the same average optimalcost) and its average cost is E N [ h ] = Γ( N + 1)Γ( N + p + 1) (cid:34) N (cid:0) N + p − (cid:1) Γ (cid:0) N − (cid:1) + (cid:18) N − (cid:19) Γ (cid:0) N + p + 1 (cid:1) Γ (cid:0) N + 1 (cid:1) + 2 Γ (cid:0) N + p (cid:1) Γ (cid:0) N (cid:1) (cid:35) . (3.95)The other possible solution, that we denote with h has 2 links with k = N/ − N/ k = N/ N/ − k = N/ E N [ h ] = Γ( N + 1)Γ( N + p + 1) (cid:34)(cid:18) N − (cid:19) Γ (cid:0) N + p − (cid:1) Γ (cid:0) N − (cid:1) + (cid:18) N − (cid:19) Γ (cid:0) N + p + 1 (cid:1) Γ (cid:0) N + 1 (cid:1) + 2 Γ (cid:0) N + p (cid:1) Γ (cid:0) N (cid:1) (cid:35) . (3.96)2 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS In Fig. 3.7 we plot the analytical results for p = 0 . 5, 1, 2 and in Fig. 3.8 we compare analyticaland numerical results for p = − 1. In particular, since E N [ h ] > E N [ h ], E N [ h ] provides ourbest upper bound for the average optimal cost of the p = − N even case. The numerical resultshave been obtained by solving 10 TSP instances using its linear programming representation.Now we investigate whether the optimal cost is a self-averaging quantity. We collect inAppendix C.2.4 all the technical details concerning the evaluation of the second moment of theoptimal cost distribution E N , which has been computed for all number of points N and, forsimplicity, in the case p > N limit it goes likelim N →∞ N p − E N [ h ∗ ] = Γ ( p + 2) (3.97)i.e. tends to the square of the rescaled average optimal cost. This proves that the cost is aself-averaging quantity. Using Eq. (C.42) together with Eq. (3.88) one gets the variance of theoptimal cost. In particular for p = 2 we get σ E N = 4( N (5 N ( N + 13) + 66) − N + 1) ( N + 2) ( N + 3)( N + 4) , (3.98)which goes to zero as σ E N (cid:39) /N . We have seen that in one dimension the cost of the solution of the bipartite TSP is twice that thecost of the assignment problem. This actually holds also in two dimensions, where the bipartiteTSP is a genuine NP-hard problem. I.e. for any given choice of the positions of the points,in the asymptotic limit of large N , the cost of the bipartite TSP converges to twice the costof the assignment. However, this claim is non-trivial and it requires several results introducedpreviously, together with a scaling argument which we present in this section. This is anothernoticeable example where information about average properties of the solution of a hard COPcan be obtained even in more than one dimension and in the presence of Euclidean correlations. Scaling argument Given an instance of N blue and N red point positions, let us consider the optimal assignment µ ∗ on them. Let us now consider N points which are taken between the red an blue point of eachedge in µ ∗ and call T ∗ the optimal “monopartite” TSP solution on these points. For simplicity,as these N points we take the blue points.We shall use T ∗ to provide an ordering among the red and blue points. Given two consecutivepoints in T ∗ , for example b and b , let us denote by ( r , b ) and ( r , b ) the two edges in µ ∗ involving the blue points b and b and let us consider also the new edge ( r , b ). We have seenthat, in the asymptotic limit of large N , the typical distance between two matched points in µ ∗ scales as (log N/N ) / (see Sec. 3.2) while the typical distance between two points matched inthe monopartite case scales only as 1 /N / [BHH59], that is (for all points but a fraction whichgoes to zero with N ) w ( b ,r ) = (cid:18) α log NN (cid:19) p ,w ( b ,r ) = (cid:20) β N + α log NN − γ √ log NN (cid:21) p . (3.99) .3. TRAVELING SALESMAN PROBLEM r r r r b b b b Figure 3.9: The optimal assignment µ ∗ is given by the orange edges { ( r , b ) , ( r , b ) , ( r , b ) , ( r , b ) } . The monopartite TSP (gray dashed edges) among bluepoints provides the necessary ordering. In order to obtain the TSP b , r , b , r , b , r , b , r , b in the bipartite graph we have to add the green edges { (( r , b ) , ( r , b ) , ( r , b ) , ( r , b ) } .where ( α log N/N ) / is the length of the edge ( r , b ) of µ ∗ , ( β /N ) / is the length of theedge ( b , b ) of T ∗ and γ = 2 √ α β cos θ , where θ is the angle between the edges ( r , b ) of µ ∗ and ( b , b ) of T ∗ .This means that, typically, the difference in cost∆ E = w ( b ,r ) − w ( b ,r ) ∼ (log N ) p − N p (3.100)is small as compared to the typical cost (log N/N ) p of one edge in the bipartite case. To obtaina valid TSP solution, which we call h A , we add to the edges µ ∗ = { ( r , b ) , . . . , ( r N , b N ) } theedges { ( r , b ) , . . . , ( r N − , b N ) , ( r N , b ) } , see Fig. 3.9.Of course h A is not, in general, the optimal solution of the TSP. However, because ofEq. (3.59), we have that E [ h A ] ≥ E [ h ∗ ] ≥ E [ µ (cid:63) ] (3.101)and we have shown that, for large N , E [ h A ] goes to 2 E [ µ (cid:63) ] and therefore also E [ h ∗ ] must behavein the same way. Notice also that our argument is purely local and therefore it does not dependin any way on the type of boundary conditions adopted, therefore it holds for both open andperiodic boundary conditions.An analogous construction can be used in any number of dimensions. However, the successof the procedure lies in the fact that the typical distance between two points in µ ∗ goes to zeroslower than the typical distance between two consecutive points in the monopartite TSP. This istrue only in one and two dimensions, and as we have already said, it is related to the importanceof fluctuations in the number of points of different kinds in a small volume.This approach allowed us to find also an approximated solution of the TSP which improvesas N → ∞ . However, this approximation requires the solution of a monopartite TSP on N/ Numerical results We confirm our theoretical predictions performing numerical simulations on both assignmentand bipartite TSP. We have considered the case of open boundary conditions.4 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS . 02 0 . 04 0 . 06 0 . 08 0 . . 12 0 . . . . . /N E ∗ N / ( N l og N ) / Assignment (x2)2-matchingTSP . 02 0 . 04 0 . 06 0 . 08 0 . . 12 0 . . . . . . 91 1 /N E ∗ N / l og N Assignment (x2)2-matchingTSP Figure 3.10: Numerical results for p = 1 (left panel) and p = 2 (right panel) for the TSP (redpoints, top), the 2-factor, which is defined in Sec. 3.4 (green points, middle), and 2 times theassignment problem (blue points, bottom) in the open boundary condition case. Continuouslines are numerical fit to the data. p = 1 a a a TSP 0.717(2) 1.32(1) − . − . p = 2 a a a TSP 0.321(5) 1.603(2) − . − . p = 1, 2. We havedoubled the factors for the assignment to verify our hypothesis. For p = 2, we have reported thetheoretical value of a which is 1 /π .For what concerns the assignment problem, we have implemented an in-house solver basedon the LEMON optimization library [DJK11], which is based on the Edmonds’ blossom algo-rithm [Edm65]. In the case of the TSP, the most efficient way to tackle the problem numericallyis to exploit its linear or integer programming formulation.To validate our argument, we solved for the assignment problem (with p = 1 , 2) 10 inde-pendent instances for 2 ≤ N ≤ independent instances for 150 ≤ N ≤ independent instances for 600 ≤ N ≤ N = 300, also reducing the total number ofinstances.An estimate of the asymptotic average optimal cost and finite size corrections has beenobtained using the fitting function for p = 1 f ( p =1) ( N ) = (cid:112) N log N (cid:18) a + a log N + a log N (cid:19) (3.102)while, for p = 2 f ( p =2) ( N ) = log N (cid:18) a + a log N + a log N (cid:19) . (3.103)These are the first 3 terms of the asymptotic behavior of the cost of the assignment prob-lem [AKT84, CLPS14]. Parameters a and a for p = 2 were obtained fixing a to 1 /π . In .3. TRAVELING SALESMAN PROBLEM N →∞ E [ h ∗ ] E [ µ ∗ ] = 2 . (3.104)This implies, for the special case p = 2, by using the second line of Eq. (3.44), an exact, analyticalresult: lim N →∞ ( E [ h ∗ ] / log N ) = 1 /π . In general, the evaluation of the large N value of the costof solutions of the bipartite TSP is reduced to the solution of the matching problem with thesame number of points, which requires only polynomial time. This seems to be a peculiar featureof the bipartite problem: the “monopartite” TSP cannot be approached in a similar way. There are many other very interesting research papers about the TSP. Here we limit ourselvesto report some results regarding the average of the solution cost in higher dimension.As for the matching case, the mean field version of the problem can be studied with replicamethods [MP86a] and the so-called cavity method [KM89]. When the graph is complete and thelinks weights are IID random variables distributed according to the law ρ ( (cid:96) ) ∼ (cid:96) → (cid:96) r r ! (3.105)where r is a parameter (notice that the behavior of the distribution far from (cid:96) = 0 is irrelevant),we have that, for large N , E ( r ) N ∼ N − / ( r +1) L r , (3.106)where L r can be computed numerically up to the desired precision. Notice that there resultare, as in the matching case, obtained by using a RS ansatz (or the analogous hypothesis forthe cavity method) and are confirmed by extensive numerical simulations (thus certifying theexactness of the RS ansatz for this problem).Another famous result (which is actually one of the first about the RCOP version of the TSP)due to Beardwood, Halton and Hammersley [BHH59] is about euclidean TSP in d > V , and the cost function is the total lengthof the tour (that is p = 1). In that case, we have, for large NE N ∼ C d N − /d V /d . (3.107)where the constant C d is unknown analytically and has been estimated, up to a certain precision,for several number of dimensions by solving numerically the TSP and averaging the cost of thesolution.6 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS In this Section we will deal with the 2-factor problem which consists, given an undirected graph,in finding a spanning subgraph that contains only disjoint cycles (that is, a 2-factor). For thisreason this problem is also called loop covering of a graph.The 2-factor problem can be seen as a relaxation of the TSP, in which one has the additionalconstraint that there must be a unique cycle. We mention that also this problem can be studiedusing replica (and cavity) methods in the mean field case: one finds that, for large number ofpoints, its average optimal cost is the same of that of the TSP.In the following we will study the 2-factor problem in one dimension and in two dimensions,both on the complete bipartite graph and, only in one dimension, on the complete graph. Thedisorder in this problem will be introduced by drawing the points independently from the uniformdistribution over the compact interval [0 , 1] or over the square [0 , × [0 , p .This problem can be seen as an intermediate problem between the assignment (or matching)and the TSP: indeed, in the former case we search for the minimum-cost 1-factor of a graph,while in the latter we are interested in the minimum-cost N -factor if the graph has N vertices.Nonetheless, when tackled in one dimension for p > 1, we will see that there is an importantdifference between the 2-factor and the other studied problems: while almost for every instanceof the problem there is only one solution, by looking at the whole ensemble of instances it appearsan exponential number of possible solutions scaling as p N , where p is the plastic constant (seeAppendix C.3.3). This is in contrast with the matching and TSP cases, where we have seenthat, for p > 1, for every realization of the disorder the configuration that solves the problem isalways the same. Moreover, also for p < 0, when for the TSP in the complete graph there aremore than one possible optimal tour, they are N different possibilities (when the graph has 2 N vertices), while for the 2-factor we have an exponential number (in N ) of them.Let us start by defining formally the problem. Consider a graph G and the set of 2-factors ofthis graph, M . Suppose now that a weight w e > e ∈ E of the graph G . We can associate to each 2-factor ν ∈ M a total cost E ( ν ) := (cid:88) e ∈ ν w e . (3.108)In the (weighted) 2-factor problem we search for the 2-factor ν ∗ ∈ M such that the total costin Eq. (3.108) is minimized, that is E ( ν ∗ ) = min ν ∈M E ( ν ) . (3.109)If H is the set of Hamiltonian cycles for the graph G , of course H ⊂ M and therefore if h ∗ isthe optimal Hamiltonian cycle, we have E [ h ∗ ] ≥ E [ ν ∗ ] , (3.110)which is a relation between the cost of the solution of the 2-factor problem and the TSP on thesame graph.From now on we specialize to the Euclidean version of the problem, and so when the graph iscomplete or complete bipartite and is embedded in [0 , d ⊂ R d . For the complete case G = K N .4. BETWEEN MATCHING AND TSP: 2-FACTOR PROBLEM i ∈ [ N ] = { , , . . . , N } we associate a point x i ∈ [0 , d , and for each e = ( i, j )with i, j ∈ [ N ] we introduce a cost which is a function of their Euclidean distance w e = | x i − x j | p (3.111)with p ∈ R . Analogously for the complete bipartite graph K N,N , we have two sets of points in[0 , d , that is, say, the red { r i } i ∈ [ N ] and the blue { b i } i ∈ [ N ] points, and the edges connect redpoints with blue points with a cost w e = | r i − b j | p . (3.112)For a discussion on this problem on an arbitrary graph G , see [BBCZ11] and references therein.Let us now focus on the case of complete bipartite graph K N,N , where each cycle in a 2-factor must have an even length. Let S N be the symmetric group of order N and consider twopermutations σ, π ∈ S N . If for every i ∈ [ N ] we have that σ ( i ) (cid:54) = π ( i ), then the two permutationsdefine the 2-factor ν ( σ, π ) with edges e i − := ( r i , b σ ( i ) ) (3.113) e i := ( r i , b π ( i ) ) (3.114)for i ∈ [ N ]. And, vice versa, for any 2-factor ν there is a couple of permutations σ, π ∈ S N , suchthat for every i ∈ [ N ] we have that σ ( i ) (cid:54) = π ( i ).It will have total cost E [ ν ( σ, π )] = (cid:88) i ∈ [ N ] (cid:2) | r i − b σ ( i ) | p + | r i − b π ( i ) | p (cid:3) . (3.115)By construction, if we denote by µ [ σ ] the matching associated to the permutation σ and by E [ µ ( σ )] := (cid:88) i ∈ [ N ] | r i − b σ ( i ) | p (3.116)its cost, we soon have that E [ ν ( σ, π )] = E [ µ ( σ )] + E [ µ ( π )] (3.117)and we recover that E [ ν ∗ ] ≥ E [ µ ∗ ] , (3.118)i.e. the cost of the optimal 2-factor is necessarily greater or equal to twice the optimal 1-factor.Together with inequality (3.110), which is valid for any graph, we obtain that E [ h ∗ ] ≥ E [ ν ∗ ] ≥ E [ µ ∗ ] . (3.119)Previously in this Chapter we have seen that in the limit of infinitely large N , in one dimensionand with p > 1, the average cost of the optimal Hamiltonian cycle is equal to twice the averagecost of the optimal matching (1-factor). We conclude that the average cost of the 2-factor mustbe the same. Moreover, since inequality (3.119) holds also in 2 dimensions, also in that case thecost of the 2-factor problem has the same limit, for large N of that obtained for assignment andbipartite TSP (see Fig. 3.10).In the following we will denote with E ( p ) N,N [ ν ∗ ] the average optimal cost of the 2-factor problemon the complete bipartite graph. Its scaling for large N will be the same of the TSP and thematching problem, that is the limit lim N →∞ E ( p ) N,N [ ν ∗ ] N − p/ = E ( p ) B , (3.120)8 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS is finite.On the complete graph K N inequality (3.118) does not hold, since a general 2-factor con-figuration cannot always be written as a sum of two disjoint matchings, due to the presence ofodd-length loops. Every 2-factor configuration on the complete graph can be determined by onlyone permutation π , satisfying π ( i ) (cid:54) = i and π ( π ( i )) (cid:54) = i for every i ∈ [ N ]. The cost can be writtenas E [ ν ( π )] = (cid:88) i ∈ [ N ] | x i − x π ( i ) | p . (3.121)The two constraints on π assure that the permutation does not contain fixed points and cyclesof length 2. In the following we will denote with E ( p ) N [ ν ∗ ] the average optimal cost of the 2-factorproblem on the complete graph. Even though inequality (3.118) does not hold, we expect that forlarge N , the average optimal cost scales in the same way as the TSP and the matching problem,i.e. as lim N →∞ E ( p ) N [ ν ∗ ] N − p = E ( p ) M . (3.122)Later we will give numerical evidence for this scaling. Here we will consider the case p > 1, that is the weight associated to an edge is a convex andincreasing function of the Euclidean distance between its two vertices. This section is taken from[CGM18]. Let us now look for the optimal solutions for the 2-factor.The possible solutions for N = 6 and are represented schematically in Fig. 3.11a. For N = 7there are three solutions and so on.The first observation that we can do is that in any optimal 2-factor ν ∗ all the loops must bein the shoelace configuration, that is the one that we found for the TSP.Indeed in each loop there is the same number of red and blue points and the result we proved forthe one dimensional bipartite TSP shows indeed that the shoelace loop is always optimal (whenthe number of loops used has to be one).Moreover, in any optimal 2-factor ν ∗ there are no loops with more than 3 red points. Indeed,as soon as the number of red points (and therefore blue points) in a loop is larger than 3, amore convenient 2-factor is obtained by considering a 2-factor with two loops. In fact, as canbe seen in Fig. 3.12a, the cost gain is exactly equal to the difference between an ordered and anunordered matching which we know is always negative for p > ν ∗ there areonly shoelaces loops with 2 or 3 red points.The reason why there is not a solution which is always the optimal independently on the pointpositions is that two different 2-factors in this class are not comparable, that is all of themcan be optimal in particular instances. For example, the possible solutions for N = 6 and arerepresented schematically in Fig. 3.11a. For N = 7 there are three solutions and so on.But how many of these possible solutions there are? At given number N of both red and bluepoints there are at most Pad( N − 2) optimal 2-factor ν ∗ . Pad( N ) is the N -th Padovan number,see Appendix C.3, where it is also shown that for large N Pad( N ) ∼ p N (3.123)with p the plastic number (see Appendix C.3.3 for a discussion on this constant).Actually, for values of N which we could explore numerically, we saw that all Pad( N − .4. BETWEEN MATCHING AND TSP: 2-FACTOR PROBLEM r r r r r r b b b b b b r r r r r r b b b b b b (a) Two instances whose optimal solutions are the twopossible ν ∗ for N = 6 on the complete bipartite graph K N,N . For each instance the blue and red points arechosen in the unit interval and sorted in increasing or-der, then plotted on parallel lines to improve visualiza-tion. x x x x x x x x x x x x x x (b) Two instances whose optimal solutions are the twopossible ν ∗ for N = 7 on the complete graph K N . Foreach instance the points are chosen in the unit intervaland sorted in increasing order. Figure 3.11: Optimal solutions for small N cases. Cost for finite N We have already seen that Eq. (3.119) guarantees that in the large N limit the average solutioncost of the 2-factor problem is the same of the bipartite TSP (with the same N ).We have proved that, for every value of N , the optimal 2-factor solution is always composed byan union of shoelaces loops with only two or three points of each color. As a consequence of thisfact, differently from the assignment and the TSP cases, different instances of the disorder canhave different spanning subgraphs that minimize the cost function. In particular these spanningsubgraphs can always be obtained by “cutting” the optimal TSP cycle (see Fig. 3.13) in a waywhich depends on the specific instance. This “instance dependence” makes the computation ofthe average optimal cost particularly difficult. However, Eq. (3.119) guarantees that the averageoptimal cost of the 2-factor problem is bounded from above by the TSP average optimal costand from below by twice the assignment average optimal cost. Since in the large N limit thesetwo quantities coincide, one obtains immediately the large N limit of the average optimal costof the 2-factor problem. Unfortunately, this approach is not useful for a finite-size system. Butwe can use Selberg integrals to obtain an upper bound: indeed we can compute the average costobtained by “cutting” the TSP optimal cycle in specific ways. When we cut at the k -positionthe optimal TSP into two different cycles we gain an average cost E ( p ) k = | b k +1 − r k | p + | r k +1 − b k | p − | b k +1 − r k +1 | p − | b k − r k | p . (3.124)0 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS r r r k r k + r N − r N b b b k b k + b N − b N y r r r k − r k r k + r k + r N − r N b b b k − b k b k + b k + b N − b N (a) K N,N case x x x x k x k + x k + x k + x N − x N − x N y x x x x k x k + x k + x k + x N − x N − x N (b) K N case Figure 3.12: Result of one cut of the shoelace in two smaller ones for both the complete bipartiteand complete graph cases. The cost gained is exactly the difference between an unorderedmatching and an ordered one.By using Eq. (C.5) and the generalized Selberg integral given in Eq. (3.64), we obtain | b k − r k | p − | b k +1 − r k | p == Γ ( N + 1) Γ( p + 1) Γ (cid:0) k + p (cid:1) Γ (cid:0) N − k + p + 1 (cid:1) Γ( k ) Γ( N − k + 1) Γ( N + p + 1) Γ (cid:0) N + p + 1 (cid:1) Γ (cid:0) p + 1 (cid:1) (cid:20) − k + p k (cid:21) = − p ( N + 1) Γ( p + 1) Γ (cid:0) k + p (cid:1) Γ (cid:0) N − k + p + 1 (cid:1) Γ( k + 1) Γ( N − k + 1) Γ( N + p + 1) Γ (cid:0) N + p + 1 (cid:1) Γ (cid:0) p + 1 (cid:1) , (3.125)and similarly | b k +1 − r k +1 | p − | r k +1 − b k | p == Γ ( N + 1) Γ( p + 1) Γ (cid:0) k + p + 1 (cid:1) Γ (cid:0) N − k + p (cid:1) Γ( k + 1) Γ( N − k ) Γ( N + p + 1) Γ (cid:0) N + p + 1 (cid:1) Γ (cid:0) p + 1 (cid:1) (cid:20) − N − k + p N − k (cid:21) = − p ( N + 1) Γ( p + 1) Γ (cid:0) k + p + 1 (cid:1) Γ (cid:0) N − k + p (cid:1) Γ( k + 1) Γ( N − k + 1) Γ( N + p + 1) Γ (cid:0) N + p + 1 (cid:1) Γ (cid:0) p + 1 (cid:1) . (3.126)Their sum is E ( p ) k = p ( N + 1) Γ( p + 1) Γ (cid:0) k + p (cid:1) Γ (cid:0) N − k + p (cid:1) Γ( k + 1) Γ( N − k + 1) Γ( N + p ) Γ (cid:0) N + p + 1 (cid:1) Γ (cid:0) p + 1 (cid:1) , (3.127)For p = 2 this quantity is E (2) k = 2( N + 1) . (3.128)Since this quantity does not depends on k , for p = 2 the best upper bound for the averageoptimal cost is given by summing the maximum number of cuts that can be done on the optimalTSP cycle. Therefore for N even the 2-factor with lowest average energy is ν (2 , ,..., and then E (2) N,N [ ν (2 , ,..., ] = 23 N + 4 N − N + 1) − N − N + 1) = 13 N (2 N + 5)( N + 1) , (3.129) .4. BETWEEN MATCHING AND TSP: 2-FACTOR PROBLEM N = 4 case, where the cutting operation is unique. Notice that blue and redpoints are chosen on a interval, but here they are represented equispaced on two parallel lines toimprove visualization.is an upper bound for the optimal average cost since, even though this configuration has theminimum average cost, for every fixed instance of disorder there can be another one which isoptimal. For N odd, one of the 2-factors with lowest average energy is ν (2 , ,..., , and E (2) N,N [ ν (2 , ,..., , ] = 23 N + 4 N − N + 1) − N − N + 1) = 13 2 N + 5 N + 3( N + 1) . (3.130)Therefore that essentially the upper bound for the optimal average cost for even and odd large N is the same. For p = 2, these bounds are compared with the results of numerical simulationsin Fig. 3.15a.For p (cid:54) = 2, E k depends on k . In particular, for 1 < p < p > p (cid:54) = 2, however, this sum does not give a simple formula. Finally, we analyze here the 2-factor problem in one dimension on complete graphs, in the p > ν ∗ there are only loops with 3, 4 or 5 points.In Fig. 3.12b we represent the two solutions when N = 7. In Appendix C.3 we prove that,similarly to the bipartite case, the number of 2-factor solutions is at most g N on the completegraph, which for large N grows according to g N ∼ p N . (3.131)2 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS − . − . − . − . − . − . − . k E ( p ) k Figure 3.14: Plot of E ( p ) k given in Eq. (3.127) for various values of p : the green line is calculatedwith p = 2 . 1, the orange with p = 2 and the blue one with p = 1 . 9; in all cases we take N = 100.Also in this case we verified numerically, for accessible N , that the set of possible solutions thatwe have identified is actually realized by some instance of the problem.Using these informations on the shape of the solution, we turn to the evaluation of bounds onits cost. Let us first evaluate the cost gain when we cut a TSP solution cycle in two “shoelaces”(we keep using here the word shoelace to indicate the cycle which is the solution to the TSP onthe complete graph) sub-cycles. For p > x k +1 − x k ) p + ( x k +3 − x k +2 ) p − ( x k +3 − x k +1 ) p − ( x k +2 − x k ) p = − p Γ( N + 1) Γ( p + 1)Γ( N + p + 1) . (3.132)For example for N = 6 (in which the solution is unique since 6 can be written as a sum of 3, 4and 5 in an unique way as 3+3) and p = 2 we have E (2)6 = 12 − 17 = 514 . (3.133)If N is multiple of 3, the lowest 2-factor is, on average, the one with the largest number of cutsi.e. ν (3 , ,..., . The number of cuts is ( N − / E ( p ) N [ ν (3 , ,..., ] = N (cid:16) p (cid:17) Γ( N + 1) Γ( p + 1)Γ( N + p + 1) . (3.134)Instead if N can be written as a multiple of 3 plus 1, the minimum average energy configurationis ν (3 , ,..., , , which has ( N − / E ( p ) N [ ν (3 , ,..., ] = (cid:20) N (cid:16) p (cid:17) + 23 p (cid:21) Γ( N + 1) Γ( p + 1)Γ( N + p + 1) . (3.135) .4. BETWEEN MATCHING AND TSP: 2-FACTOR PROBLEM . . . . . . . . / −→ N E ( ) N , N (a) K N,N case with p = 2. The orange line is the costof the TSP given in Eq. (3.69) for p = 2; the green linesare, from above, the cost of the optimal fixed 2-factor ν (2 , ,..., , given in Eq. (3.130) and ν (2 , ,..., given inEq. (3.129). The dashed black line is the asymptoticvalue and the blue continuous one is twice the costof the optimal 1-matching NN +1 . Red points are theresults of a 2-factor numerical simulation, in which wehave averaged over 10 instances. N N E ( ) N (b) K N case with p = 2. Here the average cost isrescaled with N . The orange line is the cost of theTSP given in Eq. (3.88) for p = 2. The green lines arefrom above the cost of the fixed 2-factor ν (3 , ,..., , given in Eq. (3.136), ν (3 , ,..., given in Eq. (3.135)and ν (3 , ,..., given in Eq. (3.134). Red points are theresults of a numerical simulation for the 2-factor, inwhich we have averaged over 10 instances for N ≤ for 30 < N ≤ 50 and 10 for N > Figure 3.15: Average optimal costs for various N and for p = 2.The last possibility is when N is a multiple of 3 plus 2, so the minimum average energy config-uration is ν (3 , ,..., , , with ( N − / E ( p ) N [ ν (3 , ,..., ] = (cid:20) N (cid:16) p (cid:17) + 43 p (cid:21) Γ( N + 1) Γ( p + 1)Γ( N + p + 1) . (3.136)In the limit of large N all those three upper bounds behave in the same way. For examplelim N →∞ E ( p ) N [ ν (3 , ,..., ] = N − p (cid:16) p (cid:17) Γ( p + 1) . (3.137)Note that the scaling of those upper bounds for large N is the same of those of matching and TSP.For p = 2, these bounds are compared with the results of numerical simulations in Fig. 3.15b.4 CHAPTER 3. FROM MEAN FIELD TO EUCLIDEAN PROBLEMS hapter 4 Quantum point of view In this Chapter we will deal with another field which lies between physics and computer science:quantum computing. Quantum computers have been considered for the first time by Feynmannto simulate quantum systems (or, better, physical systems in which quantum effects are relevant).We, on the other hand, will focus on the possibility of using quantum computers to solve hardcombinatorial optimization problems. After the important works by Shor and Grover, manyconcepts about quantum algorithms to solve COPs have been understood, and we will discusssome of them. We will then specialize in the so called quantum adiabatic algorithm , in the formof the simulated annealing , which is usable today in the largest chip that performs computationsusing quantum effects, i.e. the D-Wave machine. Finally, we will briefly comment on a recentand promising approach to approximate (and sometimes also solve) COPs in gate models, thefamous quantum approximate optimization algorithm . The study of quantum computation is flourishing in these recent years for two main reasons: thediscovery of powerful quantum algorithms (Shor [Sho99] and Grover [Gro97]) in the late 90s, andthe advent of real computers able to exploit quantum effects during the computation.As a consequence, there are many good books ([NC00, RP11, KLM + 07, Mer07]) and reviews(for example, [Aha99]) where a complete introduction to the subject can be found. Here we willfocus on quantum algorithms for COPs, disregarding completely other fundamental topics as,for example, quantum error correction and fault-tolerant quantum computation. The basic building block of classical computation is the bit, which can be in state 0 or 1. Thequantum version of that is the qubit , which is a two level system. Therefore its general state is | q (cid:105) = a | (cid:105) + b | (cid:105) , (4.1)where a and b are complex numbers we require | a | + | b | = 1, so that the state is normalized.We represent | (cid:105) = (cid:18) (cid:19) and | (cid:105) = (cid:18) (cid:19) (4.2)756 CHAPTER 4. QUANTUM POINT OF VIEW and we will refer to this as the computational basis . When we have N qubits, the computationalbasis is the set of states | q (cid:105) ⊗ | q (cid:105) ⊗ · · · ⊗ | q N (cid:105) , (4.3)for each choice of q i ∈ { , } . Therefore the Hilbert space describing the state of a N qubitsystem is 2 N dimensional.Let us now come back for a moment to the classical world: if we have a system of N bits, wehave 2 N possible states of our system. Let us see the state of our system (computer) as a basisvector of the 2 N -dimensional space C N , | s (cid:105) = b ... b N , (4.4)where only one bit b i is 1, and all the other are 0 (we use the braket formalism also for thisrepresentation of classical states). For example, we have for a two-bit system | (cid:105) = , | (cid:105) = , | (cid:105) = , | (cid:105) = . (4.5)Therefore, it seems that quantum computers could be more powerful of classical computerssimply because we can store much more information in N qubits than in N bits, since in theformer case the system can be in any of the linear combinations (with unit (cid:96) norm) of the 2 N basis vectors, while in the latter it lives inside the basis .However, this is not the end of the story: a deterministic program for a classical computer, inthis formalism, can be seen as a matrix which is applied to | s (cid:105) and modify the state of the system.For example, if we have a two-bit system and we want to have assign 1 to the second bit, weapply the matrix M = , (4.6)so that M | (cid:105) = | (cid:105) , M | (cid:105) = | (cid:105) , M | (cid:105) = | (cid:105) , M | (cid:105) = | (cid:105) . (4.7)In general, a computation will be a matrix with elements M ij ∈ { , } such that (cid:80) i M ij = 1for each j , since this condition correspond to the fact that we want our matrix to map one basisstate in another basis state.Nonetheless, we can do something closer to quantum computing. For example, we could havein our code instructions like “with probability 1/2, assign 1 to the second bit”. This kind ofinstructions, which are not deterministic, are captured by using stochastic matrices, that is withelements (cid:80) j M i,j = 1 but now with the only restriction that M ij ≥ 0. For our case: M s = / / / 20 0 0 1 / , (4.8) .1. QUANTUM COMPUTATION FOR COPS M s | (cid:105) = / / = 12 | (cid:105) + 12 | (cid:105) . (4.9)This result has to be interpreted as follows: “if we use the computer program M s with inputstate | (cid:105) , with 1/2 or probability the output state will be | (cid:105) and with 1/2 it will be | (cid:105) ”.And this is very close to the meaning of a quantum state for a qubit: if the state is | q (cid:105) = 1 √ | (cid:105) + 1 √ | (cid:105) , (4.10)and we measure the qubit in the computational basis, we have 1/2 of probability of obtaining 0and 1/2 of obtaining 1.Therefore, if we allow for “stochastic” instructions in our code, we can really have “superpo-sitions” of basis states of the form given in Eq. (4.4), provided that their coefficients are positiveand sum to 1: | s (cid:105) = N (cid:88) i =1 a i | s i (cid:105) , (4.11)where the | s i (cid:105) are the basis states given in Eq. (4.4), (cid:80) i a i = 1 and a i ≥ N qubits | q (cid:105) = N (cid:88) i =1 a i | q i (cid:105) , (4.12)where the states are as in the classical case, but now a i are complex numbers such that (cid:80) i | a i | =1. Given a state, the computation is done by multiplying the state for a unitary matrix U andthen measuring the state in the computational basis.Notice that, physically, this means that the initial state | q (cid:105) of the system is evolved with theHamiltonian H such that Texp (cid:18) − i (cid:126) (cid:90) t t dt H ( t ) (cid:19) | q (cid:105) = U | q (cid:105) . (4.13) As we have seen, there are two main differences between stochastic classical and quantum com-putation: in the first case the “amplitudes” of each basis state are positive quantities which sumto 1 (so they are probabilities). In the second case, the amplitudes are complex numbers andtheir modulus squared sum to 1. In fact, it turns out that the power of quantum computing isnot due to the fact that amplitudes are complex numbers, but rather to the (less stringent) factthat they can assume negative values [BV97]. The reason is that with negative amplitudes wecan create interference phenomena to decrease the probability of unwanted output states andincrease that of the solution to our problem.8 CHAPTER 4. QUANTUM POINT OF VIEW Let us deepen this intuition with a practical example: consider a system of 2 qubits. We needto define two (actually very important) gates: the Hadamard gate H , that is defined by H | (cid:105) = | (cid:105) + | (cid:105)√ | + (cid:105) , H | (cid:105) = | (cid:105) − | (cid:105)√ |−(cid:105) , (4.14)and therefore in the representation used in Eq. (4.2), we have: H = 1 √ (cid:18) − (cid:19) . (4.15)The other gate we need is a two-qubit one, the CNOT gate defined by C not = | (cid:105) (cid:104) | ⊗ I + | (cid:105) (cid:104) | ⊗ X, (4.16)where I is the identity 2 × X is the Pauli matrix X = (cid:18) (cid:19) . (4.17)The Hadamard gate H is such that a qubit in the state | (cid:105) or | (cid:105) has equal probability to bemeasured in | (cid:105) or | (cid:105) after H is applied. In this sense, the application of H has a similaritywith the classic operation of randomly flipping a bit (to some extent!).Also the C not gate has a simple actions on the computational basis states: if the first qubit is | (cid:105) , it does nothing; if the first qubit is | (cid:105) , the second qubit is flipped.Now consider the state | (cid:105) and apply firstly H to both qubits | (cid:105) → H ⊗ H | (cid:105) = | + −(cid:105) = | (cid:105) − | (cid:105) + | (cid:105) − | (cid:105)√ C not gate |− + (cid:105) → C not |− + (cid:105) = | (cid:105) − | (cid:105) + | (cid:105) − | (cid:105)√ |−−(cid:105) (4.19)and, finally, again the H gate to the second qubit |−−(cid:105) → I ⊗ H |−−(cid:105) = |− (cid:105) = | (cid:105) − | (cid:105)√ . (4.20)Therefore, after the process, we have equal probability to be in the states | (cid:105) and | (cid:105) . If wetry to replicate classically this short algorithm, we can try to do the following: take the two bitsin | (cid:105) , and randomly flips them. Notice that if we make our measurements here, the results ofthe classical and quantum systems are indistinguishable. Then, we ask a friend of ours to lookthe first bit and change the second if it is 1, otherwise do nothing. Again, at this point therehas been no interference of probabilities and we could not distinguish the qubits and the bitssystems: each possible outcome is equally probable. Finally, randomly flip the second bit again.Clearly, after the first step, each outcome has the same probability classically, in sharp contrastwith (4.20): in the quantum system, the probability of the outcomes | (cid:105) and | (cid:105) is zero! InFig. 4.1 there is a graphical representation of the situation.Another important difference between the classical and quantum case, is the entanglement . Atwo-qubit state is said to be entangled if it cannot be written as tensor product of two single-qubit .1. QUANTUM COMPUTATION FOR COPS | (cid:105) | (cid:105)| (cid:105)| (cid:105)| (cid:105) | (cid:105)| (cid:105)| (cid:105)| (cid:105) | (cid:105)| (cid:105)| (cid:105)| (cid:105) Figure 4.1: Paths of amplitude if we apply H ⊗ H , then C not , then I ⊗ H to the initial state | (cid:105) In this case, when more than one line originates from the same state, the probability is equallydivide; if more than one line ends on the same state, the probability is summed; if a line is red,the amplitude is negative.states . A famous entangled two-qubit state is | ψ (cid:105) = | (cid:105) + | (cid:105)√ . (4.21)The question is: are there any differences between this state and a classical state of two stronglycorrelated qubits? Consider that two qubits are in the state | ψ (cid:105) . These two qubits are bringfar away, and then one of them is measured (in the computational basis) and suppose that theoutcome is 0: instantaneously we know that, whenever the other will be measured, the outcomewill be again 0. This is not necessarily a quantum effect: suppose that your cousin randomlywrites a 0 or a 1 on a paper and prepares two identical copies of that. Then she sends onecopy to you and one to your brother into an envelope. When you will look at your paper, youwill immediately know the content of the other envelope. So what is the point in quantumentanglement? The best explanation requires another ingredient: non-commuting observables.Actually, we need two sets of two non-commuting single-qubit observables: let us consider thoseassociated to the Pauli matrix X and the Pauli matrix Z , Z = (cid:18) − (cid:19) , (4.22)and those associated with Hadamard gate H and with H (cid:48) defined as H (cid:48) = XHX = 1 √ (cid:18) − (cid:19) . (4.23) multiple-qubit states can be entangled or not depending on the tensor decomposition under consideration:for example, the state ( | (cid:105) + | (cid:105) + | (cid:105) + | (cid:105) ) / CHAPTER 4. QUANTUM POINT OF VIEW Equipped with the ability to measure these observables, let us take two qubits in the state | ψ (cid:105) and give one of them to Mario and one of them to Luigi . Now, Mario can measure his qubit,let us say the first one, with the observables associated to X and Z . Therefore, if he measureson his qubit the observable associated to X , his expected outcome is (cid:104) X ⊗ I (cid:105) = (cid:104) ψ | X ⊗ I | ψ (cid:105) , (4.24)and similarly for Z . An analogous situation holds for Luigi, with H and H (cid:48) instead of X and Z .Now, let us suppose that Mario and Luigi randomly choose which measurement they do.There are 4 possible different situations, and the expected values of the measurements are (cid:104) Z ⊗ H (cid:105) = − (cid:104) Z ⊗ H (cid:48) (cid:105) = (cid:104) X ⊗ H (cid:105) = (cid:104) X ⊗ H (cid:48) (cid:105) = 1 √ . (4.25)Therefore, if we take the quantity W = Z ⊗ H − Z ⊗ H (cid:48) + X ⊗ H + X ⊗ H (cid:48) , (4.26)we expect that (cid:104) W (cid:105) = 2 √ . (4.27)But here is the best part: we ask Mario and Luigi to go very far away, like a light-year orso. Then we hypothesize that whatever Mario does with his qubit, it will not change in anyway Luigi’s qubit (this is called locality hypothesis). Moreover, we assume that Mario’s qubit does have a value for the measurement of both X and Z , and analogously for Luigi’s qubit(this is the realistic hypothesis.) Let us elaborate a little about that: classically we could havenon-commutating observables, in the sense that some measurements can interfere with others,and so the order in which we perform measurements matters. But if we have two devices whichdo those measurements, we (in the classical point of view) do not doubt that the system doeshave at any moment certain values that we could measure. We are doing exactly that hypothesishere: Mario’s qubit has a certain value for the measure related to X , say x M , and another valuefor Z , say z M . The problem is that we do not know that values, because both x M and z M canbe -1 or 1 with probability 1/2 because the initial state is | ψ (cid:105) (a completely analogous situationholds for Luigi’s qubit). But under these hypothesis, we can write W = z M h L − z M h (cid:48) L + x M h L + x M h (cid:48) L = z M ( h L − h (cid:48) L ) + x M ( h L + h (cid:48) L ) . (4.28)Now, remember that all these quantities can be only -1 or 1. Therefore if h L = h (cid:48) L we have W = 2 x M , otherwise W = 2 z M . As a consequence, we obtain a form of the so-called Bellinequalities : | W | ≤ , (4.29)which contradicts Eq. (4.27). This “paradox” has been noticed for the first time by Einstein,Podolsky and Rosen [EPR35], but today many experiments have confirmed that the inequalityin (4.29) is violated: our reality is not local and realistic. Moreover, this explains the differencebetween entanglement and classical correlation: for entangled qubits Eq. (4.27) holds, while thereasoning which brought us to the inequality in (4.29) is correct if we have classical correlatedvariables.It has been showed that for an algorithm working with pure states, entanglement among anumber of qubit which scale as O ( N ) ( N being the input size) is necessary for that algorithm notto be efficiently simulated by classical computers [JL03]. However, even though entanglementand interference are two important resources which are not available to classical computers, thepower of quantum computation has more subtle origins which are not completely understoodtoday [RP11, Section 13.9]. the usual names for these two guys are Alice and Bob. .1. QUANTUM COMPUTATION FOR COPS Grover algorithm is an excellent example of the power of quantum computing at work. Wecan state the problem as follows. We are given a oracle f such that f ( i ) ∈ { , } for each i ∈ { , . . . , N } = [ N ]. We do not know anything about the internal structure of the oracle, thatis we have no idea of what the oracle is actually computing. The only thing we know is that one among the set of possible inputs, k ∈ [ N ], is such that f ( k ) = 1 and f ( i ) = 0 for i (cid:54) = k . Our aimis to find k .Now, classically the only way to proceed is to try all the possible inputs: on average, we willneed N/ f ), and N − | i (cid:105) where i ∈ [ N ] (therefore, it can be represented by log N qubits) and the other is an additionalqubit in state | q (cid:105) . Therefore the state of the whole system is | i (cid:105) ⊗ | q (cid:105) . Let us suppose thatthe oracle works as follows: it is implemented by a unitary U f such that, for | q (cid:105) = | (cid:105) or | (cid:105) , U f | i (cid:105) ⊗ | q (cid:105) = | i (cid:105) ⊗ | q ⊕ f ( i ) (cid:105) , where ⊕ denotes addition modulo 2. Equivalently, we can see that U f | i (cid:105) ⊗ | q (cid:105) = (cid:40) | i (cid:105) ⊗ X | q (cid:105) if i = k ; | i (cid:105) ⊗ | q (cid:105) if i (cid:54) = k. (4.30)In other words, U f | i (cid:105) ⊗ | q (cid:105) = | i (cid:105) ⊗ X f ( i ) | q (cid:105) . Therefore the oracle flips the qubit in the secondregister if in the first one there is the value k such that f ( k ) = 1, otherwise it left the qubituntouched (this reasoning is valid for the qubit in computational basis states). Now, we preparethe first register in the superposed state | ψ (cid:105) = 1 √ N N (cid:88) j =1 | j (cid:105) (4.31)and the qubit in the second register in the state |−(cid:105) . When we apply the oracle to the system,we obtain U f | ψ −(cid:105) = 1 √ N (cid:88) j (cid:54) = k | j −(cid:105) − √ N | k −(cid:105) = U ⊗ I | ψ −(cid:105) , (4.32)where the operator U is defined by U = I − | k (cid:105) (cid:104) k | . (4.33)Since the second qubit is left untouched by the application of U f , we will stop writing him down(the remaining part of the algorithm works on the first register). However, keep in mind thateach time we apply U , we are querying the oracle once.We also need another operator, the diffusion operator D , defined as D = 2 | ψ (cid:105) (cid:104) ψ | − I , (4.34)where | ψ (cid:105) is given in Eq. (4.31). This operator is unitary (it can be written as − e iπ | ψ (cid:105)(cid:104) ψ | ) andcan be efficiently implemented in ∼ log N elementary gates (see, for example, [RP11, Section9.1.3]).A great simplification for the analysis of Grover algorithm comes from the fact that the onlyoperators involved are those in Eq. (4.33) and in Eq. (4.34). Since these operators can be writtenin terms of the projector on | ψ (cid:105) and on | k (cid:105) (and identities), we can restrict our analysis to the2 CHAPTER 4. QUANTUM POINT OF VIEW two-dimensional space spanned by these two vectors. In this space, a basis is composed by thetwo vectors {| k (cid:105) , | ν (cid:105)} , where | ν (cid:105) = 1 √ N − (cid:88) j (cid:54) = k | j (cid:105) . (4.35)We represent | k (cid:105) = (cid:18) (cid:19) | ν (cid:105) = (cid:18) (cid:19) (4.36)and we have DU = (cid:18) cos θ − sin θ sin θ cos θ (cid:19) , (4.37)where cos θ = 1 − /N and since cos θ ∼ − θ / θ , we obtain θ ∼ / √ N . Thereforethe application of the operator DU corresponds to a rotation of an angle θ .Now, we start from the state | ψ (cid:105) , which is close to | ν (cid:105) for large N . The state | i (cid:105) is orthogonal to | ν (cid:105) , so their relative angle is π/ 2. Therefore we need to apply the operator DU for t = π/ θ ∼ π √ N (4.38)times to rotate the initial state to the target state.After this operation, the probability of obtaining k with a measure is (cid:12)(cid:12) (cid:104) k | ( DU ) t | ψ (cid:105) (cid:12)(cid:12) ≥ cos θ ∼ . (4.39)Since each usage of the operator U corresponds to a query to the oracle, we are doing π/ √ N queries for large N , which is much less than in the classical case. Finally, we note that thisalgorithm is optimal , in the sense that it has been proved that π/ √ N is the minimum numberof queries to the oracle to solve the problem, independently of the algorithm [BBBV97, BBHT98,Zal99]. The gate model of quantum computing is not the only model possible. Actually, there are manyothers and in this section we will focus on the quantum adiabatic computation model. Its in-troduction dates back to the works of Apolloni [ACdF89], and at the beginning it was called quantum annealing . The original idea was to design an algorithm similar to the simulated anneal-ing one, but able to exploit quantum , rather then thermal, fluctuations to escape local minima.Only later, when experiments with quantum systems able to physically implement quantum an-nealing [BBRA99] started to appear, the quantum annealing (or adiabatic) algorithm (QAA)becomes something which required a dedicated (quantum) device [FGGS00]. Up to that point,the Hamiltonians used to evolve the quantum systems were composed of non-positive off-diagonalentries in the computational basis ( stoquastic Hamiltonians), but it turned out that if we allowthe system to evolve with non-stoquastic Hamiltonians, then the QAA is as general as the gatemodel (that is, each gate-model algorithm can be re-casted as a QAA with a polynomial over-head) [AvDK + Pegasus , has more than 5000 qubit arranged in a topology whichallows each of them to be connected with 15 others. .2. QUANTUM ADIABATIC ALGORITHM parameter setting problem [DGRM]. The QAA consists in the following: consider a starting Hamiltonian, H , which is easy to im-plement and with a known ground state easy to prepare as well. Now encode the solution of aCOP in the ground state of another Hamiltonian H and define the Hamiltonian H ( s ) = A ( s ) H + B ( s ) H (4.40)so that A (0) = 1 , B (0) = 0 , A (1) = 0 and B (1) = 1. Now prepare a system in the groundstate of H and let it evolve with H ( s ), changing s from 0 to 1. The functions A and B arecalled schedule , and we are guaranteed that the system will always remain in the instantaneousground state of H ( s ) provided that the change of H is “slow enough”. Therefore at the endof the evolution the system will be in the ground state of H and a measurement will give asoutcome the result of our original problem. But how slow is “slow enough”? The answer is in the adiabatic theorem , that we now state (in its simpler form, a nice review of the various versionscan be found in [AL18]).Consider a Hamiltonian H t f ( t ) which depends on time and on the parameter t f , such that H t f ( st f ) = H ( s ) with s ∈ [0 , H t f ( t ) depends on time only through the form s = t/t f , which is the case for theQAA. Now, consider the set of eigenstates | (cid:15) j ( s ) (cid:105) with j ∈ { , , . . . } such that H ( s ) | (cid:15) j ( s ) (cid:105) = (cid:15) j ( s ) | (cid:15) j ( s ) (cid:105) , (4.41)and the values (cid:15) j ( s ) are ordered in increasing order. Therefore | (cid:15) ( s ) (cid:105) is the instantaneous groundstate. Now, the adiabatic theorem [Ami09] states that, if the system is prepared in the state | (cid:15) j (0) (cid:105) at s = 0, it will remain in the same instantaneous eigenstate provided that1 t f max s ∈ [0 , |(cid:104) (cid:15) i ( s ) | ∂ s H ( s ) | (cid:15) j ( s ) (cid:105)|| (cid:15) i ( s ) − (cid:15) j ( s ) | (cid:28) i (cid:54) = j . Since one is typically interested in the ground state we can set i = 0. Moreover,notice that we can always bound the numerator from above with 1, therefore we are guaranteedto stay in the ground state if t f ∆ (cid:29) 1, where∆ = min s ∈ [0 , ( (cid:15) ( s ) − (cid:15) ( s )) (4.43)is usually called spectral gap (or simply gap). In conclusion, the adiabatic theorem suggests usto choose t f = η ∆ − , with η (cid:29) 1. Notice that the typical situation is that ∆ depends on theproblem size N , as we will see in the following. Since η has to be large but we can fix it suchthat it will not depend on N , the complexity of the QAA is entirely given by the dependence on N of ∆. The adiabatic version of Grover’s algorithm has a nice story: it has been proposed as one ofthe first example of application of QAA [FGGS00], but the result was disappointing. Indeed,4 CHAPTER 4. QUANTUM POINT OF VIEW no speedup with respect to the classical case was found. Only later, Roland and Cerf [RC02]understood how to recover the Grover speedup in the adiabatic setting. Here we review theirresults.As in the standard Grover case, we have N states | i (cid:105) and a marked state | m (cid:105) which we donot know a priori, and we want to find. We use as initial state the uniform superposition | ψ (cid:105) = 1 N (cid:88) i | i (cid:105) . (4.44)The Hamiltonian that we use to evolve the system is H ( s ) = (1 − a ( s )) H + a ( s ) H , (4.45)with H = I − | ψ (cid:105) (cid:104) ψ | (4.46)and H = I − | k (cid:105) (cid:104) k | . (4.47)Notice that | ψ (cid:105) is the ground state of H with eigenvalue 0, and | k (cid:105) is the ground state of H ,again with eigenvalue 0. For this problem the schedule is completely determined by the choiceof a .Now we need to evaluate the eigensystem of H ( s ) in order to choose a proper schedule s = s ( t ).Notice that we start from the state | ψ (cid:105) and therefore, since the Hamiltonian only depends onprojectors on | ψ (cid:105) , | k (cid:105) and identities, the evolution remains in the subspace spanned by | ψ (cid:105) and | k (cid:105) . A basis of this space is {| k (cid:105) , | ν (cid:105)} , where | ν (cid:105) = 1 √ N − (cid:88) j (cid:54) = k | j (cid:105) . (4.48)In this subspace, where the non-trivial evolution of the initial state happens, we use (cid:104) ν | ψ (cid:105) = (cid:114) − N , (cid:104) k | ψ (cid:105) = 1 √ N , (cid:104) k | ν (cid:105) = 0 (4.49)and we obtain (cid:104) k | H | k (cid:105) = 1 − N , (cid:104) k | H | ν (cid:105) = (cid:104) ν | H | k (cid:105) = − √ N (cid:114) − N , (cid:104) ν | H | ν (cid:105) = 1 N , (cid:104) k | H | k (cid:105) = 0 , (cid:104) k | H | ν (cid:105) = (cid:104) ν | H | k (cid:105) = 0 , (cid:104) ν | H | ν (cid:105) = 1 . (4.50)At this point, we compute the eigenvalues of the matrix H ( s ) restricted to this 2-dimensionalspace (the other eigenvalue is 1, with degeneracy N − 2) and we obtain (cid:15) ( s ) = 12 − (cid:115) − (cid:18) − N (cid:19) a (1 − a ) (cid:15) ( s ) = 12 + (cid:115) − (cid:18) − N (cid:19) a (1 − a ) . (4.51)Therefore we have, for the instantaneous gap: g ( s ) = (cid:15) ( s ) − (cid:15) ( s ) = (cid:115) − (cid:18) − N (cid:19) a (1 − a ) . (4.52) .2. QUANTUM ADIABATIC ALGORITHM a = 1 / a = s (linear schedule), we obtain that the minimum gap is∆ = g (1 / 2) = 1 √ N (4.53)and therefore, by using Eq. (4.42) and |(cid:104) (cid:15) ( s ) | ∂ s H ( s ) | (cid:15) ( s ) (cid:105)| ≤ 1, we have Nt f (cid:28) t f (cid:29) N (notice that this means t f = ηN with η some small, N-independentparameter). This disappointing result can be improved by a more careful choice of the schedule a ( s ). Indeed, let us consider again Eq. (4.42). In this case, we have1 t f max s ∈ [0 , (cid:12)(cid:12)(cid:12)(cid:12) dads (cid:12)(cid:12)(cid:12)(cid:12) |(cid:104) (cid:15) ( s ) | ∂ a H ( a ) | (cid:15) ( s ) (cid:105)| g ( s ) (cid:28) . (4.55)Therefore, we can require that, for each s ∈ [0 , t f (cid:12)(cid:12)(cid:12)(cid:12) dads (cid:12)(cid:12)(cid:12)(cid:12) g ( s ) = η, (4.56)where η (cid:28) a ( s ), dads = t f η (cid:18) − (cid:18) − N (cid:19) a (1 − a ) (cid:19) , (4.57)with the initial condition a (0) = 0 and t f has to be chosen such that a (1) = 1. The result ofEq. (4.57) for a ( s ) is plotted in Fig. 4.2, while for t f we obtain t f = π (cid:15) √ N . (4.58)This example is instructive and allows us to see an important point: the Hamiltonian can bechanged quickly when the gap is large, but the annealing schedule has to slow down where thegap is small. Unfortunately, the computation of the gap is extremely difficult in most cases ofinterest, and therefore some more heuristic treatment is usually applied. Since its introduction, the QAA has been thoroughly studied to understand whether it can beuseful to tackle computationally hard problems faster than classical algorithms [FGGS00, KN98].The usual comparison is with the classical Simulated Annealing (SA) algorithm and its variants,such as the Parallel Tempering (PT). The general idea at the heart of the hoped success ofQAA is that quantum fluctuations could be more effective then thermal ones in exploring roughenergy landscapes (even though there are also other possible kind of advantages of quantumalgorithms over classical ones [BZ18]). This intuition has been built mostly by using very simpletoy-models, such as the highly symmetric Hamming weight problem [FGG02, MAL16] or oracularproblems (as the Grover problem analyzed in Sec. 4.2.2), but conceptual arguments proving anykind of quantum speedup for real-world problems lack to date, despite the significant effortsmade [DBI + 16, MZW + CHAPTER 4. QUANTUM POINT OF VIEW a ( s ) g ( a ) Figure 4.2: Plot of the solution of Eq. (4.57), with N = 64 and t f (cid:15) = √ N π/ . As we can see, a ( s ) changes faster when s is close to 0 or 1, that is (see inset) when the gap is large, and itchanges more slowly when s is closer to 0.5, that is to the minimum gap.allows to control about 2000 physical qubits, provided a more pragmatic road: we are now in theexciting position of doing some actual experiments using these annealers to solve certain COPs,and then compare the performances with those of classical solvers.However, many practical issues appear in this case, most of which are related to the fundamentalquestion “how can we do a fair comparison?” [RWJ + 14, MK18]. It has been soon understoodthat one needs to carefully choose the problems to be solved. The first step is to consider COPsthat admit a rewriting as Quadratic Unconstrained Binary Optimization (QUBO) problems, thatis, in the same spirit of Sec. 2.2.3, as H = (cid:88) i,j J i,j x i x j + (cid:88) i h i x i , (4.59)where x i ∈ { , } and the values of the couplers J i,j and those of the local fields h i are used tospecify the problem and the instance.However, this is not enough: to exploit in the best way possible the effect of quantum fluctuations,one has to consider problems with a sufficiently complex energy landscape. Often this is achievedby studying problems whose thermodynamics presents a spin glass phase at low temperature.Unfortunately, the present architecture of qubit interactions in the D-Wave system does not allowto have this kind of difficult problems [KHA14] without an extra step, that is the embedding ofa different interaction graph into the D-Wave qubit interaction graph (which is called Chimeragraph for the D-Wave machines up to 2000Q). To do that, we need to introduce an extra term in .2. QUANTUM ADIABATIC ALGORITHM H ( λ ) = H P + λH C , (4.60)where H P is the problem Hamiltonian, which is written in QUBO form and H C is the Hamilto-nian, written again in QUBO form, which ensures the constraints by giving a penalty in energyto the configurations which break one or more of them.Here we address the problem of choosing the value of the parameter λ . An easy recipe for thischoice does not exists: indeed λ has to be large enough so that the ground state of our problem(which is the state we are after) has no broken constraint, but it has been argued theoretically[Cho08] and observed experimentally [VMK + 15] that a small value provides better performances. Optimal choice of parameters: framework Consider a COP defined by a cost function E : Ω → R , where Ω is a discrete set. We will refer tothis problem as the “logical” problem. Consider now that this problem admits a QUBO version.This means that we also have another, “embedded”, Hamiltonian H P : { , } N = B → R ( N isthe number of binary variables that we need to encode the problem) and an invertible function φ : Ω → S ⊆ B such that H P ( φ ( σ )) = E ( σ ) for each σ ∈ Ω. Now consider the case S ⊂ B : H P will give an energy also to elements of the boolean hypercube in B \ S = S c , which do notcorrespond to acceptable configurations of the logical problem.As an example, let us consider again the matching problem introduced in Sec. 3.2: given agraph G = ( V, L ) and a weight w (cid:96) ≥ (cid:96) ∈ L , let us call A the set of allmatchings. To obtain the QUBO form of this problem, we assign to each edge (cid:96) a binary variable x (cid:96) which is 1 or 0 if the edge is used or not in the configuration x . As we have seen in Sec. 3.2,if we want a Hamiltonian in QUBO form, we need to introduce a soft constraint and we obtain H λ ( x ) = H P + λH C = (cid:88) (cid:96) ∈ L w (cid:96) x (cid:96) + λ (cid:88) ν ∈ V (cid:32) − (cid:88) (cid:96) ∈ ∂ν x (cid:96) (cid:33) , (4.61)where the quadratic term, provided that λ is large enough, enforces the fact that (at least in theground state) each point has to be connected to exactly one another point.Let us define E gs = min σ ∈ Ω E ( σ ) , E gs ( λ ) = min x ∈B H λ ( x ) . (4.62)The “minimum” value of the parameter, λ (cid:63) , is the smallest λ ∈ R + such that E gs = E gs ( λ ) . (4.63)We define the “optimal” value for the parameter λ , for a fixed heuristic algorithm, as theone such that the time-to-solution (TTS) (see Appendix D.1, for a definition of TTS) of thisalgorithm is minimized. Therefore the optimal parameter depends in general on the algorithmwe are going to use. However, if we focus on annealing algorithms with local moves, it is possibleto build some intuition that the optimal parameter is (at least close to) the minimum parameter.Indeed, this kind of heuristic algorithms are used to explore complex energy landscapes and theidea behind classical/quantum annealing is roughly to exploit thermal/quantum fluctuations to8 CHAPTER 4. QUANTUM POINT OF VIEW overcome the energy barriers which separate low-energy configurations, so that we can explorethese configurations and pick the optimal one.Now consider the case in which the barrier to overcome is given by the H C term in Eq. (4.61), thatis because of a penalty term: if the coupling term is lowered, the height of the barrier is loweredso the annealing can proceed faster. This happens, for example, when the Hamming distancebetween couples of allowed configurations is always larger than 1 (if the algorithm only performssingle spin flips): in this case the algorithm has to overcome a barrier given by the penalty termeach time it changes the system configuration from one in S to another in S , passing through S c . An explicit example of this is the matching problem: indeed if the system is in an allowedconfiguration, the closest allowed configuration is at distance 4 and it corresponds to the swapof two matched points. Moreover, it is easy to check that this is again the case for many othercombinatorial optimization problems relevant for both practical and theoretical analyses.In Appendix D.2, we investigate the effect of changing λ with a toy model example, where allthe computations can be done analytically. In the following, on the other hand, we will firstlyprovide and discuss an algorithm to find the minimum value of λ (in some cases), and we willapply it to study the effect of the choice of λ for a specific combinatorial optimization problem. Optimal choice of parameters: an algorithm The usual strategy to obtain a good constraint term H C is to find some set of constraints thatthe binary variables have to respect to be mapped in a logical configuration by φ − . Then H C isimplemented such that it increases if the number of broken constraints increases, and is zero ifno constraint is broken. For example, for the matching problem we have that given a vertex ν ,only one among the edges in ∂ν has to be used. So we have one constraint for each point, and theterm that we inserted in Eq. (4.61) is positively correlated to the number of broken constraints.We denote with E ( k )0 the minimum over the set of configurations x with k broken constraintsof H P ( x ). So, for example, E gs = E (0)0 . Therefore, the minimum parameter λ (cid:63) is the smallestpossible such that E (0)0 < kλ + E ( k )0 , (4.64)for k = 1 , . . . , M , where M is the maximum number of constraints that can be broken in a singleconfiguration. Therefore we have λ (cid:63) > max k ∈{ , ,...,N } E (0)0 − E ( k )0 k = max k ∈{ , ,...,N } λ k , (4.65)where λ k = ( E (0)0 − E ( k )0 ) / (2 k ).This inequality cannot be used efficiently to obtain λ (cid:63) as it is: the computation of each λ k could be even more difficult than solving the original problem. On the other hand, one canobtain an approximation of each λ k : to do this, one needs to approximate E (0)0 from above and E ( k )0 from below. But also in this case, one still needs to compute all the M different λ k ’s,and for most of the interesting problems M scales with the system size N . This happens, forexample, for our working example, the matching problem, where the number of constraints is thenumber of vertices to be matched. To worsen the situation, the computation of E ( k )0 requires theminimization of the energy over all the possible ways of breaking k constraints, and this numbercan grow exponentially in N (as it happens, for example, for the matching problem). However,if we can prove that λ ≥ λ ≥ · · · ≥ λ N (4.66) .2. QUANTUM ADIABATIC ALGORITHM λ (cid:63) can easily be found by estimating λ and taking the smallest value such that λ (cid:63) > λ . (4.67)Let us give some qualitative arguments to understand why Eq. (4.66) is a reasonable expectation.We have that λ k ≥ λ k +1 if and only if E (0)0 − E (1)0 + E (1)0 + · · · − E ( k − + E ( k − − E ( k )0 ≥ k ( E ( k )0 − E ( k +1)0 ) . (4.68)If we prove that E ( n − − E ( n )0 ≥ E ( n )0 − E ( n +1)0 , (4.69)for each n = 0 , , . . . , N , then inequality (4.68) immediately follows (this is a sufficient but notnecessary condition). This condition is nothing but the fact that the maximum gain in energythat we can obtain by breaking the n -th constraint is lower than the one that we obtain bybreaking the ( n + 1)-th, for each value of n .Actually the inequality given in (4.66) is satisfied for some problems, but not for all of them. Inparticular, it depends on both H P and H C and in Appendix D.3 we show a specific problem anda specific choice of H C for which this condition does not hold. We will see that for the matchingproblem defined as in Eq. (4.61) this conditions is satisfied. Finally, notice that there are otheralgorithms that can be used to find the minimum parameter λ (cid:63) : when the algorithm we discusshere is not applicable, these methods can be an alternative strategy. However, as we discussin Appendix D.4 using the example of the matching problem, the differences in performancesamong these methods can be quite relevant. An explicit example: the matching problem As we already discussed, the matching problem is in the P complexity class. However it isempirically known that for many problems in the P class, heuristic algorithms such as SA stillneed an exponential time to find the exact solution. When this problem is written in QUBO formit is one of the simplest possible constrained problems: quadratic terms are in the penalty termonly, and the problem is trivial without it. On the top of that, we have seen that the structureof the physical energy landscape, that is logical configurations separated by non-logical ones, iscommon to many other problems. Therefore the matching problem is an ideal starting point tostudy the effects of the choice of the penalty term coupling parameter.Another, more practical, reason to choose this problem is that, since it is polynomial, we cancompute λ (as defined in Eq. (4.65)) in polynomial time, and we will see that for the QUBO formthat we will use for this problem, condition (4.66) holds. Therefore we can in polynomial timefind the minimum parameter, and test in a realistic problem if that is the optimal value. Noticethat we will actually use the exact solution of the problem to obtain the minimum parameter,since our objective is to understand the effect of the choice of the parameter rather than providingan algorithm to find the minimum parameter itself. Nonetheless, for more interesting (NP-hard)problems one cannot use the solution of the problem, but, as we have discussed previously,approximate solutions together with our technique could be used to obtain good values for theparameter.Let A be the set of all the possible matchings for our problem graph G = ( V, L ). We define E N = min σ ∈A ( E σ ) , (4.70)where 2 N = | V | is a measure of the problem size.0 CHAPTER 4. QUANTUM POINT OF VIEW To discuss the inequality (4.69) in this case, we need to analyze how E ( k )0 is obtained. Firstly,notice that constraints are always broken in pairs. Now consider a configuration with 2 k brokenconstraints, with k > 0. Suppose that we can find a point x which is endpoint of m > m − x as endpoint: asit is clear from Eq. (4.61), we will have a lower cost given by the penalty term of H λ , and alower cost given by the weight term. Therefore, the way to break 2 k constraints which minimizes H λ is obtained by configurations which have 2 k points which are not matched with any otherpoint, that is by ignoring k points of the initial set of 2 N points in the matching. Therefore weintroduce the symbol E N − k = E ( k )0 , (4.71)where we dropped the subscript 0 to shorten the notation and we stress the fact that we caninterpret E ( k )0 as the optimal matching when we can ignore 2 k points. Notice that E N = E N .The inequality that we want to prove is then E n +1 − E n ≥ E n − E n − , (4.72)for each n = 0 , , . . . , N . The proof is rather technical and is given in full details in AppendixD.5. Here we only sketch its structure and the main ideas behind it: • we prove that if a point is ignored when 2 k constraint can be broken, it will also be ignoredwhen 2( k + 1) can be broken (“stability” property); • using the previous fact, we can prove Eq. (4.72) (“order” property).Both the points are proven by building sub-optimal matchings by using pieces of the solutionswith cost E n +1 and E n − and using the fact that the solutions with costs E n +1 , E n − and E n areoptimal. Numerical results for the matching problem The aim of this section is to numerically study the relevance of the choice of the parameter value interms of performances for the matching problem, where the minimum parameter can be found inpolynomial time. This will also allow us to discuss our qualitative picture introduced previously,at least for this specific example. We used an exact, polynomial-time, solver to compute theenergy of the optimal solution, both when no constraints are broken (to obtain E N ) and whenone constraint is broken (we broke it in every possible way, and the minimum of the energiesobtained is E N − ). Here, with one broken constraint we mean that we are ignoring 2 points to bematched, as discussed in the previous section. Therefore we can break a constraint in N ( N − N is the number of points, and so the procedure to find the minimum parameter isstill polynomial. Once we obtained the minimum parameter, we run the classical and quantumalgorithms using that value and values at a fixed distance from it. We then computed the timeto solution, which is used throughout this section as measure of performance.Let us give now some details about our numerical analysis: we considered the matching problemon a specific graph, which is a 2-dimentional regular lattice of vertices, where each vertex has 4edges which connect the vertex with its nearest neighbors (the vertices on the boundaries haveless edges because we used open boundary conditions). We used this specific graph because in thiscase the problem of the minor embedding in the D-Wave 2000Q hardware graph (the Chimeragraph) is moderate, so we can focus on the effects caused by the change of the penalty term ofthe matching problem, neglecting the fact that also the penalty term of the minor embeddingproblem plays a role in determining the performances. The weights of the edges are randomly .2. QUANTUM ADIABATIC ALGORITHM O cc u rr e n c i es N = = = Figure 4.3: Histograms of the minimum coupling parameter obtained for 500 different instancesof the matching problem, for system sizes of 100, 256 and 400. The optimal parameters arealways multiple of 4 because of our initial choice of link weights.extracted among the set of integers { , , } . We need to use multiples of 8 to have integersnumber after the re-writing of the problem in terms of QUBO form and then again in Isingvariables (which is the form of input Hamiltonian used in both out parallel tempering algorithmand the D-Wave 2000Q). We decided to use only integers and such a small set to avoid precisionproblems, which are particularly severe in noisy devices such as the D-Wave 2000Q. Then, foreach system size analyzed, the parameter λ is chosen as λ = λ (cid:63) +2 δ , where δ ∈ {− , − , , , , } .We did not consider a finer grid of values for δ because our computations are done with finiteprecision, and too close values of δ would be indistinguishable. In Fig. 4.3 we present histogramsof the values of λ (cid:63) obtained for system at various sizes. A first important consequence of thisplot is that the value of λ (cid:63) is not a self-averaging quantity (at least not for the system sizesexplored here). On the opposite, the variance of λ (cid:63) we obtained is increasing with the systemsize. Classical heuristic algorithm We used the Parallel Tempering (PT) algorithm included inthe NASA’s Unified Framework for Optimization (UFO). We analyzed systems with sizes up to484 points (that is, a lattice with 22 points of side length) that means, because of our QUBOembedding, about 900 binary variables. To choose the temperatures for the PT we consideredthe energy scale given by the penalty term parameter, and we multiplied it for two constants(one for the lowest and one for the highest temperatures) which are found by maximizing thenumber of times that the PT algorithm finds either the GS or a forbidden configuration withlower energy than the GS, with δ = 0. This is done to have the cleanest possible values of TTSclose to δ = 0, and we have checked that the qualitative picture (regarding the TTS scaling2 CHAPTER 4. QUANTUM POINT OF VIEW - δ O p t i m a l TT S ( μ s ) N = = = = = = = Figure 4.4: Time-to-solutions for the matching problem, shown at fixed N as function of thedistance from the optimal parameter. The solid lines connect points computed using the 50-percentile of instances, the dashed lines corresponds to the 35-percentile (below solid lines) and65 percentile (above solid lines).with δ ) does not change when varying the temperatures. To obtain the TTS we proceeded asfollows: we randomly generated 500 instances, and run the PT 500 times for each instance. For N = 400 and N = 484, the number of different instances is reduced to 250. When the PTalgorithm succeeded in finding a good solution, we recorded the time used; when it failed in thetime given, or it found a solution with energy lower than the GS (because of broken constraints),we recorded a failure and so “infinite” time to find the solution. We do that because once thesystem is trapped in a local minimum of the energy landscape, to escape from that it will require(typically) much more time than that allowed to each run of the algorithm. Using the datacollected in this way, we can compute the TTS, and the results are shown in Figs. 4.4 and 4.5,as functions respectively of δ at fixed N and of N at fixed δ .Let us now comment the results obtained: from Fig. 4.4 we can see how, as intuitivelypredicted, the use of parameters close to the minimum results in faster annealing. This effect ismore important as N increase. The 50-percentile shows that, at N = 484, the maximum systemsize analyzed here, the optimal choice is δ = − 1. Notice that we do not plot δ = − µs ). The reason is that in that case many instances are neversolved by the algorithm. On the contrary, for the largest system sizes δ = − N , the use of values of the parameter slightly lower that the minimum ispreferred, at least for this problem. This is investigated in more details in Fig. 4.4, where the50-percentile of the TTS is fitted with a function of the form Ae BN . The obtained value of B .2. QUANTUM ADIABATIC ALGORITHM 100 200 300 400 50010 N O p t i m a l TT S ( μ s ) δ =- δ = δ = δ = Figure 4.5: Time-to-solutions for the matching problem, shown at fixed N as function of thedistance from the optimal parameter. The solid lines connect points computed using the50-percentile of instances, the shaded areas corresponds to the 35-percentile (below) and 65-percentile (above), the dashed lines are the best fit of the form Ae BN .are in Table 4.1 and show that a moderate exponential speedup in TTS can be obtained using δ = − δ = 0 gives a small exponential speedup against δ = 1, which in turn givesa small exponential speedup against δ = 2 and so on. Another very important considerationis that in Fig. 4.4 we do not have plotted points corresponding to at least one percentile linesexceeding the maximum limit of 10 µs . This is the reason why the curves become shorter as N increases. This means that outside an “acceptable interval” of values around λ (cid:63) the performanceof the PT algorithm rapidly spoils, and most of the instances are never solved. Moreover, thisinterval becomes smaller and smaller as the system size increases. This means that to use thePT algorithm we need to be more and more precise in finding λ (cid:63) and that this parameter has tobe found with a pre-processing applied to each instance since, as discussed previously, even atlarge system size it depends on the specific instance. δ B -1 (1 . ± . · − . ± . · − . ± . · − . ± . · − Table 4.1: Fitting parameter B for fit equation Ae Bx for values value of δ . The fitted data andthe fitting curves are those used in Fig. 4.5.4 CHAPTER 4. QUANTUM POINT OF VIEW Quantum heuristic algorithm The quantum computations are performed using the D-Wave2000Q quantum annealer. In particular, we have embedded the problem in the Chimera graph,so that the logic Ising variables are mapped to ferromagnetic chains of length 4 after the minorembedding, except the Ising variables on the boundaries of the square lattice which correspondto shorter chains. Then we run the QAA for system of side sizes N = 16 , , 64. Larger systemsup to N = 256 (that is, a lattice with 16 points of side length) are in principle possible for theD-Wave 2000Q chip, but they resulted in too few solved instances. Notice that an instance at N = 64 is a matching of 64 logical points, which corresponds, after the QUBO and the minorembedding, to a problem of ∼ 500 qubits. Indeed, the starting graph that we have chosen for thematching problem is such that each vertex can be mapped in a 8-qubit unit cell of the Chimeragraph, and to do that each binary variable is mapped in a 4-qubit ferromagnetic chain. Noticethat the chain lengths are independent on N . We used a majority voting technique to correctchains broken at the end of the annealing (when there is no majority, the chain is randomlycorrected). We set the annealing parameters (annealing time and ferromagnetic coupling forthe embedded chains) such that the average number of successful annealings is maximized (wenoticed that these settings do not depend in a relevant way on the value of δ that we use to buildthe instances).To obtain the TTS, we generated 100 random instances as discussed in the previous section,and each instance is submitted for 10 runs of the D-Wave 2000Q. The relevant parameter isthen the probability of finding the GS for a fixed instance, which is averaged over the instancesand plotted in Fig. 4.6. From these probabilities one can obtain easily the TTS, which we showin Fig. 4.7 to ease the comparison with the classical case. Notice that in this case the curvescorrespond to percentiles, while in Fig. 4.6 we have plotted averages and standard deviations aserrors.Unfortunately, due to the small system size that we can analyze in this case, we cannot obtainfirm conclusions. However, it seems reasonable to expect that the same problems observed inthe classical case can be repeated here: in particular it is still true that a choice of δ = − N , but this choice improves (i. e. it is less worse) as N increases. Onthe other hand, up to the size N = 8 the choice of a δ > λ becomes more relevant in terms ofperformances.An interesting question is why the quantum annealer is not able to solve problem of size N = 100 or larger. We think that the precision problems have a role, but another reason couldbe also the structure of the embedded (QUBO) energy landscape itself: in particular, we thinkthat the fact that logical states are always separated by not-acceptable states might be a severeobstacle for quantum annealers. Notice that this is true also for each problem which is embeddedin the hardware graph in such a way that each QUBO binary variable is now a chain of qubits.However in this case one can use majority voting or other methods to correct configurations withthis constraint broken. In our case (as in other many interesting problems) a simple correctionas the majority voting does not exist, so if this is the reason for the failure of the quantumannealer on this problem, other ways to enforce constraints have to be designed to solve thiskind of problems. As a final section, we mention here briefly a relatively new algorithm to tackle COPs withgate-based quantum computers. .3. QUANTUM APPROXIMATE OPTIMIZATION ALGORITHM - - δ A ve r a g es u ccess p r ob a b ili t y N = = = Figure 4.6: Average probability of finding the solution for the matching problem, shown at fixedN as function of the distance from the optimal parameter. Each point is obtained averaging on100 different instances, and the probability is computed running the annealing 10 times. Theseresults are obtained using the D-Wave 2000Q quantum annealer hosted at NASA Ames ResearchCenter.6 CHAPTER 4. QUANTUM POINT OF VIEW - δ O p t i m a l TT S ( μ s ) N = = = Figure 4.7: Time-to-solutions for the matching problem, shown at fixed N as function of thedistance from the optimal parameter. These results are obtained starting from the same data setused for Fig. 4.6. The solid lines connect points computed using the 50-percentile of instances,the dashed lines corresponds to the 35-percentile (below solid lines) and 65 percentile (abovesolid lines). These results are obtained using the D-Wave 2000Q quantum annealer hosted atNASA Ames Research Center. .3. QUANTUM APPROXIMATE OPTIMIZATION ALGORITHM D ( β ) = e iβH (4.73)and U ( γ ) = e iγH (4.74)for generic values of β and γ , where H and H are those used in Eq. (4.40). Therefore we canapply the Suzuki-Trotter formula, e X + Y = lim m →∞ (cid:16) e X/m e X/m (cid:17) m , (4.75)so that we write our evolution as Texp (cid:20) − i (cid:126) (cid:90) dt ( A ( t ) H + B ( t ) H ) (cid:21) = lim m →∞ (cid:18) Texp (cid:20) − i (cid:126) H (cid:90) dt A ( t ) m (cid:21) Texp (cid:20) − i (cid:126) H (cid:90) dt B ( t ) m (cid:21)(cid:19) m . (4.76)Therefore, to implement our QAA, we simply need to use alternatively the gates D ( γ ) and U ( β ),with very small parameters β and γ , many times. After applying p times each gate, we obtainthe state | β , γ (cid:105) = D ( β p ) U ( γ p ) · · · D ( β ) U ( γ ) | ψ (cid:105) (4.77)where | ψ (cid:105) is our initial state, which, according to the QAA, has to be the ground state of H .Now, the intuition behind QAOA is the following [FGG14a]: give up the idea of sending p → ∞ and choose freely the sets { β , . . . , β p } and { γ , . . . , γ p } . Notice that at this point we are farfrom the adiabatic situation (where p → ∞ and β i , γ i are small). Therefore we can choose theparameters such that E ( β , γ ) = |(cid:104) β , γ | H | β , γ (cid:105)| (4.78)is maximized. Once the parameters are chosen, we can prepare the state by applying our sequenceof gates and then measure the state in the computational basis: we are not guaranteed that thefinal state will be the ground state of our system (unless we have taken p = ∞ ), but because ofour choice of the parameters we will end up in a low energy state with high probability. Thisalgorithm, in a slightly generalized form that we will introduce in the remaining part of thissection, is called quantum approximate optimization algorithm (QAOA).Since all the difficulty is in choosing the 2 p parameters β and γ , and we accept to be also veryfar from the adiabatic limit, usually the initial state is chosen to be the uniform superposition ofall the states in the computational basis, independently from the Hamiltonian H . In particular,a typical choice is | ψ (cid:105) = N (cid:79) i =1 | + (cid:105) (4.79)as initial state and H = (cid:88) i X i (4.80)where X i is the Pauli matrix X acting on the i -th spin. As H , the Hamiltonian of the COPone wants to solve is used. Notice again that this algorithm is very general, but for fixed p no notice that the Suzuki-Trotter formula can be immediately generalized to time-ordered integrals. CHAPTER 4. QUANTUM POINT OF VIEW guarantees that the system ends up in the solution of the problem can be given. For this reason,this algorithm is mainly (but not only, see [FH16, WHT16, JRW17]) used for approximationpurposes.The operator D defined in Eq. (4.73) is often called mixing operator, while U given inEq. (4.74) is called phase operator. There is a lively line of research about the performances ofQAOA, encouraged several findings: • in [FGG14b], QAOA with p = 1 is proved to be the best algorithm known to find approx-imate solutions for a particular COP, the so-called E2Lin2. Notice that shortly after thatwork, a new classical algorithm which currently holds the record for this specific problemhas been found [BMO + • under complexity theory assumptions which are generally believed to be true (in the samesense that P is believed to be different from NP), it can be proved [FH16] that the outputof a QAOA with p ≥ • the Grover problem is simple enough to allow for an analytical treatment up to 1 (cid:28) p (cid:28) N and it has been found [JRW17] that a periodic choice of the parameters β and γ gives aquasi-optimal algorithm (that is, the algorithm requires α log( N ) queries to the oracle forlarge N , but α is higher than that of the Grover algorithm).The interest in low- p QAOA algorithms is also motivated by the recent availability of general-purpose gate-based quantum computers. Indeed shallow circuits as those needed to implementthis kind of algorithms should be supported by devices available in the near future. hapter 5 Conclusions In this thesis we reviewed many recent results in the realm of combinatorial optimization prob-lems, mainly from the point of view of statistical mechanics. This perspective allowed us tocompute the average value of the solution cost in random versions of many COPs, even in thepresence of Euclidean correlations. Several original results stemmed during the research we haveperformed to complete this work: • we computed the average optimal length of the solution of the TSP in one dimension incomplete bipartite [CDGGM18] and complete [CGMV19] graphs; • we analyzed a problem with an exponential number of possible solution even in one dimen-sion, the 2-factor problem, and we have been able to provide also in this case bounds onits average solution cost [CGM18]; • we discovered a novel application of the famous Selberg integrals to RCOPs, which allowedus to give (in some cases) exact prediction for finite N [CGMM19]; • we extended our techniques and, thanks to a smart scaling argument, we have been ableto compute the average optimal tour length for the NP-hard bipartite TSP and for thebipartite 2-factor problem in two dimensions [CCDGM18].Other relevant results originated during this work have been obtained by leaving behind thestudy of averaged quantities, and focusing on the statistics of large (and very large) fluctuation,in the context of the p-spin spherical model, where we have proved that diverse and interestingregimes of large deviations are tuned by turning on and off an external magnetic field [PDGR19].Finally, we have discussed the possibility of using quantum computing to solve combinatorialoptimization problems. The analysis of the various possible strategy to do that, and morespecifically of the quantum annealing algorithm, has led to interesting findings about how toset the parameters of heuristic algorithms, which could be useful also for classical algorithms assimulated annealing and parallel tempering [DGRM]. As usually in Science, solving old problems has, as a side effect, the challenging result of openingnew questions. 9900 CHAPTER 5. CONCLUSIONS Here we comment upon some possible paths that could start from this work and would(possibly) lead to further understanding of the statistical physics of combinatorial optimizationproblems with correlations: • Euclidean correlations have been successfully incorporated in the realm of random matrices,with the introduction of Euclidean Random Matrices [MPZ99, AOI10, GS13]; several COPscan be rewritten as the computation of some property (for example, the determinant orthe permanent) of a matrix, and therefore the random version of the problem is connectedto the random matrix theory. Then we could devise a way to exploit the powerful andwell-developed formalism of random matrix theory to tackle the computation of averageproperties of the solutions of combinatorial optimization problems; • we have discussed here mainly Euclidean correlations. However, these are only one ofthe many kinds of correlations which in many cases occur in COPs. For example, thestructure of real-world data used as input for many tasks, such as classification and featureextraction, is far from trivial and some efforts have been spent to explore the nature andthe effect of these correlations [GMKZ19, EGR19, RLG19]. In the same spirit, we could tryto use the techniques developed in this work to deal with other kinds of correlations thanthe Euclidean ones, and to understand more deeply how correlations affect the “difficulty”of combinatorial optimization problems.Our work on large deviations introduces new questions as well. One in particular, which de-serves further investigations, regards the application of the magnetic field to obtain (analyticallyor numerically) the power of N (some measure of the system size) in the exponential suppressionof fluctuation. This is relevant for each problem where we find non-standard large deviationscalings with N , and would head towards a quite “general” method to compute this scaling.Finally, much work has been done and much more remains to be done to really understandthe potential of quantum computers to solve combinatorial optimization problems. While theinvestigations done here on the parameter setting for the quantum annealing algorithm can beconsidered a (small) step in that direction, it would be extremely interesting to be able to addressthe question of the effects of choosing different parameters in QAOA-type algorithms to solve,or approximate, combinatorial optimization problems. ppendix A Volume and surface of a sphere Consider a N dimensional sphere of radius R . Its volume is given by V N = (cid:90) ∞−∞ dx · · · (cid:90) ∞−∞ dx N θ (cid:32) R − (cid:88) i x i (cid:33) = Ω N (cid:90) R dr r N − = Ω N R N N , (A.1)where Ω N is the surface area of the unit-radius sphere in N dimensions (that is, the integralof all the N − N can be computed with thefollowing trick: consider the integral (cid:90) ∞−∞ dx · · · (cid:90) ∞−∞ dx N e − (cid:80) i x i = (cid:18)(cid:90) ∞−∞ dx e − x (cid:19) N = π N/ . (A.2)But we can also use spherical coordinates and write (cid:90) ∞−∞ dx · · · (cid:90) ∞−∞ dx N e − (cid:80) i x i = Ω N (cid:90) ∞ dr e − r r N − = 12 Ω N Γ (cid:18) N (cid:19) . (A.3)Therefore Ω N = 2 π N/ Γ (cid:0) N (cid:1) (A.4)and V N = 2 π N/ N R N Γ (cid:0) N (cid:1) . (A.5)Finally, as one can check from Eq. (A.1) (remembering that, in distributinal sense, ∂∂x θ ( x − x ) = δ ( x − x )), we obtain the surface of a N dimensional sphere of radius R as derivative of its volume: S N = ∂∂R V N = 2 π N/ R N − Γ (cid:0) N (cid:1) . (A.6)10102 APPENDIX A. VOLUME AND SURFACE OF A SPHERE ppendix B Calculations for the p-spinspherical model B.1 The replicated partition function The averaged replicated partition function of the p-spin spherical model is Z n = (cid:90) DJ (cid:90) Dσ exp β n (cid:88) a =1 (cid:88) i < ···
1. We now integrate over the disorder and, exploiting again Eq. (2.23) we get Z n = (cid:90) Dσ (cid:89) i < ···
B.2 1RSB free energy Given the 1RSB ansatz for Q , Eq. (2.55), to obtain the free energy in terms of the variationalparameters q , q , m we need to compute n (cid:80) a,b Q pab and n log det Q , and take the limit of small n .About the first part, we have that Q has n entries equal to 1 on the diagonal, m ( m − 1) entries .3. RAMMAL CONSTRUCTION q for each block for a total of n ( m − 1) entries equal to q , and the remaining n − nm entries equal to q , therefore :1 n (cid:88) a,b Q pab = 1 + ( m − q p + ( n − m ) q p ∼ m − q p + mq p . (B.12)The second piece is slightly more tricky: one has to first notice that [ E , C ] = 0, so a singleorthogonal matrix such that both E and C are diagonalized exists. Now, notice that (1 /m ) E and(1 /n ) C are both projector. (1 /m ) E projects on the subspace of R n generated by vectors with thefirst, second and so on groups of m components equal, (1 /n ) C on the subspace generated by theconstant vector. This observation makes clear that the eigenvalues of C are n with degeneracy 1and 0 with degeneracy n − 1, while E has 1 eigenvalue equal to m and m − E and C with eigenvalues m and n respectively, we have that the matrix Q has, as eigenvalues: • − q + ( q − q ) m + q n with degeneracy 1, corresponding to the non-null eigenvalue of C ; • − q + ( q − q ) m with degeneracy n/m − 1, corresponding to the other n/m − E ; • − q with degeneracy n − n/m , corresponding to the null eigenvalues of both C and E .Therefore we have1 n log det Q = m − m log(1 − q ) + 1 m log( m ( q − q ) + 1 − q )++ 1 n [ − log( m ( q − q ) + 1 − q ) + log(1 − q + ( q − q ) m + q n )]= m − m log(1 − q ) + 1 m log( m ( q − q ) + 1 − q )++ 1 n (cid:20) log (cid:18) nq − q + ( q − q ) m (cid:19)(cid:21) ∼ m − m log(1 − q ) + 1 m log( m ( q − q ) + 1 − q )++ q m ( q − q ) + 1 − q . (B.13)Using Eqs. (B.12) and (B.13) in Eq. (2.34), we obtain Eq. (2.56). B.3 Rammal construction In this appendix we report the details of the geometrical construction reproducing the solutionfor the SCGF obtained with a 1RSB ansatz with q = 0. The following observations are tracedback to Rammal’s work [Ram81] and can be found in [Kon83] (similar considerations in [OK04,NH08, NH09]). We reproduce here the reasoning not only as an historical curiosity: first of all, wesee it as an enlightening approach to the problem of the continuation of the replicated partitionfunction to real number of replicas, particularly suitable for a finite k analysis. Moreover, wenote that this interpretation, whenever it works, gives a flavor of “uniqueness” (though not in this is another weird thing of the replica trick: we are sending n to zero, but we are also supposing 0 < m < n and we do not want to send m to zero. Actually, the correct thing to do is to suppose that when n → m the relations 0 ≤ m ≤ APPENDIX B. P-SPIN MODEL CALCULATIONS a strict mathematical sense) to the resulting solution, being based only on the properties ofconvexity and extremality that the SCGF ψ ( k ) must have. In this respect, a generalizationof this result would be of great interest in order to better understand the necessity of Parisihierarchical RSB procedure, which has been dubbed as “magic” even in relatively recent works,like [Dot11]; however, a true geometrical interpretation of the full machinery of RSB, beyondthe simple case considered here, still lacks. Finally, in the context of this paper we are able toshow a case where the construction gives the correct answer (the p -spin spherical model at zeroexternal magnetic field) and a case where it fails (when the field is switched on).The explicit evaluation of the SCGF ψ ( k ) is performed within replica theory: an ansatz isimposed on the form of the replica overlap matrix, the number of replicas k is then continuedfrom integer to real values, the corresponding G ( k ) is evaluated with the saddle-point methodfor large N and finally a check is performed a posteriori to verify its validity. In the SK model,the system originally considered by Rammal, at low temperatures the replica symmetric ansatz,which still gives the correct values of the positive integer momenta of the partition function, failsto produce a sensible solution for the SCGF at k < 1, in at least three way: • it becomes unstable under variations around the saddle point (de Almeida-Thouless insta-bility [dAT78]) below k = k dAT ; • it produces a G ( k ) that is non-concave (and so a non-convex ψ ( k )) around k = k conv ,meaning that G (cid:48)(cid:48) ( k ) changes sign at k conv ; • it produces a G ( k ) /k that loses monotonicity a k = k m .In the SK model k dAT is the largest ( k dAT > k m > k conv ), and so it is the first problem oneencounters in extrapolating the RS solution from integer values of k . However, from the pointof view of convexity and monotonicity alone, Rammal proposed to build a marginally monotone G ( k ) /k in a minimal way, starting from the RS and simply keeping it constant below k m atthe value G ( k m ) /k m . While the resulting function is not the correct one for the SK model,which needs a full RSB analysis to be solved, surprisingly enough for the spherical p -spin inzero magnetic field this approach reproduces the solution obtained with a 1RSB ansatz with q = 0 (see Fig. 2.4). Notice that in the present model the RS solution suffers from the sameinconsistencies as in the SK model, but now k m is the largest of the three problematic points.To convince the reader that the two approaches are actually equivalent we prove, as finalpart of this appendix, that without an external magnetic field the 1RSB solution of the spherical p -spin and the Rammal construction coincide. In order to obtain this result, we have to provethat: • the 1RSB solution for G ( k ) /k becomes a constant below k = k c , which is defined as thepoint where the RS and 1RSB ans¨atze branch out, as we did in the main text; • this constant is the same as the one in the Rammal construction, that is G ( k m ) /k m ; • the points k c and k m are the same.As k c is the point where the RS solution is not optimal anymore, for k < k c we have ¯ q = 0, asdiscussed in [CS92]. Let us now consider Eq. (2.86) with q = 0: differentiating with respect to q and m and setting the results equal to 0 we get the equations for ¯ q and ¯ m , which read µ ¯ q p − − − ¯ q )(1 − (1 − ¯ m )¯ q ) = 0 µ m ¯ q p − 12 log (cid:18) m ¯ q − ¯ q (cid:19) + ¯ m q − (1 − ¯ m )¯ q = 0 (B.14) .4. 1RSB WITH MAGNETIC FIELD µ = p ( βJ ) / 2. These equations can be solved numerically (as we did to obtain the plots inthe main text), but to show our point here we do not really need the explicit solution. Indeed itis enough to notice that ¯ m and ¯ q do not depend on k and therefore g ( k ; 0 , ¯ q , ¯ m ) /k is a constant.Then, we need to check that it is the same constant as the one obtained by Rammal. Againstarting from Eq. (2.86), by putting q = q = q we obtain the RS solution, which is g ( k ; q ) = − ( βJ ) k + k ( k − q p ] − k − 12 log(1 − q ) − 12 log [1 − (1 − k ) q ] − ks ( ∞ ) . (B.15)In this case, extremizing with respect to q , we have an equation which gives the RS solution onthe saddle point, ¯ q . To find k m , we then require ∂∂k g /k = 0. The two resulting equations are: µ ¯ q p − − − ¯ q ) [1 − (1 − k m )¯ q ] = 0 µ k m ¯ q p − 12 log (cid:18) k m ¯ q − ¯ q (cid:19) + k m q − (1 − k m )¯ q = 0 (B.16)that are exactly Eqs. (B.14) with k m instead of ¯ m and ¯ q instead of ¯ q . Therefore k m = ¯ m and¯ q = ¯ q and one can check that g ( k ; 0 , ¯ q, k m ) k = g ( k m , q ) k m . (B.17)It only remains to prove that k c and k m , which in general can be different points, are actually thesame. As the 1RSB ansatz gives the correct solution for the present model, the correspondingSCGF must be convex and thus, in particular, continuous. The only way to obtain a continuousfunction which is equal to the RS one above k c and to the Rammal’s constant below, is to take k c = k m , and so the two functions coincide everywhere. B.4 1RSB with magnetic field To obtain (2.90), the starting point is the p-spin Hamiltonian with magnetic field H = − (cid:88) i
1) elements q , so Q s = k + k ( m − q + k ( k − m ) q (B.28)Every row (column) contains the same elements, so Q r = 1 + ( m − q + ( k − m ) q ∀ a . (B.29)In this way, we arrive at the solution of the saddle-point equations for λ : (cid:0) λ − (cid:1) ab = Q ab − ˆ q − . (B.30) .4. 1RSB WITH MAGNETIC FIELD q − = ( βh ) Q r [1 + ( βh ) l − ( Q s )] . (B.31)The structure of the matrix λ − is therefore the same as the one of Q , with a constant added toeach entry. Thus, the entries of λ − can be written as λ − = (1 − q ) I + ( q − q ) E + ( q − ˆ q − ) C . (B.32)Thanks to this equation, we can compute one of the terms containing λ in Eq.(B.21): (cid:88) ab ( λ − ) ab = k (1 − q ) + km ( q − q ) + ( q − ˆ q − ) k = k ( η − k ˆ q − ) . (B.33)Exploiting the fact that E /m and C /k are projectors, and using that E · C = C · E = m C ,one can check that the inverse of a matrix with the 1RSB structure is again a matrix with thesame structure. In particular, we obtain: λ = 1 η I + q − q η η E + ˆ q − − q η ( η − k ˆ q − ) C (B.34)where η = 1 − q , η = 1 − (1 − m ) q − mq and η = 1 − (1 − m ) q − ( m − k ) q are the threedifferent eigenvalues of Q . Then, as we have done for Q , we can compute the eigenvalues of λ : κ = 1 /η deg. = k ( m − /m ,κ = 1 /η deg. = k/m − ,κ = 1 / ( η − k ˆ q − ) deg. = 1 . (B.35)Using the eigenvalues, we can easily compute, as we did for the h = 0 case, the term log det( λ ).The last step is to evaluate the trace appearing in Eq. (B.21), by using again the propertiesof the matrices λ and Q : Tr ( λ · Q ) = k (cid:18) q − η − k ˆ q − (cid:19) . (B.36)Now we have all the ingredients to write Eq. (2.90), which has the expected limit for k → APPENDIX B. P-SPIN MODEL CALCULATIONS ppendix C Supplemental material toChapter 3 C.1 Order statistics In this section we collect some techniques useful to perform averages on N random points uni-formly distributed on a segment of length L . As first step, we notice that, if f ( x , . . . , x N ) is asymmetric function of its arguments (cid:90) L dx · · · (cid:90) L dx N f ( x , . . . , x N )= N ! (cid:90) x dx (cid:90) x x dx · · · (cid:90) x N x N − dx N − (cid:90) Lx N − dx N f ( x , . . . , x N ) , (C.1)because of the symmetry of f . Therefore, we can use this result as follows (cid:90) x dx · · · (cid:90) x k x k − dx k − = 1( k − (cid:90) x k dx · · · (cid:90) x k dx k − = x k − k ( k − (cid:90) x (cid:96) +2 x (cid:96) dx (cid:96) +1 · · · (cid:90) Lx N − dx N = 1( N − (cid:96) )! (cid:90) Lx (cid:96) dx (cid:96) +1 · · · (cid:90) Lx N − dx N = ( L − x (cid:96) ) N − (cid:96) ( N − (cid:96) )! (C.3)or, more generally (cid:90) x m +1 a dx m · · · (cid:90) bx m + t − dx m + t = ( b − a ) t +1 ( t + 1)! . (C.4)From this we can compute several useful quantities. For example, given N points randomlychosen in the interval [0, 1] and labeled such that they are ordered, { x . . . , x N } , the probabilitythat the k -th point is in the interval [ x, x + dx ] is given by the probability that k − x k and N − k positions are larger than x k , then P k ( x ) dx = N ! (cid:90) x dx · · · (cid:90) x k x k − dx k − dx (cid:90) x k +1 x k dx k +1 · · · (cid:90) x N − dx N = Γ( N + 1)Γ( k ) Γ( N − k + 1) x k − (1 − x ) N − k dx. (C.5)11112 APPENDIX C. SUPPLEMENTAL MATERIAL TO CHAPTER 3 In a similar way, the probability that the k -th point is in [ x, x + dx ] and the (cid:96) -th is in [ y, y + dy ]is ( k < (cid:96) and so x < y ) P k,(cid:96) ( x, y ) dx dy = N ! (cid:90) x dx · · · (cid:90) x k x k − dx k − dx (cid:90) x k +1 x k dx k +1 · · ·· · · (cid:90) x (cid:96) x (cid:96) − dx (cid:96) − dy (cid:90) x (cid:96) +2 x (cid:96) dx (cid:96) +1 · · · (cid:90) x N − dx N = Γ( N + 1)Γ( k )Γ( (cid:96) − k ) Γ( N − (cid:96) + 1) x k ( y − x ) (cid:96) − k − (1 − y ) N − (cid:96) (C.6) C.2 Proofs for the traveling salesman problems C.2.1 Optimal cycle on the complete bipartite graph Consider the tour ˜ h given by the Eqs. (3.62) for N even and Eqs. (3.63) for N odd. We willprove now that this cycle is optimal. To do this, we will suggest two moves that lower the energyof a tour and showing that the only Hamiltonian cycle that cannot be modified by these movesis ˜ h .We shall make use of the following moves in the ensemble of Hamiltonian cycles. Given i, j ∈ [ N ] with j > i we can partition each cycle as h [( σ, π )] = ( C r σ ( i ) b π ( i ) C b π ( j ) r σ ( j +1) C ) , (C.7)where the C i are open paths in the cycle, and we can define the operator R ij that exchanges twoblue points b π ( i ) and b π ( j ) and reverses the path between them as h [ R ij ( σ, π )] := ( C r σ ( i ) [ b π ( i ) C b π ( j ) ] − r σ ( j +1) C )= ( C r σ ( i ) b π ( j ) C − b π ( i ) r σ ( j +1) C ) . (C.8)Analogously by writing h [( σ, π )] = ( C b π ( i − r σ ( i ) C r σ ( j ) b π ( j ) C ) (C.9)we can define the corresponding operator S ij that exchanges two red points r σ ( i ) and r σ ( j ) andreverses the path between them h [ S ij ( σ, π )] := ( C b π ( i − [ r σ ( i ) C r σ ( j ) ] − b π ( j ) C )= ( C b π ( i − r σ ( j ) C − r σ ( i ) b π ( j ) C ) . (C.10)Two couples of points ( r σ ( k ) , r σ ( l ) ) and ( b π ( j ) , b π ( i ) ) have the same orientation if ( r σ ( k ) − r σ ( l ) )( b π ( j ) − b π ( i ) ) > 0. Remark that as we have ordered both set of points this means also that ( σ ( k ) , σ ( l ))and ( π ( j ) , π ( i )) have the same orientation.Then Lemma 1. Let E [( σ, π )] be the cost defined in Eq. (3.54) . Then E [ R ij ( σ, π )] − E [( σ, π )] > if thecouples ( r σ ( j +1) , r σ ( i ) ) and ( b π ( j ) , b π ( i ) ) have the same orientation and E [ S ij ( σ, π )] − E [( σ, π )] > if the couples ( r σ ( j ) , r σ ( i ) ) and ( b π ( j ) , b π ( i − ) have the same orientation.Proof. E [ R ij ( σ, π )] − E [( σ, π )] = w ( r σ ( i ) ,b π ( j ) ) + w ( b π ( i ) ,r σ ( j +1) ) − w ( r σ ( i ) ,b π ( i ) ) − w ( b π ( j ) ,r σ ( j +1) ) (C.11) .2. PROOFS FOR THE TRAVELING SALESMAN PROBLEMS r σ ( j +1) , r σ ( i ) )and ( b π ( j ) , b π ( i ) ) have the same orientation (as shown in [McC99, CLPS14] for a weight whichis an increasing convex function of the Euclidean distance). The remaining part of the proof isanalogous. Lemma 2. The only couples of permutations ( σ, π ) with σ (1) = 1 such that both ( σ ( j + 1) , σ ( i )) have the same orientation as ( π ( j ) , π ( i )) and ( π ( j ) , π ( i − and ( σ ( j ) , σ ( i )) , for each i, j ∈ [ N ] are (˜ σ, ˜ π ) and its dual (˜ σ, ˜ π ) (cid:63) .Proof. We have to start our Hamiltonian cycle from r σ (1) = r . Next we look at π ( N ), ifwe assume now that π ( N ) > 1, there will be a j such that our cycle would have the form( r C r σ ( j ) b C b π ( N ) ), if we assume j > , σ ( j )) and ( π ( N ) , 1) have opposite orientation,so that necessarily π ( N ) = 1. In the case j = 1 our Hamiltonian cycle is of the form ( r b C ),that is ( b Cr ), and this is exactly of the other form if we exchange red and blue points. Weassume that it is of the form ( r Cb ); the other form would give, at the end of the proof, (˜ σ, ˜ π ) (cid:63) .Now we shall proceed by induction. Assume that our Hamiltonian cycle is of the form ( r b r · · · x k Cy k · · · b r b )with k < N , where x k and y k are, respectively, a red point and a blue point when k is odd andviceversa when k is even. Then y k +1 and x k +1 must be in the walk C . If y k +1 it is not thepoint on the right of x k the cycle has the form ( r b r · · · x k y s C y k +1 x l · · · y k · · · b r b ) but then( x l , x k ) and ( y k +1 , y s ) have opposite orientation, which is impossible, so that s = k + 1, that isthe point on the right of x k . Where is x k +1 ? If it is not the point on the left of y k the cyclehas the form ( r b r · · · x k y k +1 · · · y l x k +1 C x s · · · y k · · · b r b ), but then ( x s , x k +1 ) and ( y k , y l )have opposite orientation, which is impossible, so that s = k + 1, that is the point on the leftof y k . We have now shown that the cycle has the form ( r b r · · · y k +1 Cx k +1 · · · b r b ) and canproceed until C is empty.Now that we have understood what is the optimal Hamiltonian cycle, we can look in moredetails at what are the two matchings which enter in the decomposition we used in Eq. (3.55).As ˜ π = ˜ σ ◦ I we have that I = ˜ σ − ◦ ˜ π = ˜ π − ◦ ˜ σ. (C.12)As a consequence both permutations associated to the matchings appearing inEq. (3.55) for theoptimal Hamiltonian cycle are involutions:˜ µ ≡ ˜ π ◦ ˜ σ − = ˜ σ ◦ I ◦ ˜ σ − = ˜ σ ◦ ˜ π − = (cid:2) ˜ π ◦ ˜ σ − (cid:3) − (C.13a)˜ µ ≡ ˜ π ◦ τ − ◦ ˜ σ − = ˜ σ ◦ I ◦ τ − ◦ I ◦ ˜ π − = (cid:2) ˜ π ◦ τ − ◦ ˜ σ − (cid:3) − , (C.13b)where we used Eq. (3.57). This implies that those two permutations have at most cycles of periodtwo, a fact which reflects a symmetry by exchange of red and blue points.When N is odd it happens that I ◦ ˜ σ ◦ I = ˜ σ ◦ τ − N − , (C.14)so that I ◦ ˜ π ◦ I = I ◦ ˜ σ ◦ I ◦ I = ˜ σ ◦ τ − N − ◦ I = ˜ π ◦ I ◦ τ − N − ◦ I = ˜ π ◦ τ N − . (C.15)14 APPENDIX C. SUPPLEMENTAL MATERIAL TO CHAPTER 3 r r r r r b b b b b r r r r r b b b b b Figure C.1: Decomposition of the optimal Hamiltonian cycle ˜ h for N = 5 in two disjoint match-ings ˜ µ and ˜ µ . r r r r b b b b r r r r b b b b Figure C.2: Decomposition of the optimal Hamiltonian cycle ˜ h for N = 4 in he two disjointmatchings ˜ µ and ˜ µ .It follows that the two permutations in Eq. (C.13a) and Eq. (C.13b) are conjugate by II ◦ ˜ π ◦ τ − ◦ ˜ σ − ◦ I = ˜ π ◦ τ N − ◦ τ ◦ τ N − ◦ ˜ σ − = ˜ π ◦ ˜ σ − (C.16)so that, in this case, they have exactly the same numbers of cycles of order 2. Indeed we have˜ µ = (2 , , , , , . . . , N − , N − , N ) (C.17a)˜ µ = (1 , , , , , . . . N, N − 1) (C.17b)and they have N − cycles of order 2 and 1 fixed point. See Fig. C.1 for the case N = 5.In the case of even N the two permutations have not the same number of cycles of order 2,indeed one has no fixed point and the other has two of them. More explicitly˜ µ = (2 , , , , , . . . , N, N − 1) (C.18a)˜ µ = (1 , , , , , . . . N − , N − , N ) (C.18b)See Fig. C.2 for the case N = 4. C.2.2 Optimal cycle on the complete graph: proofs Proof of the optimal cycle for p > σ ∈ S N with σ (1) = 1. Taking σ (1) = 1 corresponds to the irrelevant choice of thestarting point of the cycle. Let us introduce now a new set of ordered points B := { b j } j =1 ,...,N ⊂ [0 , 1] such that b i = (cid:40) r for i = 1 r i − otherwise (C.19) .2. PROOFS FOR THE TRAVELING SALESMAN PROBLEMS R and B h [( σ, π σ )] := ( r , b π σ (1) , r σ (2) , b π σ (2) , . . . , r σ ( N ) , b π σ ( N ) , r σ (1) ) (C.20)so that π σ ( i ) = i = 1 σ ( i ) + 1 for i < kσ ( i + 1) + 1 for i ≥ k i = N (C.21)where k is such that σ ( k ) = N . We have therefore( b π σ (1) , b π σ (2) , . . . , b π σ ( k − , b π σ ( k ) , . . . , b π σ ( N − , b π σ ( N ) )= ( r , r σ (2) , . . . , r σ ( k − , r σ ( k +1) , . . . , r σ ( N ) , r ) . (C.22)In other words we are introducing a set of blue points such that we can find a bipartite Hamil-tonian tour which only use link available in our “monopartite” problem and has the same costof σ . Therefore, by construction (using Eq. (C.22)): E N ( h [ σ ]) = E N ( h [( σ, π σ )]) ≥ E N ( h [(˜ σ, ˜ π )])= E N ( h [(˜ σ, π ˜ σ )]) = E N ( h [˜ σ ]) , (C.23)where the fact that ˜ π = π ˜ σ can be checked using Eqs. (3.60) and (3.61) and (C.21). Proof of the optimal cycle for < p < Lemma 3. Given an Hamiltonian cycle with its edges drawn as arcs in the upper half-plane,let us consider two of the arcs that cannot be drawn without crossing each other. Then, thiscrossing can be removed only in one way without splitting the original cycle into two disjointcycles; moreover, this new configuration has a lower cost than the original one.Proof. Let us consider a generic oriented Hamiltonian cycle and let us suppose it contains amatching as in figure: r r r r There are two possible orientations for the matching that correspond to this two oriented Hamil-tonian cycles:1. ( C r r C r r C ) , 2. ( C r r C r r C ) , APPENDIX C. SUPPLEMENTAL MATERIAL TO CHAPTER 3 where C , C and C are paths (possibly visiting other points of our set). The other possibilitiesare the dual of this two, and thus they are equivalent. In both cases, a priori, there are twochoices to replace this crossing matching ( r , r ), ( r , r ) with a non-crossing one: ( r , r ), ( r , r )or ( r , r ), ( r , r ). We now show, for the two possible prototypes of Hamiltonian cycles, which isthe right choice for the non-crossing matching, giving a general rule. Let us consider case 1: here,if we replace the crossing matching with ( r , r ), ( r , r ), the cycle will split; in fact we wouldhave two cycles: ( C r r C ) and ( r C r ). Instead, if we use the other non-crossing matching, wewould have: ( C r r [ C ] − r r C ). This way we have removed the crossing without splitting thecycle. Let us consider now case 2: in this situation, using ( r , r ), ( r , r ) as the new matching,we would have: ( C r r [ C ] − r r C ); the other matching, on the contrary, gives: ( C r r C )and ( r C r ).The general rule is the following: given the oriented matching, consider the four orientedlines going inward and outward the node. Then, the right choice for the non-crossing matchingis obtained joining the two couples of lines with opposite orientation.Since the difference between the cost of the original cycle and the new one simply consistsin the difference between a crossing matching and a non-crossing one, this is positive when0 < p < 1, as shown in [BCS14].Now we deal with the second point: given an Hamiltonian cycle, in general it is not obviousthat replacing non-crossing arcs with a crossing one, the total number of intersections increases.Indeed there could be the chance that one or more crossings are removed in the operation ofsubstituting the matching we are interested in. Notice that two arcs forms a matching of 4points. Therefore, from now on, we will use expressions like “crossing matching” (“non-crossingmatching”) and “two crossing arcs” (“two non-crossing arcs”) indifferently. We now show thatit holds the following Lemma 4. Given an Hamiltonian cycle with a matching that is non-crossing, if it is replacedby a crossing one, the total number of intersections always increases. Vice versa, if a crossingmatching is replaced by a non-crossing one, the total number of crossings always decreases.Proof. This is a topological property we will prove for cases. To best visualize crossings, wechange the graphical way we use to represent the complete graph that underlies the problem:now the nodes are organized along a circle, in such a way that they are ordered clockwise (or,equivalently, anti-clockwise) according to the natural ordering given by the positions on thesegment [0 , .2. PROOFS FOR THE TRAVELING SALESMAN PROBLEMS h ∗ given in Eq. (3.80) is the optimal one: Proof. Consider a generic Hamiltonian cycle and draw the connections between the points inthe upper half-plane. Suppose to have an Hamiltonian cycle where there are, let us say, n intersections between edges. Thanks to Lemma 3, we can swap two crossing arcs with a non-crossing one without splitting the Hamiltonian cycle. As shown in Lemma 4, this operationlowers always the total number of crossings between the edges, and the cost of the new cycle issmaller than the cost of the starting one. Iterating this procedure, it follows that one can finda cycle with no crossings. Now we prove that there are no other cycles out of h ∗ and its dualwith no crossings. This can be easily seen, since h ∗ is the only cycle that visits all the points,starting from the first, in order. This means that all the other cycles do not visit the points inorder and, thus, they have a crossing, due to the fact that the point that is not visited in a firsttime, must be visited next, creating a crossing. Proof of the optimal cycle for p < , odd N To complete the proof given in the main text, we need to discuss two points. Firstly, we addresswhich is the correct move that swap a non-crossing matching with a crossing one; thanks toLemma 4, by performing such a move one always increases the total number of crossings. Sec-ondly we prove that there is only one Hamiltonian cycle to which this move cannot be applied(and so it is the optimal solution).We start with the first point: consider an Hamiltonian cycle with a matching that is non-crossing, then the possible situations are the following two: r r r r r r r r For the first case there are two possible independent orientations:1. ( r r C r r C ) , 2. ( r r C r r C ) . If we try to cross the matchings in the first cycle, we obtain ( r r C )( r [ C ] − r ), and thisis not anymore an Hamiltonian cycle. On the other hand, in the second cycle, the non-crossingmatching can be replaced by a crossing one without breaking the cycle: ( r r [ C ] − r r C ). Forthe second case the possible orientations are:1. ( r r C r r C ) , 2. ( r r C r r C ) . APPENDIX C. SUPPLEMENTAL MATERIAL TO CHAPTER 3 By means of the same procedure used in the first case, one finds that the non-crossing matchingin the second cycle can be replaced by a crossing one without splitting the cycle, while in thefirst case the cycle is divided by this operation.The last step is the proof that the Hamiltonian cycle given in Eq. (3.83) has the maximumnumber of crossings.Let us consider an Hamiltonian cycle h [ σ ] = (cid:0) r σ (1) , . . . , r σ ( N ) (cid:1) on the complete graph K N .We now want to evaluate what is the maximum number of crossings an edge can have dependingon the permutation σ . Consider the edge connecting two vertices r σ ( i ) and r σ ( i +1) : obviouslyboth the edges ( r σ ( i − , r σ ( i ) ) and ( r σ ( i +1) , r σ ( i +2) ) share a common vertex with ( r σ ( i ) , r σ ( i +1) ),therefore they can never cross it. So, if we have N vertices, each edge has N − N [ σ ( i )] the number of edges that cross the edge ( r σ ( i ) , r σ ( i +1) )and let us define the sets: A j := (cid:40) { r k } k = σ ( i )+1 (mod N ) ,...,σ ( i +1) − N ) for j = 1 { r k } k = σ ( i +1)+1(mod N ) ,...,σ ( i ) − N ) for j = 2 (C.24)These two sets contain the points between r σ ( i ) and r σ ( i +1) . In particular, the maximum numberof crossings an edge can have is given by:max( N [ σ ( i )]) = (cid:40) j | A j | for | A | (cid:54) = | A | | A | − | A | = | A | (C.25)This is easily seen, since the maximum number of crossings an edge can have is obtained whenall the points belonging to the smaller between A and A contributes with two crossings. Thiscannot happen when the cardinality of A and A is the same because at least one of the edgesdeparting from the nodes in A for example, must be connected to one of the ends of the edge( r σ ( i ) , r σ ( i +1) ), in order to have an Hamiltonian cycle. Note that this case, i.e. | A | = | A | canhappen only if N is even.Consider the particular case such that σ ( i ) = a and σ ( i + 1) = a + N − (mod N ) or σ ( i + 1) = a + N +12 (mod N ). Then (C.25) in this cases is exactly equal to N − 3, which meansthat the edges ( r a , r a + N − (mod N ) ) and ( r a , r a + N +12 (mod N ) ) can have the maximum number ofcrossings if the right configuration is chosen.Moreover, if there is a cycle such that every edge has N − r a with r a + N − (mod N ) and r a + N +12 (mod N ) , ∀ a . C.2.3 Optimal TSP and 2-factor for p < and N even We start considering here the 2-factor problem (see Sec. 3.4 for a definition) for p < N case. We will use the shape of its solution to prove that one among the cycles given inEq. (3.86) is the solution of the TSP.In the following we will say that, given a permutation σ ∈ S N , the edge ( r σ ( i ) , r σ ( i +1) ) haslength L ∈ N if: L = L ( i ) := min j | A j ( i ) | (C.26)where A j ( i ) was defined in Eq. (C.24). .2. PROOFS FOR THE TRAVELING SALESMAN PROBLEMS N is a multiple of 4 Let us consider the sequence of points R = { r i } i =1 ,...,N of N points, with N a multiple of 4,in the interval [0 , r ≤ · · · ≤ r N , consider the permutations σ j , j = 1 , σ = ( r , r N +1 , r , r N +2 ) . . . ( r a , r a + N , r a +1 , r a + N +1 ) . . . ( r N − , r N − , r N , r N ) (C.27a) σ = ( r , r N +1 , r N , r N ) . . . ( r a , r a + N , r a − , r a + N − ) . . . ( r N − , r N − , r N − , r N − ) (C.27b)for integer a = 1 , . . . , N − 1. Defined h ∗ := h [ σ ] and h ∗ := h [ σ ], it holds the following: Proposition C.2.1. h ∗ and h ∗ are the 2-factors that contain the maximum number of crossingsbetween the arcs.Proof. An edge can be involved, at most, in N − r a , r a + N (mod N ) ), i.e. by the edges of length N − N edges of this form in a 2-factor. Thus, in order to maximize the numberof crossings, the other N edges must be of the form ( r a , r a + N +1 (mod N ) ) or ( r a , r a + N − N ) ),i.e. of length N − 2. It is immediate to verify that both h ∗ and h ∗ have this property; we haveto prove they are the only ones with this property.Consider, then, to have already fixed the N edges ( r a , r a + N (mod N ) ) , ∀ a ∈ [ N ]. Suppose to havefixed also the edge ( r , r N ) (the other chance is to fix the edge ( r , r N +2 ): this brings to theother 2-factor). Consider now the point r N +1 : suppose it is not connected to the point r N , butto the point r , i.e., it has a different edge from the cycle h ∗ . We now show that this implies itis not possible to construct all the remaining edges of length N − 2. Consider, indeed, of havingfixed the edges ( r , r N ) and ( r , r N +1 ) and focus on the vertex r N +2 : in order to have an edgeof length N − 2, this vertex must be connected either with r or with r , but r already has twoedges, thus, necessarily, there must be the edge ( r N +2 , r ). By the same reasoning, there mustbe the edges ( r N +3 , r ), ( r N +4 , r ) , . . . , ( r N − , r N − ). Proceeding this way, we have constructed N − r N − , r N ), which has nulllength.Therefore the edge ( r , r N +1 ) cannot be present in the optimal 2-factor and so, necessarily, thereis the edge ( r N +1 , r N ); this creates the cycle ( r , r N , r N , r N +1 ). Proceeding the same way on theset of the remaining vertices { r , r , . . . , r N − , r N +2 , . . . , r N − } , one finds that the only way ofobtaining N edges of length N − N edges of length N − h ∗ or h ∗ .Proposition C.2.1, together with the fact that the optimal 2-factor has the maximum numberof crossing matchings, guarantees that the optimal 2-factor is either h ∗ or h ∗ . N is not a multiple of 4 Let us consider the usual sequence R = { r i } i =1 ,...,N of N points, with even N but not a multipleof 4, in the interval [0 , r ≤ · · · ≤ r N , consider the permutation π defined by the followingcyclic decomposition: π = ( r , r N , r N , r N +1 , r , r N +2 )( r , r N +3 , r , r N +4 ) . . . ( r N − , r N − , r N − , r N − ) (C.28)20 APPENDIX C. SUPPLEMENTAL MATERIAL TO CHAPTER 3 r r r r r r r r r r (a) One of the optimal 2-factor solutions for N =10 and p < 0; the others are obtainable cyclicallypermuting this configuration r r r r r r r r r r (b) The same optimal 2-factor solution, but rep-resented on a circle, where the symmetries of thesolutions are more easily seen Figure C.4Defined π k ( i ) := π ( i ) + k (mod N ) , k ∈ [0 , N − 1] (C.29)and h ∗ k := h [ π k ] (C.30)the following proposition holds: Proposition C.2.2. h ∗ k are the 2-factors that contain the maximum number of crossings betweenthe arcs.Proof. Also in this case the observations done in the proof of Proposition C.2.1 holds. Thus,in order to maximize the number of crossing matchings, one considers, as in the previous case,the N edges of length N − 1, i.e. of the form ( r a , r a + N (mod N ) ), and then tries to constructthe remaining N edges of length N − 2, likewise the previous case. Again, if one fixes the edge( r , r N ), the edge ( r , r N +1 ) cannot be present, by the same reasoning done in the proof ofProposition B.1. The fact that, in this case, N is not a multiple of 4 makes it impossible tohave a 2-factor formed by 4-vertices loops, as in the previous case. The first consequence is that,given N edges of length N − 1, it is not possible to have N edges of length N − 2. In order tofind the maximum-crossing solution, one has the following options: • to take a 2-factor with N edges of length N − N − N − N − 2: in this case the theoretical maximum number of crossing matchingsis N ( N − + ( N − N − 4) + N − N − N − • to take a 2-factor with N − N − N +1 edges of length N − 2: in this casethe theoretical maximum number of crossing matchings is ( N − N − N +1)( N − 4) = N − N − h ∗ k belongto the second case and saturate the number of crossing matchings. Suppose, then, to be in thiscase. Let us fix the N − N − 1; this operation leaves two vertices withoutany edge, and this vertices are of the form r a , r a + N (mod N ) , a ∈ [1 , N ] (this is the motivationfor the degeneracy of solutions). By the reasoning done above, the edges that link this verticesmust be of length N − 2, and so they are uniquely determined. They form the 6-points loop ( r a , r a − N (mod N ) , r N − a (mod N ) , r a + N (mod N ) , r a +1 (mod N ) , r a +1+ N (mod N ) ). The remaining .2. PROOFS FOR THE TRAVELING SALESMAN PROBLEMS N − | ( N − N − h ∗ ∈ { h ∗ k } Nk =1 . Proof of the optimal cycles for p < , odd N Proof. Let us begin from the permutations that define the optimal solutions for the 2-factor,that is those given in Eqs. C.27 if is N a multiple of 4 and in Eq. C.28 otherwise. In both cases,the optimal solution is formed only by edges of length N − N − 2. Since theoptimal 2-factor is not a TSP, in order to obtain an Hamiltonian cycle from the 2-factor solution,couples of crossing edges need to became non-crossing, where one of the two edges belongs toone loop of the covering and the other to another loop. Now we show that the optimal way ofjoining the loops is replacing two edges of length N − N − 2. Let usconsider two adjacent 4-vertices loops, i.e. two loops of the form:( r a , r a + N , r a +1 , r a + N +1 ) , ( r a +2 , r a +2+ N , r a +3 , r a + N +3 ) (C.31)and let us analyze the possible cases:1. to remove two edges of length N − 2, that can be replaced in two ways: • either with an edge of length N − N − 4; in this case the maximumnumber of crossings decreases by 4; • or with two edges of length N − 3; also in this situation the maximum number ofcrossings decreases by 4.2. to remove one edge of length N − N − 1, and also this operation canbe done in two ways: • either with an edge of length N − N − 3; in this case the maximumnumber of crossings decreases by 3; • or with an edge of length N − N − 4; in this situation the maximumnumber of crossings decreases by 7.3. the last chance is to remove two edges of length N − 1, and also this can be done in twoways: • either with two edges of length N − 3; here the maximum number of crossings decreasesby 6; • or with two edges of length N − 2; in this situation the maximum number of crossingsdecreases by 2. This happens when we substitute two adjacent edges of length N − r a , r N + a (mod N ) ) and ( r a +1 , r N + a +1 (mod N ) ), with thenon-crossing edges ( r a , r N + a +1 (mod N ) ) and ( r a +1 , r N + a (mod N ) )The last possibility is the optimal one, since our purpose is to find the TSP with the maximumnumber of crossings, in order to conclude it has the lower cost. Notice that the cases discussedabove holds also for the 6-vertices loop and an adjacent 4-vertices loop when N is not a multiple of4. We have considered here adjacent loops because, if they were not adjacent, then the differencein maximum crossings would have been even bigger.22 APPENDIX C. SUPPLEMENTAL MATERIAL TO CHAPTER 3 Now we have a constructive pattern for building the optimal TSP. Let us call O the operationdescribed in the second point of (3). Then, starting from the optimal 2-factor solution, if it isformed by n points, O has to be applied N − N − timesotherwise. In both cases it is easily seen that O always leaves two adjacent edges of length N − N − 2. The multiplicity of solutions is given by the N ways one can choose the two adjacent edges of length N − 1. In particular, the Hamiltoniancycles h ∗ k saturates the maximum number of crossings that can be done, i.e., every time that O is applied, exactly 2 crossings are lost.We have proved, then, that h ∗ k are the Hamiltonian cycles with the maximum number of crossings.Now we prove that any other Hamiltonian cycle has a lower number of crossings. Indeed anyother Hamiltonian cycle must have • either every edge of length N − • or at least one edge of length less than or equal to N − N − N − 2. It is also impossibleto build an Hamiltonian cycle with two non-adjacent edges of length N − N − 2: the proof is immediate. Consider then the two cases presented above: in the firstcase the cycle (let us call it H ) is clearly not optimal, since it differs from h ∗ k , ∀ k by a matchingthat is crossing in h ∗ k and non-crossing in H . Let us consider, then, the second case and supposethe shortest edge, let us call it b , has length N − 3: the following reasoning equally holds if theconsidered edge is shorter. The shortest edge creates two subsets of vertices: in fact, called x and y the vertices of the edge considered and supposing x < y , there are the subsets defined by: A = { r ∈ V : x < r < y } (C.32) B = { r ∈ V : r < x ∨ r > y } (C.33)Suppose, for simplicity, that | A | < | B | : then, necessarily | A | = N − | B | = N + 1. Asan immediate consequence, there is a vertex in B whose edges have both vertices in | B | . Asa consequence, fixed an orientation on the cycle, one of this two edges and b are obviouslynon-crossing and, moreover, have the right relative orientation so that they can be replacedby two crossing edges without splitting the Hamiltonian cycle. Therefore also in this case theHamiltonian cycle considered is not optimal. C.2.4 Second moment of the optimal cost distribution on the completegraph Here we compute the second moment of the optimal cost distribution. We will restrict forsimplicity to the p > E N [ h ∗ ] = | r − r | p + | r N − r N − | p + N − (cid:88) i =1 | r i +2 − r i | p . (C.34)We begin by writing the probability distribution for N ordered points ρ N ( r , . . . , r N ) = N ! N (cid:89) i =0 θ ( r i +1 − r i ) (C.35) .2. PROOFS FOR THE TRAVELING SALESMAN PROBLEMS r ≡ r N +1 ≡ 1. The joint probability distribution of their spacings ϕ i ≡ r i +1 − r i , (C.36)is, therefore ρ N ( ϕ , . . . , ϕ N ) = N ! δ (cid:34) N (cid:88) i =0 ϕ i = 1 (cid:35) N (cid:89) i =0 θ ( ϕ i ) . (C.37)If { i , i , . . . , i k } is a generic subset of k different indices in { , , . . . , N } , we soon get the marginaldistributions ρ ( k ) N ( ϕ i , . . . , ϕ i k ) = N !( N − k )! (cid:32) − k (cid:88) n =1 ϕ i n (cid:33) N − k θ (cid:32) − k (cid:88) n =1 ϕ i n (cid:33) k (cid:89) n =1 θ ( ϕ i n ) . (C.38)Developing the square of Eq. (C.34) one obtains N terms, each one describing a particularconfiguration of two arcs connecting some points on the line. We will denote by χ and χ thelength of these arcs; they can only be expressed as a sum of 2 spacings or simply as one spacing.Because the distribution (C.38) is independent of i , . . . , i k , these terms can be grouped togetheron the base of their topology on the line with a given multiplicity. All these terms have a weightthat can be written as (cid:90) dχ dχ χ p χ p ρ ( χ , χ ) (C.39)where ρ is a joint distribution of χ and χ . Depending on the term in the square of Eq. (C.34)one is taking into account, the distribution ρ takes different forms, but it can always be ex-pressed as in function of the distribution Eq. (C.38). As an example, we show how to calculate | r − r | p | r − r | p . In this case ρ ( χ , χ ) takes the form ρ ( χ , χ ) = (cid:90) dϕ dϕ dϕ ρ (3) N ( ϕ , ϕ , ϕ ) δ ( χ − ϕ − ϕ ) δ ( χ − ϕ − ϕ )= N ( N − (cid:2) (1 − χ ) N − θ ( χ ) θ ( χ − χ ) θ (1 − χ )+ (1 − χ ) N − θ ( χ ) θ ( χ − χ ) θ (1 − χ ) − (1 − χ − χ ) N − θ ( χ ) θ ( χ ) θ (1 − χ − χ ) (cid:3) , (C.40)that, plugged into Eq. (C.39) gives | r − r | p | r − r | p = Γ( N + 1) (cid:2) Γ(2 p + 3) − Γ( p + 2) (cid:3) ( p + 1) Γ( N + 2 p + 1) . (C.41)All the other terms contained can be calculated the same way; in particular there are 7 dif-ferent topological configurations that contribute. After having counted how many times eachconfiguration appears in ( E N [ h ∗ ]) , the final expression that one gets is( E N [ h ∗ ]) = Γ( N + 1)Γ( N + 2 p + 1) (cid:34) N − p + 2)Γ( p + 1)+ (cid:0) ( N − N − p + 1) − N + 8 (cid:1) Γ( p + 1) ++ [ N (2 p + 1)( p + 5) − p ( p + 5) − 8] Γ(2 p + 1)( p + 1) (cid:21) . (C.42)24 APPENDIX C. SUPPLEMENTAL MATERIAL TO CHAPTER 3 C.3 2-factor problem and the plastic constant C.3.1 The Padovan numbers According to the discussion in Sec. 3.4, in the optimal 2-factor configuration of the completebipartite graph there are only loops of length 2 and 3. Here we will count the number of possibleoptimal solutions for each value of N . Let f N be the number of ways in which the integer N canbe written as a sum in which the addenda are only 2 and 3. For example, f = 1 because N = 4can be written only as 2 + 2, but f = 2 because N = 5 can be written as 2 + 3 and 3 + 2. Wesimply get the recursion relation f N = f N − + f N − (C.43)with the initial conditions f = f = f = 1. The N -th Padovan number Pad( N ) is defined as f N +2 . Therefore it satisfies the same recursion relation Eq. (C.43) but with the initial conditionsPad(0) = Pad(1) = Pad(2) = 1.A generic solution of Eq. (C.43) can be written in terms of the roots of the equation x = x + 1 . (C.44)There is one real root p = (9 + √ + (9 − √ ≈ . . . . (C.45)known as the plastic constant and two complex conjugates roots z ± = ( − ± i √ √ + ( − ∓ i √ − √ ≈ − . ± i . . . . (C.46)of modulus less than unity. ThereforePad( N ) = a p N + b z N + + b ∗ z N − (C.47)and by imposing the initial conditions we getPad( N ) = ( z + − z − − p − z + )( p − z − ) p N + ( p − z − − z + − p )( z + − z − ) z N + + ( p − z + − z − − p )( z − − z + ) z N − . (C.48)For large N we get Pad( N ) ∼ λ p N (C.49)with λ ≈ . . . . the real solution of the cubic equation23 t − t + 6 t − . (C.50)In Fig. C.5 we plot the Padovan sequence for a range of values of N and its asymptotic expression.There is a relation between the Padovan numbers and the Binomial coefficients. If we consider k addenda equal to 3 and s addenda equal to 2, there are (cid:0) k + sk (cid:1) = (cid:0) k + ss (cid:1) possible differentorderings. If we fix N = 3 k + 2 s we easily get thatPad( N − 2) = (cid:88) k ≥ (cid:88) s ≥ δ N, k +2 s (cid:18) k + sk (cid:19) = (cid:88) m ≥ (cid:88) k ≥ δ N,k +2 m (cid:18) mk (cid:19) . (C.51) .3. 2-FACTOR PROBLEM AND THE PLASTIC CONSTANT N P a d ( N ) Figure C.5: Padovan numbers and their asymptotic expansion. C.3.2 The recursion on the complete graph A recursion relation analogous to Eq. (C.43) can be derived for the number of possible solutionof the 2-factor problem on the complete graph K N . Let g N be the number of ways in which theinteger N can be expressed as a sum of 3, 4 and 5. Then g N satisfies the recursion relation givenby g N = g N − + g N − + g N − , (C.52)with the initial conditions g = g = g = g = 1 and g = 2. The solution of this recursionrelation can be written in function of the roots of the 5-th order polynomial x − x − x − . (C.53)This polynomial can be written as ( x + 1)( x − x − 1) = 0. Therefore the roots will be the sameof the complete bipartite case ( p , and z ± ) and in addition y ± = ± i . (C.54) g N can be written as g N = α p N + α z N + + α z N − + α y N + + α y N − , (C.55)where the constants α , α , α , α , and α are fixed by the initial conditions g = g = g = g = 1 and g = 2. When N is large the dominant contribution comes from the plastic constant g N (cid:39) α p N . (C.56)with α ≈ . .. . C.3.3 The plastic constant In 1928, shortly after abandoning his architectural studies and becoming a novice monk ofthe Benedictine Order, Hans van der Laan discovered a new, unique system of architectural26 APPENDIX C. SUPPLEMENTAL MATERIAL TO CHAPTER 3 proportions. Its construction is completely based on a single irrational value which he calledthe plastic number (also known as the plastic constant) [MS12]. This number was originallystudied in 1924 by a French engineer, G. Cordonnier, when he was just 17 years old, calling it”radiant number”. However, Hans van der Laan was the first who explained how it relates tothe human perception of differences in size between three-dimensional objects and demonstratedhis discovery in (architectural) design. His main premise was that the plastic number ratiois truly aesthetic in the original Greek sense, i.e. that its concern is not beauty but clarity ofperception [Pad02]. The word plastic was not intended, therefore, to refer to a specific substance,but rather in its adjectival sense, meaning something that can be given a three-dimensionalshape [Pad02]. The golden ratio or divine proportion φ = 1 + √ ≈ . , (C.57)which is a solution of the equation x = x + 1 , (C.58)has been studied by Euclid, for example for its appearance in the regular pentagon, and has beenused to analyze the most aestetich proportions in the arts. For example, the golden rectangle,of size ( a + b ) × a which may be cut into a square of size a × a and a smaller rectangle of size b × a with the same aspect ratio a + ba = ab = φ . (C.59)This amounts to the subdivision of the interval AB of length a + b into AC of length a and BC of length b . By fixing a + b = 1 we get 1 a = a − a = φ , (C.60)which implies that φ is the solution of Eq. (C.58). The segments AC and BC , of length,respectively φ ( φ, 1) are sides of a golden rectangle.But the golden ratio fails to generate harmonious relations within and between three-dimensionalobjects. Van der Laan therefore elevates definition of the golden rectangle in terms of space di-mension. Van der Laan breaks segment AB in a similar manner, but in three parts. If C and Dare points of subdivision, plastic number p is defined with ABAD = ADBC = BCAC = ACCD = CDBD = p (C.61)and by fixing AB = 1, from AC = 1 − BC , BD = 1 − AD we get p = p + 1 . (C.62)The segments AC , CD and BD , of length, respectively, p +1) p ( p , p , 1) can be interpreted assides of a cuboid analogous to the golden rectangle. ppendix D Supplemental material toChapter 4 D.1 Time-to-solution The time to solution (TTS) is a widely accepted empiric measure of algorithmic performances.It is defined as the time needed to solve an instance of a problem with high probability (herewe take the 99%). In particular, given the probability p ( t ) of solving the instance in time t , theTTS is given by TTS( t ) = t log(0 . − p ( t )) . (D.1)One is usually interested in the minimum TTS, given byTTS = min t TTS( t ) . (D.2)When we want to test our algorithm on a set of instances I and a probability distribution p is defined on such a set, the measure of performance can be the average (cid:104) TTS (cid:105) = (cid:88) I ∈I p ( I ) TTS I , (D.3)where TTS I is the T T S for the instance I . This average is typically computed as an empiricalaverage on a large number of instances generated with probability p . For some problems andsome algorithms, there are instances that are never solved for reasonable running time. WhatTTS should one use in Eq. (D.3) for them? To avoid this problem, it is often used, instead ofthe average TTS, the 50-percentile of the TTSs computed for a large set of instance. Typically(and this is the approach used throughout Sec. 4.2.3) this 50-percentile is shown together withthe 35-percentile and 65-percentile. D.2 Hamming weight example Here we analyze in detail the annealing (both classical and quantum) of a toy problem, to give aconcrete example of the effect of choosing the penalty-term parameter. Consider a cost functiondefined on x ∈ { , } N with the symmetry E ( x ) = E ( σ ( x )) for each σ ∈ S N permutation of N APPENDIX D. SUPPLEMENTAL MATERIAL TO CHAPTER 4 objects. This kind of cost functions characterize the so-called Hamming weight problems, sincethe only thing they can depend on is the Hamming weight (i.e. number of 1) of the configurations.These problems have been extensively used to explore the properties of thermal and quantumannealing, mainly because of their simplicity: their high level of symmetry often allows for exactcomputations, and the specific form of the cost function can be chosen such that the requiredannealing time is either polynomial or exponential (or even exponential for classical thermalannealing and polynomial for the quantum version).Here we introduce a constrained version of the problem, with cost function E ( x ) = 1 N ( W ( x ) − N/ , (D.4)where W ( x ) is the Hamming weight of the configuration x . The normalization is chosen to makethe cost function an extensive quantity. Indeed, if we define the intensive Hamming weight as w ( x ) = W ( x ) /N , we have the density-of-cost function e ( x ) = E ( x ) /N = ( w ( x ) − / . (D.5)Let us suppose that we have the following constraint: only configurations with density of costin [0 , / ∪ [1 / , 1] are acceptable. To implement this constraint, we consider the penalty term(that we write directly as function of the intensive Hamming weight w ) p ( w ) = (cid:26) (cid:0) − (cid:0) w − (cid:1)(cid:1) (cid:0) w − (cid:1) − (cid:0) w − (cid:1) (cid:0) w − (cid:1) if w ∈ (cid:2) , (cid:3) ,0 otherwise, (D.6)where the non-zero term is simply a linear interpolation between x − / x > / 4) and − x + 1 / x < / e tot ( w ; λ ) = e ( w ) + λ p ( w ) , (D.7)and our goal is to find the minimum and the optimum value of λ . Notice that this cost functionis not given as a local Hamiltonian, and in particular it is not in QUBO form; this is not relevantfor our discussion here, since we are only interested in understanding in a simple example therole of the coupling parameter for the penalty term.One can consider both SA and QAA to solve this problem: in both case, as shown in AppendixD.2, if a too high penalty term λ is chosen the system remains trapped an exponentially longtime in a local minimum. D.2.1 Classical annealing In the simulated annealing algorithm, the probability of a configuration is its free energy F ( W ; λ ),defined at temperature β byexp ( − βF ( W ; λ )) = (cid:18) NW (cid:19) exp ( − βE tot ( λ )) . (D.8)Expanding the binomial for large N , and keeping only the dominant term, we obtain the followingdensity of free energy f ( w, λ ) = 1 β [(1 − w ) log(1 − w ) + w log w ] + e tot ( w ; λ ) . (D.9) .2. HAMMING WEIGHT EXAMPLE λ and β . Notice thatthe probability of configurations which are not corresponding to the minimum of Eq. (D.9) is ∼ exp( − N ∆), where ∆ is the free energy density of such configurations minus the minimum freeenergy density. Therefore the (local) SA algorithm takes exponential time to leave the minimumof f , and for λ too large (such as λ = 1 in Fig. D.1) at β (cid:39) . w to the one at smaller w , which will become theglobal minimum. If a lower λ is chosen, such as λ = 1 / 12 (which is the minimum), the previoussituation never happens, consisting in an annealing that can proceed in polynomial time. Noticethat if λ is further decreased the final minimum will be in the forbidden interval, and again wewill need to wait an exponential time to reach the minimum acceptable configuration (becauseit is not a local minimum anymore). D.2.2 Quantum annealing For the quantum case, we will consider the following annealing procedure: • the quantum problem Hamiltonian is defined by its action on the computational basis, thusstarting from Eq. (D.7); • the quantum driving term of the Hamiltonian (the one which provides quantum fluctua-tions) is (cid:80) i σ xi , so the system is initialized in the ground state of this term and the problemHamiltonian is slowly turned on, while this driving term is slowly turned off.Because of the symmetry of the problem, we can define a semi-classical potential and supposethat its minimum describes the instantaneous ground state of the system during the quantumannealing schedule. The idea is that, because we initialize the system in the factorized superpo-sition state | + (cid:105) N = ⊗ i | (cid:105) i + | (cid:105) i √ | θ (cid:105) = (cid:79) i =1 ,...,N (cos θ | (cid:105) + sin θ | (cid:105) ) . (D.11)The semi-classical potential we use is (cid:104) θ | H tot | θ (cid:105) , with H tot = s ( H + λH P ) − (1 − s ) (cid:88) i σ xi , (D.12)where s = s ( t ) is the parameter which defines the annealing schedule, with s (0) = 0, s ( T ) = 1and T is the total annealing time; H and H P are defined by their action on the computational notice that there are also other possible states, which are entangled; however, this form of quasi-classicalpotential has been profitably used in many examples (see [MAL16]), so we use it also here. APPENDIX D. SUPPLEMENTAL MATERIAL TO CHAPTER 4 basis. Therefore, one has (cid:104) θ | H | θ (cid:105) = (cid:88) a,b (cid:104) θ | a (cid:105) (cid:104) b | θ (cid:105) (cid:104) a | H | b (cid:105) = N (cid:88) W =0 (cid:18) NW (cid:19) (sin θ ) W (cos θ ) N − W (cid:32) N (cid:18) W − N (cid:19) (cid:33) = N (cid:18) sin θ − (cid:19) . (D.13)Notice that one can naturally introduce the “Hamming-weight operator” as (cid:80) i σ zi and (cid:104) θ | (cid:88) i σ zi | θ (cid:105) = N sin θ, (D.14)therefore the semi-classical potential is identical to the classical one, where the Hamming weightbecomes the expectation value of the Hamming-weight operator. It is slightly more tricky todeal with the penalty term (we need to take into account also sub-leading terms of the Stirlingapproximation): (cid:104) θ | H P | θ (cid:105) = N/ (cid:88) W = N/ (cid:18) NW (cid:19) (sin θ ) W (cos θ ) N − W N p ( w ) ∼ N / (cid:90) d w e − Ng ( w,θ ) p ( w ) (cid:112) w (1 − w ) , (D.15)where g ( w, θ ) = w log (cid:0) w/ sin θ (cid:1) + (1 − w ) log (cid:0) (1 − w ) / cos θ (cid:1) . (D.16)Now the integral in Eq. (D.15) is done by saddle-point method, where the solution of the saddlepoint equations is (using the fact that 1 / ≤ w ≤ / w = 11 + cot( θ ) π ≤ θ ≤ π π ≤ θ ≤ π , / < θ < π π < θ < π, / π < θ < π . (D.17)Notice that when w = 1 / w = 1 / g (1 / (1 + cot( θ ) , θ ) = 0, therefore we have (cid:104) θ | H P | θ (cid:105) ∼ N √ π cos(2 θ ) (1 − θ )) π ≤ θ ≤ π π ≤ θ ≤ π , (cid:104) θ | (cid:88) i σ xi | θ (cid:105) = N sin(2 θ ) . (D.19)Putting all the terms together one can easily see that considerations analogue to those done forthe classical case hold also here. .3. FAILURE OF OUR METHOD FOR AN INSTANCE OF THE MINOR EMBEDDING PROBLEM D.3 Failure of our method for an instance of the minorembedding problem The results obtained for the matching problem raise the question if a similar analysis can beextended to all constrained problems. However, this is not the case. Remember that Eq. (4.65)can be used to obtain efficiently an estimate for the minimum parameter if the chain of inequal-ities (4.66) is true (this is a necessary but not sufficient condition: we also need to be able toapproximate or solve efficiently the problem under analysis). But there are problems, such asthe minor embedding problem, where these inequalities are false.To show that, we briefly introduce the minor embedding problem in the formulation that isrelevant for us. Then we will choose a specific instance of the problem where one can explicitlysee that the inequalities (4.66) are false.When a problem is written in QUBO form, an underlying weighted graph can be definedlooking at the couplings J i,j in the Hamiltonian: each qubit is associated to a vertex of thegraph, and an edge of the graph is present between qubits i and j if J i,j (cid:54) = 0. This graph is ofgreat importance, because in real quantum annealing devices there is an effective hardware graphwith qubits as vertices, and qubits can interact only if they correspond to two connected verticesin this hardware graph. If the former graph (“problem graph”) and the latter (“hardware graph”)are different, an extra-step is need: the minor embedding. In the minor embedding problem, wehave a QUBO problem defined on a graph G and we want to embed G in another graph (which istypically a fixed hardware graph) U , such that we have a QUBO problem on the graph U whoseground state corresponds through a known map to the ground state of our original problem. Todo so we define a function φ : u → g , where g and u are the vertices of respectively G and U ,such that if we contract all those vertices in u which are sent by φ to the same vertex in g , weobtain from the graph U the graph G . In other words, the function φ defines subsets of u whichcorrespond to the same vertex in g . These subsets of spins are often called “chains”. Then thehardware graph U can be used for the QAA, with problem Hamiltonian H (cid:48) = H + J (cid:88) i,j ∈ uφ ( i )= φ ( j ) σ i σ j , (D.20)where H is the Hamiltonian of the original problem, where the interaction among two vertices a and b connected in the graph G is now between two qubits of u , k and (cid:96) , which are connected in u and such that φ ( k ) = a and φ ( (cid:96) ) = b .The minor embedding problem in general consists in finding a suitable function φ . Here we takeanother point of view: given a suitable φ , we are interested in finding the minimum value forthe parameter J in Eq. (D.20). The term with coupling J is a kind of penalty term, whosecontribution is minimum when all the spins inside the same chain have equal sign. Notice thatonly in this situation is possible to go back from the solution of the problem on the hardwaregraph to the original graph G . Therefore the search for the ground state of H (cid:48) is a constrainedproblem, where the only acceptable configurations are those with chains composed of spins withthe same sign. This is the problem we are interested in, and that we would like to address withthe technique developed in Sec. 4.2.3 and used for the matching problem.Let us consider a specific (and trivial) example in which the order condition given in (4.66) isnot respected. The starting problem graph and the hardware graph are those given in Fig. D.2,where the couplings in the starting problems are ± 1: the black continuous edges correspondsto − APPENDIX D. SUPPLEMENTAL MATERIAL TO CHAPTER 4 the chain with wavy links, and the analogous of Eq. (4.65) is λ > max k ∈{ , } E − E k k = max k ∈{ , } λ k , (D.21)where now E k is the energy of the problem on the graph without wavy lines, when k kinks arepermitted on the wavy lines. Therefore it is easy to check that λ = 0 and λ = 1 / 2, therefore λ > λ and the inequality given in (4.66) is not fulfilled.This allows us to observe that there are problems (as the matching problems) in which theorder relation (4.66) holds and our method can be efficiently use to estimate the minimumparameter, while in other problems (such as the minor embedding problem) this is not true.This brings in turn an interesting consequence: whatever method one wants to use to decide thevalue of the parameter, a value that prevents the breaking of (only) one constraint is in generala too weak condition. Indeed, one has to prevent the breaking of any number of constraints. D.4 Other methods to find the minimum penalty termweight To the best of our knowledge, there are two ways to choose parameters for penalty terms: one isthe “trial-and-error” method, the other is the one described by Choi in [Cho08]. The first methodconsists basically in trying many different values and solving the problem with these values tosee if the constraints are broken in the ground state. A limitation of this method is that, evenif the problem can be solved efficiently (which is not the case for hard computational problemsof relevant size), one cannot be sure to have found the real minimum parameter if the numberof attempts with different parameters is small. On the other hand, the strength of this methodis that it can be run using the same heuristic algorithm used to solve the problem (so thereis no need in principle for additional problem-dependent algorithms). However, especially forlarge size instances of hard problems, since the heuristic method fails often, to have a reasonableprecision on the parameter the algorithms has to be run many times, so it becomes more andmore inefficient as the system size increases. Moreover, as we have discussed in the main text, formany constrained problems it is reasonable to expect that even a small error in the parametersetting cause a slowdown which is more and more relevant as the system size increases.The second method consists in pre-processing the instance and choosing a penalty-term weighthigh enough to ensure the constraints. We will not review that method in general, but we willdiscuss how to apply it to the matching problem in the next section. Here we will only highlightthe main differences with our method: • it is built to work with the minor embedding problem, but the same idea can easily beapplied to other problems; however, always in [Cho08] the author refines the method ina way that only applies to the minor embedding problem, so we will not discuss thatrefinement here; • it has a different parameter for each constraint of the problem, and these parameters areindividually tuned; • it uses no information about the solution (even approximate) of the instance.To conclude, this method could be applied to problems where the algorithm we described inSec. 4.2.3 fails (that is, when condition (4.66) is not true or the problem cannot be approximatedin an acceptable way), but the results are often quite far from the real minimum value of theparameters. .5. PROOF OF THE INEQUALITY ( ?? ) FOR THE MATCHING PROBLEM Choi’s method applied to the matching problem Another interesting feature of the matching problem is that here we can quantify the how goodthe Choi’s upper bound for the minimum parameters is. Choi’s method can be applied toHamiltonians of the form H = H + (cid:88) i µ i H ( i ) P , (D.22)where H ( i ) P enforces the local i -th constraint. The method consists in choosing the values of the µ i singularly, in such a way that the i -th constraint is never broken, irrespectively of the solutionof the specific instance. More concretely, for the matching problem one would have the followingterm to ensure that one and only one edge connects the vertex ν to another vertex: H ( ν ) P = (cid:32) − (cid:88) e ∈ ∂ν x e (cid:33) (D.23)and to be sure that independently on the solution of the instance this constraint is not broken,one has to choose a value for µ ν such that µ ν > max e ∈ ∂ν w e / , (D.24)where the factor 1/2 is because we have a contribution from two of penalty terms each time webreak a constraint.Consider now a specific example: the Euclidean matching in one dimension. In this case, for eachinstance 2 N points are randomly thrown on a segment of length 1. The graph of the problem isa complete graph where each vertex corresponds to a point on the segment, and the link weightsare the distances on the segment between the two points which correspond to the link endpoints.Since each vertex corresponds to a point in one dimension, we can order the points and it canbe seen that the distance between the first and last point on the typical instance is going to 1with N . Therefore (cid:104) µ i (cid:105) ∼ max (cid:18) i N + 1 , − i N + 1 (cid:19) , (D.25)where λ i is the coupling for the i − th point once points have been ordered and the angled bracketsdenotes average on the disorder. Summing all the λ i s we obtain (cid:104) (cid:88) i µ i (cid:105) ∼ N (cid:90) / d x (1 − x ) + (cid:90) / d x x = 3 N . (D.26)On the opposite, the minimum parameter given by Eq. (4.67) is going to zero with N . Indeedfor this very simple problem (cid:104) E (0)0 (cid:105) = O (1) (that is, of the same order of the length of thesegment) and by removing a couple of points we cannot change this limiting behavior, thereforelim N →∞ λ = 0 and so the sum of N of those parameters is scaling differently from Eq. (D.26),and it is definitely lower than that. In particular, from numerical simulations we see that theminimum parameter λ = O (1 /N ), therefore the sum of N of these gives a constant, rather thangoing to infinity. D.5 Proof of the inequality (4.72) for the matching problem In this appendix we give the full proof of (4.69) for the matching problem. We will use thenotation introduced in Sec. 4.2.3. As first step, we introduce the concept of signed path, whichwill be of used many times in the proof. Then we will proceed with the actual proof.34 APPENDIX D. SUPPLEMENTAL MATERIAL TO CHAPTER 4 Definition: signed paths. Consider an instance of the problem, that is a given weightedgraph with 2 N vertexes. Take m (cid:54) = (cid:96) ≤ N and consider E (cid:96) and E m . In general, since (cid:96) (cid:54) = m ,the matchings of which E (cid:96) and E m are the costs (with a slight abuse of notation, from now onwe will simply say “the matchings E (cid:96) and E m ”) can be done over two completely different setsof points. Indeed, if for example we have m = (cid:96) − 1, in E m we are using 2 points less than thoseused in E (cid:96) . But this does not necessarily mean that some of the points which are used in E (cid:96) areused also in E m , so the matching E m can be completely different from E (cid:96) . Take a vertex usedin E (cid:96) but not in E m , x . Consider the path on the instance graph which starts in x and which isbuilt by using links alternatively of E (cid:96) and E m . Let us call y = y ( x ) the ending vertex of thispath. We define the “signed path” P (cid:96),m ( x, y ) as the weight of that path, which is obtained bysumming the weight of each edge of the path used in E m and by subtracting the weight of eachedge of the path used in E (cid:96) . Proof, part I: stability. Let us denote with { x } the set of vertexes used in E n , with { y } those used in E n +1 but not in E n and with { z } those used in E n but not in E n +1 . We prove that { z } = ∅ , using a reductio ad absurdum . To do so, we take one among the z ’s, z (cid:63) , and build thesigned path P n +1 ,n ( z (cid:63) , w ), where w is the point from which we cannot proceed further with thepath. Notice that by construction w can only be another point of { z } or one of { y } . These twocases require separate discussions. Case 1: consider that P n +1 ,n ( z (cid:63) , w ) = P n +1 ,n ( z (cid:63) , y ), with y = y ( z (cid:63) ) ∈ { y } . Since P n +1 ,n ( z (cid:63) , y )starts with a link of E n and the last link is of E n +1 , it has the same number of links of both thematchings. Moreover, E n − P n +1 ,n ( z (cid:63) , y ) (D.27)is again an acceptable matching of 2 n points (although not the same points used in E n ), so ithas to be greater than or equal to E n because E n is the optimal matching of 2 n points. So wehave P n +1 ,n ( z (cid:63) , y ) ≥ 0. On the other side, also E n +1 + P n +1 ,n ( z (cid:63) , y ) (D.28)is an acceptable matching of 2( n + 1) points, which similarly leads to P n +1 ,n ( z (cid:63) , y ) ≤ 0. Therefore we have P n +1 ,n ( z (cid:63) , y ) = 0, which is the absurdum . Notice thatactually we can have paths equal to zero and so { z } (cid:54) = ∅ if there are “compatible sub-matchings”with the same cost. However this kind of degeneracy can be easily taken into account with aslight modification of our arguments (for simplicity we will consider here the non-degenerate caseonly). Case 2: now we consider P n +1 ,n ( z (cid:63) , w ) = P n +1 ,n ( z (cid:63) , z (cid:48) ), with z (cid:48) = z (cid:48) ( z (cid:63) ) ∈ { z } . Take y ∈ { y } such that the signed path ˜ P n,n +1 ( y , y ) ends in y ∈ { y } . A such point y have toexist: indeed, a path starting from y ∈ { y } can only end in another point of { y } or a point of { z } . However, since E n +1 has two points more than E n , the set { y } has two more point that theset { z } , so at least one of the paths starting from points in { y } has to finish in { y } . As in thecase 1, P n +1 ,n ( z (cid:63) , z (cid:48) ) − ˜ P n,n +1 ( y , y ) has the same number of links of both the matchings and E n − P n +1 ,n ( z (cid:63) , z (cid:48) ) + ˜ P n,n +1 ( y , y ) (D.29)and E n +1 + P n +1 ,n ( z (cid:63) , z (cid:48) ) − ˜ P n,n +1 ( y , y ) (D.30)are acceptable matchings of, respectively, n and n + 1 points. So the proof proceeds as theprevious case. .5. PROOF FOR THE MATCHING PROBLEM Proof, part II: order. We want to prove Equation (4.72). Let us denote with { x , x , x , x } the four points of E n +1 which are not used in E n − . Let x i be such that P n − ,n +1 ( x i , x j ), thatis the ending point of the signed path starting in x i is x j . Two such points x i and x j have toexist for definition of path and because of the stability property (the path starting in x i cannotend somewhere else than in another of the x ’s). Then E n +1 − P n − ,n +1 ( x i , x j ) ≥ E n , (D.31)because this is an acceptable matching of n points and E n is the optimal among these matchings.But also E n − + P n − ,n +1 ( x i , x j ) ≥ E n , (D.32)because this is an acceptable matching of n points and E n is the optimal among these matchings.Therefore equation (4.72) follows.36 APPENDIX D. SUPPLEMENTAL MATERIAL TO CHAPTER 4 β =0.01 β =5.9 β =100 λ =0 λ =1/12 λ =1 f r ee e n e r g y d e n s i t y Hamming weight Figure D.1: Free energy landscape for the Hamming weight problem defined in Sec. 4.2.3, forvarious values of inverse temperature β and of the penalty term parameter λ . In particular, thesecond column shows to the temperature in which the role of the local and global minimum isexchanged, and the second row shows what happens if we use the minimum value for λ . Noticethat there is no local minimum in this last case.Figure D.2: Example of minor embedding: on the left there is the problem graph, on the right thehardware graph. Each vertex is a spin, the black continuous edges are ferromagnetic couplings,the black dashed edge is a antiferromagnetic couplings and the red wavy edges are the couplingsused for the minor embedding. Therefore the blue spin in the problem graph corresponds to thethree blue spins in the hardware graph. ibliography [AAR99] George E. Andrews, Richard Askey, and Ranjan Roy. Special Functions , volume 71of Encyclopedia of Mathematics and its Applications . Cambridge University Press,1999.[ABCC06] David Applegate, Ribert Bixby, Vasek Chvatal, and William Cook. Concorde tspsolver, 2006.[ABM04] A Andreanov, Francesca Barbieri, and OC Martin. Large deviations in spin-glassground-state energies. The European Physical Journal B-Condensed Matter andComplex Systems , 41(3):365–375, 2004.[ACdF89] B. Apolloni, C. Carvalho, and D. de Falco. Quantum stochastic optimization. Stochastic Processes and their Applications , 33(2):233 – 244, 1989.[ACO08] Dimitris Achlioptas and Amin Coja-Oghlan. Algorithmic barriers from phase tran-sitions. In , pages 793–802. IEEE, 2008.[Aha99] Dorit Aharonov. Quantum Computation , pages 259–346. 1999.[AKT84] M. Ajtai, J. Koml´os, and G. Tusn´ady. On optimal matchings. Combinatorica ,4(4):259–264, Dec 1984.[AL18] Tameem Albash and Daniel A. Lidar. Adiabatic quantum computation. Rev. Mod.Phys. , 90:015002, Jan 2018.[Ami09] M. H. S. Amin. Consistency of the adiabatic theorem. Phys. Rev. Lett. , 102:220401,Jun 2009.[AOI10] Ariel Amir, Yuval Oreg, and Yoseph Imry. Localization, anomalous diffusion,and slow relaxations: A random distance matrix approach. Phys. Rev. Lett. ,105:070601, Aug 2010.[AST19] Luigi Ambrosio, Federico Stra, and Dario Trevisan. A pde approach to a 2-dimensional matching problem. Probability Theory and Related Fields , 173(1):433–477, Feb 2019.[AvDK + 07] D. Aharonov, W. van Dam, J. Kempe, Z. Landau, S. Lloyd, and O. Regev. Adia-batic quantum computation is equivalent to standard quantum computation. SIAMJournal on Computing , 37(1):166–194, 2007.[Bac84] C P Bachas. Computer-intractability of the frustration model of a spin glass. Journal of Physics A: Mathematical and General , 17(13):L709–L712, sep 1984.13738 BIBLIOGRAPHY [Bar82] F Barahona. On the computational complexity of ising spin glass models. Journalof Physics A: Mathematical and General , 15(10):3241–3253, oct 1982.[BBBV97] C. Bennett, E. Bernstein, G. Brassard, and U. Vazirani. Strengths and weaknessesof quantum computing. SIAM Journal on Computing , 26(5):1510–1523, 1997.[BBCZ11] Mohsen Bayati, Christian Borgs, Jennifer Chayes, and Riccardo Zecchina. BeliefPropagation For Weighted b-Matchings on Arbitrary Graphs and its Relation toLinear Programs with Integer Solutions. SIAM Journal on Discrete Mathematics ,25(2):989–1011, 2011.[BBHT98] Michel Boyer, Gilles Brassard, Peter Hyer, and Alain Tapp. Tight bounds onquantum searching. Fortschritte der Physik , 46(45):493–505, 1998.[BBRA99] J. Brooke, D. Bitko, T. F. Rosenbaum, and G. Aeppli. Quantum annealing of adisordered magnet. Science , 284(5415):779–781, 1999.[BCS14] Elena Boniolo, Sergio Caracciolo, and Andrea Sportiello. Correlation function forthe grid-poisson euclidean matching on a line and on a circle. Journal of StatisticalMechanics: Theory and Experiment , 2014(11):P11023, nov 2014.[BHH59] J. Beardwood, J. H. Halton, and J. M. Hammersley. The shortest path throughmany points. Proc. Cambridge Philos. Soc. , 55, 1959.[BMO + 15] Boaz Barak, Ankur Moitra, Ryan O’Donnell, Prasad Raghavendra, Oded Regev,David Steurer, Luca Trevisan, Aravindan Vijayaraghavan, David Witmer, andJohn Wright. Beating the random assignment on constraint satisfaction problemsof bounded degree. arXiv preprint arXiv:1505.03424 , 2015.[BV97] E. Bernstein and U. Vazirani. Quantum complexity theory. SIAM Journal onComputing , 26(5):1411–1473, 1997.[BZ18] Carlo Baldassi and Riccardo Zecchina. Efficiency of quantum vs. classical annealingin nonconvex learning problems. Proceedings of the National Academy of Sciences ,115(7):1457–1462, 2018.[CBB + 97] Nicolas J. Cerf, Jacques H. Boutet de Monvel, Oriol Bohigas, Olivier C. Martin,and Allon G. Percus. The random link approximation for the Euclidean travelingsalesman problem. Journal de Physique I , 7(January):117–136, 1997.[CC05] Tommaso Castellani and Andrea Cavagna. Spin-glass theory for pedestrians. Jour-nal of Statistical Mechanics: Theory and Experiment , 2005(05):P05012, may 2005.[CCDGM18] Riccardo Capelli, Sergio Caracciolo, Andrea Di Gioacchino, and Enrico M. Malat-esta. Exact value for the average optimal cost of the bipartite traveling salesmanand two-factor problems in two dimensions. Phys. Rev. E , 98:030101, Sep 2018.[CDES19] Sergio Caracciolo, Matteo P D’Achille, Vittorio Erba, and Andrea Sportiello. Thedyck bound in the concave 1-dimensional random assignment model. arXiv preprintarXiv:1904.10867 , 2019.[CDGGM18] Sergio Caracciolo, Andrea Di Gioacchino, Marco Gherardi, and Enrico M. Malat-esta. Solution for a bipartite euclidean traveling-salesman problem in one dimen-sion. Phys. Rev. E , 97:052109, May 2018. IBLIOGRAPHY Phys. Rev. E , 96:042102, Oct 2017.[CGM18] Sergio Caracciolo, Andrea Di Gioacchino, and Enrico M Malatesta. Plastic numberand possible optimal solutions for an euclidean 2-matching in one dimension. Jour-nal of Statistical Mechanics: Theory and Experiment , 2018(8):083402, aug 2018.[CGMM19] Sergio Caracciolo, Andrea Di Gioacchino, Enrico M Malatesta, and Luca G Moli-nari. Selberg integrals in 1d random euclidean optimization problems. Journal ofStatistical Mechanics: Theory and Experiment , 2019(6):063401, jun 2019.[CGMV19] Sergio Caracciolo, Andrea Di Gioacchino, Enrico M Malatesta, and Carlo Vanoni.Average optimal cost for the euclidean TSP in one dimension. Journal of PhysicsA: Mathematical and Theoretical , 52(26):264003, jun 2019.[Cho08] Vicky Choi. Minor-embedding in adiabatic quantum computation: I. the parametersetting problem. Quantum Information Processing , 7(5):193–209, Oct 2008.[Cho11] Vicky Choi. Minor-embedding in adiabatic quantum computation: Ii. minor-universal graph design. Quantum Information Processing , 10(3):343–353, Jun 2011.[CLPS14] S. Caracciolo, C. Lucibello, G. Parisi, and G. Sicuro. Scaling hypothesis for theeuclidean bipartite matching problem. Phys. Rev. E , 90:012118, Jul 2014.[CNPV90] A Crisanti, S Nicolis, G Paladin, and A Vulpiani. Fluctuations of correlationfunctions in disordered spin systems. Journal of Physics A: Mathematical andGeneral , 23(13):3083–3093, jul 1990.[CO09] Amin Coja-Oghlan. A better algorithm for random k-sat. In Susanne Albers,Alberto Marchetti-Spaccamela, Yossi Matias, Sotiris Nikoletseas, and WolfgangThomas, editors, Automata, Languages and Programming , pages 292–303, Berlin,Heidelberg, 2009. Springer Berlin Heidelberg.[Coo71] Stephen A. Cook. The complexity of theorem-proving procedures. In Proceedingsof the Third Annual ACM Symposium on Theory of Computing , STOC ’71, pages151–158, New York, NY, USA, 1971. ACM.[CR92] V. Chvatal and B. Reed. Mick gets some (the odds are on his side) (satisfiability).In Proceedings., 33rd Annual Symposium on Foundations of Computer Science ,pages 620–627, Oct 1992.[CS92] A. Crisanti and H. J. Sommers. The spherical p-spin interaction spin glass model:the statics. Zeitschrift f¨ur Physik B Condensed Matter , 87(3):341–354, Oct 1992.[CS14] Sergio Caracciolo and Gabriele Sicuro. One-dimensional euclidean matching prob-lem: Exact solutions, correlation functions, and universality. Phys. Rev. E ,90:042112, Oct 2014.[CS15] Sergio Caracciolo and Gabriele Sicuro. Quadratic stochastic euclidean bipartitematching problem. Phys. Rev. Lett. , 115:230601, Dec 2015.[dAT78] J R L de Almeida and D J Thouless. Stability of the sherrington-kirkpatricksolution of a spin glass model. Journal of Physics A: Mathematical and General ,11(5):983–990, may 1978.40 BIBLIOGRAPHY [DBI + 16] Vasil S. Denchev, Sergio Boixo, Sergei V. Isakov, Nan Ding, Ryan Babbush, VadimSmelyanskiy, John Martinis, and Hartmut Neven. What is the computational valueof finite-range tunneling? Phys. Rev. X , 6:031015, Aug 2016.[DGRM] Andrea Di Gioacchino, Eleanor Rieffel, and Salvatore Mandr`a. Parameter settingfor classical and quantum annealing.[DJK11] Balzs Dezs, Alpr Jttner, and Pter Kovcs. Lemon an open source c++ graphtemplate library. Electronic Notes in Theoretical Computer Science , 264(5):23 –45, 2011. Proceedings of the Second Workshop on Generative Technologies (WGT)2010.[DM08] David S Dean and Satya N Majumdar. Extreme value statistics of eigenvalues ofgaussian random matrices. Physical Review E , 77(4):041108, 2008.[Dot05] Viktor Dotsenko. Introduction to the replica theory of disordered statistical systems ,volume 4. Cambridge University Press, 2005.[Dot11] V. Dotsenko. Replica solution of the random energy model. EPL (EurophysicsLetters) , 95(5):50006, aug 2011.[DY83] C De Dominicis and A P Young. Weighted averages and order parameters for theinfinite range ising spin glass. Journal of Physics A: Mathematical and General ,16(9):2063–2075, jun 1983.[EA75] S F Edwards and P W Anderson. Theory of spin glasses. Journal of Physics F:Metal Physics , 5(5):965–974, may 1975.[Edm65] Jack Edmonds. Paths, trees, and flowers. Canadian Journal of Mathematics ,17:449467, 1965.[EGR19] Vittorio Erba, Marco Gherardi, and Pietro Rotondo. Intrinsic dimension estimationfor locally undersampled data. arXiv preprint arXiv:1906.07670 , 2019.[EK03] Jack Edmonds and Richard M. Karp. Theoretical Improvements in AlgorithmicEfficiency for Network Flow Problems , pages 31–33. Springer Berlin Heidelberg,Berlin, Heidelberg, 2003.[Ell07] Richard S Ellis. Entropy, large deviations, and statistical mechanics . Springer,2007.[EPR35] A. Einstein, B. Podolsky, and N. Rosen. Can quantum-mechanical description ofphysical reality be considered complete? Phys. Rev. , 47:777–780, May 1935.[Fey82] Richard P. Feynman. Simulating physics with computers. International Journalof Theoretical Physics , 21(6):467–488, Jun 1982.[FF98] F F Ferreira and J F Fontanari. Probabilistic analysis of the number partitioningproblem. Journal of Physics A: Mathematical and General , 31(15):3417–3428, apr1998.[FFM07] M. Fedrigo, F. Flandoli, and F. Morandin. A Large Deviation Principle for thefree energy of random Gibbs measures with application to the REM. Annali diMatematica Pura ed Applicata , 186(3):381–417, Jul 2007. IBLIOGRAPHY arXiv:quant-ph/0201031 , 2002.[FGG14a] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximateoptimization algorithm. arXiv preprint arXiv:1411.4028 , 2014.[FGG14b] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximateoptimization algorithm applied to a bounded occurrence constraint problem. arXivpreprint arXiv:1412.6062 , 2014.[FGGS00] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser. Quantumcomputation by adiabatic evolution. arXiv preprint quant-ph/0001106 , 2000.[FH16] Edward Farhi and Aram W Harrow. Quantum supremacy through the quantumapproximate optimization algorithm. arXiv preprint arXiv:1602.07674 , 2016.[GD89] E. Gardner and B Derrida. The probability distribution of the partition functionof the random energy model. Journal of Physics A: Mathematical and General ,22(12):1975, 1989.[GMKZ19] Sebastian Goldt, Marc M´ezard, Florent Krzakala, and Lenka Zdeborov´a. Modellingthe influence of data structure on learning in neural networks. arXiv preprintarXiv:1909.11500 , 2019.[Goe90] A. Goerdt. A treshold for unsatisfiability. In I. M. Havel and V. Koubek, editors, Proceedings Of the 17th international Symposium on the Mathematical Foundationof Computer Science , pages 264–274. Springer-Verlag, 1990.[Gro97] Lov K. Grover. Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev. Lett. , 79:325–328, Jul 1997.[GS13] A Goetschy and SE Skipetrov. Euclidean random matrices and their applicationsin physics. arXiv preprint arXiv:1303.2880 , 2013.[GT02] Francesco Guerra and Fabio Lucio Toninelli. The thermodynamic limit in meanfield spin glass models. Communications in Mathematical Physics , 230(1):71–79,Sep 2002.[GT17] A. Garca and J. Tejel. Polynomially solvable cases of the bipartite traveling sales-man problem. European Journal of Operational Research , 257(2):429–438, 2017.[Gue03] Francesco Guerra. Broken replica symmetry bounds in the mean field spin glassmodel. Communications in Mathematical Physics , 233(1):1–12, Feb 2003.[GW96a] Ian P Gent and Toby Walsh. Phase transitions and annealed theories: Numberpartitioning as a case study. In ECAI , pages 170–174. PITMAN, 1996.[GW96b] Ian P Gent and Toby Walsh. The tsp phase transition. Artificial Intelligence ,88(1-2):349–358, 1996.[Hal95] John H Halton. The shoelace problem. The Mathematical Intelligencer , 17(4):36–41, Dec 1995.[HBM98] J´erˆome Houdayer, J.H. Boutet de Monvel, and O.C. Martin. Comparing mean fieldand Euclidean matching problems. The European Physical Journal B , 6(3):383–393, 1998.42 BIBLIOGRAPHY [JG79] David S Johnson and Michael R Garey. Computers and intractability: A guide tothe theory of NP-completeness , volume 1. WH Freeman San Francisco, 1979.[JL03] Richard Jozsa and Noah Linden. On the role of entanglement in quantum-computational speed-up. Proceedings of the Royal Society of London. Series A:Mathematical, Physical and Engineering Sciences , 459(2036):2011–2032, 2003.[JRW17] Zhang Jiang, Eleanor G. Rieffel, and Zhihui Wang. Near-optimal quantum circuitfor grover’s unstructured search using a transverse field. Phys. Rev. A , 95:062317,Jun 2017.[Kar72] Richard M. Karp. Reducibility among Combinatorial Problems , pages 85–103.Springer US, Boston, MA, 1972.[KGV83] Scott Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by Simulated An-nealing. Science , 220(4598):671–680, 1983.[KHA14] Helmut G. Katzgraber, Firas Hamze, and Ruben S. Andrist. Glassy chimeras couldbe blind to quantum speedup: Designing better benchmarks for quantum annealingmachines. Phys. Rev. X , 4:021008, Apr 2014.[KLM + 07] Phillip Kaye, Raymond Laflamme, Michele Mosca, et al. An introduction to quan-tum computing . Oxford University Press, 2007.[KM89] W. Krauth and Marc M´ezard. The Cavity Method and the Travelling-SalesmanProblem. Europhysics Letters , 8(3):213–218, 1989.[KN98] Tadashi Kadowaki and Hidetoshi Nishimori. Quantum annealing in the transverseising model. Phys. Rev. E , 58:5355–5363, Nov 1998.[Kon83] I. Kondor. Parisi’s mean-field solution for spin glasses as an analytic continuation inthe replica number. Journal of Physics A: Mathematical and General , 16(4):L127,1983.[KS85] R. M. Karp and M. Steele. The Travelling Salesman Problem . John Wiley andSons, New York, 1985.[KS94] Scott Kirkpatrick and Bart Selman. Critical behavior in the satisfiability of randomboolean expressions. Science , 264(5163):1297–1301, 1994.[KTJ76] J. M. Kosterlitz, D. J. Thouless, and Raymund C. Jones. Spherical model of aspin-glass. Phys. Rev. Lett. , 36:1217–1220, May 1976.[Kuh55] H. W. Kuhn. The hungarian method for the assignment problem. Naval ResearchLogistics Quarterly , 2(12):83–97, 1955.[LP09] L´aszl´o Lov´asz and Michael D Plummer. Matching theory , volume 367. AmericanMathematical Soc., 2009.[LPS17] Carlo Lucibello, Giorgio Parisi, and Gabriele Sicuro. One-loop diagrams in therandom euclidean matching problem. Phys. Rev. E , 95:012302, Jan 2017.[LSKL85] E.L. Lawler, D.B. Shmoys, A.H.G.R. Kan, and J.K. Lenstra. The Traveling Sales-man Problem . John Wiley & Sons, Incorporated, 1985. IBLIOGRAPHY Frontiers in Physics , 2:5,2014.[MAL16] Siddharth Muthukrishnan, Tameem Albash, and Daniel A. Lidar. Tunneling andspeedup in quantum optimization for permutation-symmetric problems. Phys. Rev.X , 6:031010, Jul 2016.[Mal19] Enrico M Malatesta. Random combinatorial optimization problems: Mean fieldand finite-dimensional results. arXiv preprint arXiv:1902.00455 , 2019.[McC99] Robert J. McCann. Exact solutions to the transportation problem on the line. Proc. R. Soc. A: Math., Phys. Eng. Sci. , 455:1341–1380, 1999.[Men32] Karl Menger. Das botenproblem. Ergebnisse eines mathematischen kolloquiums ,2:11–12, 1932.[Mer98] Stephan Mertens. Phase transition in the number partitioning problem. Phys. Rev.Lett. , 81:4281–4284, Nov 1998.[Mer01] Stephan Mertens. A physicist’s approach to number partitioning. TheoreticalComputer Science , 265(1):79 – 108, 2001. Phase Transitions in CombinatorialProblems.[Mer07] N David Mermin. Quantum computer science: an introduction . Cambridge Uni-versity Press, 2007.[Mis96] Michal Misiurewicz. Lacing irregular shoes. ”The Mathematical Intelligencer” ,18(4):32–34, 1996.[MK18] Salvatore Mandr`a and Helmut G Katzgraber. A deceptive step towards quantumspeedup detection. Quantum Science and Technology , 3(4):04LT01, Jul 2018.[MM11] Cristopher Moore and Stephan Mertens. The nature of computation . OUP Oxford,2011.[MP85] Marc M´ezard and Giorgio Parisi. Replicas and optimization. Journal de PhysiqueLettres , 46(17):771–778, 1985.[MP86a] Marc M´ezard and Giorgio Parisi. A replica analysis of the travelling salesmanproblem. Journal de Physique , 47(1986):1285–1296, 1986.[MP86b] Marc M´ezard and Giorgio Parisi. Mean-field equations for the matching and thetravelling salesman problems. Europhysics Letters , 2(12):913–918, 1986.[MP88] Marc M´ezard and Giorgio Parisi. The Euclidean matching problem. Journal dePhysique , 49:2019–2025, 1988.[MPS19] Enrico M Malatesta, Giorgio Parisi, and Gabriele Sicuro. Fluctuations in therandom-link matching problem. arXiv preprint arXiv:1905.08529 , 2019.[MPV87] Marc M´ezard, Giorgio Parisi, and Miguel A. Virasoro. Spin Glass Theory andBeyond . Lecture Notes in Physics Series. World Scientific Publishing Company,Inc., 1987.[MPZ99] M. M´ezard, G. Parisi, and A. Zee. Spectra of euclidean random matrices. NuclearPhysics B , 559(3):689 – 701, 1999.44 BIBLIOGRAPHY [MS12] L Marohni´c and T Strmeˇcki. Plastic number: Construction and applications. ARSA (Advanced Research in Scientific Area) , 2012.[MV80] S. Micali and V. V. Vazirani. An o(v—v— c —e—) algoithm for finding maxi-mum matching in general graphs. In , pages 17–27, Oct 1980.[MZW + 16] Salvatore Mandr`a, Zheng Zhu, Wenlong Wang, Alejandro Perdomo-Ortiz, and Hel-mut G. Katzgraber. Strengths and weaknesses of weak-strong cluster problems: Adetailed overview of state-of-the-art classical heuristics versus quantum approaches. Phys. Rev. A , 94:022337, Aug 2016.[NC00] Michael A Nielsen and Isaac Chuang. Quantum computation and quantum infor-mation . Cambridge University Press, 2000.[NH08] Tetsuya Nakajima and Koji Hukushima. Large deviation property of free energyin p-body sherringtonkirkpatrick model. Journal of the Physical Society of Japan ,77(7):074718, 2008.[NH09] Tetsuya Nakajima and Koji Hukushima. Thermodynamic construction of a one-step replica-symmetry-breaking solution in finite-connectivity spin glasses. Phys.Rev. E , 80:011103, Jul 2009.[Nis01] Hidetoshi Nishimori. Statistical physics of spin glasses and information processing:an introduction . Number 111. Clarendon Press, 2001.[OK04] Kenzo Ogure and Yoshiyuki Kabashima. Exact analytic continuation with respectto the replica number in the discrete random energy model of finite system size. Progress of Theoretical Physics , 111(5):661–688, 2004.[OK09a] Kenzo Ogure and Yoshiyuki Kabashima. On analyticity with respect to the replicanumber in random energy models: I. an exact expression for the moment of thepartition function. Journal of Statistical Mechanics: Theory and Experiment ,2009(03):P03010, 2009.[OK09b] Kenzo Ogure and Yoshiyuki Kabashima. On analyticity with respect to the replicanumber in random energy models: Ii. zeros on the complex plane. Journal ofStatistical Mechanics: Theory and Experiment , 2009(05):P05011, 2009.[Orl85] Henri Orland. Mean-field theory for optimization problems. Le Journal de Physique- Lettres , 46(17), 1985.[Pad02] Richard Padovan. Dom Hans Van Der Laan and the Plastic Number. In KimWilliams and Jose Francisco Rodrigues, editors, Architecture and Mathematics ,volume Nexus IV, pages 181–193, Fucecchio (Florence), 2002. Kim Williams Books.[Pap77] Christos H Papadimitriou. The euclidean travelling salesman problem is np-complete. Theoretical computer science , 4(3):237–244, 1977.[Pap03] Christos H Papadimitriou. Computational complexity . John Wiley and Sons Ltd.,2003.[Par79a] G. Parisi. Infinite number of order parameters for spin-glasses. Phys. Rev. Lett. ,43:1754–1756, Dec 1979. IBLIOGRAPHY Physics Letters A , 73(3):203– 205, 1979.[Par80a] G Parisi. The order parameter for spin glasses: a function on the interval 0-1. Journal of Physics A: Mathematical and General , 13(3):1101–1112, mar 1980.[Par80b] G Parisi. A sequence of approximated solutions to the s-k model for spin glasses. Journal of Physics A: Mathematical and General , 13(4):L115–L121, apr 1980.[Par83] Giorgio Parisi. Order parameter for spin-glasses. Phys. Rev. Lett. , 50:1946–1948,Jun 1983.[PDGR19] Mauro Pastore, Andrea Di Gioacchino, and Pietro Rotondo. Large deviations of thefree energy in the p-spin glass spherical model. arXiv preprint arXiv:1909.06196 ,2019.[Pel11] Luca Peliti. Statistical mechanics in a nutshell , volume 10. Princeton UniversityPress, 2011.[PM96] Allon G. Percus and Olivier C. Martin. Finite size and dimensional dependencein the Euclidean traveling salesman problem. Physical Review Letters , 76(8):1188–1191, feb 1996.[Pol02] Burkard Polster. Mathematics: What is the best way to lace your shoes? Nature ,420(6915):476–476, 2002.[PR91] M. Padberg and G. Rinaldi. A branch-and-cut algorithm for the resolution oflarge-scale symmetric traveling salesman problems. SIAM Review , 33(1):60–100,1991.[PR08] Giorgio Parisi and Tommaso Rizzo. Large deviations in the free energy of mean-field spin glasses. Phys. Rev. Lett. , 101:117205, Sep 2008.[PR09] Giorgio Parisi and Tommaso Rizzo. Phase diagram and large deviations in the freeenergy of mean-field spin glasses. Phys. Rev. B , 79:134205, Apr 2009.[PR10a] Giorgio Parisi and Tommaso Rizzo. Large deviations of the free energy in di-luted mean-field spin-glass. Journal of Physics A: Mathematical and Theoretical ,43(4):045001, 2010.[PR10b] Giorgio Parisi and Tommaso Rizzo. Universality and deviations in disorderedsystems. Phys. Rev. B , 81:094201, Mar 2010.[Ram81] R. Rammal. PhD thesis (unpublished), Grenoble University , 1981.[RC02] J´er´emie Roland and Nicolas J. Cerf. Quantum search by local adiabatic evolution. Phys. Rev. A , 65:042308, Mar 2002.[Rei94] Gerhard Reinelt. The traveling salesman: computational solutions for TSP appli-cations . Springer-Verlag, 1994.[Riv05] Olivier Rivoire. The cavity method for large deviations. Journal of StatisticalMechanics: Theory and Experiment , 2005(07):P07004, 2005.[RLG19] Pietro Rotondo, Marco Cosentino Lagomarsino, and Marco Gherardi. Countingthe learnable functions of structured data. arXiv preprint arXiv:1903.12021 , 2019.46 BIBLIOGRAPHY [RP11] Eleanor G Rieffel and Wolfgang H Polak. Quantum computing: A gentle introduc-tion . MIT Press, 2011.[RRG14] Siamak Ravanbakhsh, Reihaneh Rabbany, and Russell Greiner. Augmentative Mes-sage Passing for Traveling Salesman Problem and Graph Partitioning. Advancesin Neural Information Processing Systems , 1(January):289–297, jun 2014.[RWJ + 14] Troels F. Rønnow, Zhihui Wang, Joshua Job, Sergio Boixo, Sergei V. Isakov, DavidWecker, John M. Martinis, Daniel A. Lidar, and Matthias Troyer. Defining anddetecting quantum speedup. Science , 345(6195):420–424, 2014.[Sel44] Atle Selberg. Remarks on a multiple integral. Norsk Mat. Tidsskr. , 26(71–78),1944.[Sho99] P. Shor. Polynomial-time algorithms for prime factorization and discrete logarithmson a quantum computer. SIAM Review , 41(2):303–332, 1999.[Sic16] Gabriele Sicuro. The Euclidean matching problem . Springer, 2016.[SK75] David Sherrington and Scott Kirkpatrick. Solvable model of a spin-glass. Phys.Rev. Lett. , 35:1792–1796, Dec 1975.[Sou86] Nicolas Sourlas. Statistical Mechanics and the Travelling Salesman Problem. Eu-rophysics Letters , 2(12):919–923, 1986.[Ste81] J. Michael Steele. Subadditive euclidean functionals and nonlinear growth in geo-metric probability. The Annals of Probability , 9(3):365–376, 1981.[Ste97] J Michael Steele. Probability theory and combinatorial optimization , volume 69.Siam, 1997.[Tal92] Michel Talagrand. Matching random samples in many dimensions. The Annals ofApplied Probability , 2(4):846–856, 1992.[TD81] Toulouse, G. and Derrida, B. Free energy probability distribution in the sk spinglass model. In Proceedings of the Sixth Symposium on Theoretical Physics , page217, Rio de Janeiro, Brazil, 1981.[TFI89] Toshijiro Tanaka, Hirokazu Fujisaka, and Masayoshi Inoue. Free-energy fluctua-tions in a one-dimensional random ising model. Phys. Rev. A , 39:3170–3172, Mar1989.[Tou09] Hugo Touchette. The large deviation approach to statistical mechanics. PhysicsReports , 478(1):1 – 69, 2009.[VCC + 14] Angelo Vulpiani, Fabio Cecconi, Massimo Cencini, Andrea Puglisi, and DavideVergni. Large Deviations in Physics: The Legacy of the Law of Large Numbers .Springer, 2014.[vHP79] J. L. van Hemmen and R. G. Palmer. The replica method and solvable spin glassmodel. Journal of Physics A: Mathematical and General , 12(4):563, 1979.[VM84] Jean Vannimenus and Marc M´ezard. On the statistical mechanics of optimiza-tion problems of the travelling salesman type. Journal de Physique Lettres ,45(24):L1145–L1153, 1984. IBLIOGRAPHY + 15] Davide Venturelli, Salvatore Mandr`a, Sergey Knysh, Bryan O’Gorman, RupakBiswas, and Vadim Smelyanskiy. Quantum optimization of fully connected spinglasses. Phys. Rev. X , 5:031040, Sep 2015.[Was10] Johan Wastlund. The mean field traveling salesman and related problems. ActaMathematica , 204(1):91–150, 2010.[WHT16] Dave Wecker, Matthew B. Hastings, and Matthias Troyer. Training a quantumoptimizer. Phys. Rev. A , 94:022309, Aug 2016.[Yuk06] Joseph E Yukich. Probability theory of classical Euclidean optimization problems .Springer, 2006.[Zal99] Christof Zalka. Grover’s quantum searching algorithm is optimal. Phys. Rev. A ,60:2746–2751, Oct 1999.[Zde09] Lenka Zdeborov´a. Statistical physics of hard optimization problems.