[PDF] Performance Analysis on Evolutionary Algorithms for the Minimum Label Spanning Tree Problem

Abstract

Some experimental investigations have shown that evolutionary algorithms (EAs) are efficient for the minimum label spanning tree (MLST) problem. However, we know little about that in theory. As one step towards this issue, we theoretically analyze the performances of the (1+1) EA, a simple version of EAs, and a multi-objective evolutionary algorithm called GSEMO on the MLST problem. We reveal that for the MLST b problem the (1+1) EA and GSEMO achieve a b+1 2 -approximation ratio in expected polynomial times of n the number of nodes and k the number of labels. We also show that GSEMO achieves a (2ln(n)) -approximation ratio for the MLST problem in expected polynomial time of n and k . At the same time, we show that the (1+1) EA and GSEMO outperform local search algorithms on three instances of the MLST problem. We also construct an instance on which GSEMO outperforms the (1+1) EA.

Full PDF

11 Performance Analysis on Evolutionary Algorithmsfor the Minimum Label Spanning Tree Problem

Xinsheng Lai, Yuren Zhou, Jun He and Jun Zhang

Abstract

Some experimental investigations have shown that evolutionary algorithms (EAs) are efﬁcient for the minimum label spanningtree (MLST) problem. However, we know little about that in theory. As one step towards this issue, we theoretically analyzethe performances of the (1+1) EA, a simple version of EAs, and a multi-objective evolutionary algorithm called GSEMO onthe MLST problem. We reveal that for the MLST b problem the (1+1) EA and GSEMO achieve a b +12 -approximation ratio inexpected polynomial times of n the number of nodes and k the number of labels. We also show that GSEMO achieves a (2 ln ( n )) -approximation ratio for the MLST problem in expected polynomial time of n and k . At the same time, we show that the (1+1)EA and GSEMO outperform local search algorithms on three instances of the MLST problem. We also construct an instance onwhich GSEMO outperforms the (1+1) EA. Index Terms

Evolutionary algorithm; time complexity; approximation ratio; minimum label spanning tree; multi-objective

I. I

NTRODUCTION

The minimum label spanning tree (MLST) problem is an issue arising from practice, which seeks a spanning tree with theminimum number of labels in a connected undirected graph with labeled edges. For example, we want to ﬁnd a spanning treethat uses the minimum number of types of communication channels in a communication networks connected with differenttypes of channels. The MLST problem, proposed by Chang and Leu, is proved to be NP-hard [1].For this problem, Chang and Leu have proposed two heuristic algorithms. One is the edge replacement algorithm, ERA forshort, the other is the maximum vertex covering algorithm, MVCA for short. Their experimental results showed that ERA isnot stable, and MVCA is more efﬁcient.The genetic algorithm, belonging to the larger class of EAs, is a general purpose optimization algorithm [2]–[4] with astrong globally searching capacity [5]. So, Xiong, Golden, and Wasil proposed a one-parameter genetic algorithm for the MLSTproblem. The experimental results on extensive instances generated randomly showed that the genetic algorithm outperformsMVCA [6]. Nummela and Julstrom also proposed an efﬁcient genetic algorithm for solving the MLST problem [7].Besides, many methods recently have been proposed for solving this NP-hard problem. Consoli et al. proposed a hybridlocal search combining variable neighborhood search and simulated annealing [8]. Chwatal and Raidl presented exact methodsincluding branch-and-cut and branch-and-cut-and-price [9]. Cerulli et al. utilized several metaheuristic methods for this problem,such as simulated annealing, reactive tabu search, the pilot method, and variable neighborhood search [10]. Consoli et al. stillproposed a greedy randomized adaptive search procedure and a variable neighborhood search for solving the MLST problem[11].Since both ERA and MVCA are two original heuristic algorithms for the MLST problem, the worst performance analysisof these two algorithms, especially MVCA, is a hot research topic in recent years. Krumke and Wirth proved that MVCAhas a logarithmic performance guarantee of ln ( n ) + 1 , where n is the number of nodes in the input graph, and presentedan instance to show that ERA might perform as badly as possible [12]. Wan, Chen, and Xu further proved that MVCA has abetter performance guarantee of ln ( n −

1) + 1 [13]. Xiong, Golden, and Wasil proved another bound on the worst performanceof MVCA for MLST b problems, i.e., H b = (cid:80) bi =1 1 i , where the subscript b denotes that each label appears at most b times,and also called the maximum frequency of the labels [14].The performance of MVCA on the MLST problem has been deeply investigated. However, there is still no theoreticalanalysis work on EAs’ performance for the MLST problem.In fact, the theoretical analysis of EAs’ performance on fundamental optimization problems has received much attention frommany researchers. During the past few years theoretical investigations about EAs focused on the runtime or(and) the probabilityof EAs for ﬁnding globally optimal solutions of fundamental optimization problems or their variants. These problems includeplateaus of constant ﬁtness [15], linear function problems [16]–[18], minimum cut problems [19], satisﬁability problems [20],minimum spanning tree problems [21], Eulerian cycle problems [22], Euclidean traveling salesperson problems [23], etc. X. Lai and Y. Zhou are with the School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China (e-mail:[email protected]). Corresponding author: Y. ZhouX. Lai is also with the School of Mathematics and Computer Science, Shangrao Normal University, Shangrao 334001, China.J. He is with the Department of Computer Science, Aberystwyth University, Aberystwyth, SY23 3DB, UK.J. Zhang is with the Department of Computer Science, Sun Yat-Sen University, Guangzhou 510275, China, with the Key Laboratory of Digital Life, Ministryof Education, China, and also with the Key Laboratory of Software Technology, Education Department of Guangdong Province, Chin a r X i v : . [ c s . N E ] S e p Nevertheless, since many fundamental optimization problems, including the MLST problem, are NP-hard, no polynomial-time algorithm can be expected to solve them unless P = N P . Fortunately, we usually only ask satisfying solutions to suchNP-hard problems in practice. Thus, we are interested in whether an approximation solution with a given satisfying quality canbe efﬁciently obtained. In fact, the approximation performance analysis of randomized heuristics, including EAs, on NP-hardproblems receives many attentions.Giel and Wegener proved that the (1+1) EA can ﬁnd a (1 + ε ) -approximation solution in expected runtime O ( m (cid:100) /ε (cid:101) ) , andconcluded that EAs are good approximation algorithms for the maximum matching problem [24].Subsequently, Oliveto, He, and Yao found that for minimum vertex cover problems the (1+1) EA may ﬁnd arbitrary badapproximation solutions on some instances, but can efﬁciently ﬁnd the minimum cover of them by using a restart strategy [25].Friedrich et al. proved that the (1+1) EA may ﬁnd almost arbitrarily bad approximation solution for minimum vertex coverproblems and minimum set cover problems as well [26]. Witt proved that in the worst case the (1+1) EA and the randomizedlocal search algorithm need an expected runtime O ( n ) to produce a -approximation solution to the partition problem [27].On the approximation performance of multi-objective EAs, Friedrich et al. revealed that the multi-objective EA efﬁcientlyﬁnds an ( ln ( n )) -approximation solution to the minimum set cover problem. Neumann and Reichel found that multi-objectiveEAs can ﬁnd a k -approximation solution for the minimum multicuts problem in expected polynomial time [28]. Recently,Yu, Yao, and Zhou studied the approximation performance of SEIP, a simple evolutionary algorithm with isolated population,on set cover problems. They found that SEIP can efﬁciently obtain an H n -approximation solution for unbounded set coverproblems, and an ( H n − k − k ) -approximation solution for k -set cover problems as well [29].In this paper, we concentrate on the performance analysis of the (1+1) EA and GSEMO for the MLST problem. We analyzethe approximation performances of the (1+1) EA and GSEMO on the MLST problem. For the MLST b problem, We provethat the (1+1) EA and GSEMO are b +12 -approximation algorithms. We also reveal that GSEMO can efﬁciently achieve a (2 ln ( n )) -approximation ratio for the MLST problem. Though the MLST problem is NP-hard, we show that on three instancesthe (1+1) EA and GSEMO efﬁciently ﬁnds the global optima, while local search algorithms may be trapped in local optima.Meanwhile, we construct an additional instance where GSEMO outperforms the (1+1) EA.The rest of this paper is organised as follows. The next section describes the MLST problem, and the algorithms consideredin this paper. Section III analyzes the approximation performances of the (1+1) EA and GSEMO on the MLST problem, whilesection IV analyzes the performances of the (1+1) EA and GSEMO on four instances. Finally, the section V presents theconclusions. II. T HE MLST

PROBLEM AND ALGORITHMS

First of all, we give the concept of spanning subgraph.

Deﬁnition 1: ( Spanning subgraph ) Let G = ( V, E ) and H = ( V (cid:48) , E (cid:48) ) be two graphs, where V and V (cid:48) are, respectively,the sets of nodes of G and G (cid:48) , E and E (cid:48) are, respectively, the sets of edges of G and G (cid:48) , if V (cid:48) = V and E (cid:48) ⊂ E , then H isa spanning subgraph of G .Let G = ( V, E, L ) be a connected undirected graph, where V , E , and L = { , , . . . , k } are the set of nodes, the set ofedges, and the set of labels, respectively, | V | = n , | E | = m , and clearly | L | = k , each edge associates with a label by afunction l : E → N . Thus, each edge e ∈ E has an unique label l ( e ) ∈ L . The MLST problem is to seek a spanning tree withthe minimum number of labels in the input graph G . If the maximum frequency of the labels is b , then we denote such anMLST problem by MLST b . Clearly, the MLST b problem is a special case of the MLST problem.Our goal in this paper is to seek a connected spanning subgraph with the minimum number of labels rather than a spanningtree with the minimum number of labels, since any spanning tree contained in such a spanning subgraph is a MLST. This isan alternative formulation of the MLST problem which is also adopted in papers [6], [7].We encode a solution as a bit string X = ( x , . . . , x k )( ∈ { , } k ) which is used in [6], where bit x i (1 ≤ i ≤ k ) correspondsto label i . If x i = 1( i = 1 , , . . . , k ) , then label i is selected, otherwise it is not. Thus, a bit string X represents a label subset,and | X | represents the number of labels contained in X .We consider the spanning subgraph H ( X ) of G , where H ( X ) is a spanning subgraph restricted to edges with labels thatthe corresponding bits in X are set to 1. We call a solution X such that H ( X ) is a connected spanning subgraph a feasiblesolution. A feasible solution with the minimum number of labels is a globally optimal solution.For solving the MLST problem, the (1+1) EA uses a ﬁtness function, which is deﬁned as f it ( X ) = ( c ( H ( X )) − ∗ k + | X | , (1)where c ( H ( X )) is the number of connected components in H ( X ) , k is the total number of labels in L , and | X | = (cid:80) ki =1 x i ,i.e, the number of labels conained in X and also used in H ( X ) .The ﬁtness function should be minimized. The ﬁrst part of it is to make sure that H ( X ) is a connected spanning subgraph,and the second part is to make sure that the number of labels in the connected spanning subgraph is minimized.For a feasible solution X , since the number of connected components of H ( X ) is 1, the ﬁtness value equals to the numberof labels contained in it. We also deﬁne the ﬁtness vector for GSEMO as a vector ( c ( H ( X ) , | X | ) , where c ( H ( X )) and | X | are simultaneouslyminimized by GSEMO.The following algorithms are those considered in this paper. Algorithm 1: The (1+1) EA for the MLST problem

Begin

02: Initialize a solution X ∈ { , } k uniformly at random;03: While termination criterion is not fulﬁlled04: Obtain an offspring Y by ﬂipping each bit in X withprobability k ;05: If f it ( Y ) < f it ( X ) then X := Y ;06: End while

End

The (1+1) EA starts with an arbitrary solution, and repeatedly uses mutation operator to generate an offspring solution fromthe current one. If the offspring solution is strictly better than the current one, then the (1+1) EA uses it to replace the currentsolution.Another algorithm proposed by Br¨uggemanna, Monnot, and Woeginger is called the local search algorithm with the –switchneighborhood. We now describe some concepts about it. Deﬁnition 2: [30]( h -switch neighborhood ) Given an integer h ≥ , let X and X be two feasible solutions for someinstance of the MLST problem. We say that X is in h -switch neighborhood of X , denoted by X ∈ h - SW IT CH ( X ) , ifand only if | X − X | ≤ h and | X − X | ≤ h. (2)In other words, X ∈ h - SW IT CH ( X ) means that X can be derived from X by ﬁrst removing at most h labels from X and then adding at most h labels to it. The local search algorithm with the -switch neighborhood :In the algorithm 1, if the initial solution X is an arbitrary feasible solution, and the offspring Y is selected from the -switchneighborhood of X , then it is the local search algorithm with the -switch neighborhood [30].GSEMO has been investigated on covering problems [26], pseudo-Boolean functions [31], [32], and minimum spanning treeproblems [33], [34]. It is described as follows. Algorithm 2: GSEMO for the MLST problem

Begin

02: Initialize a solution X ∈ { , } k uniformly at random;03: P ← { X } ;04: While termination criterion is not fulﬁlled05: Choose a solution X from P uniformly at random;06: Obtain an offspring Y by ﬂipping each bit in X withprobability k ;07: If Y is not dominated by ∀ X ∈ P then Q := { X | X ∈ P, and Y dominates X } ;09: P ← P ∪ { Y } \ Q ;10: End if

End while

End

In algorithm 2, P is a population used to preserve those solutions which can not be dominated by any other from thepopulation. The concept of domination is deﬁned as follows.Suppose the ﬁtness vectors of solutions X and Y are ( c ( H ( X )) , | X | ) and ( c ( H ( Y )) , | Y | ) , respectively. We say that X dominates Y , if one of the following two conditions is satisﬁed:(1) c ( H ( X )) < c ( H ( Y )) and | X | ≤ | Y | ;(2) c ( H ( X )) ≤ c ( H ( Y )) and | X | < | Y | .For the sake of completeness, another two greedy algorithms are included.The ﬁrst one is the modiﬁed MVCA . It starts with a solution containing no labels, and each time selects a label such thatwhen this label is chosen the decrease in the number of connected components is the largest. Algorithm 3: The modiﬁed MVCA for the MLST problem [14]

Input:

A given connected undirected graph G = ( V, E, L ) , L = { , . . . , k } .01: Let C be the set of used labels, C := ∅ ;02: Repeat

03: Let H be the spanning subgraph of G restricted to edges with labels from C ;04: For all i ∈ L \ C do

05: Determine the number of connected componentswhen inserting all edges labeled by i in H ;06: End for

07: Choose label i with the smallest resulting number ofconnected components: C := C ∪ { i } ;08: Until H is connected. Output: H In algorithm 3, if we contract each connected component in H to a supernode after step 3, then we obtain the second greedyalgorithm which is investigated in [12], and we call it the modiﬁed MVCA with contraction in this paper.III. A PPROXIMATION PERFORMANCES OF THE (1+1) EA

AND

GSEMO

ON THE

MLST

PROBLEM

The following is the concept of approximation ratio (solution). Given a minimization problem P and an algorithm A , if foran instance I of P , the value of the best solution obtained in polynomial time by A is A ( I ) , and sup I ∈ P A ( I ) OP T ( I ) = r , where OP T ( I ) is the value of the optimal solution of I , then we say that A achieves an r -approximation ratio (solution) for P .Although the MLST problem is NP-hard, we reveal that the (1+1) EA and GSEMO guarantee to achieve an approximationratio for the MLST b problem in expected polynomial times of n and k , and that GSEMO guarantees to obtain an approximationratio for the MLST problem in expected polynomial time of n and k . A. The approximation guarantees of the (1+1) EA and GSEMO on the MLST b problem To reveal that the (1+1) EA and GSEMO guarantee to achieve a b +12 -approximation ratio in expected polynomial time of n the number of nodes and k the number of labels, we ﬁrst prove that the (1+1) EA and GSEMO ﬁnd a feasible solutionstarting from any initial solution in expected polynomial time of n and k , then prove that starting from any feasible solutionthe (1+1) EA and GSEMO ﬁnd a b +12 -approximation solution in expected polynomial time of k by simulating the followingresult proved by Br¨uggemann, Monnot, and Woeginger [30]. Theorem 1: If b ≥ , then for any instance of the MLST b problem, the local search algorithm with the -switch neighborhoodcan ﬁnd a local optimum with at most OP T · b +12 labels, where OP T is the number of labels in the global optimum.We partition all feasible solutions into two disjoint sets. One is S = { X | X ∈ { , } k , X is a feasible solution, | X | ≤ OP T · b +12 } , the other is S = { X | X ∈ { , } k , X is a feasible solution, | X | > OP T · b +12 } .From theorem 1, we derive a property with respect to the -switch neighborhood for MLST b problems. Corollary 1: If b ≥ , let G = ( V, E, L ) be an instance of MLST b , which has a minimum label spanning tree with OP T labels. If X is a feasible solution, and X ∈ S , then there must exist a feasible solution X (cid:48) ∈ - SW IT CH ( X ) whose ﬁtnessis 1 or 2 less than that of X .We now prove that starting with an arbitrary initial solution for an instance G = ( V, E, L ) of MLST b , the (1+1) EA canefﬁciently ﬁnd a feasible solution. Lemma 1:

Given an instance G = ( V, E, L ) of MLST b , where | V | = n , and | L | = k , the (1+1) EA starting from an arbitraryinitial solution ﬁnds a feasible solution in O ( nk ) for G . Proof:

According to the ﬁtness function (1), during the optimization process of the (1+1) EA, the number of connectedcomponents will never be increased.For any solution X , if it is not a feasible solution, then the number of connected components of the spanning subgraph H ( X ) is greater than 1. Since the input graph is connected, there must exist a label such that when it is added to the currentsolution X the number of connected components will be decreased by at least one. The probability of adding this label to X is k (1 − k ) k − ≥ ek , which implies that in expected time O ( k ) the number of connected components will be decreased byat least one.Note that there are at most n connected components. Hence, we obtain the upper bound of O ( nk ) .Then, we prove that starting with an arbitrary feasible solution on an instance G = ( V, E, L ) of MLST b , the (1+1) EA canefﬁciently ﬁnd a b +12 -approximation solution. Lemma 2:

Given an instance G = ( V, E, L ) of MLST b , where | V | = n , | L | = k , and b ≥ , the expected time for the(1+1) EA starting from an arbitrary feasible solution to ﬁnd a local optimum with at most OP T · b +12 labels for G is O ( k ) . Proof:

By corollary 1, if a feasible solution X ∈ S , there must exist a feasible solution X (cid:48) ∈ - SW IT CH ( X ) whoseﬁtness is 1 or 2 less than that of X . So, replacing X with X (cid:48) decreases the ﬁtness value by at least 1. Since a feasible solutionbelonging to S has at most k labels, then after at most k − OP T · b +12 such replacing steps a feasible solution belonging to S will be found.Now, we calculate the expected time for the (1+1) EA to ﬁnd X (cid:48) . Since X (cid:48) ∈ - SW IT CH ( X ) and | X (cid:48) | < | X | . Thereexist three cases. The ﬁrst is that X (cid:48) is obtained by removing one exact label from X . The second is that X (cid:48) is obtained by removing two exact labels from X . The third is that X (cid:48) is obtained by removing two exact labels from X and adding oneexact label to it.Obviously, the worst case is the third one, since in this case three bits of X must be simultaneously ﬂipped by the (1+1)EA. In this case, the probability that for the (1+1) EA to ﬁnd X (cid:48) is k (1 − k ) k − ≥ ek . So, the expected time for the (1+1)EA to ﬁnd a feasible solution X (cid:48) ∈ - SW IT CH ( X ) is O ( k ) , which means that the expected time for the (1+1) EA toreduce the ﬁtness value by at least one is O ( k ) .Therefore, the expected time for the (1+1) EA starting from an arbitrary feasible solution to ﬁnd a local optimum with atmost OP T · b +12 labels is O (( k − OP T · b +12 ) k ) = O ( k ) , as OP T · b +12 ≤ k .Combining Lemma 1 and 2, we obtain the following Theorem. Theorem 2:

Given an instance G = ( V, E, L ) of MLST b , where | V | = n , | L | = k , and b ≥ , the (1+1) EA starting withany initial solution ﬁnds a b +12 -approximation solution in expected time O (( n + k ) k ) .It has been proved that the (1+1) EA efﬁciently achieves a b +12 -approximation ratio for the MLST b problem. As we willsee below, GSEMO can also efﬁciently achieve this approximation ratio. Theorem 3:

Given an instance G = ( V, E, L ) of MLST b , where | V | = n , | L | = k , and b ≥ , GSEMO starting with anyinitial solution ﬁnds a b +12 -approximation solution in expected time O ( nk + k ) . Proof:

Starting with an arbitrary solution X with ﬁtness vector ( c ( H ( X )) , | X | ) , if c ( H ( X )) > , then there exists alabel l such that when it is added the number of connected components will be reduced by at lease one, as the input graph isconnected. The probability of selecting X to mutate is Ω( k ) , as the population size is O ( k ) , and the probability of ﬂippingthe bit corresponding to label l is k (1 − k ) k − = Ω( k ) , so a solution X (cid:48) with ﬁtness vector ( c ( H ( X (cid:48) )) , | X | + 1) will beincluded in expected time O ( k ) , where c ( H ( X (cid:48) )) < c ( H ( X )) .Since there is at most n connected components in the spanning subgraph induced by any solution, a feasible solution willbe included in expected time O ( nk ) .Now a feasible solution X is included in the population, if the number of labels contained in the feasible solution is greaterthan b +12 · OP T , then according to Corollary 1 there exists a feasible solution X (cid:48) ∈ - SW IT CH ( X ) such that | X (cid:48) | is atleast 1 less than | X | . If such a solution X (cid:48) is found, it will replace X . According to the proof of Lemma , the expected timeto ﬁnd such a solution X (cid:48) is O ( k ) . Combining the expected time to select such a solution X is O ( k ) , such a solution X (cid:48) will be included in expected time O ( k ) .Therefore, a b +12 -approximation solution will be included in expected time O (( k − b +12 · OP T ) k ) = O ( k ) once a feasiblesolution is found.Hence, GSEMO starting with any initial solution will ﬁnd a b +12 -approximation solution in expected time O ( nk + k ) . B. The approximation guarantee of GSEMO on the MLST problem

Here we prove the approximation guarantee of GSEMO on the MLST problem by simulating the process of the modiﬁedMVCA with contraction. Similar to Lemma 2 in [12], we prove the following Lemma.

Lemma 3:

Given a connected undirected graph G = ( V, E, L ) having a minimum label spanning tree T OP T with

OP T labels, where | V | = n and n ≥ , there exists a label such that the number of connected components of the spanning subgraphrestricted to edges with this label is not more than (cid:98) n (1 − OP T ) (cid:99) . Proof:

In fact, the minimum label spanning tree T OP T of G has exact n − edges, there must exist a label, say j , suchthat the number of edges in T OP T labeled by j is at least (cid:100) n − OP T (cid:101) , so the number of connected components of the spanningsubgraph, restricted to edges of label j , is not more than n − (cid:100) n − OP T (cid:101) = (cid:98) n (1 − OP T ) + OP T (cid:99) . When n ≥ , we have (cid:98) n (1 − OP T ) + OP T (cid:99) ≤ (cid:98) n (1 − OP T ) (cid:99) .Further, for a spanning subgraph H ( X ) of G = ( V, E, L ) , we have the following Corollary. Corollary 2: If r the number of connected components of H ( X ) is greater than , then there is a label such that when itis added to X the number of connected components will be reduced to not more than (cid:98) r (1 − OP T ) (cid:99) . Proof:

Contracting each connected component of H ( X ) to a supernode, then G is converted to G (cid:48) with r nodes. Supposethe number of labels in the minimum label spanning tree of G (cid:48) is OP T (cid:48) , according to Lemma 3, there is a label in G (cid:48) such that the number of connected components of the spanning subgraph, restricted to edges with this label, is not morethan (cid:98) r (1 − OP T (cid:48) ) (cid:99) . Noting that the number of labels of the minimum label spanning tree of G is OP T , it is clear that

OP T (cid:48) ≤ OP T . Thus, (cid:98) r (1 − OP T (cid:48) ) (cid:99) < (cid:98) r (1 − OP T ) (cid:99) . In other words, there is a label such that when it is added to X thenumber of connected components of H ( X ) will be reduced to not more than (cid:98) r (1 − OP T ) (cid:99) .Based on Corollary 2, we prove that the GSEMO guarantees to ﬁnd a (2 ln ( n )) -approximate solution in expected polynomialtime of n and k . Theorem 4:

Given an instance G = ( V, E, L ) of MLST problems, where | V | = n and | L | = k , the expected time thatGSEMO starts with any initial solution to ﬁnd a (2 ln ( n )) -approximation solution for G is O ( k ln ( n ) + k ln ( k )) . Proof:

We ﬁrst reveal that GSEMO starting with any initial solution will ﬁnd the all-zeros bit string in expected time O ( k ln ( k )) , then reveal that GSEMO ﬁnds a (2 ln ( n )) -approximation solution in expected time O ( k ln ( n )) after the all-zerosbit string being included in the population. Combining them, we obtain the Theorem. Fig. 1. An example of instance G (cid:48) . We now investigate the expected time that GSEMO starting from any initial solution ﬁnds the all-zeros bit string withPareto optimal ﬁtness vector ( n, . Once it is found, it can never be removed from the population. If it is not included in thepopulation, GSEMO can choose a solution X from P which contains the minimum number of labels among all solutions inthe population with probability Ω( k ) , as the population size is O ( k ) . The event of ﬂipping one of | X | bits whose value is 1will decrease the number of labels, and the probability for this event is (cid:0) | X | (cid:1) k (1 − k ) k − ≥ | X | ek . So, the expected time thatGSEMO includes a solution which contains | X | − labels is O ( k | X | ) . Following this way, the all-zeros bit string will be foundin expected time O ( (cid:80) i = | X | k i ) = O ( k ln ( | X | )) = O ( k ln ( k )) .Now that the all-zeros bit string with ﬁtness vector ( n, is included in the population. According to corollary 2, there is alabel such that when it is added to the all-zeros bit string the number of connected components will be reduced to not more than n (1 − OP T ) . The probability of choosing this label is k (1 − k ) k − = Ω( k ) . Since the population size is O ( k ) , the probability ofﬁnding the all-zeros bit string is Ω( k ) . So a solution X with ﬁtness vector ( c , , where c ≤ (cid:98) n (1 − OP T ) (cid:99) ≤ n (1 − OP T ) can be included in the population in expected time O ( k ) .If c ≥ , then there is still a label such that when it is added to X the number of connected components will be reducedto not more than n (1 − OP T ) . So a solution X with ﬁtness vector ( c , , where c ≤ n (1 − OP T ) can be included inthe population in expected time O ( k ) after X being included in the population.Similarly, after solution X h − with ﬁtness vector ( c h − , h − , where c h − ≤ n (1 − OP T ) h − , being included in thepopulation, if c h − ≥ , then a solution X h with ﬁtness vector ( c h , h ) , where c h ≤ n (1 − OP T ) h , will be included in thepopulation in expected time O ( k ) .Since when h = 2 OP T ln ( n ) , n (1 − OP T ) h ≤ . So, a connected spanning subgraph with at most OP T ln ( n ) labelswill be ﬁnally included in the population in expected time O ( hk ) = O (2 OP T k ln ( n )) = O ( k ln ( n )) after the all-zeros bitstring being included in the population. TABLE IA

PPROXIMATION PERFORMANCES OF THE (1+1) EA

AND

GSEMO. ’ r ’, ’ UPPER BOUND ’, AND ‘—’

REFER TO THE APPROXIMATION RATIO , UPPERBOUND OF THE EXPECTED TIME , AND UNKNOWN , RESPECTIVELY .The (1+1) EA GSEMO r upper bound r upper boundMLST b b +12 O ( nk + k ) b +12 O ( nk + k ) MLST — — ln ( n ) O ( k ln ( n ) + k ln ( k )) Table I summarizes the approximation performances of the (1+1) EA and GSEMO for the minimum label spanning treeproblem. For the MSLT b problem, the (1+1) EA and GSEMO can efﬁciently achieve a b +12 -approximation ratio. However, theorder of the expected time of GSEMO is higher than that of the (1+1) EA, the reason is that GSEMO has to select a promisingsolution to mutate in a population of size O ( k ) . For the MLST problem, GSEMO efﬁciently achieves a ln ( n ) -approximationratio, but the approximation performance of the (1+1) EA is unknown.IV. P ERFORMANCES OF THE (1+1) EA

AND

GSEMO

ON FOUR INSTANCES

In this section, we ﬁrstly present an instance where GSEMO outperforms the (1+1) EA, then we show that the (1+1) EAand GSEMO outperform local search algorithms on three instances of the MLST problem.

A. An instance where GSEMO outperforms the (1+1) EA

At ﬁrst, we construct an instance G (cid:48) = { V, E, L } to show that GSEMO is superior to the (1+1) EA, where L = { , . . . , k } .Given µ (0 < µ < ) and k the number of labels, we construct instance G (cid:48) by the following steps. For simplicity, weassume that µk is an integer, thus (1 − µ ) k is an integer as well. First, we construct (1 − µ ) k subgraphs G (cid:48) , . . . , G (cid:48) (1 − µ ) k . G (cid:48) i (1 ≤ i ≤ (1 − µ ) k ) contains a ( µk − -sided regular polygon whose edges are all labeled by the same label µk + i andan inner node in the center. From the inner node, ( µk − edges labeled by from 1 to µk − connect to the µk − outernodes v , v , . . . , v µk − . Then three edges are connected from G (cid:48) i (1 ≤ i ≤ (1 − µ ) k − to G (cid:48) i +1 : the ﬁrst one labeled by µk + i is from the inner node of G (cid:48) i to outer node v of G (cid:48) i +1 , the second one labeled by µk + i is from outer node v of G (cid:48) i to outer node v of G (cid:48) i +1 , the third one labeled by µk is from the the inner node of G (cid:48) i to the inner node of G (cid:48) i +1 . Finally, anadditional edge labeled by k connects the inner node of G (cid:48) (1 − µ ) k with outer node v of G (cid:48) . Figure 1 shows instance G (cid:48) .For < µ < / , the global optimum of G (cid:48) is X ∗ = ( µk (cid:122) (cid:125)(cid:124) (cid:123) , . . . , , (1 − µ ) k (cid:122) (cid:125)(cid:124) (cid:123) , . . . , , and the local optimum is X l = ( µk (cid:122) (cid:125)(cid:124) (cid:123) , . . . , , (1 − µ ) k (cid:122) (cid:125)(cid:124) (cid:123) , . . . , for local search algorithms, such as the local search algorithm with the -switch neighborhood, since both spanning subgraphs H ( X ∗ ) and H ( X l ) are connected, but | X ∗ | = µk , | X l | = (1 − µ ) k , and µk < (1 − µ ) k . For instance G (cid:48) , the expected timefor the (1+1) EA to jump out of the local optimum is exponential. Theorem 5:

For instance G (cid:48) , starting from the local optimum X l , the expected time for the (1+1) EA to ﬁnd the globaloptimum is Ω( k µk ) . Proof:

For instance G (cid:48) , when the current solution is the local optimum X l , the (1+1) EA only accepts the event that addsall µk labels from { , . . . , µk } and simultaneously removes more than µk labels from { µk + 1 , . . . , k } . So, the probability ofescaping from the local optimum is (cid:80) k − µki =1 (cid:0) k − µkµk + i (cid:1) ( k ) µk + i (1 − k ) k − µk − i = ( k ) µk (cid:80) k − µki =1 (cid:0) k − µkµk + i (cid:1) ( k ) µk + i (1 − k ) k − µk − i < ( k ) µk .This is because (cid:80) k − µki =1 (cid:0) k − µkµk + i (cid:1) ( k ) µk + i (1 − k ) k − µk − i < (cid:80) k − µki =1 (cid:0) k − µkµk + i (cid:1) ( k ) µk + i (1 − k ) k − µk − i + (cid:80) i = − µk (cid:0) k − µkµk + i (cid:1) ( k ) µk + i (1 − k ) k − µk − i = (cid:80) k − µki =0 (cid:0) k − µki (cid:1) ( k ) i (1 − k ) k − µk − i = 1 .Thus, starting from the local optimum, the expected time for the (1+1) EA to ﬁnd the global optimum of G (cid:48) is O ( k µk ) .Though the (1+1) EA needs expected exponential time to jump out of the local optimum, GSEMO can efﬁciently ﬁnd theglobal optimum for instance G (cid:48) . Theorem 6:

For instance G (cid:48) , GSEMO ﬁnds the global optimum in expected time O ( k ln ( k )) . Proof:

Adding a label from L = { , . . . , µk } to the all-zeros bit string can reduce the number of connected componentsby (1 − µ ) k , while adding a label from L = { µk + 1 , . . . , k } can reduce the number of connected components by µk . Notethat (1 − µ ) k is larger then µk , so, the Pareto front contains µk + 1 Pareto optimal solutions with ﬁtness vectors ( n, , ( n − (1 − µ ) k, , . . . , ( n − (1 − µ ) jk, j ) , . . . , (1 , µk ) , respectively. It is clear that the population size is O ( k ) .It has been proved in Theorem 4 that the expected time for GSEMO starting with any initial solution to include the all-zerosbit string in the population is O ( k ln ( k )) .Now we calculate the expected time to produce the whole Pareto front after the all-zeros bit string is found. The worst caseis from the all-zeros bit string to produce the whole Pareto front. Suppose now in the population, there is a Pareto optimalsolution X with ﬁtness vector ( n − (1 − µ ) jk, j ) , which has the maximum number of labels. Another Pareto optimal solutionwith ﬁtness vector ( n − (1 − µ )( j + 1) k, j + 1) can be produced by adding a label from L which is not in X . The probabilityof adding this label is (cid:0) µk − j (cid:1) k (1 − k ) k − ≥ µk − jek . This implies that the expected time is O ( ek µk − j ) , as the expected time toselect X is O ( k ) . So, considering the worst case of starting from ( n, , the expected time for GSEMO to produce the wholePareto front is (cid:80) µk − j =0 ek µk − j = O ( k ln ( k )) . B. An instance where the (1+1) EA and GSEMO outperform ERA

ERA is a local search algorithm. It takes an arbitrary spanning tree as input, then considers each non-tree edge and testswhether the number of used labels can be reduced by adding this non-tree edge and deleting a tree edge on the induced cycle.

Fig. 2. An example of instance G with n = 5 . In this subsection, we show that the (1+1) EA and GSEMO outperform ERA on an instance proposed by Krumke and Wirth,which is denoted by G in this paper.This instance can be constructed by two steps. First, we construct a star shaped graph with n − distinct labels, i.e., selectingone node out of n nodes, and adding n − edges from it to the other n − nodes labeled by n − distinct labels: , , . . . , n − .Second, by adding edges between each pair of nodes with the same label k . Thus, we get a complete graph G = ( V, E, L ) ,where | V | = n , | E | = n ( n − / , and L = { , , . . . , k } is the set labels. It is clear that | L | = k , and k = n . Figure IV-Bshows an example with n = 5 , where the dashed edges construct the spanning tree with the minimum number of labels.For instance G , a global optimum X ∗ uses one a label from { , . . . , k − } and label k , i.e., | X ∗ | = 2 , (cid:80) k − i =1 x i = 1 , and x k = 1 .Krumke and Wirth used instance G to demonstrate that ERA might perform as badly as possible. In fact, X l = ( k − (cid:122) (cid:125)(cid:124) (cid:123) , . . . , , is a local optimum for ERA, since the number of labels used in H ( X l ) can not be reduced by adding any non-tree edge anddeleting a tree edge on the induced cycle. The local optimum uses k − labels, while the global optimum uses only labels.However, the (1+1) EA and GSEMO can efﬁciently ﬁnd a global optimum for G . Theorem 7:

For instance G , the (1+1) EA ﬁnds a global optimum in expected time O ( kln ( k )) . Proof:

For simplicity, let L denote the label set { , . . . , k − } .Let A = { X | c ( H ( X ) = 1 , x k = 1 , ≤ | X | ≤ k − } , i.e., a solution X ∈ A contains label k and at least one but at most k − labels from L .To ﬁnd a global optimum, a solution X ∈ A should be found ﬁrst. Once a solution X ∈ A has been found, the globaloptimum can be found by removing all | X | − redundant labels from L . According to the Coupon Collector’s theorem [35],all redundant labels contained in X will be removed in expected time O ( kln ( k )) .In order to analyze the expected time to ﬁnd a solution X ∈ A , we further partition all solutions that do not belong to A into ﬁve disjoint subsets B , C , D , E , F : B = { X | c ( H ( X )) = 1 , | X | = k , and x k = 1 } ; C = { X | c ( H ( X )) = 1 , | X | = k − , and x k = 0 } ; D = { X | c ( H ( X )) > , ≤ | X | ≤ k − , and x k = 0 } ; E = { X | c ( H ( X )) > , | X | = 1 , and x k = 1 } ; F = { X | c ( H ( X )) > , | X | = 0 } .If X ∈ B , then X will be transformed into A by removing one label from L . The probability of this event is (cid:0) k − (cid:1) k (1 − k ) k − = Ω(1) , which implies that the expected time is O (1) .If X ∈ C , then X will be transformed into A by adding label k and simultaneously removing two labels from L . Theprobability of this event is (cid:0) k − (cid:1) ( k ) (1 − k ) k − = Ω( k ) , which implies that the expected time is O ( k ) .If X ∈ D ( E ) , then X will be transformed into A by adding label k (one label from L ). The probability of this event is k (1 − k ) k − = Ω( k ) , which implies that the expected time is O ( k ) .If X ∈ F , then X will be transformed into A by simultaneously adding label k and a label from L . The probability is (cid:0) k − (cid:1) ( k ) (1 − k ) k − = Ω( k ) , which implies that the expected time is O ( k ) .So, any solution will be transformed into A in expected time O ( k ) .Combining the expected time to remove all redundant labels contained in a solution belonging to A , the expected time forthe (1+1) EA to ﬁnd a global optimum is O ( kln ( k )) . Theorem 8:

For instance G , GSEMO ﬁnds a global optimum in expected time O ( k ln ( k )) . Proof:

Let L denote the label set { , . . . , k − } . We treat the optimization process as two independent phases: the ﬁrstphase lasts until a solution with ﬁtness vector (1 , . ) is included, the second phase ends when a global optimum is found.To analyze the expected time of the ﬁrst phase, we consider the solution X with ﬁtness vector ( c ( H ( X )) , | X | ) where c ( H ( X )) is the minimum among all solutions in the population. If c ( H ( X )) > , then there are three cases. The ﬁrst one isthat X contains no label, the second is that X contains label k but no label from L , the third is that X contains at least onebut at most k − labels from L and no label k .For all three cases, a solution with ﬁtness vector (1 , . ) will be included in expected time O ( k ) , since the probability ofselecting X to mutate is Ω( k ) , and the probability of transforming X into a solution with ﬁtness vector (1 , . ) is Ω( k ) .Once a solution with ﬁtness vector (1 , . ) is included, we show that a global optimum will be found in expected time O ( k ln ( k )) . To this end, we partition the second phase into two subphases: the ﬁrst subphase lasts until a solution belongingto A = { X | x k = 1 , ≤ | X | ≤ k − } is found, i.e, such a solution contains label k and at least one but at most k − labelsfrom L , the second subphase ends when a global optimum is found.If a solution X with ﬁtness vector (1 , . ) and X (cid:54)∈ A , then there are two cases needed to be considered: the ﬁrst is that X contains label k and all labels from L , the other is that X contains all labels from L but no label k .For the ﬁrst case, removing any one of labels from L will transform X into A . The probability of this event is (cid:0) k − (cid:1) k (1 − k ) k − = Ω(1) , which implies that the expected time is O (1) . For the second case, removing two labels from L andsimultaneously adding label k will transform X into A . The probability of this event is (cid:0) k − (cid:1) ( k ) (1 − k ) k − = Ω( k ) , which implies that the expected time is O ( k ) . Combining the probability of selecting X to mutate Ω( k ) , a solution belongingto A will be included in expected time O ( k ) after a solution with ﬁtness vector (1 , . ) being included.Now a solution X ∈ A is included, the global optimum will be found by removing all | X | − redundant labels from L .If such a label is removed from X , then it can not be added any more. According to the Coupon Collector’s theorem [35],all redundant labels will be removed in expected mutations O ( kln ( k )) , and the probability of selecting a solution belongingto A to mutate is Ω( k ) , so a global optimum will be found in expected time O ( k ln ( k )) .Therefore, GSEMO ﬁnds the global optimum in expected time O ( k ln ( k )) . C. An instance where the (1+1) EA and GSEMO outperform the local search algorithm with the 2-switch neighborhood

Br¨uggemann, Monnot, and Woeginger proposed an instance, denoted by G in this paper, to show that there exists a localoptimum with respect to the local search algorithm with the 2-switch neighborhood [30].As shown in Figure 3, this instance is a graph G = ( V, E, L ) , where V = ( v , x , x , . . . , x k − , y , y , . . . , y k − ) , L = (1 , , . . . , k ) , | V | = 2 k − , | E | = 4 k − , | L | = k . Figure 4 shows the minimum label spanning tree. Fig. 3. Instance G .Fig. 4. The MLST of instance G . In this instance, the global optimum is X ∗ = ( k − (cid:122) (cid:125)(cid:124) (cid:123) , , . . . , , , .The local search algorithm with the -switch neighborhood might be trapped in the local optimum which contains labels 1,2, . . . , k − . In fact, to jump out of this local optimum, at least three labels from { , , . . . , k − } should be removed andsimultaneously two labels k − and k should be added, but the resulting solution is not in the the -switch neighborhood ofthe local optimum. However, the (1+1) EA and GSEMO can efﬁciently ﬁnd the global optimum of G . Theorem 9:

For instance G , the (1+1) EA ﬁnds the global optimum in expected time O ( k ) . Proof:

Let L denote the label set { , . . . , k − } , and let A = { X | c ( H ( X )) = 1 , x k − = 1 , x k = 1 , ≤ | X | ≤ k − } ,i.e, a solution X ∈ A contains labels k − and k and at most k − labels from L .Noting that the global optimum contains only two labels k − and k , we treat the optimization process as two phases: theﬁrst phase lasts until a solution X ∈ A is constructed from an arbitrary solution, the second phase ends when all | X | − redundant labels from L are removed.For analyzing the expected time of ﬁnding a solution X ∈ A , we partition all solutions that do not belong to A into sevendisjoint subsets B , C , D , E , F , G , H : Fig. 5. Instance G with b=4. B = { X | c ( H ( X ) = 1 , x k − = 0 , x k = 0 , | X | = k − } ; C = { X | c ( H ( X ) = 1 , x k − = 0 , x k = 1 , | X | = k − or | X | = k − } ; D = { X | c ( H ( X ) = 1 , x k − = 1 , x k = 0 , | X | = k − or | X | = k − } ; E = { X | c ( H ( X ) = 1 , x k − = 1 , x k = 1 , | X | = k } ; F = { X | c ( H ( X ) > , x k − = 0 , x k = 0 } ; G = { X | c ( H ( X ) > , x k − = 0 , x k = 1 } ; H = { X | c ( H ( X ) > , x k − = 1 , x k = 0 } ; If X ∈ B , then X will be transformed into A by adding labels k − , k , and simultaneously removing three labels from L . The probability of this event is (cid:0) k − (cid:1) ( k ) (1 − k ) k − = Ω( k ) , which implies that the expected time is O ( k ) .If X ∈ C ( D ) , then X will be transformed into A by adding label k − ( k ) and simultaneously removing two labels from L . The probability of this event is at least (cid:0) k − (cid:1) ( k ) (1 − k ) k − = Ω( k ) , which implies that the expected time is O ( k ) .If X ∈ E , then X will be transformed into A by removing a label from L . The probability of this event is (cid:0) k − (cid:1) k (1 − k ) k − = Ω(1) , which implies that the expected time is O (1) .If X ∈ F , then X will be transformed into A by simultaneously adding labels k − and k . The probability of this event is ( k ) (1 − k ) k − = Ω( k ) , which implies that the expected time is O ( k ) .If X ∈ G ( H ) , then X will be transformed into A by adding label k − ( k ). The probability of this event is k (1 − k ) k − =Ω( k ) , which implies that the expected time is O ( k ) .So, a solution belonging to A will be found in expected time O ( k ) .In the second phase, removing each label contained in a solution belonging to A which is from L will reduce the ﬁtnessvalue, and once it is removed it can not be added any more. According to the Coupon Collectors theorem [35], the secondstage ends in expected time O ( kln ( k )) .Altogether, the expected time for the (1+1) EA to ﬁnd the global optimum is O ( k ) . Theorem 10:

For instance G , the expected time for GSEMO to ﬁnd the global optimum is O ( k ln ( k )) . Proof:

It has been proved in Theorem 4 that the expected time for GSEMO starting with any initial solution to ﬁnd theall-zeros bit string is O ( k ln ( k )) .Once the all-zeros bit string is included in the population, the Pareto optimal solution X with ﬁtness vector ( k − , willbe found by adding label k − or k , to the all-zeros bit string. Since the probability of ﬁnding the all-zeros bit string is Ω( k ) ,and the probability of ﬂipping a bit corresponding to such labels is k (1 − k ) k − = Ω( k ) . So, the expected time to produce X from the all-zeros bit string is O ( k ) . Then the Pareto solution X with ﬁtness vector (1 , will be found by adding theremaining label from { k − , k } to solution X , and the expected time to produce solution X from solution X is also O ( k ) .Therefore, the expected time for GSEMO to ﬁnd the global optimum is O ( k ln ( k )) . D. An instance where the (1+1) EA and GSEMO outperform the modiﬁed MVCA

In this subsection, we show that the (1+1) EA and GSEMO outperform the modiﬁed MVCA on an instance proposed byXiong, Golden, and Wasil [14], which is denoted by G in this paper.Given the bound of the labels’ frequency b ( b ≥ , and let n = b · b ! + 1 , we construct G = ( V, E, L ) as follows, where V = { , , . . . , n } , | V | = n , and L = L b ∪ L b − ∪ · · · ∪ L ∪ L opt . We construct b ! groups from V , each containing b + 1 nodes: V = { , , . . . , b + 1 } , V = { b + 1 , b + 2 , . . . , b + 1 } , . . .V j = { ( j − b + 1 , ( j − b + 2 , . . . , jb + 1 } , . . .V b ! = { ( b ! − b + 1 , ( b ! − b + 2 , . . . , b ! b + 1 } .In V j ( j = 1 , , . . . , b !) , the edges between consecutive nodes (( j − b + 1 , ( j − b + 2) , . . . , ( jb, jb + 1) are all labeledwith one label. Thus, b ! labels are needed, which constitute the label set L opt . The edges with these b ! labels construct theminimum label spanning tree T opt , so in this instance OP T = b ! .The label subset L h ( h = b, b − , . . . , is obtained as follows. We choose edge (( j − b + 1 , ( j − b + 1 + h ) in each V j , so there are b ! such edges. We label the ﬁrst h edges with one label, and the next h edges with a second label, etc. So, b ! h labels are needed, and they construct L h . Hence, | L h | = b ! h , and the total number of labels k = (cid:80) bj =2 b ! j + b ! .Figure 5 shows an example with b = 4 , where the dashed edges construct the spanning tree with the minimum number oflabels.In this instance, the global optimum is X ∗ = ( (cid:80) bj =2 b ! j (cid:122) (cid:125)(cid:124) (cid:123) , . . . , , b ! (cid:122) (cid:125)(cid:124) (cid:123) , . . . , .Xiong, Golden, and Wasil used this instance to show that the modiﬁed MVCA may obtain the worst-case solution using alllabels from L b ∪ L b − ∪ · · · ∪ L ∪ L opt , which is H b -approximation solution, where H b = (cid:80) bi =1 1 i . Here, we show that the(1+1) EA and GSEMO can efﬁciently ﬁnd the global optimum. Theorem 11:

For instance G , the (1+1) EA ﬁnds the global optimum in expected time O ( nk ) , where n = b · b ! + 1 , k = (cid:80) bj =2 b ! j + b ! , and b is the maximum frequency of the labels. Proof:

We treat the optimization process as two independent phases. The ﬁrst phase ends when the (1+1) EA ﬁnds asolution X such that H ( X ) is a connected spanning subgraph. The second phase lasts until the (1+1) EA removes all redundantlabels from { , , . . . , (cid:80) bj =2 b ! j } .Let X be the current solution, if the number of connected components of H ( X ) is not , then there must exist a bit x h from { x i | (cid:80) bj =2 b ! j + 1 ≤ i ≤ (cid:80) bj =2 b ! j + b ! } valued 0, otherwise the number of connected components is . So, the (1+1)EA can decrease the number of connected components by at least with probability at least k (1 − k ) k − ≥ ek . This is theprobability of the event that bit x h is ﬂipped from to while the other bits keeping unchanged. Hence, the expected timeto decrease the number of connected components from n to is O ( nk ) , i.e., a connected spanning subgraph will be createdby the (1+1) EA in expected runtime O ( nk ) .Once a connected spanning subgraph is constructed, each bit from { x i | (cid:80) bj =2 b ! j + 1 ≤ i ≤ (cid:80) bj =2 b ! j + b ! } takes value 1, andthe ﬂippings of them can not be accepted by the (1+1) EA, as such ﬂippings will create a disconnected spanning subgraph.For each bit x i (1 ≤ i ≤ (cid:80) bj =2 b ! j ) , if x i = 1 , then it can be ﬂipped from 1 to 0, since this will decrease the ﬁtness value;otherwise, its ﬂipping can not be accepted by the (1+1) EA, as this will increase the ﬁtness value. So, when all bits have beenselected at least once to ﬂip, the connected spanning subgraph with the minimum number of labels will be found. Accordingto the Coupon Collector’s theorem [35], the upper bound of the expected runtime for this to happen is O ( kln ( k )) .Hence, the expected time needed for the (1+1) EA to ﬁnd the global optimum is O ( nk + kln ( k )) = O ( nk ) . Note that n = b · b ! + 1 > k = b !(1 + + · · · + b ) , and k > ln ( k ) . So, n > ln ( k ) , and nk > kln ( k ) .GSEMO can also efﬁciently ﬁnd the global optimum for G . Theorem 12:

For instance G , GSEMO ﬁnds the global optimum in expected time O ( k ) . Proof:

We treat the optimization process as two phases: the ﬁrst phase is that GSEMO starting from any initial solutionﬁnds a solution with ﬁtness vector (1 , . ) , the second phase is that GSEMO ﬁnds the global optimum after a solution withﬁtness vector (1 , . ) being found.Noting that a connected spanning subgraph contains all labels from L opt = { (cid:80) bj =2 b ! j + 1 , . . . , (cid:80) bj =2 b ! j + b ! } , if X is asolution such that c ( H ( X )) > , then at least one labels from L opt are not contained in it, and the number of connectedcomponents can be decreased by adding such labels.We now analyze the expected time that GSEMO starting from any initial solution ﬁnds a solution with ﬁtness vector (1 , . ) . Ifsuch a solution has not been included in the population, then there is a solution X from P such that c ( H ( X )) is the minimal,and adding some label l from L opt to X will reduce the number of connected components. The probability that GSEMOchooses X to mutate is Ω( k ) , as the population size is O ( k ) , and the probability of ﬂipping the bit corresponding to label l is k (1 − k ) k − = Ω( k ) , so a solution with a smaller number of connected components will be found in expected time O ( k ) .After all labels from L opt being added, a connected spanning subgraph will be constructed. Thus a solution with ﬁtness vector (1 , . ) will included in expected time O ( b ! k ) = O ( k ) .Once a solution with ﬁtness vector (1 , . ) is included, GSEMO can ﬁnish the second phase by removing all redundant labelsfrom L b ∪ L b − ∪ · · · ∪ L one by one. Since the probability of selecting the solution with ﬁtness vector (1 , . ) to mutate is Ω( k ) , and removing all redundant labels needs an expected time O ( kln ( k )) . So the global optimum will be ﬁnd in expectedtime O ( k ln ( k )) .Combining the expected times in two phases, we ﬁnish the proof. TABLE IIU

PPER BOUNDS ON THE EXPECTED TIMES FOR THE (1+1) EA

AND

GSEMO

TO FIND THE GLOBAL OPTIMA ON FOUR INSTANCES . ‘—’

MEANSUNKNOWN .Instance G (cid:48) Instance G Instance G Instance G The (1+1) EA — O ( kln ( k )) O ( k ) O ( nk ) GSEMO O ( k ln ( k )) O ( k ln ( k )) O ( k ln ( k )) O ( k ) Table II summarizes the upper bounds on the expected times for the (1+1) EA and GSEMO to ﬁnd the global optima onall four instances. On instances G , G , and G , GSEMO needs times of higher order than the (1+1) EA. The main reasonis that GSEMO selects a promising solution to mutate in a population of size O ( k ) . On G (cid:48) , GSEMO outperforms the (1+1)EA because GSEMO behaves greedily and optimizes solutions with different number of labels.V. C ONCLUSION

In this paper, we investigated the performances of the (1+1) EA and GSEMO on the minimum label spanning tree problem.We found that the (1+1) EA and GSEMO can guarantee to achieve some approximation ratios. We further show that the (1+1)EA and GSEMO defeat local search algorithms on some instances, and that GSEMO outperforms the (1+1) EA on an instance.As for the approximation ratio of the (1+1) EA on the MLST problem, we still know nothing about. Apart from this, sincethe (1+1) EA and GSEMO are randomized algorithms, it is natural to ask whether they can achieve better approximate ratiosthan those guaranteed by some greedy algorithms. From our analysis process, especially from the analysis process of GSEMO,it seems possible.

Acknowledgement:

This work was supported in part by the National Natural Science Foundation of China (61170081,61165003, 61300044, 61332002), in part by the EPSRC under Grant EP/I009809/1, in part by the National High-TechnologyResearch and Development Program (863 Program) of China No. 2013AA01A212, and in part by the NSFC for DistinguishedYoung Scholars 61125205. R

EFERENCES[1] R.-S. Chang, S.-J. Leu. The minimum labeling spanning trees. Information Processing Letters, 1997, 63: 277-282.[2] J. Holland. Adaptation in natural and artiﬁcial systems. The University of Michigan Press, 1975.[3] D. Goldberg. Genetic Algorithms in search, optimization, and machine learning. Addison-Wesley, New York, 1989.[4] F. Herrera, M. Lozano, J. Verdegay. Tackling real-coded genetic algorithms: Operators and tools for behavioural analysis. Artiﬁcial Intelligence Review,1998, 12(4): 265-319.[5] K. Gallagher, M. Sambridge. Genetic algorithms: A powerful tool for large-scale non-linear optimization problems. Computer & Geosciences, 1994,20(7-8): 1229-1236.[6] Y. Xiong, B. Golden, E. Wasil. A one-parameter genetic algorithm for the minimum labeling spanning tree problem. IEEE Transactions on EvolutionaryComputation, 2005, 9(1): 55-60.[7] J. Nummela, B. Julstrom. An effective genetic algorithm for the minimum-label spanning tree problem. In proceedongs of Genetic and EvolutionaryComputation Conference, 2006, pp. 553-558.[8] S. Consoli, K. Darby-Dowman, N. Mladenovic, J. Moreno-P´erez. Solving the minimum labelling spanning tree problem using hybrid local search.Technical report, n/a, 2007.[9] A. Chwatal, G. Raidl. Solving the minimum label spanning tree problem by mathematical programming techniques. Technical Report TR 186-1-10-03,Vienna University of Technology, Institute of Computer Graphics and Algorithms, June 2010.[10] R. Cerulli, A. Fink, M. Gentili, S. Voß. Metaheuristics comparison for the minimum labelling spanning tree problem. Operations Research/ComputerScience Interfaces Series, 2005, 29: 93-106.[11] S. Consoli, K. Darby-Dowman, N. Mladenovi´c, J. Moreno-P´erez. Greedy randomized adaptive search and variable neighbourhood search for the minimumlabelling spanning tree problem. European Journal of Operational Research, 2009, 196: 440-449.[12] S. O. Krumke, H. Wirth. On the minimum label spanning tree problem, Information Processing Letters, 1998, 66(2):81-85.[13] Y. Wan, G. Chen, Y. Xu. A note on the minimum label spanning tree. Information Processing Letters, 2002, 84: 99-101.[14] Y. Xiong, B. Golden, E. Wasil. Worst-case behavior of the MVCA heuristic for the minimum labeling spanning tree problem. Operations ResearchLetters, 2005, 33: 77-80.[15] S. Jansen, I. Wegener. Evolutionary algorithms: How to cope with plateaus of constant ﬁtness and when to reject strings of the same ﬁtness. IEEETransactions on Evolutionary Computation, 2001, 5(6): 589-599.[16] J. He and X. Yao. Drift analysis and average time complexity of evolutionary algorithms. Artiﬁcial Intelligence, 2001, 127(1): 57-85.[17] S. Droste, T. Jansen, I. Wegener. On the analysis of the (1+1) evolutionary algorithm. Theoretical Computer Science, 2002, 276(1-2): 51-81.[18] J. He and X. Yao. Towards an analytic framework for analysing the computation time of evolutionary algorithms. Artiﬁcial Intelligence, 2003, 145:59-97.[19] F. Neumann, J. Reichel, M. Skutella. Computing minimum cuts by randomized search heuristics. Algorithmica, 2011, 59: 323-342.[20] Y. Zhou, J. He, Q. Nie. A comparative runtime analysis of heuristic algorithms for satisﬁability problems. Artiﬁcial Intelligence, 2009, 173: 240-257.[21] F. Neumann, I. Wegener. Randomized local search, evolutionary algorithms, and the minimum spanning tree problem. Theoretical Computer Science,2007, 378: 32-40.[22] F. Neumann. Expected runtimes of evolutionary algorithms for the Eulerian cycle problem. Computers & Operations Research, 2008, 35: 2750-759.[23] A. Sutton, F. Neumann. A parameterized runtime analysis of evolutionary algorithms for the euclidean traveling salesperson problem. In Proceedings ofthe 26th AAAI Conference on Artiﬁcial Intelligence, 2012, pp. 1105-1111.3