[PDF] Solving the Large Scale Next Release Problem with a Backbone Based Multilevel Algorithm

Abstract

The Next Release Problem (NRP) aims to optimize customer profits and requirements selection for the software releases. The research on the NRP is restricted by the growing scale of requirements. In this paper, we propose a Backbone based Multilevel Algorithm (BMA) to address the large scale NRP. In contrast to direct solving approaches, BMA employs multilevel reductions to downgrade the problem scale and multilevel refinements to construct the final optimal set of customers. In both reductions and refinements, the backbone is built to fix the common part of the optimal customers. Since it is intractable to extract the backbone in practice, the approximate backbone is employed for the instance reduction while the soft backbone is proposed to augment the backbone application. In the experiments, to cope with the lack of open large requirements databases, we propose a method to extract instances from open bug repositories. Experimental results on 15 classic instances and 24 realistic instances demonstrate that BMA can achieve better solutions on the large scale NRP instances than direct solving approaches. Our work provides a reduction approach for solving large scale problems in search based requirements engineering.

Full PDF

IIEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1

Solving the Large Scale Next Release Problem with a Backbone Based Multilevel Algorithm

Jifeng Xuan, He Jiang,

Member , IEEE , Zhilei Ren, and Zhongxuan Luo

Abstract —The Next Release Problem (NRP) aims to optimize customer profits and requirements selection for the software releases. The research on the NRP is restricted by the growing scale of requirements. In this paper, we propose a Backbone based Multilevel Algorithm (BMA) to address the large scale NRP. In contrast to direct solving approaches, BMA employs multilevel reductions to downgrade the problem scale and multilevel refinements to construct the final optimal set of customers. In both reductions and refinements, the backbone is built to fix the common part of the optimal customers. Since it is intractable to extract the backbone in practice, the approximate backbone is employed for the instance reduction while the soft backbone is proposed to augment the backbone application. In the experiments, to cope with the lack of open large requirements databases, we propose a method to extract instances from open bug repositories. Experimental results on 15 classic instances and 24 realistic instances demonstrate that BMA can achieve better solutions on the large scale NRP instances than direct solving approaches. Our work provides a reduction approach for solving large scale problems in search based requirements engineering.

Index Terms —the next release problem, backbone, soft backbone, multilevel algorithm, requirements instance generation, search based requirements engineering. ——————————  ——————————

1 I

NTRODUCTION or a large software project, determining the require-ments assignment in the next release is an important problem in the requirements phase [12]. The customers of the software wish to purchase the products suitable for their needs while the software company wishes to select optimal requirements to maximize commercial profits [48]. Due to the complexity of customer requests and product requirements, decisions for software releases frequently conflict with efforts to maximize the profits of the project [10]. To maximize the profits of a software project, an ideal approach is to implement all the re-quirements to satisfy each potential customer. Limited by the software costs (e.g., the budget or the development time), only a subset of these requirements can be selected in the next release [48]. From the perspective of the soft-ware company, the goal of the next release is to select the optimal requirements to maximize the customer profits. However, two factors restrict the development of re-quirements selection: the problem scale and the require-ments dependency. On one hand, when facing large scale requirements management, it is time-consuming to optimize customer profits [49]. The growth in scale has been listed as one of the nine research hotspots in the future of requirements engineering [10]. Some studies have focused on the large scale requirements analysis. For example, 1000 require-ments are provided for the experiments on elicitation and triage in the

SugarCRM project [9], [14]; 2400 market re-quirements and 1100 business requirements are handled for the next release in the

Baan software framework [48]. Although some optimization technologies are introduced to balance the customer profits and requirements costs, such as the cost-value approaches [32], the linguistic-engineering approaches [48], [49], and the search based approaches [5], [57], it is still a challenge to select an op-timal decision for large scale requirements problems [67]. On the other hand, the requirements dependencies in-crease the complexity of requirements optimization. In the modern incremental software development process, new-coming requirements may build joint functions with previous and associate requirements [50]. An industry survey shows that about 80% requirements are con-strained by dependencies, which significantly complicate the decision for the software releases [8]. In this paper, we address large scale requirements se-lection with the Next Release Problem (NRP), which is proposed to model the decision for customer profits and requirements costs [4]. The NRP seeks to maximize cus-tomer profits from a set of dependent requirements, un-der the constraint of a predefined budget bound. Assisted by the NRP, a requirements engineer can make a decision for software requirements to balance the profits of the company and the customers. As a combinatorial optimi-zation problem, the NRP has been proved as “ - even when it is basic and customer requirements are in-dependent” [4], i.e., unless , there exists no exact algorithm to select the optimal set of customers to maxim-ize the profits in polynomial time [18]. In practice, espe-cially for a large scale problem, it is hard to exactly opti-mize the decision of the NRP. Thus, it is straightforward to design approximate algorithms to generate near- xxxx-xxxx/0x/$xx.00 © 200x IEEE ————————————————  J. Xuan and Z. Ren are with the School of Software, Dalian University of Technology, Dalian 116621, China. E-mail: {xuan, ren}@mail.dlut.edu.cn.  H. Jiang is with the School of Software, Dalian University of Technology, Dalian 116621, China. E-mail: [email protected].  Z. Luo is with the School of Mathematical Sciences, Dalian University of Technology, Dalian 116621, China. E-mail: [email protected].

Manuscript received (insert date of submission if desired). F IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. optimal decisions within polynomial time. Many search-based approaches have been proposed to approximately solve the NRP and its variant problems, including greedy algorithms [4], [25], greedy randomized adaptive search procedures [4], local searches (e.g., a hill climbing or a simulated annealing) [4], [5], genetic algorithms [13], [57], ant colony optimizations [31], [13], etc. A few of these approaches (e.g., a simulated annealing [7] and a genetic algorithm [52]) can work effectively on the small scale NRP. However, these approaches for the small scale prob-lems cannot be directly applied to the large scale prob-lems. For example, Natt och Dag et al. [48] show the hardness of large scale requirements management by ana-lyzing the relationship between customer requests and product requirements; Svahnberg et al. discuss the grow-ing complexity while the problem scale improves [59]. For the large scale NRP , it is necessary to design an effective algorithm to cope with the increasing problem scale. In this paper, we propose a Backbone based Multilevel Algorithm (BMA) to iteratively solve the large scale NRP. In contrast to direct solving approaches, BMA iteratively downgrades the scale of the problem by fixing the back-bone , which can be approximately viewed as the common part of customers with optimal requirements. The back-bone is one of the effective tools in large scale combinato-rial optimization in recent years [58], [64], [30]. In our work, two kinds of backbones are employed for the NRP, namely the approximate backbone (the common part of cus-tomers from a given number of local optimal solutions) and the soft backbone (the customers, who add zero cost to the requirements selection). Based on the backbone, we can break a large scale problem down to small ones and refine the solution to the original problem. To face the lack of open large requirements databases, we propose a method to mine the NRP instances from open bug repositories. Knowledge from bug repositories is extracted to generate the information of requirements. For example, we map a bug report and a user in open bug repositories to a requirement and a customer in require-ments engineering. Based on our method, we can gener-ate realistic NRP instances from open bug repositories. In our work, first, we give the definitions of the NRP model and illustrate the NRP with an example. Then, we define the NRP backbone and propose the instance reduc-tion approach for the NRP. In the implementation of the backbone, we use the approximate backbone to replace the backbone and present the similarity between the ap-proximate backbone and the backbone via the fitness landscape analysis; to augment the application of the ap-proximate backbone, we propose the new concept of the soft backbone. Next, we employ the approximate back-bone and the soft backbone to design BMA, which em-ploys a multilevel strategy to enhance the instance reduc-tion. The framework of BMA includes three phases, namely reduction, solving, and refinement. Finally, exper- Throughout this paper, the large scale NRP can be viewed as the NRP in large scale requirements management, which is to balance the custom-er profits and requirements costs in large scale software projects. An instance is a detailed model generated by specifying particular values for all the parameters of a problem [45]. iments are conducted on 15 instances generated from the classic literature and 24 new instances extracted from open bug repositories. Experimental results show that our BMA can achieve better solutions than direct solving al-gorithms on large scale NRP instances. The primary contributions of this paper are as follows: 1. We present a new algorithm, BMA, to reduce the problem scale of the NRP. In this algorithm, we show how to incorporate the backbone into an ap-proximate algorithm for solving large scale prob-lems. To our knowledge, this is the first applica-tion of the backbone in requirements engineering. 2.

We propose the soft backbone to augment the ex-isting concept of the backbone in both software engineering and combinatorial optimization. In our work, the soft backbone is directly obtained from the instance after the instance reduction by fixing the selected near-optimal customers. 3.

We generate new NRP instances from bug reposi-tories of three open source software projects. The bug repositories are mined to cope with the lack of open requirements repositories. This method of mining new instances can provide realistic in-stances for the empirical research. 4.

We experimentally evaluate BMA and two other existing algorithms for large scale NRP instances. Numerous experimental results on both solution quality and running time are shown to present the performance of these algorithms. This paper is an extension of our previous work pre-sented at Search Based Software Engineering (SBSE) Track at the 12th Annual Conference on Genetic and Evo-lutionary Computation (GECCO ’10) [28]. In this exten-sion, we add the new concept of the soft backbone, the new method for instance generation, and numerous ex-perimental results. The remainder of this paper is organized as follows. Section 2 states the definitions of the NRP. Section 3 shows the NRP backbone and the instance reduction. In Section 4, we propose BMA for the NRP. Section 5 pre-sents the experiments and results. Section 6 gives the threats to validity in our work. Section 7 lists the related work. Section 8 briefly concludes this paper and presents our future work.

2 P

ROBLEM D EFINITIONS

In this section, we present the related definitions of the NRP and illustrate the NRP with an example instance. The NRP can be retrieved from the following scenario [4]: in the requirements analysis phase of a software pro-ject, a necessary step is to select adequate requirements in the next release to achieve maximized commercial profits within a limited cost. Each customer requests a fraction of candidate requirements and provides a potential com-mercial profit for the software company. In a real-world project, the dependencies among candidate requirements restrict the selection of customer profits. The NRP aims to determine a subset of customers to achieve maximum profits under a predefined budget bound.

UTHOR: TITLE 3

According to this application scenario, we give the formal definitions of the NRP as follows. In a software project, let be the set of all the candidate requirements and the cardinality of is . Each requirement ( ) is associated with a nonnegative cost . A directed acyclic graph denotes the dependencies among these requirements, where is the set of vertexes and is the set of arcs. In the dependency graph , an arc indicates that the requirement depends on , i.e., if is implemented in the next release, must be implemented as well to satisfy the dependency. The requirement is called the child requirement of . Let be the set of requirements, which can reach via one or more arcs. More formally, . All the requirements in must be implemented to en-sure the implementation of . Let be all the customers related to the requirements and . Each customer , requests a set of re-quirements . Let be the profit gained from the customer . Let . For a given customer , let the set of total requirements re-quested by be . Under the above definitions, a customer can be satisfied by the software release decision, if and only if all the requirements in are implemented in the next release. Let the cost for satis-fying the customer be . A subset of customers can be viewed as a solution. To facilitate the following discussion, we also formulate a solution as a set of ordered pairs, i.e., the solution is denoted as . It is easy to convert the form of or into each other. Let the cost of a solution be and the objec-tive function of (i.e., the profit of ) be . Definition 1.

The next release problem (NRP).

Given a directed acyclic requirements dependency graph , each customer directly requests a set of requirements . The profit of is and the cost of requirement is . A predefined budget bound is . The goal of the NRP is to find an optimal solution , to maximize , subject to . For an NRP instance, the scale is . To simplify the statements, all the values of an NRP instance are inte-gers except especial specifications. For a real-world appli-cation, it is easy to convert a non-integer NRP instance into an integer-only instance by magnifying the same multiple for all the values. An NRP instance with 7 customers and 8 requirements is illustrated as follows. The requirements are extracted from a communication company project, which is intro-duced in [50]. Table 1 shows the description of these 8 requirements and the dependencies. In Fig. 1, we present the dependency graph and the requirements requested by customers, where the arrows from top to bottom indicate the dependencies and the lines indicate customer requests. For the requirements set , let the costs of these requirements be , re-spectively; for the customer set , let the profits of these customers be . Taken as an example, the total requirements requested by are ; the cost for satisfying the re-quirements is ; and the profit of is . Given a predefined budget bound , the profit and the cost of a feasible solution are 14 and 20, respectively. Similarly, the profit and the cost of are 15 and 25. Obviously, is a better solution than . However, is unfeasi-ble since its cost 29 exceeds the bound . From the definition of the NRP, the requirements requested by a customer are calculated from the de-pendency graph of requirements [4]. If we directly input the requirements requests for each customer, Definition 1 can be simplified [5], [69]. Definition 2.

The Simplified NRP.

Given a set of requirements and a set of customers , each requirement ( ) has a cost and each customer ( ) has a profit . A request shows whether a customer requests a requirement in the next release, i.e., denotes that requests or denotes not. Given a solution , the requirements for is . A prede-fined budget bound is . The goal of the NRP is to find an optimal solution , to maximize , subject to . Based on the definitions, each NRP instance can be di-rectly converted into a Simplified NRP instance. The de-pendencies among requirements are included in the re-quirements requests . To simplify the following state-ments, a Simplified NRP instance is called an NRP in-stance for short. We denote an NRP instance as . TABLE R EQUIREMENTS AND D EPENDENCIES

Requirement Description Cost Arc

Expanding memory on BTS controller 2

BTS variant 5

Market entry feature 1 4

Market entry feature 2 3

Market entry feature 3 8

Next generation BTS 1

Pole mount packaging 5

Software quality initiative 2

Fig. 1. Requirements dependencies and customer requests. r r r r r r r r s s s s s s s RequirementCustomer

IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL.

3 B

ACKBONE B ASED I NSTANCE R EDUCTION

In this paper, for the large scale NRP, our basic idea is to reduce the scale of an NRP instance to get a small in-stance, which is easy to solve. In this section, we will pre-sent the backbone based instance reduction and the sub-stitutions of the backbone, namely the approximate back-bone and the soft backbone.

The backbone is a useful tool for algorithm design in con-straint solving and combinatorial optimization [33]. In an algorithm, the backbone is viewed as an ideal structure to model the common characteristics of the optimal solu-tions [30]. On one hand, if the backbone is ideally ob-tained, the optimal solutions to an instance can be partly constructed; on the other hand, it is usually intractable to obtain the backbone within polynomial time [58]. In prac-tice, the approximate backbone is usually employed in-stead. Backbone based algorithms have been shown effec-tive on some classic problems, such as the Maximum SATisfiability (Max-SAT) [64], the Travelling Salesman Problem (TSP) [33], and the Quadratic Assignment Prob-lem (QAP) [30]. If we consider searching for a solution as finding a key part in a physical body, the backbone can be informally viewed as the common part of the global op-timal solutions. A global optimal solution (called an optimal solution for short in the rest of this paper) is defined as the best solution in the whole search space while a local opti-mal solution is the best solution in a local part of the search space with respect to a given algorithm [45]. We define the NRP backbone in Definition 3.

Definition 3.

The NRP backbone.

Given an NRP instance , let be the set of all the optimal solutions to . The backbone of is defined as . The backbone scale of is . Based on Definition 3, the NRP backbone contains the common characteristics of the optimal solutions. Given an NRP instance, we can reduce the instance scale by fixing its backbone. Definition 4.

The NRP instance reduction.

Given an NRP instance and its backbone , an in-stance reduction is a process to generate a new and small scale instance , which is easy to solve. Meanwhile, the backbone and a solution to the new instance can be used to form the solution to the original instance. A new instance can be constructed by removing the customers and the requirements of the backbone from the original instance. We list the parameter values for the variables of the new instance in Table 2. For an instance , the customers of its backbone is while the require-ments selected in is . The request set indicates the requirements requests of . In the new instance , the customers in and the requirements selected by is not helpful yet. Thus, , , and in are gener-ated by removing the elements in , , and , respectively. To build the budget bound for , we re-move the cost of , i.e., . The profits and the costs are the subsets related to and . We can vali-date that each optimal solution to can be constructed from an optimal solution to and the backbone . Ac- cording to Table 2, each parameter value of can be ob-tained within polynomial time. Based on these parameter values, the new instance can be uniquely determined, denoted as - . However, the NRP backbone is obtained from the opti-mal solutions, which cannot be obtained within polynomi-al time. Thus, there exists no polynomial time algorithm to find the NRP backbone. In this paper, we use the approx-imate backbone and the soft backbone to replace the backbone. The approximate backbone is the set of common cus-tomers of a given number of local optimal solutions while the soft backbone is the set of optimal customers with no cost for the given instance. In Fig. 2, we summarize the relationship among the backbone, the approximate back-bone, and the soft backbone. The approximate backbone is generated as the common part of a group of local opti-mal solutions; the soft backbone is directly extracted from the current instance. The union of the approximate back-bone and the soft backbone is called the combined backbone . We employ the combined backbone to build the near-optimal solutions for the NRP. In Section 3.2 and Section 3.3, we will present more details about the approximate backbone and the soft backbone, respectively. Since no polynomial time algorithm exists to exactly obtain the NRP backbone, we follow the existing work to replace the backbone with the approximate backbone, which is generated from a set of local optimal solutions [64]. In this section, we will present the relationship between the op-timal solutions and local optimal solutions using the fit-ness landscape analysis. We show that the approximate

TABLE

ARAMETER V ALUES FOR THE I NSTANCE R EDUCTION

Instance

Backbone

New instance

Fig. 2. The relationship among the backbone, the approximate back-bone, and the soft backbone.

An NRP Instance Optimal SolutionsLocal Optimal Solutions BackboneApproximate Backbone Soft Backbone

Hard to solve Common part Part withzero costCustomers with zero costLocal search Common partApproximation

Combined Backbone

To build near-optimal solutions

UTHOR: TITLE 5 backbone can partly reflect the characteristics of the back-bone. Compared to the concept of backbone, an approximate backbone of an NRP instance is defined as the common part of local optimal solutions. In real-world applications, a local optimal solution is generated within polynomial time by a local search algorithm, which is also called as a local search operator when it is incorporated into another algorithm [45]. We give Definition 5 to describe the ap-proximate backbone.

Definition 5.

The NRP approximate backbone.

Given an NRP instance , let be a set of local optimal solutions to . The approximate backbone of is defined as - . We employ the fitness landscape analysis [44] to inves-tigate the relationship between the backbone and the ap-proximate backbone. The fitness landscape analysis is an important technology to understand the behavior of com-binatorial optimization algorithms [44]. For large scale optimization, the fitness degree and the solution distribu-tion in the fitness landscape are measured to guide the design of algorithms [64]. To analyze the fitness land-scape between the backbone and the approximate back-bone, we evaluate the differences between the optimal solutions and local optimal solutions by the distances of these two kinds of solutions. The distance is usually measured as Hamming distance [44], [54]. For an NRP instance with scale , the Hamming distance between a solution and an optimal solution is . Thus, the normalized Hamming distance is - . To evaluate the differ-ence of profits between these two solutions, we define the normalized profit difference as - . In practice, the optimal solutions to large scale instances are hard to obtain. Therefore, we follow the existing fitness landscape analysis approaches to replace the optimal solutions with the best known solu-tions [44], [54]. Note that the measure criterion of the profit difference in our work is a little different from that in some existing work on the fitness landscape (e.g., [42], [43]). In our work, we use the normalized profit differ-ences between solutions to evaluate the relationship be-tween the backbone and the approximate backbone while the existing work uses the fitness values of solutions to evaluate the fitness degree. In Fig. 3, we present the fitness landscape of five clas-sic NRP instances. The scales of these instances are 100, 500, 500, 750, and 1000, respectively (see Section 5.2 for the details of these instances). In the fitness landscape analysis, we employ four algorithms to indicate the simi-larity between local optimal solutions and the optimal solutions, including Randomized Search (RS), First Found Hill Climbing (FFHC), Sampled Hill Climbing (SHC), and Simulated Annealing (SA). RS is a randomized algorithm, which randomly generates a solution. Due to the budget bound of the NRP, a solution may be infeasible; RS re-pairs these infeasible solutions by randomly removing a couple of selected customers. FFHC and SHC are two kinds of hill climbing algorithms proposed in [4]. As their names suggest, FFHC updates its solution with the first improved solution while SHC updates its solution with the best solution among a certain number of sampled so- To obtain a best known solution of an NRP instance, RS (in Section 3.2) and BMA (in Section 4.2) have been performed repeatedly ( times for RS and 200 times for BMA, respectively) and the best solution is selected. (a) Instance nrp-1-0.5 with 100 customers (b) Instance nrp-2-0.5 with 500 customers (c) Instance nrp-3-0.5 with 500 customers (d) Instance nrp-4-0.5 with 750 customers (e) Instance nrp-5-0.5 with 1000 customers

Fig. 3. Landscape of five classic NRP instances with four algorithms. For each instance in a sub-figure, the x-axis is the normalized Ham-ming distance from a local optimal solution to the optimal solution and the y-axis is the normalized profit difference of these two solutions. In each sub-figure, the point denotes the optimal solution. A solution in the bottom-left corner is more similar to the optimal solution than that in the top-right corner. N o r m a li z e d p r o f i t d i ff e r e n ce Normalized Hamming distance RS FFHCSHC SA N o r m a li z e d p r o f i t d i ff e r e n ce Normalized Hamming distance RS FFHCSHC SA N o r m a li z e d p r o f i t d i ff e r e n ce Normalized Hamming distance

RSFFHCSHCSA N o r m a li z e d p r o f i t d i ff e r e n ce Normalized Hamming distance

RSFFHCSHC SA N o r m a li z e d p r o f i t d i ff e r e n ce Normalized Hamming distance

RSFFHC

SHCSA

IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. lutions. SA (also called LMSA in [4]) is an extension of a non-linear simulated annealing algorithm, which com-bines the hill climbing with an acceptance temperature [4], [5]. Among these four algorithms, SA usually obtains the best solutions for the NRP [4]. In the experiments, each of these four algorithms is independently run for 100 times and each fitness landscape of an algorithm consists of 100 local optimal solutions. As shown in Fig. 3, an algorithm with small distances between solutions, can provide small profit differences. For FFHC, SHC, or SA, the distances between solutions are from 0.2 to 0.6 of the instance scale while the profit differences between solutions are from 0.2 to 0.7 of the profits of the best known solutions. In general, when measuring the distances for these five instances, SA is better than SHC and SHC is better than FFHC. On most of the five instances, SA is the best algorithm, which leads to both small distances and small profit differences. An ex-ception instance is nrp-1-0.5 , the one with the smallest scale. On nrp-1-0.5 , the landscape of SHC covers the landscape of SA. This fact is primarily due to the small scale of nrp-1-0.5 , i.e., both SA and SHC can generate good solutions. Moreover, for all the instances, the dis-tances by SA are less than 0.45 of scales. Compared with FFHC and SHC, both the solution distances and profit differences by SA are stable. The other algorithm, RS, on-ly provides large distances around 0.6 of instance scales. The fitness landscape analysis indicates that there is an overlap between local optimal solutions and the optimal solutions. Thus, local optimal solutions can partly repre-sent the characteristics of the optimal solutions. Mean-while, among these four algorithms, an algorithm with high performance leads to small solution distances. Thus, a local optimal solution by a high-performance algorithm can do well in showing the characteristics of the optimal solutions. In summary of the fitness landscape analysis, the NRP backbone can be replaced by the approximate backbone, which is the intersection of local optimal solu-tions obtained by a good local search algorithm. The instance reduction in Definition 4 can be applied to the approximate backbone. Since local optimal solutions are obtained in polynomial time, the approximate back-bone based instance reduction can be conducted within polynomial time.

In this section, apart from the approximate backbone, we propose the soft backbone to augment the application of the backbone. Given an approximate backbone, we can generate a new and small instance after the instance reduction. The new instance can also be viewed as an NRP instance, which can be solved by an existing algorithm. However, there is one key difference between the new instance and the original one. Given an original NRP instance , for each customer , , i.e., requests one or more requirements to be implemented in the next release. Thus, the cost of requirements requested by each custom-er is more than zero. However, for the new NRP instance after an instance reduction, it is possible to find a cus- tomer such that . In other words, there may exist a customer, whose requirements have been wholly reduced in the instance reduction. For the goal of the NRP, we add this kind of customer to the solution to maximize the profits. From the perspective of the problem solving, the customers, who provide profits with zero cost, can also be approximately considered as the common part of the optimal solutions. We define the soft backbone as such customers, who provide profits and request no requirements. In contrast to obtaining the approximate backbone from solutions, the soft backbone is a new kind of backbone obtained from instances. Definition 6.

The NRP soft backbone.

Given an NRP instance after the instance reduction, the soft backbone is defined as - ( denotes an empty set and denotes the requirements set of ). The instance reduction in Definition 4 can be directly applied to the soft backbone. Thus, for an NRP instance, two instance reductions are employed to reduce the scale of the NRP instance, based on the approximate backbone and the soft backbone, respectively. In Fig. 4, we take the instance in Fig. 1 as an example to illustrate the instance reductions. Given the approximate backbone , the three requirements, , , and , requested by the customer are selected by . Then the approximate backbone based instance reduction is built and the three requirements and the customer are removed. For the new instance, no requirement is re-quested by the customer . Since provides a profit without any cost of requirements, the soft backbone of the new instance is . Based on , a soft backbone based instance reduction is built and the customer can be removed after this instance reduction. In summary, 2 customers and 3 requirements are removed based on the-se two instance reductions. We present four differences between the approximate backbone and the soft backbone in Table 3. First, from the definition, the approximate backbone is the intersection of a given number of local optimal solutions while the soft backbone is directly extracted from the instance. No local search algorithm is needed for obtaining the soft back-bone. Second, based on the first difference, the soft back- Fig. 4. An example of the approximate backbone and the soft back-bone based instance reductions. r r r r r r r r s s s s s s s RequirementCustomer

No requirement is requested.

Approximate backbone based instance reduction Soft backbonebased instance reduction

UTHOR: TITLE 7 bone only exists in the instance generated after an in-stance reduction. Since an original NRP instance does not include the customers who request no requirements, the original instance cannot provide any soft backbone. The soft backbone is a by-product of the instance reduction. In other words, the instance reduction provides an applica-tion scenario for the soft backbone. However, the approx-imate backbone can be generated for all the instances, which have feasible solutions. Third, given a new in-stance after the instance reduction, the approximate back-bone is a kind of approximation of the backbone while the soft backbone is a part of the backbone of this new in-stance. Based on the definition, the soft backbone can be added to any feasible solution to improve the profit of this solution. Since the optimal solutions have the maxi-mum profit, each optimal solution of the given instance must include the soft backbone. Fourth, only when a cus-tomer is selected, this customer may appear in the soft backbone while both selected customers and unselected customers may appear in the approximate backbone.

4 B

ACKBONE B ASED M ULTILEVEL A LGORITHM

To address the large scale NRP, we tend to reduce the scale of the NRP instances by fixing the backbone in order to solve the problem with the existing search based algo-rithms. First, we will show that the multilevel strategy can be employed to iteratively reduce the instance scale. Then we will propose the framework of BMA and illus-trate the process of BMA with an example.

From Section 3.3, we can reduce the scale of an NRP in-stance using two instance reductions, based on the ap-proximate backbone and the soft backbone, respectively. However, for a large scale instance, the instance after two instance reductions may be still hard to solve with the existing algorithms. Thus, we consider using the multi-level strategy to perform the instance reductions step by step. A multilevel strategy is to convert the original problem into multiple levels of sub-problems, each of which is an independent problem [60]. In combinatorial optimization, a multilevel strategy includes two kernel phases, namely reduction (reducing the hardness of the problem) and refinement (constructing the solution to the original prob-lem) [60]. In our work, since a new generated instance after one instance reduction can be viewed as a new NRP instance, we use the multilevel strategy to iteratively re-duce the scale of an instance, i.e., the original NPR in-stance is handled with multiple instance reductions and the final solution to the instance is then constructed from the approximate backbones and the soft backbones. In this paper, the approximate backbone and the soft back-bone based instance reductions are alternatively used. More specifically, given an instance, we always conduct a soft backbone based instance reduction after an approxi-mate backbone based instance reduction. We call these two instance reductions (based on the approximate back-bone and the soft backbone) a pair of instance reductions for simplicity. In Fig. 5, we present the experimental result of the rela-tionship between the pairs of instance reductions and the scales of instances. The instances in Fig. 5 are the same as those in Fig. 3, except the instance nrp-1-0.5 ( nrp-1-0.5 is omitted due to its small scale, 100). In this experiment, each approximate backbone is calculated from 5 local op-timal solutions, which are obtained by the classic algo-rithm, SA (see Section 3.2 for details). For each instance, 12 pairs of instances reductions are sequentially used to obtain new and small instances. As shown in Fig. 5, although a single pair of instance reductions can reduce the scales of instances, it is feasible to employ further reductions when utilizing the multi-level strategy. For example, in the instance nrp-3-0.5 with the scale 500, the scales of two new instances after one pair of instance reductions are 439 and 432, respectively. TABLE

IFFERENCES BETWEEN THE A PPROXIMATE B ACKBONE AND THE S OFT B ACKBONE

Approximate backbone Soft backbone

Source From local optimal solu-tions Directly from the in-stance Existence Existing for each instance with feasible solutions Only existing in the new instance after an instance reduction Approximation Approximation of the backbone Part of the backbone for the current instance Component Including both the ordered pairs as and

Only including the ordered pairs as

Fig. 5. The instance scales with 12 instance reductions for the ap-proximate backbone and the soft backbone. The scales of the four instances are 500, 500, 750, and 1000. The x-axis shows the pair of instance reductions based on two kinds of backbones and the y-axis shows the change of instance scales. There are two kinds of points in each curve. A solid point denotes an instance reduction based on the soft backbone while the other kind of point denotes an instance reduction based on the approximate backbone. S c a l e o f i n s t an c e s % Pair of instance reductions based on two kinds of backbones

0 1 2 3 4 5 6 7 8 9 10 11 12 nrp-2-0.5nrp-3-0.5nrp-4-0.5nrp-5-0.5

IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL.

For all the four instances, less than 25% of the instance scales are removed after one pair of instance reductions; the scales of new instances are still too large for the solv-ing algorithm. Rather than a single pair of instance reduc-tions, multiple pairs can sufficiently reduce the instance scale. The instance scale gradually decreases while the number of instance reductions increases. The curves in Fig. 5 indicate that nearly all the instance reductions can reduce the scales of instances. After 12 pairs of instance reductions, only less than 15% of the scale for each in-stance is left, e.g., in nrp-3-0.5 , only 19 customers are left after these multiple instance reductions. Moreover, Fig. 5 shows that 10 to 12 pairs of instance reductions can pro-vide reasonable shrinkage for the scale of each instance. To show the ability of the multilevel strategy for the instance reduction, we summarize the values of reduced scales in Table 4. For each instance, apart from the origi-nal instance scale, we show the scales reduced by all the approximate backbones, the scales reduced by all the soft backbones, and the sum of all the scales reduced by in-stances reductions. For example, with 12 pairs of instance reductions for nrp-3-0.5 , 88.6% of the scale is reduced by the approximate backbone while 7.6% of the scale is re-duced by the soft backbone. For the four instances in Ta-ble 4, the scale reduced by fixing the approximate back-bone is larger than that by fixing the soft backbone. The scale reduced by fixing the soft backbone is between 7% and 33% while the one by fixing the approximate back-bone is between 60% and 89%. Especially, in nrp-2-0.5 , the left scale is 0.4%, i.e., 2 customers. Nearly the whole instance scale of nrp-2-0.5 is reduced in the multiple in-stance reductions. Based on the analysis above, we conclude that multiple instance reductions can effectively reduce instance scales. Both of the two kinds of backbones work well in the in-stance reductions. The approximate backbone based in-stance reduction can reduce a large part of the instance scale while the soft backbone based instance reduction can enhance the reduction by the approximate backbone. Thus, we employ this multilevel strategy to design our algorithm, BMA.

In Algorithm 1, we present the details of our algorithm, BMA. The framework of BMA contains 3 phases: reduc-tion, solving, and refinement. In the reduction phase, the algorithm reduces the scale of an NRP instance by fixing the approximate backbone and the soft backbone. The approximate backbone is gen-erated as the intersection of a certain number of local op-timal solutions, which are obtained by a specified local search operator; the soft backbone is generated from an NRP instance after the instance reduction. In the solving phase, the local search operator in the reduction phase is employed to approximately solve the final small instance. In the refinement phase, the algorithm combines the ap-proximate backbone, the soft backbone, and the current solution to the reduced instance together to construct a solution to the original instance. Either the reduction phase or the refinement phase is an iterative procedure, which reduces the instances or extends the solutions us-ing a multilevel strategy. The actual number of levels in BMA depends on two input parameters, namely the max-imum number of levels and the minimum scale of in-stances. Moreover, the other input parameter of BMA is the number of local optimal solutions in each level of the reduction phase. This parameter constrains the scale and the quality of the backbone. In Section 5.4.1, we will pre-sent an experiment on this parameter, i.e., the number of local optimal solutions. In Fig. 6, we illustrate the process of BMA with the in-stance presented in Fig. 3. For this instance, the algorithm employs two-level reductions and refinements. In the first level reduction (Fig. 6(a)), the local search operator ob-tains a set of 3 local optimal solutions to the instance . Thus, the first level approximate back-bone is , i.e., the customer is selected while the customer is not. Since the requirements for the customer are all satisfied in the release, all these requirements for can be reduced. By fixing the approx-imate backbone , a new instance with 5 customers and 5 requirements is generated after the instance reduc-

TABLE

OTAL S CALES R EDUCED BY I NSTANCE R EDUCTIONS

Instance name nrp-2-0.5 nrp-3-0.5 nrp-4-0.5 nrp-5-0.5

Original scale 500 500 750 1000 Scale reduced by approximate backbone% 88.4 88.6 79.9 60.8 Scale reduced by soft backbone% 11.2 7.6 7.2 32.4 Sum of scale reduced% 99.6 96.2 87.1 93.2

Algorithm 1. Backbone based Multilevel Algorithm Input : instance , local search operator , maximum number of levels, minimum scale of instances, number of local optimal solutions

Output : solution

Phase I.

Reduction for to do if then obtain a set of local optimal solutions by to ; calculate - ; reduce instance, - ; calculate - ; reduce instance, - ; endif endfor count the actual number of levels in Phase I; Phase II.

Solving obtain a local optimal solution to by ; Phase III.

Refinement

12 13 14 for to do refine solution ; endfor UTHOR: TITLE 9 tion. Then no requirement is left for the customer . Thus, the soft backbone is generated as and the in-stance is further reduced. Similarly, in the second level reduction (Fig. 6(b)), a set of 3 local op-timal solutions is obtained for the instance with 4 cus-tomers and 5 requirements. Thus, the second level ap-proximate backbone is . By fixing , a new instance with 3 customers and 2 requirements is gen-erated. Then no requirement is left for the customer . Thus, the soft backbone is generated as and a new instance is generated as well. For the local search operator, the instance with 2 customers and 2 require-ments is easy to solve (Fig. 6(c)). The solution is . At last, under the inverted sequence of re-ductions, the algorithm combines the current solution, the approximate backbones, and the soft backbones together to construct the solution for each level (Fig. 6(d)). The final solution to the original instance is formed within two-level reductions and refinements.

5 E

XPERIMENTS AND R ESULTS

For approximate algorithms, experimentation is a com-mon way to evaluate the performance of algorithms. In this section, we evaluate our algorithm on 39 NRP in-stances. We first give the research questions in our exper-iments; then, we describe the instance generation rules of the classic NRP instances; next, we present the new in-stance generation method by mining open bug reposito-ries; finally, we answer the research questions based on the experimental results.

We experimentally evaluate the performance of BMA for the NRP. For all the experiments in this paper, the algo-rithms are implemented with C++ and run on a PC with

Intel Core uBuntu

OS (

Linux ker-nel 2.6). We design the experiments to answer the follow-ing Research Questions (RQs):

RQ1: Parameter configuration for BMA . In the framework of BMA, each approximate backbone is gener-ated based on a given number of local optimal solutions. The scale and the quality of the backbone (the combina-tion of the approximate backbone and the soft backbone) may depend on the number of local optimal solutions, which is set manually for BMA. How does the number of local optimal solutions in BMA affect the backbone?

RQ2: Performance evaluation . In requirements engi-neering, some existing algorithms have been proposed to solve the NRP. We want to compare the solution quality of BMA with other algorithms. Can BMA perform well on the large scale NRP instances? In Section 5.2 and Section 5.3, we will give the details of the NPR instances in our experiments. The NRP in-stances in this paper can be found in http://ssdut.dlut.edu.cn/oscar/nrp/ . As requirements are usually private data of software companies, no open large NRP instances can be found in the literature. In this paper, we evaluate our algorithm on two sets of the NRP instances. One set includes 15 in-stances generated under certain constraints based on the classic literature of the NRP experiments [4]; the other set includes 24 realistic instances mined from open bug re-positories of three open source software projects. The classic set of the NRP instances consists of 5 groups and each group includes 3 instances. In each group, instances have distinct budget bounds, each of which equals to the cost ratio (0.3, 0.5, or 0.7, respectively) multiplied by the sum of all the costs. Table 5 shows the details of the 5 groups of instances. According to [4], these instances are based on Definition 1. Taken the group nrp-1 as an example, all the requirements are classified into 3 levels separated by the symbol “/”. A requirement in the 2nd level may depend on some requirements in the 1st level while a requirement in the 3rd level may depend on some requirements in the 1st and 2nd levels. An instance name is formed by the group name and the cost ratio. For example, nrp-1-0.3 is an instance in the group nrp-1 and the cost ratio is 0.3. The details of the instance nrp-1-0.3 are as follows. There are 3 levels of requirements, 20, 40, (a) The 1st level reduction (b) The 2nd level reduction (c) Solving the small instance (d) Two levels of refinements

Fig. 6. Illustration on an instance with 7 customers and 8 require-ments under two-level BMA. Sub-figures (a) and (b) show each level in the two-level reduction phase; (c) shows the solving phase; and (d) shows the two-level refinement phase. s s s s s s s r r r r r r r r instance Π = Instance-Reduction Π ,ξ ξ = Approximate-Backbone Π ,Γ = { 1,1 ,(4,0)} s s s s s s s r r r r r r r r a set of local optimal solutions Γ X = 1,1 , 2,0 , 3,1 , 4,0 , 5,0 , 6,0 , 7,1 X = 1,1 , 2,1 , 3,0 , 4,0 , 5,1 , 6,1 , 7,0 X = 1,1 , 2,1 , 3,0 , 4,0 , 5,0 , 6,1 , 7,1 original instance Π ξ = Soft-Backbone Π ,ϕ = { 2,1 } s s s s s s s r r r r r r r r after the 1 st level reduction instance Π = Instance-Reduction Π , ξ instance Π = Instance-Reduction Π , ξ ξ = Approximate-Backbone Π ,Γ = { 6,1 } ξ = Soft-Backbone Π ,ϕ = { 7,1 } s s s s s s s r r r r r r r r a set of local optimal solutions Γ X = 3,1 , 5,0 , 6,1 , 7,1 X = 3,0 , 5,1 , 6,1 , 7,1 X = 3,0 , 5,1 , 6,1 , 7,0 X = 3,1 , 5,0 a local optimal solutions X after the 2 nd level reduction instance Π = Instance-Reduction Π , ξ s s s s s s s r r r r r r r r the 2 nd level refinement the 1 st level refinement X = 1,1 , 2,1 , 3,1 , 4,0 , 5,0 , 6,1 ,(7,1) X = 3,1 , 5,0 ξ = { 6,1 } ξ = { 7,1 } X = 3,1 , 5,0 , 6,1 ,(7,1) ξ = { 1,1 ,(4,0)} ξ = { 2,1 }

0 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. and 80 requirements in each level. The costs of require-ments in the three levels are from 1 to 5, from 2 to 8, and from 5 to 10, respectively. A requirement in the 1st level has at most 8 child requirements. Similarly, a requirement in the 2nd level has at most 2 child requirements. There are 100 customers, each of who requests 1 to 5 require-ments. In addition, each customer provides a profit be-tween 10 and 50.

Besides the classic instances, we extract a set of NRP in-stances from open source bug repositories. To face the lack of large scale open requirements repositories, the requirements data can be mined from other databases. To our knowledge, only one requirements repository is mined for experiments, i.e., the requirements database mined from an open source forum project by Duan et al. [14]. In their paper, requests or problems in the forum project are mapped to the requests in a requirements re-pository to evaluate their requirements prioritization and triage approach. In our work, to build large NRP instances, we mine the NRP instances from bug repositories (also called bug track-ing systems , e.g., a popular bug repository,

Bugzilla [7]). A bug repository is a database for the storage of numerous bug reports , each of which is submitted by a user (maybe a developer, a tester, or an end user) for recording the de-tails of suggestions or problems. One bug report may be commented by one or more users; meanwhile, one user may make comments on one or more bug reports. The user comments on the bug reports provide a similar scenar-io for the requirements requests in requirements reposito-ries. For example, if two users make comments on three bug reports in a bug repository, we can extract a software release, in which two customers request three require-ments in the requirements analysis. Thus in our experi-ments, a bug report and a user in bug repositories are mapped to a requirement and a customer in the NRP, respectively. In addition, a user comment on a bug report is mapped to a requirement request; the severity of a bug report is mapped to the cost of a requirement. Similar to the classic set of NRP instances, the profit of a customer is randomly generated within a certain range. We present the corresponding relationship between bug repositories and the NRP in Table 6. To mine the NRP instances, we employ the bug reposi-tories in three open source software projects, namely

Eclipse (a Java integrated development environment) [15],

Mozilla (a set of web applications) [47], and

Gnome (a desktop project) [19]. The XML form of these bug reposi-tories can be found in

Mining Challenges and of IEEE Working Conference on Mining Software Repositories (MSR) [46]. To generate various instances, we set different parameters for bug repositories. In each group of instanc-

TABLE

ETAILS OF THE R EALISTIC

NRP I NSTANCE G ROUPS

Instance group name nrp-e1 nrp-e2 nrp-e3 nrp-e4 nrp-m1 nrp-m2 nrp-m3 nrp-m4 nrp-g1 nrp-g2 nrp-g3 nrp-g4

Source repository Eclipse Mozilla Gnome Bug report ID 150001~160000 160001~170000 200001~210000 210001~220000 450001~460000 460001~470000 Bug reports time period

Jul. 2006~Oct. 2006 Oct. 2006~Jan. 2007 Mar. 2003~Jun. 2003 Jun. 2003~Sept. 2003 Jun. 2007~Jul. 2007 Jul. 2007~Aug. 2007

TABLE

ORRESPONDING R ELATIONSHIP FOR THE I TEMS BETWEEN B UG R EPOSITORIES AND THE

NRP

Item in the NRP Item in a bug repository

Requirements,

Bug reports Customers,

Users for the bug reports Requests,

User comments on bug reports Costs,

The severity of the bug reports Profits,

Random values generated within a certain range

TABLE

ENERATION R ULES OF THE C LASSIC

NRP I NSTANCE G ROUPS

Instance group name nrp-1 nrp-2 nrp-3 nrp-4 nrp-5

UTHOR: TITLE 11 es, first, we select 10000 continuous bug reports from a bug repository. The time period of the selected bug re-ports is around the software release time since the bug reports in this period are usually active [2]. Then, we filter out the bug reports and users (i.e., the requirements and customers in Table 6) out of a specified range by limiting the number of user comments (i.e., the requests in Table 6). As a result, the characteristics of a group can be gener-ated. In Table 7, we show the details of 12 groups of in-stances extracted for experiments. The form of instances is based on Definition 2. Each group of instances consists of two instances, with the cost ratio 0.3 or 0.5, respectively. Therefore, the budget bound of each instance equals to the value of the sum of costs multiplied by the cost ratio. Thus, 24 realistic instances are mined for the following experiments.

In this section, we will answer the research questions proposed in Section 5.1. We evaluate our algorithm BMA on the 39 NRP instances mentioned in Sections 5.2 and 5.3.

For the three input parameters of BMA, including the maximum number of levels, the minimum scale of instances, and the number of local optimal solutions, the parameters and can be viewed as the termination conditions of BMA. However, the parameter is a key value to decide the scale of the backbone. We experimen-tally evaluate the relationship among the number of local optimal solutions, the scale of the backbone, and the qual-ity of the backbone. In the framework of BMA, any algorithm can be em-bedded as a local search operator . To compare the ex-perimental results, we employ the existing best local search algorithm, SA, to obtain local optimal solutions [7]. The solid empirical results and simplicity of SA have led to a wide range of applications in combinatorial optimiza-tion. In the experiments in this paper, we set the parame-ters for SA according to [4], i.e., the starting temperature is set to 100 and the non-linear ratio is set to . In Fig. 7, we present the experimental results to visual-ize the relationship among the parameter , the scale of the backbone, and the quality of the backbone. To simpli-fy the visualization, the backbone in Fig. 7 is a combined backbone (see Fig. 2), which is the combination of the ap-proximate backbones and the soft backbones in all the lev-els of BMA. For a -level BMA, given the approximate backbone and the soft backbone in the th level, we define the combined backbone as . We evaluate the scale and the quality of the combined backbone with two criteria, namely the ratio of the com-bined backbone scale and the ratio of optimal customers in the combined backbone. Given an NRP instance with the scale , the ratio of the combined backbone scale in the original instance is ; given the best known solution , the ratio of optimal customers in the combined back-bone is . In this experiment, we set and . Each point is calculated as an average of the results from ten independent runs. In Fig. 7, the scale of the combined backbone decreases and the ratio of optimal customers increases while the number of local optimal solutions increases. Four of the curves in this experiment present the same trend when varying the number of local optimal solutions. The curve of the instance nrp-1-0.5 does not correspond with the curves of other instances since nrp-1-0.5 is a small in-stance, which is much easier to solve than the other four instances. Based on each value of , the instance scale of nrp-1-0.5 can be easily reduced. For all the five instances, when each approximate backbone in a level is generated by 2 local optimal solutions, the scale of the combined backbone is nearly the same as the instance scale and the number of optimal customers is less than 0.8 of the scale of the combined backbone; on the other hand, when each approximate backbone is generated by 10 local optimal solutions, the scale of the combined backbone is less than 0.4 of the instance scale for 4 out of 5 instances and the number of optimal customers is more than 0.9 of the scale of the combined backbone for all the 5 instances. From Fig. 7, we consider that 4 to 6 local optimal solutions for each approximate backbone is a good choice for the trade-off (a) Scale of the combined backbone (b) Optimal customers in the combined backbone Fig. 7. Relationship between the number of local optimal solutions, the scale of the combined backbone, and the ratio of optimal custom-ers in the combined backbone. R a t i o o f op t i m a l c u s t o m e r s i n t he c o m b i ned ba ck bone Number of local optimal solutions in each level nrp-1-0.5nrp-2-0.5nrp-3-0.5nrp-4-0.5nrp-5-0.5 R a t i o o f c o m b i ned ba ck bone sc a l e i n t he o r i g i na l i n s t an c e Number of local optimal solutions in each levelnrp-1-0.5nrp-2-0.5nrp-3-0.5nrp-4-0.5nrp-5-0.5

2 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. between the scale of the combined backbone and the number of optimal customers. Based on this part, the answer to RQ1 is that the value of the input parameter , i.e., the number of local optimal solutions, can affect the scale and the quality of the back-bone of BMA. The large scale backbone and the high quality backbone cannot be obtained simultaneously while tuning the value of . Thus, for the following exper-iments, we choose a trade-off value 5 for , which can balance the scale of the combined backbone and the num-ber of optimal customers obtained by BMA. For other parameters and in BMA, we choose parameter values as follows. In Fig. 5, we have studied the influence of the change of instance scales by tuning the pair of instance reductions. We set since over 10 pairs of instance reductions may significantly reduce the instance scale. For the parameter of the minimum scale of an instance, we manually set , since an instance with the scale less than 20 could be easy to solve [13].

To evaluate the performance of BMA, we employ two direct solving algorithms for comparison. One algorithm is a Multi-Start strategy based SA (MSSA) [39]. In MSSA, the existing best local search algorithm, SA ([4], [5]) is run independently for multiple times and the best solution among these runs is chosen as the final solution [45]. The other algorithm is Genetic Algorithm (GA), which is a bio-inspired and population-based technology for com-plex problems, also for the NRP [69], [13]. Among many variants of GA, we choose the implementation described in [13]. This implementation uses an elitism based selec-tion strategy to construct the population and updates new population with crossover and mutation operators. We show the experimental results for the comparison among MSSA, GA and BMA on the NRP instances. SA is employed as a local search operator in both MSSA and BMA; the input parameters of SA are the same as those in Section 5.4.1. We set the parameters of MSSA and BMA as follows. In MSSA, we repeat SA for 30 times and choose the best solution; in BMA, we set the parameters accord-ing to Section 5.4.1, i.e., , , and . To our knowledge, there is no prior parameter value of GA for the large scale NRP. Thus, we tune the parameters for GA. To this end, we configure the parameters for GA with an open access tuning tool, ParamILS [26], which employs an off-line local search framework for automati-cally tuning parameters. In

ParamILS , we set the training set as three instances nrp-1-0.5 , nrp-3-0.5 , and nrp-5-0.5 ; we set the test set as the two instances nrp-2-0.5 and nrp-4-0.5 . The cutoff time of ParamILS is set to 5000 seconds. After the parameter tuning by

ParamILS , the values ob-tained are 100 for the population size, 0.3 for the elitism selection ratio, 0.3 for the crossover ratio, and 0.1 for the mutation ratio. Based on the parameters for GA, we set the maximum number of iterations to . We choose such a value for the number of iterations to sufficiently show the solution quality of GA and to balance the run-ning time of three algorithms. We independently run each of the three algorithms (MSSA, GA, and BMA) for 10 times. The results are col-lected to measure the performance and to plot the profit distributions. In Table 8 and Table 9, we show the exper-imental results of MSSA, GA, and BMA on two sets of NRP instances. Each table has five columns, including the details of instances, the results of MSSA, the results of GA, the results of BMA, and the profit distributions. In the first column, the sub-columns are the instance name and the budget bound. The following three columns include sub-columns for the best profit, the average profit, and the average running time. For BMA, the sub-column “

MSSA% ” and “

GA% ” present the rate of average profit in percentage to measure the advantage by BMA against that by MSSA and GA. For example, “

MSSA% ” is calcu-lated as , where and are the average profits obtained by BMA and MSSA, re-spectively. The average profit is used to measure the quality while the best profit is listed as a reference. Since each of the three algorithms is run for 10 times, we show the profit distributions of solutions with box plots [41] for all the algorithms in the last column. In a box plot, we measure the stability of solutions with the range between the first quartile and the third quartile. To normalize prof-its of distinct instances, the point in box plots is calculated as , where is the profit ob-tained by the solution and is the average profit of all the solutions by an algorithm. Thus, the 0% point shows that the profit equals to the average. Note that based on the normalized profit distributions, each profit distribution shows the distribution for only one algorithm on one instance while no comparison is conducted for the absolute values of profits among MSSA, GA, and BMA. In Table 8 for the classic instances, BMA obtains better solutions within less running time than MSSA and GA on most of the instances. Based on the sub-column “

MSSA% ”, the average profits obtained by BMA are 2% to 51% better than those by MSSA on all the instances. Note that on only two instances, the average profits by BMA are less than 10% better than those by MSSA, name-ly nrp-3-0.7 and nrp-5-0.7 . The reason for this result is that the large cost ratio 0.7 makes it easy for MSSA to solve these instances, i.e., the predefined cost is adequate for making the decision. Thus, on these three instances, BMA can do only a little better than MSSA. On the other hand, based on the sub-column “ GA% ”, the average prof-its by BMA are 0% to 68% better than those by GA on all but one instance. The exception instance is nrp-1-0.5 , on which GA can obtain better solutions than BMA. Moreo-ver, on the other two instances in the group nrp-1 (with scale 100), the profits obtained by BMA are very similar to the profits by GA. That is, GA can work well on small scale instances. Among the rates in “

MSSA% ” and “

GA% ”, both the rates less than 1% and the rates more than 60% are provided by “

GA% ”. As a result, we can find that the solutions of GA are in a wider range than those of MSSA. From the profit distributions of MSSA, GA, and BMA, the average profits on most of the instanc-es are surrounded by the ranges of the profits. Moreover, among the last 9 instances (the last 3 groups) in Table 8, BMA can provide the most stable solution distributions

UTHOR: TITLE 13 for 6 instances (i.e., in box plots, for each of these 6 in-stances, the distance between the first quartile and the third quartile is short). In summary of Table 8, the results show that the backbone based instance reduction makes BMA obtain good solutions for the large scale NRP. In Table 9 for the realistic instances, the experimental results are basically similar to those in Table 8. BMA can obtain the best solutions on all the instances. The average profits obtained by BMA are 19% to 35% better than those by MSSA on all the instances while the average profits obtained by BMA are 5% to 21% better than those by GA on all the instances. GA can beat MSSA on all these in-stances. From the sub-column “

GA% ”, the rates on the instances extracted from Gnome (the instance names starting with “ nrp-g ”) are smaller than the instances from Eclipse and Mozilla. In other words, the advantage of BMA is inconspicuous for the instances extracted from Gnome. A reason for this fact is that Gnome provides the simplest instances in our experiments, the instance scales of which are less than 500 (see Table 7 for the instance scales). On the contrary, for the large scale instances ex-tracted from Mozilla, BMA can obtain much better profits than MSSA and GA. On the instances in Table 9, the prof-it distributions are also stable. On 10 instances among all the 24 realistic instances, the results obtained by BMA are the most stable in the three algorithms. From the sub-columns “

MSSA% ” and “

GA% ” in both Table 8 and Table 9, in general, the rate of profits decreases while the cost ratio increases for the instances in each group (i.e., from 0.3 to 0.7 for the classic instances or from 0.3 to 0.5 for the realistic instances). Thus, BMA can obtain much better solutions on most of the instances, which are with small cost ratios. Based on this part, the answer to RQ2 is that BMA can obtain better solutions than the typical algorithms MSSA and GA on the large scale NRP instances. Moreover, the profit distributions provided by BMA are stable for most of the instances. In conclusion of experiments in this section, the results show that BMA can obtain better profits than MSSA and GA within similar time on the large scale NRP instances. From the perspective of algorithm design, the approxi-mate backbone leads to the fast solving for BMA; the soft backbone is helpful in constructing the near-optimal solu-tions to the problem instances; and the multilevel reduc-tions and refinements provide a framework to use the existing algorithms. Based on these characteristics, BMA outperforms the typical algorithms, MSSA and GA, on most of the NRP instances.

6 T

HREATS TO V ALIDITY

Our approach is a search based technology to solve the NRP in requirements engineering. There are three poten-tial threats to validity for our work.

In this paper, only one kind of requirements dependency is given to the NRP model following the existing defini-tions [4], [5], [31]. However, there are some other kinds of dependencies in requirements engineering. For example, Carlshamre et al. [8] list six kinds of requirements de-pendencies and the dependency in our work can be viewed as a “REQUIRES” dependency in their approach; Zhang et al. [65] explore four kinds of requirements de-pendencies to facilitate the requirements reuse and soft-ware design. Since the requirements dependencies in the NRP are formed as input parameters, it is straightforward to add other kinds of requirements dependencies to the model of the NRP. Based on the definition of the Simplified NRP, the model aims to handle the requirements requested by

TABLE

ERFORMANCE FOR

MSSA,

GA,

AND

BMA ON C LASSIC I NSTANCES

Instance MSSA GA BMA

Profit distribution %

Name

Bound

Best

Average

Time Best

Average

Time Best

Average

Time

MSSA%

GA% nrp-1-0.3

257 998 976.5 108.65 nrp-1-0.5

429 1536 1505.2 98.93 nrp-1-0.7

600 2301 2273.6 91.70 nrp-2-0.3 nrp-2-0.5 nrp-2-0.7 nrp-3-0.3 nrp-3-0.5 nrp-3-0.7 nrp-4-0.3 nrp-4-0.5 nrp-4-0.7 nrp-5-0.3 nrp-5-0.5 nrp-5-0.7 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0 +0.5 +1.0 +1.5 +2.0 +2.5 +3.0 +3.5

MSSAGABMA

4 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. customers. As a result, the dependencies can be formed to the requirements requested by each customer. Therefore, our algorithm, BMA, can be extended to solve the NRP with various requirements dependencies.

In this paper, we use the approximate backbone and the soft backbone to replace the backbone to construct our algorithm. The basic principle for using the approximate backbone is based on an empirical study, the fitness land-scape analysis. However, it is not exact when applying the fitness landscape analysis for the relationship between the backbones and the approximate backbones. A theoret-ical analysis can provide much knowledge to the applica-tion of the approximate backbone. To our knowledge, the fitness landscape analysis is a useful empirical technology for approximately exploring the relationship between solutions [44], [35], [54]. This approximation between lo-cal optimal solutions and the optimal solutions can be viewed as a trade-off between theory and algorithm per-formance. In the fitness landscape analysis, we use the best known solutions to replace the optimal solutions. This replacement may bring some perturbation to the analysis results. Since the optimal solutions to large scale instances are always hard to find within polynomial time, we fol-low the existing approaches to choose the most similar substitutions, i.e., the best known solutions [44], [54]. The soft backbone, another approximation of the back-bone in our work, is also experimentally evaluated. Ex-perimental results on four classic instances (in Fig. 5) have indicated the necessity of the soft backbone. However, an exact theoretical analysis is much better to quantify the power of the soft backbone, e.g., how to analyze the scale of the soft backbone for a given NRP instance. For both the approximate backbone and the soft backbone, a deep theoretical analysis may provide a further guideline for the design of backbone based algorithms.

In the experimental results, we evaluate our algorithm on two sets of the NRP instances, namely a set generated under given constraints and a set extracted from bug re-positories. However, both of these two sets of instances may bring threats to validity of our experimental results. On one hand, the classic generated NRP instances are a series of controllable randomized instances. Compared with real requirements repositories, these generated in-stances could provide extra stochastic distributions for the requirements data. On the other hand, the new ex-

TABLE

ERFORMANCE FOR

MSSA,

GA,

AND

BMA ON R EALISTIC I NSTANCES

Instance MSSA GA BMA

Profit distribution %

Name

Bound

Best

Average

Time Best

Average

Time Best

Average

Time

MSSA%

GA% nrp-e1-0.3 nrp-e1-0.5 nrp-e2-0.3 nrp-e2-0.5 nrp-e3-0.3 nrp-e3-0.5 nrp-e4-0.3 nrp-e4-0.5 nrp-m1-0.3 nrp-m1-0.5 nrp-m2-0.3 nrp-m2-0.5 nrp-m3-0.3 nrp-m3-0.5 nrp-m4-0.3 nrp-m4-0.5 nrp-g1-0.3 nrp-g1-0.5 nrp-g2-0.3 nrp-g2-0.5 nrp-g3-0.3 nrp-g3-0.5 nrp-g4-0.3 nrp-g4-0.5 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0 +0.5 +1.0 +1.5 +2.0 +2.5 +3.0 +3.5

MSSAGABMA

UTHOR: TITLE 15 tracted instances are much realistic since the items in bug repositories can be viewed as a kind of requirements in-formation. However, the knowledge gap between bug repositories and requirements repositories may lead to a bias for the evaluation results. To avoid the bias between our instances and real requirements data, the best method is to build open large requirements repositories in the future.

7 R

ELATED W ORK

To our knowledge, this paper is the first work using backbone based algorithms to solve requirements engi-neering problems. In this section, we investigate the relat-ed work of this paper.

To balance customer profits and requirements costs, Bagnall et al. [4] first proposed the NRP in 2001. In this work, they model the NRP, provide the instance genera-tion rules, and apply numerous search based algorithms to approximately solve the NRP. The most relevant prob-lem of the NRP is the process of Release Planning (RP) [21], which addresses selecting optimal releases to satisfy software requirements constraints [56] or release time [62], [40]. Both the NRP and the RP aim to find an optimal de-cision for requirements selection, especially dependency constraints based requirements selection. The NRP tends to address customer profits for the coming release while the RP tends to directly assign requirements for multiple releases. A recent review of the RP by Svahnberg et al. [59] lists and compares the related work of the RP. Based on the number of problem objectives, the related work of the NRP can be divided into two categories, namely single-objective and multi-objective. In the single-objective NRP (or the NRP for short), such as the problem in this paper, the cost bound of a software release is pre-defined and the problem objective is to obtain the maxi-mum profits from customers. For example, Geer & Ruhe [21] propose a genetic algorithm based approach to opti-mize software releases; Jiang et al. [31] propose an ant colony optimization algorithm to solve the NRP; Baker et al. [5] extend the NRP with component prioritization and solve this problem with the greedy algorithm and the simulated annealing. Moreover, for the resource alloca-tion for software releases, Ngo-The & Ruhe [50] propose a two-phase optimization by combining integer program-ming to relax the search space and genetic programming to reduce the search space. In this paper, we address the large scale single-objective NRP. Our approach is to downgrade the problem scale in contrast to the existing algorithms, which solve the problems directly. In the Multi-Objective NRP (MONRP), besides the profit, another objective is usually defined to minimize software costs. Zhang et al. [69] first proposed the MONRP and gave an empirical study with the genetic algorithm based multi-objective optimization algorithms in 2007. Many extensions of the MONRP are studied to balance the benefits and resources, including fairness [17], sensitivity [25], and robustness with completion time [22]. Moreover, Saliu & Ruhe [57] detect feature coupling from both business perspectives and implementation perspec-tives; Zhang et al. [66] model two periods of profits to analyze the requirements under varying time. A recent work by Zhang & Harman [68] shows that the dependen-cies in requirements interaction management can be for-mulated as an extension of the MONRP. The NRP is a combinatorial optimization model for re-quirements selection. Requirements selection and optimi-zation have impacted numerous aspects of requirements engineering, including requirements management [55], [48], [49], requirements prioritization [32], [3], require-ments triage [12], [14], and requirements visualization [16]. In addition, for further researches in requirements selection, some work investigates requirements interde-pendencies to explore the relationship and conflicts be-tween requirements [8], [65], [20], [68].

By fixing the optimal requirements for the next release, our work is a kind of Search Based Software Engineering (SBSE) approach for requirements engineering. In SBSE, software engineering problems are transformed into op-timization problems for approximately solving with search technologies [24], [23]. One typical field of SBSE is search based software testing (e.g., [42], [36], [1], [43]). Some other fields of SBSE include design (e.g., [6], [53], [61]), quality (e.g., [37]), refactoring (e.g., [51]), reverse engineering (e.g., [34]), etc. Among the fields of SBSE, Search Based Requirements Engineering (SBRE) aims to manage requirements with search technologies [24]. Most of work about the NRP and its relevant problems is the typical application of SBRE. A survey of SBRE shows the existing work and challenges in this field [67]. In our work, the backbone based algo-rithm is introduced to SBRE for the first time.

The backbone, a basic structure for reductions and re-finements in our work, is a solving strategy for exploring the hard problems in combinatorial optimization [58], [64], [33], [30]. To our knowledge, besides our work, there is only one concept similar to the approximate backbone for search technologies in software engineering. That is, Mahdavi et al. [38] propose a building block based multiple hill climbing approach for the software module clustering problem. From the viewpoint of combinatorial optimiza-tion, a building block in [38] is also an intersection of local optimal solutions as the approximate backbone. However, in our work, the concept of the soft backbone is first pro-posed in both software engineering and combinatorial optimization. The soft backbone and the multilevel strat-egy are combined with the approximate backbone to solve large scale search based problems. Besides the backbone, the muscle and the fat in combi-natorial optimization are two other effective technologies for constructing the solutions. The muscle of an instance is the union set of optimal solutions [29], [27] while the fat of an instance is the part without any optimal solution [11]. Drawn on the experiments from the existing work in

6 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. combinatorial optimization, each of the backbone, the muscle, and the fat can be employed to further guide the algorithm design for problems in requirements engineer-ing.

The multilevel approach is a kind of large scale optimiza-tion technology [60]. As we mentioned in Section 4.1, the key idea of the multilevel approach is to iteratively con-vert the original problem into multiple sub-problems so that the algorithm can downgrade the problem scale to apply existing algorithms. In this paper, our BMA is a multilevel approach for reducing the problem scale in requirements engineering. Besides the multilevel approach, the cooperative co-evolution approach is one of the recently proposed tech-nologies for large scale optimization [63], [52]. In contrast to the iterative reduction in the multilevel approach, the cooperative co-evolution approach employs the divide-and-conquer strategy to find the optimal solutions.

8 C

ONCLUSIONS AND F UTURE W ORK

As an important problem in requirements engineering, the Next Release Problem (NRP) aims to balance custom-er profits and requirements costs for the project decision. In this paper, we propose a Backbone based Multilevel Algorithm (BMA) to solve the large scale NRP. Based on the approximate backbone and the soft backbone, BMA iteratively reduces the instance scales and refines the so-lutions to construct the final solution. Experimental re-sults show that BMA can achieve better performance than direct solving approaches. In our work, we propose the soft backbone for the first time, which can be generated from the instance after the instance reduction. Moreover, we also propose a method to generate requirements data from open bug repositories. This method can be used to supplement the lack of open requirements databases. Our future work will focus on the application of BMA to other problems in software engineering. In require-ments engineering, BMA can be used to solve many other large scale problems, such as release planning and re-quirements prioritization. The backbone based multilevel strategy can build a bridge between large scale problems and existing algorithms. We will explore some new prob-lems, which may be solved by the similar strategy of BMA. In addition, we plan to develop BMA with a theo-retical analysis, e.g., how to estimate the scale of the backbone without empirical methods. Apart from appli-cations in requirements engineering, we want to apply the BMA to the regression test case selection problem in software testing. The model of the regression test case selection problem is very similar to the NRP. Thus, the application of BMA can be extended to various fields in software engineering. Another further work is to explore the relationship be-tween open bug repositories and requirements reposito-ries. In this paper, we map items in the NRP to ones in open bug repositories. However, the domain knowledge behind these two kinds of repositories may bring a gap to the application from one repository to the other. We will conduct an empirical study to find out the details of this knowledge gap. A CKNOWLEDGMENTS

We greatly thank our anonymous reviewers for their in-sightful comments and corrections. We thank Edward Prendergast and Qingna Fan with Intel Corporation for their helpful suggestions. This work is partially supported by the National Natu-ral Science Foundation of China under grants 60805024 and 61033012, and the National Research Foundation for the Doctoral Program of Higher Education of China un-der grant 20070141020. R EFERENCES [1]

S. Ali, L.C. Briand, H. Hemmati, and R.K. Panesar-Walawege, “A Systematic Review of the Application and Empirical Inves-tigation of Search-Based Test Case Generation,”

IEEE Trans. Software Engineering , vol. 36, no. 6, pp. 742-762, Nov./Dec. 2010, doi:10.1109/TSE.2009.52. [2]

J. Anvik, L. Hiew, and G.C. Murphy, “Who should Fix this Bug?,”

Proc. 28th Intl. Conf. Software Engineering (ICSE ’ , pp. 361-370, May 2010, doi:10.1145/1134285.1134336. [3] J. Azar, R.K. Smith, and D. Cordes, “Value-Oriented Require-ments Prioritization in a Small Development Organization,”

IEEE Software , vol. 24, no. 1, Jan./Feb. 2007, doi:10.1109/MS.2007.30. [4]

A. Bagnall, V. Rayward-Smith, and I. Whittley, “The Next Re-lease Problem,”

Information and Software Technology , vol. 43, no. 14, pp. 883-890, Dec. 2001, doi:10.1016/S0950-5849(01)00194-X. [5]

P. Baker, M. Harman, K. Steinhofel, and A. Skaliotis, “Search Based Approaches to Component Selection and Prioritization for the Next Release Problem,”

Proc. 22nd Intl. Conf. Software Maintenance (ICS

M ’06) , pp. 176-185, Sep. 2006, doi:10.1109/ICSM.2006.56. [6]

M. Bowman, L.C. Briand, and Y. Labiche, “Solving the Class Responsibility Assignment Problem in Object-Oriented Analy-sis with Multi-Objective Genetic Algorithms,”

IEEE Trans. Soft-ware Engineering , vol. 36, no. 6, pp. 817-837, Nov./Dec. 2010, doi:10.1109/TSE.2010.70. [7]

P. Carlshamre, K. Sandahl, M. Lindvall, B. Regnell, and J. Natt och Dag, “An Industrial Survey of Requirements Interdepend-encies in Software Product Release Planning,”

Proc. IEEE Intl. Symp. Requirements Engineering (ISRE ’ , pp. 84-91, Aug. 2001, doi:10.1109/ISRE.2001.948547. [9] C. Castro-Herrera, C. Duan, J. Cleland-Huang, and B. Mobasher, “A Recommender System for Requirements Elicitation in Large-scale Software Projects,”

Proc. ACM Symp. Applied Com-puting (SAC ’ , pp. 1419-1426, Mar. 2009, doi:10.1145/1529282.1529601. [10] B.H.C. Cheng and J.M. Atlee, “Research Directions in Require-ments Engineering,”

Proc. Intl. Conf. Software Engineering Work-shop Future of Software Engineering (Fo

SE ’07) , pp. 285-303, May 2007, doi:10.1109/FOSE.2007.17. [11]

S. Climer and W. Zhang, “Searching for Backbones and Fat: A Limit-Crossing Approach with Applications,”

Proc. National

UTHOR: TITLE 17

Conf. Artificial Intelligence (AAAI ’ , pp. 707-712, Aug. 2002. [12] A.M. Davis, “The Art of Requirements Triage,”

Computer , vol. 36, no. 3, pp. 42-49, Mar. 2003, doi:10.1109/MC.2003.1185216. [13]

J. del Sagrado, I.M. del Águila, and F.J. Orellana, “Ant Colony Optimization for the Next Release Problem: A Comparative Study,”

Proc. 2nd Intl. Symp. Search Based Software Engineering (SSBSE ’ , pp. 67-76, Sep. 2010, doi:10.1109/SSBSE.2010.18. [14] C. Duan, P. Laurent, J. Cleland-Huang, and C. Kwiatkowski, “Towards automated requirements prioritization and triage,”

Requirements Engineering , vol. 14, no. 2, pp. 73-89, Jun. 2009, doi:10.1007/s00766-009-0079-7. [15]

M.S. Feather, S.L. Cornford, J.D. Kiper, and T. Menzies, “Expe-riences Using Visualization Techniques to Present Require-ments, Risks to Them, and Options for Risk Mitigation,”

Proc. 1st Intl. Workshop Requirements Engineering Visualization (REV ’06) , pp. 10, Sep. 2006, doi:10.1109/REV.2006.2. [17]

A. Finkelstein, M. Harman, S.A. Mansouri, J. Ren, and Y. Zhang, “A Search Based Approach to Fairness Analysis in Requirement Assignments to Aid Negotiation, Mediation and Decision mak-ing,”

Requirements Engineering , vol. 14, no. 4, pp. 231-245, Dec. 2009, doi:10.1007/s00766-009-0075-y. [18]

M.R. Garey and D.S. Johnson,

Computers and Intractability: A Guide to the Theory of NP-Completeness . New York, NY: W.H. Freeman, pp. 109-117, 1979. [19]

T. Gorschek and A.M. Davis, “Requirements Engineering: In Search of the Dependent Variables,”

Information and Software Technology , vol. 50, no. 1-2, pp. 67-75, Jan. 2008, doi:10.1016/j.infsof.2007.10.003. [21]

D. Greer and G. Ruhe, “Software Release Planning: An Evolu-tionary and Iterative Approach,”

Information and Software Tech-nology , vol. 46, no. 4, pp. 243-253, Mar. 2004, doi:10.1016/j.infsof.2003.07.002. [22]

S. Gueorguiev, M. Harman, and G. Antoniol, “Software Project Planning for Robustness and Completion Time in the Presence of Uncertainty Using Multi Objective Search Based Software Engineering,”

Proc. 11th Ann. Conf. Genetic and Evolutionary

Computation (GECCO ’09) , pp. 1673-1680, Jul. 2009, doi:10.1145/1569901.1570125. [23]

M. Harman, “The Current State and Future of Search Based Software Engineering,”

Proc. Intl. Conf. Software Engineering Workshop Future of Software Engineering (Fo

SE ’07) , pp. 342-357, May 2007, doi:10.1109/FOSE.2007.29. [24]

M. Harman and B.F. Johns, “Search Based Software Engineer-ing,”

Information and Software Technology , vol. 43, no. 14, pp. 833-839, Dec. 2001, doi:10.1016/S0950-5849(01)00189-6. [25]

M. Harman, J. Krinke, J. Ren, and S. Yoo, “Search Based Data Sensitivity Analysis Applied to Requirement Engineering,”

Proc. 11th Ann. Conf. Genetic and Evolutionary Computation (GECCO ’10) , pp. 1681-1688, Jul. 2009, doi:10.1145/1569901.1570126. [26]

F. Hutter, H.H. Hoos, K. Leyton-Brown, and T. Stützle, “ParamILS: An Automatic Algorithm Configuration Frame-work,”

Journal of Artificial Intelligence Research , vol. 36, no. 1, pp. 267-306, Sep. 2009, doi:10.1613/jair.2861. [27]

H. Jiang and Y. Chen, “An Efficient Algorithm for Generalized Minimum Spanning Tree Problem,”

Proc. 12th Ann. Conf. Genet-ic and Evolutionary Computation (GECCO ’ , pp. 217-224, Jul. 2010, doi:10.1145/1830483.1830525. [28] H. Jiang, J. Xuan, and Z. Ren, “Approximate Backbone Based Multilevel Algorithm for Next Release Problem,”

Proc 12th Ann. Conf. Genetic and Evolutionary Computation (GECCO ’ , pp. 1333-1340, Jul. 2010, doi:10.1145/1830483.1830730. [29] H. Jiang, J. Xuan and X. Zhang, “An Approximate Muscle Guided Global Optimization Algorithm for the Three-Index Assignment Problem,”

Proc. IEEE Cong. Evolutionary Computa-tion (CEC ’ , pp. 2404-2410, Jun. 2008, doi:10.1109/CEC.2008.4631119. [30]

H. Jiang, X. Zhang, G. Chen, and M. Li, “Backbone Analysis and Algorithm Design for the Quadratic Assignment Problem,”

Science in China Series F: Information Science , vol. 51, no. 5, pp. 476-488, May 2008, doi:10.1007/s11432-008-0042-0. [31]

H. Jiang, J. Zhang, J. Xuan, Z. Ren, and Y. Hu, “A Hybrid ACO Algorithm for the Next Release Problem,”

Proc. Intl. Conf. Soft-ware Engineering and Data Mining (SEDM ’ , pp. 166-171, Jun. 2010. [32] J. Karlsson and K. Ryan, “A Cost-Value Approach for Prioritiz-ing Requirements,”

IEEE Software , vol. 14, no. 5, pp. 67-74, Sep./Oct. 1997, doi:10.1109/52.605933. [33]

P. Kilby, J. Slaney, and T. Walsh, “The Backbone of the Travel-ling Salesperson,”

Proc. 19th Intl. Joint Conf. Artificial Intelligence (IJCAI ’05) , pp. 175-180, Jul. 2005. [34]

K. Krogmann, M. Kuperberg, and R. Reussner, “Using Genetic Search for Reverse Engineering of Parametric Behaviour Mod-els for Performance Prediction,”

IEEE Trans. Software Engineer-ing , vol. 36, no. 6, pp. 865-877, Nov./Dec. 2010, doi:10.1109/TSE.2010.69. [35]

S. Kryazhimskiy, G. Tkačik, and J.B. Plotkin, “The Dynamics of Adaptation on Correlated Fitness Landscapes,”

Proc. National Academy of Sciences USA (PNAS) , vol. 106, no. 44, pp. 18638-18643, Nov. 2009, doi:10.1073/pnas.0905497106. [36]

Z. Li, M. Harman, and R.M. Hierons, “Search Algorithms for Regression Test Case Prioritization,”

IEEE Trans. Software Engi-neering , vol. 33, no. 4, pp. 225-237, Apr. 2007, doi:10.1109/TSE.2007.38. [37]

Y. Liu, T.M. Khoshgoftaar, and N. Seliya, “Evolutionary Opti-mization of Software Quality Modeling with Multiple Reposito-ries,”

IEEE Trans. Software Engineering , vol. 36, no. 6, pp. 852-864, Nov./Dec. 2010, doi:10.1109/TSE.2010.51. [38]

K. Mahdavi, M. Harman, and R.M. Hierons, “A Multiple Hill Climbing Approach to Software Module Clustering,”

Proc. 19th Intl.

Conf. Software Maintenance (ICSM ’03) , pp. 315-324, Sep. 2003, doi:10.1109/ICSM.2003.1235437. [39]

R. Martí, “Multi-Start Methods,”

Handbook of Metaheuristics , F. Glover and G.A. Kochenberger, eds., Intl. Series Operations Re-search and Managements Science, New York: Kluwer, pp. 355-368, 2003. [40]

J. McElroy and G. Ruhe, “When-to-Release Decisions for Fea-tures with Time-Dependent Value Functions,”

Requirements En-gineering , vol. 15, no. 3, pp. 337-358, Sep. 2010, doi:10.1007/s00766-010-0097-5. [41]

R. McGill, J.W. Tukey, and W.A. Larsen, “Variations of Box Plots,”

The American Statistician , vol. 32, no. 1, pp. 12-16, Feb. 1978. [42]

P. McMinn, “Search-Based Software Test Data Generation: A Survey,”

Software Testing, Verification and Reliability , vol. 14, no. 2, pp. 105-156, Jun. 2004, doi:10.1002/stvr.294. [43]

P. McMinn, M. Harman, K. Lakhotia, Y. Hassoun, and J. We-gener, “Input Domain Reduction through Irrelevant Variable

8 IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL.

Removal and its Effect on Local, Global and Hybrid Search-Based Structural Test Data Generation,”

IEEE Trans. Software Engineering , preprint, 10 Feb. 2011, doi:10.1109/TSE.2011.18. [44]

P. Merz and B. Freisleben. “Fitness Landscape Analysis and Memetic Algorithms for the Quadratic Assignment Problem,”

IEEE Trans. Evolutionary Computation , vol. 4, no. 4, pp. 337-352, Nov. 2000, doi:10.1109/4235.887234. [45]

Z. Michalewicz and F.B. Fogel,

How to Solve it: Modern Heuristics . Berlin Heidelberg, Germany: Springer-Verlag, pp. 35-43, 2000. [46]

Mining Challenges 2007 and 2009 of IEEE Working Conference on Mining Software Repositories (MSR), http://msr.uwaterloo.ca/msr2007/challenge/ and http://msr.uwaterloo.ca/msr2009/challenge/, last accessed in Jul. 2011. [47]

J. Natt och Dag, B. Regnell, V. Gervasi, and S. Brinkkemper, “A Linguistic-Engineering Approach to Large-Scale Requirements Management,”

IEEE Software , vol. 22, no. 1, pp. 32-39, Jan./Feb. 2005, doi:10.1109/MS.2005.1. [49]

J. Natt och Dag, T. Thelin, and B. Regnell, “An Experiment on Linguistic Tool Support for Consolidation of Requirements from Multiple Sources in Market-Driven Product Development,”

Empirical Software Engineering , vol. 11, no. 2, pp. 303-329, Jun. 2006, doi:10.1007/s10664-006-6405-5. [50]

A. Ngo-The and G. Ruhe, “Optimized Resource Allocation for Software Release Planning,”

IEEE Trans. Software Engineering , vol. 35, no. 1, pp. 109-123, Jan./Feb. 2009, doi:10.1109/TSE.2008.80. [51]

M. O’Keeffe and M.Ó. Cinnéidea, “Search-Based Refactoring for Software Maintenance,”

Journal of Systems and Software , vol. 81, no. 4, pp. 502-516, Apr. 2008, doi:10.1016/j.jss.2007.06.003. [52]

M.N. Omidvar, X. Li, Z. Yang, and X. Yao, “Cooperative Co-evolution for Large Scale Optimization Through More Frequent Random Grouping,”

Proc. IEEE Cong. Evolutionary Computation (CEC ’10) , pp. 1754-1761, Jul. 2010, doi:10.1109/CEC.2010.5586127. [53]

K. Praditwong, M. Harman, and X. Yao, “Software Module Clustering as a Multi-Objective Search Problem,”

IEEE Trans. Software Engineering , vol. 37, no. 2, pp. 264-282, Mar./Apr. 2011, doi:10.1109/TSE.2010.26. [54]

M. Qasem and A. Prügel-Bennett, “Learning the Large-Scale Structure of the MAX-SAT Landscape Using Populations,”

IEEE Trans. Evolutionary Computation , vol. 14, no. 4, pp. 518 - 529, Aug. 2010, doi:10.1109/TEVC.2009.2033579. [55]

B. Regnell, L. Karlsson, and M. Host, “An Analytical Model for Requirements Selection Quality Evaluation in Product Software Development,”

Proc. IEEE Intl. Conf. Requirements Engineering (ICRE ’ , pp. 254–263, Sep. 2003, doi:10.1109/ICRE.2003.1232757. [56] G. Ruhe and M.O. Saliu, “The Art and Science of Software Re-lease Planning,”

IEEE Software , vol. 22, no. 6, pp. 47-53, Nov. 2005, doi:10.1109/MS.2005.164. [57]

M.O. Saliu and G. Ruhe, “Bi-objective Release Planning for Evolving Software Systems,”

Proc. 6th Joint Meeting European Software Engineering Conf. and ACM SIGSOFT Symp. Foundations of Software Engineering (ESEC/FSE ’07) , pp. 105–114, Sep. 2007, doi:10.1145/1287624.1287641. [58]

J. Slaney and T. Walsh, “Backbones in Optimization and Ap-proximation,”

Proc. 17th Intl. Joint Conf. Artificial intelligence (IJCAI ’ , pp. 254-259, 2001. [59] M. Svahnberg, T. Gorschek, R. Feldt, R. Torkar, S.B. Saleem, and M.U. Shafique, “A Systematic Review on Strategic Release Planning Models,”

Information and Software Technology , vol. 52, no. 3, pp. 237-248, Mar. 2010, doi:10.1016/j.infsof.2009.11.006. [60]

C. Walshaw, “A Multilevel Approach to the Traveling Sales-man Problem,”

Operations Research , vol. 50, no. 5, pp. 862-877, Sep. 2002. [61]

Z. Wang, K. Tang, and X. Yao, “Multi-Objective Approaches to Optimal Testing Resource Allocation in Modular Software Sys-tems,”

IEEE Trans. Reliability , vol. 59, no. 3, pp. 563-575, Sep. 2010, doi:10.1109/TR.2010.2057310. [62]

B. Yang, H. Hu, and L. Jia, “A Study of Uncertainty in Software Cost and Its Impact on Optimal Software Release Time,”

IEEE Trans. Software Engineering , vol. 34, no. 6, pp. 813-825, Nov./Dec. 2008, doi:10.1109/TSE.2008.47. [63]

Z. Yang, K. Tang, and X. Yao, “Large Scale Evolutionary Opti-mization Using Cooperative Coevolution,”

Information Sciences , vol. 178, no. 15, pp. 2985-2999, Aug. 2008, doi:10.1016/j.ins.2008.02.017. [64]

W. Zhang, “Configuration Landscape Analysis and Backbone Guided Local Search. Part I: Satisfiability and Maximum Satisfiability,”

Artificial intelligence , vol. 158, no. 1, pp. 1-26, Sep. 2004, doi:10.1016/j.artint.2004.04.001. [65]

W. Zhang, H. Mei, and H. Zhao, “Feature-Driven Requirement Dependency Analysis and High-Level Software Design,”

Re-quirements Engineering , vol. 11, no. 3, pp. 205-220, Jun. 2006, doi:10.1007/s00766-006-0033-x. [66]

Y. Zhang, E. Alba, J.J. Durillo, S. Eldh, and M. Harman, “To-day/Future Importance Analysis,”

Proc. 12th Ann. Conf. Genetic and Evolutionary Computation (GECCO ’ , pp. 1357-1364, Jul 2010, doi:10.1145/1830483.1830733. [67] Y. Zhang, A. Finkelstein, and M. Harman, “Search Based Re-quirements Optimisation: Existing Work and Challenges,”

Re-quirements Engineering: Foundation for Software Quality , B. Paech et al., eds., LNCS 5025, Berlin Heidelberg: Springer-Verlag, pp. 88-94, 2008, doi:10.1007/978-3-540-69062-7_8. [68]

Y. Zhang and M. Harman, “Search Based Optimization of Re-quirements Interaction Management,”

Proc. 2nd Intl. Symp.

Search Based Software Engineering (SSBSE ’10) , pp. 47-56, Sep. 2010, doi:10.1109/SSBSE.2010.16. [69]

Y. Zhang, M. Harman, and S.A. Mansouri, “The Multi-Objective Next Release Problem,”

Proc. 9th Ann. Conf. Genetic and Evolutionary Computation (GECCO ’07) , pp. 1129-1136, Jul. 2007, doi:10.1145/1276958.1277179.

Jifeng Xuan received the BSc degree in software engineering from Dalian University of Technology, China, in 2007. He is current-ly working toward the PhD degree at Dalian University of Technology. His research inter-ests include search based software engi-neering, mining software repositories, and machine learning. He is a student member of the China Computer Federation (CCF).

He Jiang received the BSc and PhD de-grees in computer science from University of Science and Technology of China (USTC), China, in 1999 and 2005, respectively. He is currently an associate professor at School of Software, Dalian University of Technology, China. His research interests include compu-tational intelligence and its applications in

UTHOR: TITLE 19 software engineering and data mining. He is a member of the IEEE and the China Computer Federation (CCF).

Zhilei Ren received the BSc degree in soft-ware engineering from Dalian University of Technology, China, in 2007. He is currently working toward the PhD degree at Dalian University of Technology. His research inter-ests include metaheuristic algorithm design, data mining, and their applications in soft-ware engineering. He is a student member of the China Computer Federation (CCF).