Efficient Simulation Budget Allocation for Subset Selection Using Regression Metamodels
EEfficient Simulation Budget Allocation for Subset SelectionUsing Regression Metamodels
Fei Gao a , Zhongshun Shi b , Siyang Gao c , Hui Xiao d a Network Planning Department, SF Technology, Shenzhen 518052, China b Department of Industrial and Systems Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA c Department of Systems Engineering and Engineering Management, City University of Hong Kong, Hong Kong d School of Statistics, Southwestern University of Finance and Economics, Chengdu 611130, China
Abstract
This research considers the ranking and selection (R&S) problem of selecting the optimal subset from a finite set of alternativedesigns. Given the total simulation budget constraint, we aim to maximize the probability of correctly selecting the top- m designs. In order to improve the selection efficiency, we incorporate the information from across the domain into regressionmetamodels. In this research, we assume that the mean performance of each design is approximately quadratic. To achievea better fit of this model, we divide the solution space into adjacent partitions such that the quadratic assumption can besatisfied within each partition. Using the large deviation theory, we propose an approximately optimal simulation budgetallocation rule in the presence of partitioned domains. Numerical experiments demonstrate that our approach can enhancethe simulation efficiency significantly. Key words:
Simulation optimization; ranking and selection; OCBA; subset selection; regression.
Discrete-event systems (DES) simulation has played an important role in analyzing modern complex systems andevaluating decision problems, since these systems are usually too difficult to be described using analytical models.DES simulation has been a common analysis method of choice and widely used in many practical applications,such as the queueing systems, electric power grids, air and land traffic control systems, manufacturing plants andsupply chains (Xu et al., 2015, 2016; Gao and Chen, 2016). However, running the simulation model is usually timeconsuming, and a large number of simulation replications are typically required to achieve an accurate estimate of adesign decision (Lee et al., 2010). In addition, it could be computationally quite expensive to select the best design(s)when the number of alternatives is relatively large.In this paper, we consider the problem of selecting the optimal subset of the top- m designs out of t alternatives,where the performance of each design is estimated based on simulation. In order to improve the selection efficiency,we aim to intelligently allocate the simulation replications to each design to maximize the probability of correctlyselecting all the top- m designs. This problem setting falls in the well-established statistics branch known as rankingand selection (R&S) (Xu et al., 2015).In the literature, several types of efficient R&S procedures have been developed. The indifference-zone (IZ) approachallocates the simulation budget to provide a guaranteed lower bound for the probability of correct selection ( P CS ) Email addresses: [email protected] (Fei Gao), [email protected] (Zhongshun Shi), [email protected] (Siyang Gao), [email protected] (Hui Xiao).
25 April 2019 a r X i v : . [ m a t h . O C ] A p r Kim and Nelson, 2001). Chen et al. (2000) proposed an optimal computing budget allocation (OCBA) approach forR&S problems. The OCBA approach allocates the simulation replications sequentially in order to maximize
P CS under a simulation budget constraint. He et al. (2007), Gao and Shi (2015) and Gao et al. (2017a) further developedthe OCBA method with the expected opportunity cost (
EOC ) measure, which focuses more on the consequence ofa wrong selection compared to
P CS . Brantley et al. (2013) proposed another approach called optimal simulationdesign (OSD) to select the best design with regression metamodels. It assumes that all designs fit a single quadraticline and the variance of each design is identically distributed. The OSD approach was further extended in Brantleyet al. (2014), Xiao et al. (2015) and Gao et al. (2018) to consider more general problems by dividing the solutionspace into adjacent partitions. Although these studies are also based on partitioning or metamodeling, they do notaim to select the top m designs, and are therefore different in objectives from this research. Other variants of OCBAinclude selecting the best design considering resource sharing and allocation (Peng et al., 2013) and input uncertainty(Gao et al., 2017b).Most of the existing R&S procedures focus on identifying the best design and return a single choice as the estimatedoptimum. However, decision makers may prefer to have several good alternatives instead of one and make the finalselection by considering some qualitative criteria, such as political feasibility and environmental consideration, whichmight be neglected by computer models (Gao and Chen, 2016; Zhang et al., 2016). The selection procedure providingthe top- m designs can help the decision makers make their final decision in a more flexible way.The literature on the optimal subset selection is sparse. Chen et al. (2008) and Zhang et al. (2016) considered theoptimal subset selection problem using the OCBA framework, which maximizes P CS under a simulation budgetconstraint. In Gao and Chen (2015), an optimal subset selection procedure was proposed to minimize the measureof
EOC . The optimal subset selection problem was further extended in Gao and Chen (2016) for general underlyingdistributions using the large deviations theory.The aforementioned R&S procedures could smartly allocate the computing budget given the simulation results. Theyestimate the performance of each design only by considering the sample information of the considered design itself.However, the designs nearby could also provide useful information since neighboring designs usually have similarperformance. Based on this idea, we aim to improve the selection efficiency by incorporating the information fromacross the domain into some response surfaces. Unlike traditional R&S methods, the regression based approachesrequire simulation experiments on only a subset of all the designs under consideration. The performances of theother designs can be inferred based on the sample information of the simulated designs. This provides us an effectiveway to further improve the efficiency for solving the subset selection problem, which is the motivation of this paper.In this research, we assume the underlying function is quadratic or approximately quadratic. This assumption couldhelp utilize the structure information of the design space and led to significant improvement of the computationalefficiency. It is commonly used in the literature, such as Brantley et al. (2013, 2014); Xiao et al. (2015); McConnell andServaes (1990). Based on this assumption we built a quadratic regression metamodel to incorporate the informationfrom across the domain. The first contribution of this work is that we propose an asymptotically optimal allocationrule that determines which designs need to be simulated and the number of simulation budget allocated to them,such that the
P CS of the optimal subset could be maximized. We call this procedure the optimal computing budgetallocation for selecting the top- m designs with regression (OCBA-mr). In order to further extend the OCBA-mrprocedure to more general cases where the underlying function is partially quadratic or non-quadratic, we dividethe solution space into adjacent partitions and build a quadratic regression metamodel within each partition. Theunderlying function in each partition could be well approximated by a quadratic function if the solution space isproperly partitioned or each partition is small enough. According to the results in Brantley et al. (2014); Xiaoet al. (2015); Gao et al. (2018), the use of partitioned domains along with regression metamodels could significantlyimprove the simulation efficiency. That means interpolating the solution space can be an effective way for us tohave further improvement. For different problems, the solution space could be divided into discrete partitions usingdifferent criteria, such as the size of corporations, the type of industries and the temperature of chemical process(Xiao et al., 2015). Based on the idea mentioned above, we develop an asymptotically optimal computing budgetallocation procedure for selecting the top- m designs with regression in partitioned domains (OCBA-mrp), which isan extension of the OCBA-mr procedure for more general cases. In order to maximize the P CS of the optimal subset,the OCBA-mrp procedure not only determines the optimal simulation budget allocation within each partition butalso determines the optimal budget allocation between partitions.The rest of the paper is organized as follows. In Section 2, we formulate the optimal subset selection problemwith regression metamodel and derive an asymptotically optimal simulation budget allocation rule, called OCBA-mr. Section 3 extends the OCBA-mr method for more general cases with partitioned domains and derives another2symptotically optimal simulation budget allocation rule, called OCBA-mrp. The performance of the proposedmethods is illustrated with numerical examples in Section 4. Section 5 concludes the paper.
In this section, we provide an optimal computing budget allocation rule for the subset selection problem based onthe regression metamodel.
Without loss of generality, the best design is defined as the design with the smallest mean performance. We introducethe following notations: t : total number of designs; x i : location of design i ; m : design with the m -th smallest mean value; f ( x i ): mean performance value for design i ; Y ( x i ): simulation output for design i ; β : vector of ( β , β , β ) , representing the coefficients of the regression model;ˆ f ( x i ): estimate of f ( x i ) based on the regression model; S o : set of the true top- m designs; S n : set of designs not in S o (complement of S o ); n : total number of simulation replications; n i : number of simulation replications allocated to design i ; α : vector of ( α , α s , α k ), where α r is the proportion of the total simulation budget n allocated to design r ,i.e., α r = n r /n , r = 1 , s, t .In the last bulleted item, r takes values only at 1, s and t , where designs 1 and t are the first and last designs inthe solution space, and design s is an intermediate design determined by (5) (the rationale of it will be explainedin more detail in Theorem 2). The problem we considered in this paper is to select the top- m designs out of the t alternatives by allocating simulation replications to these three designs. The optimal set is S o = (cid:26) i : max i ∈ S o f ( x i ) < min j ∈ S n f ( x j ) , i, j = 1 , ..., t and | S o | = m (cid:27) , where | S | is the size of subset S .We study the problem where the expected performance value across the solution space is quadratic or approximatelyquadratic in nature. Then, the mean performance of design i can be written as f ( x i ) = β + β x i + β x i , i = 1 , ..., t. The coefficients β = ( β , β , β ) are unknown beforehand, but can be estimated via simulation samples. Assume thatthe noises ( ε ) of the simulation experiments of design i follow normal distribution N (0 , σ ). For each design, the noiseis independent from replication to replication. The simulation output is Y ( x i ) = f ( x i ) + ε ( x i ) with ε ( x i ) ∼ N (0 , σ ) . Given a total of n samples, we define Y as the vector of the n simulation samples Y ( x i ) and X be an n × , x i , x i ) corresponding to each entry Y ( x i ) of Y . We estimate the coefficients β usingthe ordinary least squares (OLS) method (Hayashi, 2000) and denote the estimates as ˆ β = ( ˆ β , ˆ β , ˆ β ). Then, wehave ˆ β = ( X T X ) − X T Y and Cov [ ˆ β ] = σ ( X T X ) − , where T denotes transposition and X T X is known as theinformation matrix (Kiefer, 1959). The estimate of f ( x i ) can be written asˆ f ( x i ) = ˆ β + ˆ β x i + ˆ β x i , i = 1 , ..., t. (1)3e can use the equation (1) which incorporates the sample information of the simulated designs to estimate theexpected performance for each design across the solution space. As ˆ f ( x i ) is a linear combination of ˆ β , we have V ar [ ˆ f ( x i )] = σ X Ti ( X T X ) − X i , where X Ti = (1 , x i , x i ).Due to the uncertainty of the estimate of the underlying function, a correct selection of the optimal subset S o maynot always occur. Therefore, we introduce a measure, the probability of correct selection ( P CS ), to formulate theR&S problem considered in this paper. The
P CS is given by
P CS = P (cid:26) (cid:84) i ∈ S o (cid:84) j ∈ S n ˆ f ( x i ) ≤ ˆ f ( x j ) (cid:27) . Given a fixed simulation budget, the optimization problem can be written as follows:max n i P CS = P (cid:26) (cid:92) i ∈ S o (cid:92) j ∈ S n ˆ f ( x i ) ≤ ˆ f ( x j ) (cid:27) s.t. t (cid:88) i =1 n i = n. (2)In this section, we aim to solve problem (2) where the mean performance of each design is estimated using theregression metamodel. Due to the uncertainty in the simulation experiments, multiple simulation replications areneeded to generate the estimates of the underlying function ˆ f ( x i ) accurately. The variance of ˆ f ( x i ) is a function ofthe information matrix ( X T X ) and can be reduced if additional simulation replications are conducted (Xiao et al.,2015). We are interested in how to intelligently allocate the simulation budget to proper designs such that ˆ f ( x i ) canbe better estimated using regression metamodel, and the P CS in problem (2) can be maximized.
A major difficulty in solving problem (2) is that the objective function
P CS does not have a closed-form expression.We seek to solve this optimization problem under an asymptotic framework in which the
P CS is maximized or theprobability of false selection (
P F S = 1 − P CS ) is minimized as n goes to infinity.The P CS used in this paper is defined based on the quadratic regression model (1), which is constructed using thesimulation information of only a fraction of the t designs. We call the designs receiving simulation replications thesupport designs. In order to construct the quadratic regression model, we need at least three support designs toobtain all of the information in X T X (Kiefer, 1959). For simplicity, in this research we let the number of supportdesigns be three, two of which are at the extreme locations, i.e., x and x t (for this setting, see, e.g., Brantley et al.(2013, 2014); Xiao et al. (2015); Kiefer (1959)). The case with more than three designs can be similarly analyzed. Lemma 1: Let ˆ d i,j = ˆ f ( x i ) − ˆ f ( x j ) , i, j = 1 , , ..., t, i (cid:54) = j . ˆ d i,j follows normal distribution N (cid:0) f ( x i ) − f ( x j ) , ς ij (cid:1) ,where ς ij = σ (0 , x i − x j , x i − x j )( X T X ) − x i − x j x i − x j = σ n (cid:16) ρ ij, α + ρ ij,s α s + ρ ij,t α t (cid:17) , and ρ ij, = ( x s − x i )( x t − x i ) − ( x s − x j )( x t − x j )( x − x s )( x − x t ) ,ρ ij,s = ( x − x i )( x t − x i ) − ( x − x j )( x t − x j )( x s − x )( x s − x t ) ,ρ ij,t = ( x − x i )( x s − x i ) − ( x − x j )( x s − x j )( x t − x )( x t − x s ) . m = 1), and henceis omitted for brevity.For any i, j = 1 , , ..., t, i (cid:54) = j , ˆ d i,j is a normally distributed random variable. We can use the large deviations theoryto derive the convergence rate function of the false selection probability. Lemma 2: The convergence rate function of incorrect comparison probability for each design is: − lim n →∞ n log P { ˆ f ( x i ) > ˆ f ( x m ) } , i ∈ S o , i (cid:54) = m − lim n →∞ n log P { ˆ f ( x i ) < ˆ f ( x m ) } , i ∈ S n = R m,i ( α ) = ( f ( x m ) − f ( x i )) / σ (cid:16) ρ mi, α + ρ mi,s α s + ρ mi,t α t (cid:17) , Based on the results in Lemma 2, we can get an explicit expression of the convergence rate function of
P F S . Lemma 3: The convergence rate function of
P F S is: − lim n →∞ n log P F S = min (cid:40) min i ∈ Soi (cid:54) = m R m,i ( α ) , min j ∈ S n R m,j ( α ) (cid:41) . The main assertion of Lemma 3 is that the overall convergence rate of
P F S is determined by the minimum con-vergence rate of the incorrect comparison for each design. Minimizing the
P F S is asymptotically equivalent tomaximizing the rate at which
P F S goes to zero as a function of α , i.e., maximizing − lim n →∞ n log P F S . Based onLemma 3, the asymptotical version of (2) becomesmax min (cid:26) min i ∈ S o ,i (cid:54) = m R m,i ( α ) , min j ∈ S n R m,j ( α ) (cid:27) s.t. α + α s + α t = 1 α , α s , α t ≥ . (3) In this subsection, we seek to derive the optimality conditions for (3). Since the overall convergence rate of
P F S isdetermined by the design with the minimum convergence rate. A false selection is most likely to happen at this keydesign. Therefore, it is enough for us to investigate the convergence rate of the key design across the solution space.We define i ∗ as the key design. That is i ∗ = arg min i =1 ,...,t (cid:26) min i ∈ S o ,i (cid:54) = m R m,i ( α ) , min i ∈ S n R m,i ( α ) (cid:27) . Theorem 1
The optimization problem (2) can be asymptotically optimized with the following allocation rule: α ∗ r = (cid:40) | ρ mi ∗ ,r || ρ mi ∗ , | + | ρ mi ∗ ,s | + | ρ mi ∗ ,t | , r = 1 , s, t, , otherwise . (4)The ρ mi ∗ ,r is also known as the Lagrange interpolating polynomial coefficient (De la Garza et al., 1954; Burdenand Faires, 2001). It represents the relative importance of each support design for estimating ˆ d mi = ˆ f ( x m ) − ˆ f ( x i ).Theorem 1 indicates that the support design r will receive more simulation budget if it has larger ρ mi ∗ ,r .Given the optimal allocation rule (Theorem 1), we next determine the optimal location for the support design s .5 heorem 2 The rate function of
P F S with allocation rule satisfying (4) can be maximized if the support design s satisfies the following equations.The support design x s = x i ∗ + x m − x , x + x t ≤ x i ∗ + x m < x + x t ,x i ∗ + x m − x t , x + x t < x i ∗ + x m ≤ x +3 x t , ( x + x t ) / . otherwise . (5)When the x s derived from (5) does not correspond to any design available, we round it to the nearest one. Theexpression of Theorem 2 is similar to the results in Brantley et al. (2013). The differences are the selection of thekey design x ∗ i . We define x ∗ i as the design with the minimum convergence rate of PFS, where a false selection of theoptimal subset is most likely to happen. The regression based method mentioned above can greatly improve the subset selection efficiency compared to thetraditional methods. However, it is constrained with the typical assumptions such as the assumption of quadraticunderlying function for the means. It is possible that the underlying function is neither quadratic nor approximatelyquadratic, so that we will fail to find the top- m designs. In order to extend our method to more general cases, wedivide the solution space into adjacent partitions. The quadratic pattern can be expected when the solution spaceis properly partitioned or each partition is small enough. We first add the following notations for partitioned domains: l : number of partitions of the entire domain; k h : number of designs in partition h , h = 1 , , ..., l ; i h : i th design in partition h , i = 1 , , ..., k h (when i = k h , we denote design k hh as design k h for notationalsimplicity); x i h : location of design i h ; b : the partition containing the design with the m -th smallest mean value; m b : design with the m -th smallest mean value; β h : vector of ( β h , β h , β h ) , representing the coefficients of the regression model in partition h ; n h • : number of simulation replications allocated to partition h ; n i h : number of simulation replications allocated to design i h ; θ h : the proportion of simulation budget allocated to partition h , i.e., θ h = n h • /n ; α h : vector of ( α h , α s h , α k h ), where α r h is the proportion of the simulation budget n h • allocated to design r h ,i.e., α r h = n r h /n h • , r h = 1 h , s h , k h .The notations f ( x i h ), Y ( x i h ) and ˆ f ( x i h ) defined for partitioned domains are similar to those in Section 2, exceptthe design i is replaced by i h . The entire domain is divided into l adjacent partitions. Each partition contains k h designs, i.e., there are (cid:80) lh =1 k h = t designs in total.We assume there exists (cid:15) ∈ R such that there are exactly m designs from the total t designs with mean performancesless than (cid:15) and the rest t − m designs with mean performances greater than (cid:15) . It ensures that the optimal subset iswell defined and can be distinguished. We define the optimal subset S o as S o = (cid:40) i h : max i h ∈ S o f ( x i h ) < min j g ∈ S n f ( x j g ) , i h = 1 h , ..., k h ,j g = 1 g , ..., k g h, g = 1 , ..., l and | S o | = m (cid:41) .
6n this section, we assume that the expected performance value in each partition is quadratic or approximatelyquadratic when the solution space is properly partitioned and the mean performance within a partition is continuousand smooth. For this problem setting the
P CS is defined as
P CS = P (cid:26) (cid:92) i h ∈ S o (cid:92) j g ∈ S n ˆ f ( x i h ) ≤ ˆ f ( x j g ) (cid:27) . Given a fixed simulation budget, the optimization problem can be written as follows:max
P CS s.t. l (cid:88) h =1 k h (cid:88) i h =1 h n i h = n. (6)In this section, we aim to solve problem (6) in the presence of regression metamodels. We are interested in howto intelligently allocate the simulation budget to proper designs such that ˆ f ( x i h ) can be better estimated usingregression metamodels, and the P CS in problem (6) can be maximized. Note that this model does not requireall the designs to be on the same axis, and therefore does not hinder it from being applied for multi-dimensionalproblems. For multi-dimensional problems, we can treat the range of the underlying function on each dimension asone or more partitions, and then apply this formulation.
In order to solve the optimization problem (6), one challenge is how to derive an explicit expression of the
P CS .We seek to solve (6) under an asymptotic framework in which the probability of false selection (
P F S = 1 − P CS )is minimized as n goes to infinity.Similar to the setting in Section 2, we let the number of support designs in each partition be three, two of which areat the extreme locations, i.e., x h and x k h . We have ˆ d ij h = ˆ f ( x i h ) − ˆ f ( x j h ) follow normal distribution N (cid:0) f ( x i h ) − f ( x m h ) , ς ij h (cid:1) and ˆ d i h ,j g = ˆ f ( x i h ) − ˆ f ( x j g ) follow normal distribution N (cid:0) f ( x i h ) − f ( x j g ) , ς i h ,j g (cid:1) where h (cid:54) = g .Similar to the proof of Lemma 1, we have ς ij h = σ h (0 , x i h − x j h , x i h − x j h )( X Th X h ) − x i h − x j h x i h − x j h = σ b θ h n (cid:16) ρ ij h , α h + ρ ij h ,s α s h + ρ ij h ,k α k h (cid:17) , ρ ij h , = ( x sh − x ih )( x kh − x ih ) − ( x sh − x jh )( x kh − x jh )( x h − x sh )( x h − x kh ) ,ρ ij h ,s = ( x h − x ih )( x kh − x ih ) − ( x h − x jh )( x kh − x jh )( x sh − x h )( x sh − x kh ) ,ρ ij h ,k = ( x h − x ih )( x sh − x ih ) − ( x h − x jh )( x sh − x jh )( x kh − x h )( x kh − x sh ) , and ς i h ,j g = σ h (0 , x i h − x j g , x i h − x j g )( X Th X h ) − x i h − x j g x i h − x j g = σ h θ h n (cid:16) η i h , α b + η i h ,s α s b + η i h ,k α k b (cid:17) + σ g θ g n (cid:16) η j g , α h + η j g ,s α s h + η j g ,k α k h (cid:17) , η i h , = ( x sh − x ih )( x kh − x ih )( x h − x sh )( x h − x kh ) ,η i h ,s = ( x h − x ih )( x kh − x ih )( x sh − x h )( x sh − x kh ) ,η i h ,k = ( x h − x ih )( x sh − x ih )( x kh − x h )( x kh − x sh ) . For any h = 1 , , ..., l and i h = 1 h , h , ..., k h , ˆ f ( x i h ) is a normally distributed random variable. We can use the largedeviations theory to derive the convergence rate function of the false selection probability. Lemma 4: The convergence rate functions of incorrect comparison probability for each design are provided as follows: − lim n →∞ n log P { ˆ f ( x i h ) > ˆ f ( x m b ) } , i h ∈ S o − lim n →∞ n log P { ˆ f ( x i h ) < ˆ f ( x m b ) } , i h ∈ S n = R m b ,i h ( θ b , θ h , α b , α h )= ( f ( x m b ) − f ( x i h )) / σ b θ b (cid:16) η mb, α b + η mb,s α sb + η mb,k α kb (cid:17) + σ h θ h (cid:16) η ih, α h + η ih,s α sh + η ih,k α kh (cid:17) , h (cid:54) = b, (7) and − lim n →∞ n log P { ˆ f ( x i b ) > ˆ f ( x m b ) } , i b ∈ S o , i b (cid:54) = m b − lim n →∞ n log P { ˆ f ( x i b ) < ˆ f ( x m b ) } , i b ∈ S n = R m b ,i b ( θ b , α b )= ( f ( x m b ) − f ( x i b )) / σ b θ b (cid:16) ρ mib, α b + ρ mib,s α sb + ρ mib,k α kb (cid:17) . (8)According to the Bonferroni inequality, we have P F S ≤ (cid:80) ih ∈ So,ih (cid:54) = mb P { ˆ f ( x i h ) ≥ ˆ f ( x m b ) } + (cid:80) j g ∈ S n P { ˆ f ( x j g ) ≤ ˆ f ( x m b ) } . The
P F S is bounded below by max (cid:40) max ≤ h ≤ lh (cid:54) = b (cid:26) max i h ∈ S o P { ˆ f ( x i h ) ≥ ˆ f ( x m b ) } , max j h ∈ S n P { ˆ f ( x j h ) ≤ ˆ f ( x m b ) } (cid:27) , max (cid:26) max ib ∈ Soib (cid:54) = mb P { ˆ f ( x i b ) ≥ ˆ f ( x m b ) } , max j b ∈ S n P { ˆ f ( x j b ) ≤ ˆ f ( x m b ) } (cid:27)(cid:41) and bounded above by | S o |×| S n |× max (cid:40) max ≤ h ≤ lh (cid:54) = b (cid:26) max i h ∈ S o P { ˆ f ( x i h ) ≥ ˆ f ( x m b ) } , max j h ∈ S n P { ˆ f ( x j h ) ≤ ˆ f ( x m b ) } (cid:27) , max (cid:26) max ib ∈ Soib (cid:54) = mb P { ˆ f ( x i b ) ≥ ˆ f ( x m b ) } , max j b ∈ S n P { ˆ f ( x j b ) ≤ ˆ f ( x m b ) } (cid:27)(cid:41) . P F S is given by − lim n →∞ n log P F S = min (cid:40) min ≤ h ≤ lh (cid:54) = b (cid:26) min i h ∈ S o R m b ,i h ( θ b , θ h , α b , α h ) , min j h ∈ S n R m b ,j h ( θ b , θ h , α b , α h ) (cid:27) , min (cid:26) min ib ∈ Soib (cid:54) = mb R m b ,i b ( θ b , α b ) , min j b ∈ S n R m b ,j b ( θ b , α b ) (cid:27)(cid:41) , Minimizing the
P F S is asymptotically equivalent to maximizing the rate at which
P F S goes to zero as a functionof θ h and α h , h = 1 , , ..., l. Similarly, the asymptotical version of (6) becomesmax min (cid:40) min ≤ h ≤ lh (cid:54) = b (cid:26) min i h ∈ S o R m b ,i h ( θ b , θ h , α b , α h ) , min j h ∈ S n R m b ,j h ( θ b , θ h , α b , α h ) (cid:27) , min (cid:26) min ib ∈ Soib (cid:54) = mb R m b ,i b ( θ b , α b ) , min j b ∈ S n R m b ,j b ( θ b , α b ) (cid:27)(cid:41) s.t. α h + α s h + α k h = 1 , h = 1 , ..., l, l (cid:88) h =1 θ h = 1 ,θ h , α h , α s h , α k h ≥ , h = 1 , ..., l. (9) In this section, we seek to derive the optimality conditions for (9). We want to determine (i) the number of simulationreplication allocated to each partition, (ii) the locations of the designs should be simulated in each partition and(iii) the number of simulation replications allocated to those selected designs.According to Xiao et al. (2015), we have θ h /θ b → l goes to infinity. That means thefraction of the simulation budget allocated to the partition b containing the m -th smallest mean value far exceedsthe fraction given to any other partition when l goes to infinity. Given that, the rate function R m b ,i h ( θ b , θ h , α b , α h )converges to (cid:101) R m b ,i h ( θ h , α h ) where (cid:101) R m b ,i h ( θ h , α h ) = lim l →∞ R m b ,i h ( θ b , θ h , α b , α h )= ( f ( x m b ) − f ( x i h )) / σ h θ h (cid:16) η ih, α h + η ih,s α sh + η ih,k α kh (cid:17) . (cid:40) min ≤ h ≤ lh (cid:54) = b (cid:26) min i h ∈ S o (cid:101) R m b ,i h ( θ h , α h ) , min j h ∈ S n (cid:101) R m b ,j h ( θ h , α h ) (cid:27) , min (cid:26) min ib ∈ Soib (cid:54) = mb R m b ,i b ( θ b , α b ) , min j b ∈ S n R m b ,j b ( θ b , α b ) (cid:27)(cid:41) s.t. α h + α s h + α k h = 1 , h = 1 , ..., l l (cid:88) h =1 θ h = 1 θ h , α h , α s h , α k h ≥ , h = 1 , ..., l. (10)In order to better analyze problem (10) above, we decompose it as follows:max min ≤ h ≤ lh (cid:54) = b (cid:26) min i h ∈ S o (cid:101) R m b ,i h ( θ h , α h ) , min j h ∈ S n (cid:101) R m b ,j h ( θ h , α h ) (cid:27) s.t. α h + α s h + α k h = 1 , α h , α s h , α k h ≥ , (11)for h = 1 , , ..., l, h (cid:54) = b , and max min (cid:26) min ib ∈ Soib (cid:54) = mb R m b ,i b ( θ b , α b ) , min j b ∈ S n R m b ,j b ( θ b , α b ) (cid:27) s.t. α b + α s b + α k b = 1 , α b , α s b , α k b ≥ , (12)for h = b . Lemma 5: Let θ ∗ h and α ∗ h be the optimal solution to (10). The θ ∗ h and α ∗ h are independent and can be solved separately.In addition, the l α ∗ h ’s ( h = 1 , , ..., l ) corresponding to the l partitions are also mutually independent and can besolved separately using (11) and (12). Similar to Section 2, we define a key design of each partition h , denoted as i ∗ h . A false selection is most likely tohappen at these key designs. That is i ∗ h = arg min h ≤ i h ≤ k h (cid:26) min i h ∈ S o R m b ,i h ( θ b , θ h , α b , α h ) , min i h ∈ S n R m b ,i h ( θ b , θ h , α b , α h ) (cid:27) , h (cid:54) = b, arg min b ≤ i b ≤ k b (cid:26) min ib ∈ Soib (cid:54) = mb R m b ,i b ( θ b , α b ) , min i b ∈ S n R m b ,i b ( θ b , α b ) (cid:27) , h = b. α ∗ h and locations of support designs Given Lemma 5, we can determine α ∗ h separately for each partition by solving the optimization problems (11) and(12). Theorem 3
The optimization problem (6) can be asymptotically optimized with the following allocation rule: α ∗ r h = (cid:40) | z rh || z h | + | z sh | + | z kh | , r h = 1 h , s h , k h , , otherwise , (13)10 here z r h = (cid:40) ρ mi ∗ b ,r , h = b,η i ∗ h ,r , otherwise , for r h = 1 h , s h , k h . Given the optimal allocation rule (Theorem 3), we next determine the optimal location for the support design s h within each partition. Theorem 4
The rate function of
P F S with allocation rule satisfying (13) can be asymptotically maximized if thesupport design s h satisfies the following equations.(a) For all h (cid:54) = b , support design x s h = (cid:26) x i ∗ h , i ∗ h (cid:54) = 1 h , k h , ( x h + x k h ) / , i ∗ h = 1 h , k h , (14) and the optimal allocation rule (13) becomes α ∗ r h = (cid:26) , r h = i ∗ h , , otherwise . (15) (b) For h = b , support design x s b = x i ∗ b + x m b − x b , x b + x kb ≤ x i ∗ b + x mb < x b + x kb ,x i ∗ b + x m b − x k b , x b + x kb < x i ∗ b + x mb ≤ x b +3 x kb , ( x b + x k b ) / , otherwise , (16) and the approximate optimal allocation rule (13) is α ∗ r b = | ρ mi ∗ b ,r || ρ mi ∗ b , | + | ρ mi ∗ b ,s | + | ρ mi ∗ b ,k | , r b = 1 b , s b , k b , , otherwise . (17)When the x s h derived from (14) and (16) does not correspond to any design available, we round it to the nearestone. θ ∗ h In this subsection, we aim to determine the optimal budget allocation rules between the l partitions. Since α ∗ h and θ ∗ h , can be solved separately for each partition, R m b ,i h ( θ b , θ h , α ∗ b , α ∗ h ) is a function of θ b and θ h only. Similarly, R m b ,i h ( θ b , α ∗ b ) is a function of θ b only. Optimization problem (9) can be rewritten asmax min (cid:40) min ≤ h ≤ lh (cid:54) = b (cid:26) min i h ∈ S o R m b ,i h ( θ b , θ h , α ∗ b , α ∗ h ) , min j h ∈ S n R m b ,j h ( θ b , θ h , α ∗ b , α ∗ h ) (cid:27) , min (cid:26) min ib ∈ Soib (cid:54) = mb R m b ,i b ( θ b , α ∗ b ) , min j b ∈ S n R m b ,j b ( θ b , α ∗ b ) (cid:27)(cid:41) s.t. l (cid:88) h =1 θ h = 1 , θ h ≥ , h = 1 , ..., l. (18)11y solving the optimization model (18), we can get the optimal θ ∗ h for all h = 1 , ..., l . Theorem 5
The optimal allocation that asymptotically minimizes the
P F S for the problem (6) is that θ ∗ h = γ ∗ hl (cid:80) i =1 γ ∗ i , where γ ∗ b = σ b (cid:115)(cid:16) η mb, α ∗ b + η mb,s α ∗ sb + η mb,k α ∗ kb (cid:17) (cid:80) h (cid:54) = b γ ∗ h σ h and γ ∗ h = σ h / ( f ( x m b ) − f ( x i ∗ h )) , h (cid:54) = b. The results of Theorem 5 indicate the optimal simulation budget allocation between the l partitions. The resultsof Theorem 4 determine which designs should be selected for simulation and the number of simulation replicationsallocated to these selected designs. Since the solution space is divided into adjacent partitions, this optimal budgetallocation procedure can be used for more general cases where the underlying function is non-quadratic. By con-ducting simulation replications on fewer designs, the simulation efficiency is dramatically enhanced compared to theexisting methods. Based on the discussion above, we propose two sequential budget allocation algorithm to implement the optimalityconditions. They are Optimal Optimal Computing Budget Allocation for selecting the top- m designs with regression(OCBA-mr) based on Theorems 1 and 2 and Optimal Optimal Computing Budget Allocation for selecting the top- m designs with regression in partitioned domains (OCBA-mrp) based on Theorems 4 and 5. (1) OCBA-mr AlgorithmINITIALIZE: κ ←
0; Perform n simulation replications for three design locations in each partition; by conventionwe use the D-optimal support designs, i.e., n κ = n κ ((1+ t ) / = n κt = n (round as needed). The incremental simulationbudget is ∆. LOOP: WHILE (cid:80) ti =1 n κi < n DOUPDATE:
Estimate a quadratic regression equation ˆ f ( x i ) using the least squares method based on the informationfrom all prior simulation runs.Calculate the mean and variance of each design using ˆ f ( x i ) = ˆ β + ˆ β x i + ˆ β x i . Determine the observed subset ˆ S o and ˆ S n ; the design with the m -th smallest sample mean value; the observed keydesign ˆ i ∗ .Calculate ˆ α ∗ and determine the support designs using Theorems 1 and 2. ALLOCATE:
Increase the simulation budget by ∆ and calculate the new budget allocation rule using ˆ α ∗ . SIMULATE:
Perform max(0 , n κi − n κ − i ) simulation replications for design i and i = 1 , s, t . κ ← κ + 1. END OF LOOP.(2) OCBA-mrp AlgorithmINITIALIZE: κ ←
0; Perform n simulation replications for three design locations in each partition; by conventionwe use the D-optimal support designs, i.e., n κ h = n κ ((1+ k ( h ) ) / h = n κk h = n (round as needed). The incrementalsimulation budget is ∆. LOOP: WHILE (cid:80) lh =1 (cid:80) k h i h =1 h n κi h < n DO PDATE:
Estimate the quadratic regression equations ˆ f ( x i h ) for each partition using the least squares methodbased on the information from all prior simulation runs.Calculate the means and variances of each design using ˆ f ( x i h ) = ˆ β h + ˆ β h x i h + ˆ β h x i h , i h = 1 h , h , ..., k h , h =1 , , ..., l. Determine the observed subset ˆ S o and ˆ S n ; the design with the m -th smallest sample mean value; the observed keydesign ˆ i h ∗ of each partition.Calculate ˆ α ∗ h and determine the support designs in each partition using Theorem 4. Calculate ˆ θ ∗ h using Theorem 5. ALLOCATE:
Increase the simulation budget by ∆ and calculate the new budget allocation rule using ˆ α ∗ h and ˆ θ ∗ h . SIMULATE:
Perform max(0 , n κi h − n κ − i h ) simulation replications for design i h in each partition h = 1 , , ..., l and i h = 1 h , s h , k h . κ ← κ + 1. END OF LOOP.5 Numerical Experiments
In this section, we test our proposed simulation budget allocation rules, OCBA-mr and OCBA-mrp, with someexisting methods on several typical optimal subset selection problems. We use the following three budget allocationapproaches for comparison. The top- m designs are selected based on their mean values. • Equal Allocation (EA) : The EA is the most commonly used and simplest method, which allocates the simulationbudget equally to each of the design. • OCBAm+ Allocation : The OCBAm+ is an efficient simulation budget allocation procedure for selecting the top m designs. It was developed based on the OCBA framework. For detail see Zhang et al. (2016). • OCBA-ss Allocation : The OCAB-ss procedure was proposed in Gao and Chen (2016) and seek to solve the optimalsubset R&S problems for general underlying distributions using the large deviations theory. • OSD Allocation : The OSD procedure was proposed in Brantley et al. (2013) and seek to solve the single best R&Sproblems using regression metamodel.The OCBA-mrp, OCBA-mr and OSD estimate the performance of each design based on some regression metamodels.The EA, OCBAm+ and OCBA-ss use the mean value of each design for comparison, and the mean performancevalue is computed directly from the simulation output. Unlike our proposed methods and OSD, they do not relyon any response surface to estimate the performance value for any design. In order to compare the performance ofthese allocation approaches, we test them empirically on the following experiments. • Experiment 1: Consider the optimization problemmin f ( x ) = ( x − . The experiment assumes that the noise for simulation has normal distribution N (0 , ) for the objective value. Wediscretize the domain of the function into 100 evenly spaced points from 0 to 10. Since the underlying function isquadratic, we can use the OCBA-mr method directly for this problem. In order to utilize the OCBA-mrp method,we divide the 100 points into 5 adjacent partitions and there are 20 points in each partition. We want to selectthe top m = 5 designs. • Experiment 2: Consider the Griewank functionmin f ( x ) = 10 × (1 + (1 / × x − cos( x )) . We discretize the domain of the function into 100 evenly spaced points from 0 to 20. The nature of this underlyingfunction does not allow us to apply the OCBA-mr method directly. To test the performance of the OCBA-mrmethod for the non-quadratic problem, we make a slight modification by dividing the solution space into 5 adjacentpartitions. Then, we allocate the simulation budget equally to each partition and use the OCBA-mr method withineach partition. In this experiment, the OCBA-mrp is also tested under this partition pattern. The experimentassumes that the noise for simulation has normal distribution N (0 , . ) We want to select the top m = 3 designs.13 Experiment 3: Consider the optimization problemmin f ( x ) = sin( x )+sin(10 x/ x ) − . x +3 . The entire domain consists of 200 discrete points for x ∈ [0 , m = 5 designs. • Experiment 4: Consider the optimization problemmin f ( x ) = 2( x − . + sin(8 πx − π/ . The experiment assumes that the noise for simulation has standard normal distribution for the objective measure.We discretize the domain of the function into 200 evenly spaced points from 0 to 2. The 200 points are furtherdivided into 20 adjacent partitions so that there are 10 points in each partition. Similar to the setting in experiment2, the OCBA-rm method is modified to handle this non-quadratic problem. We want to select the top m = 3designs. • Experiment 5: Consider the optimization problemmin f ( x , x ) = 140 ( x + x ) − cos( x ) cos( x √ . The experiment assumes that the noise for simulation has normal distribution N (0 , ) for the objective measure. x and x are continuous variables with − ≤ x ≤ − ≤ x ≤
5. We discretize the solution space into11 ×
11 discrete points x = {− , − , ..., } and x = {− , − , ..., } . We divide the 11 ×
11 designs into 11 adjacentpartitions with 11 designs in each partition. For each i = 1 , , ...,
11, partition i consists of design points ( − , i − − , i − , i − m = 3 designs, which are (0 , − ,
0) and (0 , • Experiment 6: Consider the ( s, S ) inventory problem in the Simulation Optimization Library ( http://simopt.org/wiki/index.php?title=SS_Inventory ). The decision variable ( s, S ) is the inventory strategy. When theinventory of design ( s, S ) on hand below s , an order is incurred to supplement the inventory to S . The optimizationproblem is to minimize the the E[Total cost per period]. The domain is discretized into 20 ×
20 discrete pointswith s = { , , ..., } and S = { , , ..., } . The discrete points are divided into 20 partitionswith 20 points in each partition. For each i = 1 , , ...,
20, partition i consists of points (800 + 10 i, , (800 +10 i, , ..., (800 + 10 i, S and s is kept as a constant.We want to select the top m = 3 designs, which are (810 , , , n be 10 and theincremental budget ∆ be 100 for all the five experiments. Figures 1 reports the comparison results in terms of P CS .The estimate of
P CS is based on the average of 2000 independent replications of each procedure for experiments1-6.It is observed that all the procedures obtain higher
P CS as the available simulation budget increases. Our proposedprocedures OCBA-mr and OCBA-mrp perform the best on the tested examples. It shows that our proposed pro-cedures have dramatically improved the selection quality of the top- m designs compared to the existing methods.When the considered problem is quadratic or approximately quadratic, such as example 1, the OCBA-mr methodhas the best performance. That is because the OCBA-mr method is developed for the quadratic or approximatelyquadratic problem setting. It can make full use of its advantages by simulating fewer designs once the quadraticassumption is satisfied. When the considered problem is partially quadratic or non-quadratic, such as examples2-6, the OCBA-mrp method performs better than the modified OCBA-mr method. Because the OCBA-mrp canintelligently allocate the simulation budget both between and within each partition.Compared to the existing approaches, our proposed approaches incorporate the information from the simulateddesigns into regression metamodel(s). We only need to simulate a subset of the alternative designs to build up theregression metamodel(s). The performances of the designs that have not been simulated can be inferred based onthe metamodel(s). It dramatically improves the selection efficiency.14 PCS n O C B A - m r p O C B A - m r O S D O C B A - s s O C B A m + E A (a) Experiment 1
PCS n O C B A - m r p O C B A - m r O S D O C B A - s s O C B A m + E A (b) Experiment 2
PCS n O C B A - m r p O C B A - m r O S D O C B A - s s O C B A m + E A (c) Experiment 3
PCS n O C B A - m r p O C B A - m r O S D O C B A - s s O C B A m + E A (d) Experiment 4
PCS n O C B A - m r p O C B A - m r O S D O C B A - s s O C B A m + E A (e) Experiment 5
PCS n O C B A - m r p O C B A - m r O S D O C B A - s s O C B A m + E A (f) Experiment 6Fig. 1. Comparison results for different test examples.
In this study, we further enhanced the simulation efficiency of finding the top- m designs by incorporating quadraticregression equation(s). Using the large deviations theory, we formulate the problem of selecting the top- m designs asthat of maximizing the minimum convergence rate of the false selection probability. We derived two asymptoticallyoptimal budget allocation rules, OCBA-mr and OCBA-mrp, for one partition and multi-partition problem setting,respectively. Numerical results suggest that the proposed approaches can dramatically improve the selection efficiencycompared to the existing R&S approaches. When the underlying function can be approximated by a quadraticfunction across the solution space, the OCBA-mr is more efficient. For the non-quadratic problem setting, theOCBA-mrp demonstrates the best performance. References
Brantley, M. W., Lee, L. H., Chen, C.-H., and Chen, A. (2013). Efficient simulation budget allocation with regression.
IIE Transactions , 45(3):291–308.Brantley, M. W., Lee, L. H., Chen, C.-H., and Xu, J. (2014). An efficient simulation budget allocation methodincorporating regression for partitioned domains.
Automatica , 50(5):1391–1400.Burden, R. L. and Faires, J. D. (2001). Numerical analysis. 2001.
Brooks/Cole, USA .Chen, C.-H., He, D., Fu, M., and Lee, L. H. (2008). Efficient simulation budget allocation for selecting an optimalsubset.
INFORMS Journal on Computing , 20(4):579–595.Chen, C. H., Lin, J., Y¨ucesan, E., and Chick, S. E. (2000). Simulation budget allocation for further enhancing theefficiency of ordinal optimization.
Discrete Event Dynamic Systems , 10(3):251–270.De la Garza, A. et al. (1954). Spacing of information in polynomial regression.
The Annals of Mathematical Statistics ,25(1):123–130.Gao, F., Gao, S., Xiao, H., and Shi, Z. (2018). Advancing constrained ranking and selection with regression inpartitioned domains.
IEEE Transactions on Automation Science and Engineering .Gao, S. and Chen, W. (2015). Efficient subset selection for the expected opportunity cost.
Automatica , 59:19–26.Gao, S. and Chen, W. (2016). A new budget allocation framework for selecting top simulated designs.
IIE Trans-actions , 48(9):855–863. 15ao, S., Chen, W., and Shi, L. (2017a). A new budget allocation framework for the expected opportunity cost.
Operations Research , 65(3):787–803.Gao, S. and Shi, L. (2015). Selecting the best simulated design with the expected opportunity cost bound.
IEEETransactions on Automatic Control , 60(10):2785–2790.Gao, S., Xiao, H., Zhou, E., and Chen, W. (2017b). Robust ranking and selection with optimal computing budgetallocation.
Automatica , 81:30–36.Glynn, P. and Juneja, S. (2004). A large deviations perspective on ordinal optimization. In Ingalls, R. G., Rossetti,M. D., Smith, J. S., and Peters, B. A., editors,
Proceedings of the 2004 Winter Simulation Conference , pages577–585. IEEE.Hayashi, F. (2000). Econometrics. princeton.
New Jersey, USA: Princeton University .He, D., Chick, S. E., and Chen, C.-H. (2007). The opportunity cost and OCBA selection procedures in ordinaloptimization.
IEEE Transactions on Systems, Man, and Cybernetics–Part C , 37(5):951–961.Kiefer, J. (1959). Optimum experimental designs.
Journal of the Royal Statistical Society. Series B (Methodological) ,pages 272–319.Kim, S. H. and Nelson, B. L. (2001). A fully sequential procedure for indifference-zone selection in simulation.
ACMTransactions on Modeling and Computer Simulation , 11(3):251–273.Lee, L. H., Chen, C., Chew, E. P., Li, J., Pujowidianto, N. A., and Zhang, S. (2010). A review of optimal computingbudget allocation algorithms for simulation optimization problem.
International Journal of Operations Research ,7(2):19–31.McConnell, J. J. and Servaes, H. (1990). Additional evidence on equity ownership and corporate value.
Journal ofFinancial economics , 27(2):595–612.Peng, Y., Chen, C.-H., Fu, M. C., and Hu, J.-Q. (2013). Efficient simulation resource sharing and allocation forselecting the best.
IEEE Transactions on Automatic Control , 58(4):1017–1023.Xiao, H., Lee, L. H., and Chen, C.-H. (2015). Optimal budget allocation rule for simulation optimization usingquadratic regression in partitioned domains.
IEEE Transactions on Systems, Man, and Cybernetics: Systems ,45(7):1047–1062.Xu, J., Huang, E., Chen, C.-H., and Lee, L. H. (2015). Simulation optimization: A review and exploration in thenew era of cloud computing and big data.
Asia-Pacific Journal of Operational Research , 32(03):1–34.Xu, J., Huang, E., Hsieh, L., Lee, L. H., Jia, Q. S., and Chen, C.-H. (2016). Simulation optimization in the era ofindustrial 4.0 and the industrial internet.
Journal of Simulation , 10(4):310–320.Zhang, S., Lee, L. H., Chew, E. P., Xu, J., and Chen, C.-H. (2016). A simulation budget allocation procedure forenhancing the efficiency of optimal subset selection.
IEEE Transactions on Automatic Control , 61(1):62–75.16nline Supplement
Proof of Lemma 2
Let Λ ( d ) i ( θ ) = ln E ( e θ ˆ d i,m ) denote the log-moment generating function of ˆ d i,m and I ( d ) i ( · ) denote the Fenchel-Legendretransform of Λ ( d ) i , that is I ( d ) i ( x ) = sup θ ∈ R ( θx − Λ ( d ) i ( θ )) . For any set A , let A ◦ denotes its interior and for any function f ( · ), let f (cid:48) ( x ) denote the derivative of f at x . Theeffective domain of Λ i is D Λ i = { θ ∈ R : Λ i ( θ ) < ∞} and F i = { Λ (cid:48) i ( θ ) : θ ∈ D ◦ Λ i } . Let f min = min i ∈{ , ,...,t } f ( x i )and f max = max i ∈{ , ,...,t } f ( x i ). We assume that the interval [ f min , f max ] ⊂ (cid:84) ti =1 F ◦ i . This assumption can besatisfied by some common families of distributions such Normal, Bernoulli, Poisson and Gamma family (Glynn andJuneja, 2004). It ensures that the range of ˆ f ( x i ) include [ f min , f max ] and P ( ˆ f ( x i ) > ˆ f ( x m )) and P ( ˆ f ( x j ) < ˆ f ( x m ))are positive for i ∈ S o and j ∈ S n .Then, ˆ d i,m = ˆ f ( x i ) − ˆ f ( x m ) satisfies the large deviation principle with good rate functions − lim n →∞ n log P { ˆ d i,m > } , i ∈ S o , i (cid:54) = m − lim n →∞ n log P { ˆ d i,m < } , i ∈ S n = R m,i ( α ) = I ( d ) i (0) , Since ˆ d i,m = ˆ f ( x i ) − ˆ f ( x m ) are all normally distributed random variables, the rate function can be expressed asfollow according to the results in Glynn and Juneja (2004). R m,i ( α ) = ( f ( x m ) − f ( x i )) / σ (cid:16) ρ mi, α + ρ mi,s α s + ρ mi,t α t (cid:17) . Proof of Lemma 3
According to the definition of
P CS , we have
P CS = P (cid:26) (cid:92) i ∈ S o (cid:92) j ∈ S n ˆ f ( x i ) ≤ ˆ f ( x j ) (cid:27) ≥ P (cid:26)(cid:16) (cid:92) i ∈ Soi (cid:54) = m ˆ f ( x i ) ≤ ˆ f ( x m ) (cid:17) (cid:92) (cid:16) (cid:92) j ∈ S n ˆ f ( x j ) ≥ ˆ f ( x m ) (cid:17)(cid:27) ≥ − (cid:88) i ∈ So,i (cid:54) = m P { ˆ f ( x i ) ≥ ˆ f ( x m ) } − (cid:88) j ∈ S n P { ˆ f ( x j ) ≤ ˆ f ( x m ) } . The last inequality follows from the Bonferroni inequality. Since
P F S = 1 − P CS , we can get that
P F S ≤ (cid:88) i ∈ Soi (cid:54) = m P { ˆ f ( x i ) ≥ ˆ f ( x m ) } + (cid:88) j ∈ S n P { ˆ f ( x j ) ≤ ˆ f ( x m ) } . Therefore, a lower bound on
P F S can be derived asmax (cid:26) max i ∈ Soi (cid:54) = m P { ˆ f ( x i ) ≥ ˆ f ( x m ) } , max j ∈ S n P { ˆ f ( x j ) ≤ ˆ f ( x m ) } (cid:27) P F S can be derived as | S o |×| S n |× max (cid:26) max i ∈ Soi (cid:54) = m P { ˆ f ( x i ) ≥ ˆ f ( x m ) } , max j ∈ S n P { ˆ f ( x j ) ≤ ˆ f ( x m ) } (cid:27) . According to Lemma 2, as n goes to infinity we can get the convergence rate function shown in Lemma 3. Proof of Theorem 1
Given the definition of the key design i ∗ , we rewrite (3) asmax R m,i ∗ ( α )s.t. α + α s + α t = 1 , α , α s , α t > . (.1)Since R m,i ∗ ( α ) is concave and strictly increasing functions of α (Xiao et al., 2015; Glynn and Juneja, 2004), theoptimization problem is a convex programming problem. We define a Lagrangian function L = − R m,i ∗ ( α ) + λ ( α + α s + α t −
1) and determine the optimal allocation based on its Karush-Kuhn-Tucker (KKT) conditions. ∂L∂α r = − ( f ( x m ) − f ( x i ∗ )) / σ (cid:16) ρ mi ∗ , α + ρ mi ∗ ,s α s + ρ mi ∗ ,t α t (cid:17) ρ mi ∗ ,r α ∗ r + λ = 0 ,r = 1 , s, t. (.2)Plug (.2) into α ∗ + α ∗ s + α ∗ t = 1, we can establish α ∗ r = | ρ mi ∗ ,r || ρ mi ∗ , | + | ρ mi ∗ ,s | + | ρ mi ∗ ,t | , r = 1 , s, t. (.3)Then, we can get the conclusions in Theorem 1. Proof of Theorem 2
Plugging (4) into R m,i ∗ ( α ), we have R m,i ∗ ( α ) = ( f ( x m ) − f ( x i ∗ )) / σ ( | ρ mi ∗ , | + | ρ mi ∗ ,s | + | ρ mi ∗ ,t | ) .∂R m,i ∗ ( α ) ∂x s = − ( f ( x m ) − f ( x i ∗ ) σ (cid:16) | ρ mi ∗ , | + | ρ mi ∗ ,s | + | ρ mi ∗ ,t | (cid:17) × (cid:16) ∂ | ρ mi ∗ , | ∂x s + ∂ | ρ mi ∗ ,s | ∂x s + ∂ | ρ mi ∗ ,t | ∂x s (cid:17) . where ∂ρ mi ∗ , ∂x s = ( x i ∗ − x m )( x i ∗ + x m − x − x t )( x − x s ) ( x − x t ) ,∂ρ mi ∗ ,s ∂x s = − ( x i ∗ − x m )( x i ∗ + x m − x − x t )(2 x s − x − x t )( x s − x ) ( x s − x t ) ,∂ρ mi ∗ ,t ∂x s = ( x i ∗ − x m )( x i ∗ + x m − x − x t )( x t − x )( x t − x s ) .
18n order to analyze the optimal location for x s , we consider the following five cases.Case I: x i ∗ + x m < x + x t We have ∂R m,i ∗ ( α ) /∂x s > x s < ( x + x t ) / ∂R m,i ∗ ( α ) /∂x s < x s > ( x + x t ) /
2. Therefore, when( x i ∗ + x m ) / < (3 x + x t ) /
4, we choose x s = ( x + x t ) / x + x t ≤ x i ∗ + x m < x + x t We have ∂R m,i ∗ ( α ) /∂x s > x s < x i ∗ + x m − x ; ∂R m,i ∗ ( α ) /∂x s < x s > x i ∗ + x m − x . Therefore,when (3 x + x t ) / ≤ ( x i ∗ + x m ) / < ( x + x t ) /
2, we choose x s = x i ∗ + x m − x .Case III: x i ∗ + x m > x +3 x t We have ∂R m,i ∗ ( α ) /∂x s > x s < ( x + x t ) / ∂R m,i ∗ ( α ) /∂x s < x s > ( x + x t ) /
2. Therefore, when( x i ∗ + x m ) / > ( x + 3 x t ) /
4, we choose x s = ( x + x t ) / x + x t < x i ∗ + x m ≤ x +3 x t We have ∂R m,i ∗ ( α ) /∂x s > x s < x i ∗ + x m − x t ; ∂R m,i ∗ ( α ) /∂x s < x s > x i ∗ + x m − x t . Therefore,when (3 x + x t ) / ≤ ( x i ∗ + x m ) / < ( x + x t ) /
2, we choose x s = x i ∗ + x m − x t .Case V: x i ∗ + x m = x + x t The location of x s has no influence on the results. Therefore, we choose x s = ( x + x t ) / Proof of Lemma 4
Let Λ i h ( θ ) = ln E ( e θ ˆ f ( x ih ) ) denote the log-moment generating function of ˆ f ( x i h ) and I i h ( · ) denote the Fenchel-Legendre transform of Λ i h , that is I i h ( x ) = sup θ ∈ R ( θx − Λ i h ( θ )) . Let the scaled cumulant generating function of ( ˆ f ( x m b ) , ˆ f ( x i h )) be denoted as follows:lim n →∞ n Λ ( ˆ f ( x mb ) , ˆ f ( x ih )) ( nθ b , nθ h )= lim n →∞ n ln E ( e nθ b ˆ f ( x mb )+ nθ h ˆ f ( x ih ) ) , By the G¨artner-Ellis theorem, ( ˆ f ( x m b ) , ˆ f ( x i h )) satisfy the large deviation principle with good rate functions − lim n →∞ n log P { ˆ f ( x i h ) > ˆ f ( x m b ) } , i h ∈ S o − lim n →∞ n log P { ˆ f ( x i h ) < ˆ f ( x m b ) } , i h ∈ S n = R m b ,i h ( θ b , θ h , α b , α h ) = inf v ( θ b I m b ( v ) + θ h I i h ( v )) , h (cid:54) = b. f ( x m b ) , ˆ f ( x i h )) are all normally distributed random variables, the rate function can be expressed as followsaccording to the results in Glynn and Juneja (2004). Then, we can get R m b ,i h ( θ b , θ h , α b , α h )= ( f ( x m b ) − f ( x i h )) / σ b θ b (cid:16) η mb, α b + η mb,s α sb + η mb,k α kb (cid:17) + σ h θ h (cid:16) η ih, α h + η ih,s α sh + η ih,k α kh (cid:17) , h (cid:54) = b, The proof of (8) is similar to that is given for Lemma 2, and hence is omitted for brevity.
Proof of Lemma 5
We first rewrite optimization model (10) asmax z s.t. (cid:101) R i h ( θ h , α h ) − z ≥ , h = 1 , ..., m, h (cid:54) = b, i h = 1 h , ..., k h R i b ( θ b , α b ) − z ≥ , i b = 1 b , ..., k b α h + α s h + α k h = 1 , h = 1 , ..., m l (cid:88) h =1 θ h = 1 , θ h , α h , α s h , α k h ≥ , h = 1 , ..., m. (.4)Similarly, the optimization models (11) and (12) can be rewritten as follows:max z s.t. (cid:101) R i h ( θ h , α h ) − z ≥ , i h = 1 h , ..., k h α h + α s h + α k h = 1 , α h , α s h , α k h ≥ , (.5)for h = 1 , , ..., m, h (cid:54) = b , and max z s.t. R i b ( θ b , α b ) − z ≥ , i b = 1 b , ..., k b α b + α s b + α k b = 1 , α b , α s b , α k b ≥ . (.6)Let (cid:101) α ∗ h and (cid:101) α ∗ b be the optimal solutions to (.5) and (.6). The optimization model (.4), (.5) and (.6) have the sameobjective function, and the domain of (.4) is a subset of (.5) and (.6). Therefore, if (cid:101) α ∗ h and (cid:101) α ∗ b are feasible to (.4),they are the optimal solutions of (.4). According to the formula of R i b ( θ b , α b ) and (cid:101) R i h ( θ h , α h ), we can concludethat (cid:101) α ∗ h and (cid:101) α ∗ b do not depend on the value of θ h . Therefore, we can conclude that (cid:101) α ∗ h and (cid:101) α ∗ b are feasible to (.4),and they are optimal to (10). Then we get the conclusions in Lemma 5. Proof of Theorem 3
In order to determine the optimal α ∗ h for each partition, we consider two cases ( h (cid:54) = b and h = b ).For h (cid:54) = b , we solve the optimization model (11). Given the definition of the key design i ∗ h , we rewrite (11) asmax (cid:101) R m b ,i ∗ h ( θ h , α h )s.t. α h + α s h + α k h = 1 , α h , α s h , α k h > , (.7)for h = 1 , , ..., m and h (cid:54) = b . 20ince (cid:101) R i ∗ h ( θ h , α h ) is concave and strictly increasing functions of α h (Xiao et al., 2015; Glynn and Juneja, 2004), theoptimization problem is a convex programming problem. We define a Lagrangian function L = − (cid:101) R m b ,i ∗ h ( θ h , α h ) + λ h ( α h + α s h + α k h −
1) and determine the optimal allocation based on its Karush-Kuhn-Tucker (KKT) conditions. (cid:101) R m b ,i ∗ h ( θ h , α h ) = ( f ( x m b ) − f ( x i ∗ h )) / σ h θ h (cid:16) η i ∗ h, α h + η i ∗ h,s α sh + η i ∗ h,k α kh (cid:17) .∂L∂α r h = − ( f ( x m b ) − f ( x i ∗ h )) / σ h θ h (cid:16) η i ∗ h, α h + η i ∗ h,s α sh + η i ∗ h,k α kh (cid:17) η i ∗ h ,r α ∗ r h + λ h = 0 ,r h = 1 h , s h , k h . Using the fact that α ∗ h + α ∗ s h + α ∗ k h = 1, we can establish α ∗ r h = | η i ∗ h ,r || η i ∗ h , | + | η i ∗ h ,s | + | η i ∗ h ,k | , r h = 1 h , s h , k h . (.8)For h = b , we solve the optimization model (12), which could be rewritten asmax R m b ,i ∗ b ( θ b , α b )s.t. α b + α s b + α k b = 1 , α b , α s b , α k b > . (.9)The process of solving the (.9) is similar to that is given for Theorem 1, and hence is omitted for brevity. Then, wecan get the conclusions in Theorem 3. Proof of Theorem 4
In order to determine the location of the support design s h for each partition, we consider two cases ( h (cid:54) = b and h = b ).For h (cid:54) = b , we plug (13) into (cid:101) R m b ,i ∗ h ( θ h , α h ), we have (cid:101) R m b ,i ∗ h ( θ h , α h ) = ( f ( x m b ) − f ( x i ∗ h )) / σ h θ h (cid:16) | η i ∗ h , | + | η i ∗ h ,s | + | η i ∗ h ,k | (cid:17) .∂ (cid:101) R m b ,i ∗ h ( θ h , α h ) ∂x s h = − ( f ( x m b ) − f ( x i ∗ h ) σ h θ h (cid:16) | η i ∗ h , | + | η i ∗ h ,s | + | η i ∗ h ,k | (cid:17) × (cid:16) ∂ | η i ∗ h , | ∂x s h + ∂ | η i ∗ h ,s | ∂x s h + ∂ | η i ∗ h ,k | ∂x s h (cid:17) . where ∂η i ∗ h , ∂x s h = ( x k h − x i ∗ h )( x h − x i ∗ h )( x h − x s h ) ( x h − x k h ) , η i ∗ h ,s ∂x s h = − ( x h − x i ∗ h )( x k h − x i ∗ h )(2 x s h − x h − x k h )( x s h − x h ) ( x s h − x k h ) ,∂η i ∗ h ,k ∂x s h = ( x h − x i ∗ h )( x k h − x i ∗ h )( x k h − x h )( x k h − x s h ) . When x s h > x i ∗ h , ∂ (cid:101) R m b ,i ∗ h ( θ h , α h ) ∂x s h < . When x s h < x i ∗ h , ∂ (cid:101) R m b ,i ∗ h ( θ h , α h ) ∂x s h > . Therefore, (cid:101) R m b ,i ∗ h ( θ h , α h ) is maximized when x s h = x i ∗ h . According to equation (13), we can get α ∗ i h = (cid:26) , i h = i ∗ h , , otherwise . When i ∗ h is at the extreme locations (1 h and k h ), (cid:101) R m b ,i ∗ h ( θ h , α h ) = ( f ( x m b ) − f ( x i ∗ h )) / σ h η i ∗ h ,r /θ h α r h , r h = i ∗ h . In this setting, the location of the support design s h has no effect on (cid:101) R m b ,i ∗ h ( θ h , α h ). For simplicity, we use theD-optimal support design and let x s h = ( x h + x k h ) / h = b , the proof is similar to that is given for Theorem 2, and hence is omitted for brevity. Then, we can getthe conclusions in Theorem 4. Proof of Theorem 5
Given the definition of key design i ∗ h , we rewrite the optimization model (18) asmax min (cid:110) R m b ,i ∗ b ( θ b , α ∗ b ) , min ≤ h ≤ mh (cid:54) = b (cid:16) R m b ,i ∗ h ( θ b , θ h , α ∗ b , α ∗ h ) (cid:17)(cid:111) , s.t. l (cid:88) h =1 θ h = 1 , θ h ≥ , h = 1 , ..., l. The R m b ,i ∗ b ( θ b , α ∗ b ) and R m b ,i ∗ h ( θ b , θ h , α ∗ b , α ∗ h ) are concave and strictly increasing functions of θ h and θ b (Glynn andJuneja, 2004). Therefore, the optimization problem is a convex programming problem, and the first order conditionis also the optimality condition.Rewrite the optimization model above asmax z s.t. R m b ,i ∗ h ( θ b , θ h , α ∗ b , α ∗ h ) − z ≥ , h = 1 , ..., l, h (cid:54) = bR m b ,i ∗ b ( θ b , α ∗ b ) − z ≥ l (cid:88) h =1 θ h = 1 , θ h ≥ , h = 1 , ..., l. (.10)22rom the KKT conditions, there exist λ h ≥ , h = 1 , , ..., l and µ > − m (cid:88) h =1 λ h = 0 (.11) µ − λ h ∂R m b ,i ∗ h ( θ ∗ b , θ ∗ h , α ∗ b , α ∗ h ) ∂θ h = 0 , h = 1 , , .., l, h (cid:54) = b (.12) µ − l (cid:88) h =1 h (cid:54) = b λ h ∂R m b ,i ∗ h ( θ ∗ b , θ ∗ h , α ∗ b , α ∗ h ) ∂θ b − λ b ∂R m b ,i ∗ b ( θ ∗ b , α ∗ b ) ∂θ b = 0 (.13) λ h ( z − R m b ,i ∗ h ( θ ∗ b , θ ∗ h , α ∗ b , α ∗ h )) = 0 , h = 1 , , .., l, h (cid:54) = b (.14) λ b ( z − R m b ,i ∗ b ( θ ∗ b , α ∗ b )) = 0 (.15)From (.11), it can be concluded that there must exist some λ h > , h = 1 , , ..., l . If there is one λ h = 0 , h =1 , , ..., l, h (cid:54) = b , it results µ = 0 based on (.12). That means all the λ h = 0, as ∂R m b ,i ∗ h ( θ b , θ h , α ∗ b , α ∗ h ) /∂θ h is strictlypositive. Therefore, we can conclude that λ h (cid:54) = 0 , h = 1 , , ..., l, h (cid:54) = b. According to (.14), z ∗ = R m b ,i ∗ h ( θ ∗ b , θ ∗ h , α ∗ b , α ∗ h ) , h = 1 , , ..., l, h (cid:54) = b. (.16)In addition, since θ h /θ b → l goes to infinity and R m b ,i ∗ b ( θ ∗ b , α ∗ b ) > R m b ,i ∗ h ( θ ∗ b , θ ∗ h , α ∗ b , α ∗ h ) , h = 1 , , ..., l, h (cid:54) = b ,it can be concluded that λ b = 0 according to (.15). Therefore, we have l (cid:88) h =1 h (cid:54) = b ∂R m b ,i ∗ h ( θ ∗ b ,θ ∗ h , α ∗ b , α ∗ h ) /∂θ b ∂R m b ,i ∗ h ( θ ∗ b ,θ ∗ h , α ∗ b , α ∗ h ) /∂θ h = 1 . (.17)According to Theorem 4, we have R m b ,i ∗ h ( θ ∗ b , θ ∗ h , α ∗ b , α ∗ h ) = ( f ( x m b ) − f ( x i ∗ h )) / σ b θ ∗ b (cid:16) η mb, α ∗ b + η mb,s α ∗ sb + η mb,k α ∗ kb (cid:17) + σ h θ ∗ h . (.18)Then, we can get ∂R m b ,i ∗ h ( θ ∗ b , θ ∗ h , α ∗ b , α ∗ h ) ∂θ ∗ b = σ b ( f ( x m b ) − f ( x i ∗ h )) (cid:16) η mb, α ∗ b + η mb,s α ∗ sb + η mb,k α ∗ kb (cid:17) /θ ∗ b (cid:18) σ b θ ∗ b (cid:16) η mb, α ∗ b + η mb,s α ∗ sb + η mb,k α ∗ kb (cid:17) + σ h θ ∗ h (cid:19) , (.19)and ∂R b m ,i ∗ h ( θ ∗ b , θ ∗ h , α ∗ b , α ∗ h ) ∂θ ∗ h = σ h ( f ( x m b ) − f ( x i ∗ h )) /θ ∗ h (cid:18) σ b θ ∗ b (cid:16) η mb, α ∗ b + η mb,s α ∗ sb + η mb,k α ∗ kb (cid:17) + σ h θ ∗ h (cid:19) , (.20)As ll