Stochastic Optimization using Polynomial Chaos Expansions
SStochastic Optimization using Polynomial ChaosExpansions
Tuhin Sahai ∗ September 18, 2020
Abstract
Polynomial chaos based methods enable the efficient computationof output variability in the presence of input uncertainty in complexmodels. Consequently, they have been used extensively for propagat-ing uncertainty through a wide variety of physical systems. Thesemethods have also been employed to build surrogate models for accel-erating inverse uncertainty quantification (infer model parameters fromdata) and construct transport maps. In this work, we explore the useof polynomial chaos based approaches for optimizing functions in thepresence of uncertainty. These methods enable the fast propagationof uncertainty through smooth systems. If the dimensionality of therandom parameters is low, these methods provide orders of magnitudeacceleration over Monte Carlo sampling. We construct a generalizedpolynomial chaos based methodology for optimizing smooth functionsin the presence of random parameters that are drawn from known dis-tributions. By expanding the optimization variables using orthogonalpolynomials, the stochastic optimization problem reduces to a deter-ministic one that provides estimates for all moments of the output dis-tribution. Thus, this approach enables one to avoid computationallyexpensive random sampling based approaches such as Monte Carlo andQuasi-Monte Carlo. In this work, we develop the overall framework,derive error bounds, construct the framework for the inclusion of con-straints, analyze various properties of the approach, and demonstratethe proposed technique on illustrative examples. ∗ Raytheon Technologies Research Center, Berkeley, CA, USA. [email protected] a r X i v : . [ m a t h . O C ] S e p Introduction
Uncertainty quantification (UQ) is a popular area of research that focuses onpropagating uncertainty through complex dynamic systems (typically repre-sented by ordinary or partial differential equations). Typical approaches forpropagating uncertainty include Monte Carlo [1], Quasi-Monte Carlo meth-ods [2], and importance sampling [3, 4]. These methods are based on thesampling of the underlying input probability distributions, and consequently,are standard techniques for estimating output uncertainty. In addition tosampling based techniques, over the last couple of decades, the UQ com-munity has actively pursued the development of non-sampling approachessuch as response surface [5, 6] and polynomial chaos [7] based methodolo-gies. Polynomial chaos methods involve the expansion of the stochasticvariable of interest using an orthogonal basis associated with the underlyingdistribution. This step is typically followed by a projection computationthat exploits the aforementioned orthogonality. Let us now briefly discussthe conditions under which various sampling and non-sampling methods arefound to be useful.Sampling based methods for UQ rely on generating samples in param-eter space, propagating the points through the system (evolving the pointsforward using numerical integration in the case of dynamical systems), andcomputing statistics of the first few moments of the output distribution.Monte Carlo methods involve generating random points that correspond toindependent trials. Note that the convergence of Monte Carlo is guaranteedby the strong law of large numbers. If one generates N independent samples,the error in the mean estimate converges as O ( N − / ) [2]. The advantage ofMonte Carlo based sampling is that the convergence is independent of thenumber of random parameters. Quasi-Monte Carlo based approaches, on theother hand, involve the generation of points using deterministic schemes. Inparticular, these points are generated using low-discrepancy sequences (low-discrepancy sequences have the property that, in the limit, the fraction ofpoints that fall into an arbitrary set A is equal to the measure of A ). Quasi-Monte Carlo methods have a convergence rate of log d ( N ) /N , where d isthe dimensionality of the input random parameter vector. Note that MonteCarlo and Quasi-Monte Carlo methods are routinely used in stochastic op-timization for computing sample average approximations (SAA) [8, 9].As mentioned previously, polynomial chaos methods are non-samplingmethods that rely on expanding the output random variables using an or-thogonal polynomial basis with respect to the prior distribution [10]. Forexample, if the joint probability distribution (prior) that captures the un-2ertainty in the random input variables is Gaussian, one uses the Hermitebasis [11]. Similarly, if the uncertainty is captured by the uniform distribu-tion, then one uses Legendre polynomials [11]. One can generate orthogonalpolynomials for arbitrary distributions, for more details see [10]. The pri-mary advantage of the polynomial chaos based approach is that it providesexponential convergence for smooth processes with finite variance [12]. Thisremarkable convergence result is obtained by invoking the Cameron-Martintheorem. Note that this approach only works for random variables withfinite variance where the underlying probability measure can be uniquelydetermined by its moments. A major drawback of the approach is that itsuffers from a curse of dimensionality. This curse of dimensionality will bediscussed in greater detail in subsequent sections.Our paper is organized as follows: we start by introducing the polynomialchaos approach. We then show how one can use this method to efficientlycompute the solutions of stochastic optimization problems and derive as-sociated error estimates. We also explore properties associated with thepolynomial chaos transformation and its consequences for stochastic opti-mization. We demonstrate the approach on illustrative examples includinga real-world human-machine task scheduling problem. This new method-ology for stochastic optimization is compared to sampling based methods.Although in [13], the authors invoke polynomial chaos for multiobjective op-timization, they neither integrate these expansions into the stochastic opti-mization framework nor derive approximation bounds. We note that in [14],the authors use polynomial chaos based surrogate models for stochastic op-timization of the power grid. Starting with a complete probability space Γ given by (Ω , F , P ), where Ω isthe sample space, F is the σ -algebra on Ω and P is a probability measure,let L (Γ , X ) denote the Hilbert space of square-integrable, F -measurable, X -valued random elements. Then one can, in general, define a polynomialchaos basis { ψ k ( λ ( ω )) } , where λ ( ω ) is a random vector, ω ∈ Ω, and k =( k , k , . . . ) is a vector of non-negative indices. We denote the probabilitydensity function of the random vector λ by ρ ( λ ).Generalized polynomial chaos (gPC) [15] provides a framework for rep-resenting second-order stochastic processes κ ∈ L (Γ , X ) for arbitrary dis-3ributions of λ by using the following expansion: κ ( λ ) = ∞ (cid:88) | k | =0 a k ψ k ( λ ) , (1)where | k | = (cid:80) i k i is the sum of the indices of k and ψ k ( λ ) are orthonor-mal polynomials on Γ with respect to ρ ( λ ). Restricting our formalism toEuclidean spaces (relevant for this work) the orthonormality is given by, (cid:90) R p ρ ( λ ) ψ i ( λ ) ψ k ( λ ) dλ = δ ik , (2)where δ ik is the Kronecker delta product. Depending on ρ ( λ ), one can gen-erate an appropriate orthogonal basis for representing κ ( λ ). As mentionedearlier, if ρ is Gaussian, then the appropriate polynomial chaos basis is theset of Hermite polynomials; if ρ is the uniform distribution, then the ba-sis is the set of Legendre polynomials. For details on the correspondencebetween distributions and polynomials see [10, 16]. A framework for gen-erating polynomials for arbitrary distributions has been developed in [15].The advantage of using polynomial chaos is that it provides exponentialconvergence for smooth processes with finite variance [12]. However, theapproach suffers from a curse of dimensionality, rendering it infeasible forproblems with more than a handful of random parameters. In particular,one typically truncates the order of expansion in Eqn. 1 (to r terms). Onethen uses the orthogonality property in Eqn. 2 to project the original equa-tion onto the different coefficients a k in Eqn. 1 [15]. Typically, low ordertruncations are found to capture the uncertainty in smooth systems [12] (aslong as the underlying probability measure can be uniquely determined byits moments [17]). If the order of expansion is r and the dimensionality ofthe uncertain parameters is p , then the number of terms one gets is, d ( r + p )! r ! p ! , (3)where d is the dimensionality of x (since we assume that x ∈ R d ). To mit-igate the above curse of dimensionality, sparse grid techniques [18, 19, 20],iterative methods [21, 22, 23, 24], regression based algorithms [25, 26], hier-archical methods [27], and dimensionality reduction based techniques [28, 29]have been developed. We now explore the use of these methods for optimiz-ing functions in the presence of uncertainty.4 Stochastic Optimization using Polynomial Chaos
Without loss of generality, we assume that the optimization problem is posedin the form, min x ∈ R d f ( x, λ ) , subject to g i ( x, λ ) ≤ , i = 1 , . . . , mh j ( x, λ ) = 0 , j = 1 , . . . , n, (4)where λ is a vector of random variables drawn from the probability distri-bution ρ ( λ ) and f ( x, λ ) , h ( x, λ ) , and g ( x, λ ) are smooth functions of x and λ . Here m and n are the number of inequality and equality constraintsrespectively.A host of algorithms have been developed in the areas of stochasticprogramming [30] and distributionally robust optimization [31, 32] to tacklethis problem. However, to the best of our knowledge, none of these methodsor their variants exploit the exponential convergence offered by polynomialchaos based approaches [12].Typically the above set of equations (Eqn. 4) are solved in either ex-pectation or worst case. In the case of expectation minimization, the aboveoptimization is converted to, min x ∈ R d E [ f ( x, λ )] (5)which is usually computed using expensive Monte Carlo computations [33].In this work, we exploit polynomial chaos expansions to approximate the op-timal solution (in expectation) without resorting to expensive Monte Carlosimulations. The advantage of our approach is that one can compute sev-eral moments of the optimal solution (mean, variance, and other higher ordermoments) through a single optimization computation, without the need forexpensive sampling. Note that the accuracy of computed moments dependon the order of expansion for the variables.We now explore the use of polynomial chaos in the context of stochas-tic optimization. For simplicity, assume that the optimal solution for theproblem in Eqn. 4, in the absence of constraints, is, x ∗ ( λ ) = argmin x f ( x, λ ) . (6)Although constraints in Eqn. 4 have been ignored for the moment, the con-strained optimization case will be revisited in Section 4. We now approx-imate the optimization variables in terms of the orthogonal polynomials5 k ( λ ) as follows, x ( λ ) ≈ r (cid:88) k =0 a k ψ k ( λ ) , (7)which results in the following approximate optimization problem,[ a ∗ , . . . , a ∗ r ] = argmin [ a ,...,a r ] f ( r (cid:88) k =0 a k ψ k ( λ ) , λ ) . (8)Using expectations and interchanging the integral and minimization gives(the bounds on the error due to this interchange are derived in section 3.2),[ a ∗ , . . . , a ∗ r ] ≈ argmin [ a ,...,a r ] (cid:90) R p f ( r (cid:88) k =0 a k ψ k ( λ ) , λ ) ρ ( λ ) dλ. (9)Denoting the integral term as F ( a , . . . , a r ) gives,[ a ∗ , . . . , a ∗ r ] ≈ argmin [ a ,...,a r ] F ( a , . . . , a r ) . (10)Typically F ( a , . . . , a r ) reduces to a simple form due to the orthogonal prop-erties of the basis. Moreover, the coefficients a k in the expansion can be usedto compute the moments of x ∗ [34]. In particular, the mean µ ( x ∗ ) = a ,standard deviation µ ( x ∗ ) = (cid:112) a + a + . . . + a r , and so on. This expan-sion is guaranteed to converge to the correct answer as long as the associatedmoment generating function converges to the input distribution [17]. Asmentioned previously, the dimensionality of the optimization in Eqn. 10 ismuch higher than the one in Eqns. 4 and 5. An important distinction of theoptimization in Eqn. 5 from the one in Eqn. 10 is that, although both prob-lems are deterministic, the latter retains information regarding higher-orderstatistics of x ∗ in the form of the coefficients. In particular, by substitutingEqn. 7 in the expressions below, one can compute the moments of x ∗ interms of [ a ∗ , . . . , a ∗ r ] using, µ = (cid:90) R p x ∗ ρ ( λ ) dλ, ... µ k = (cid:90) R p ( x ∗ − µ ) k ρ ( λ ) dλ, (11)where x ∗ is approximated using Eqns. 7 and 10.6 .1 Convergence of Polynomial Chaos The results by Cameron and Martin [12] show that a square integrablefunctional on the set of continuous functions with compact support canbe expanded in a convergent series of Hermite polynomials in a countablesequence of Gaussian random variables. This result was extended to thegeneralized polynomial chaos setting for arbitrary distributions with finitevariance under the condition that the underlying probability measure isuniquely determined by its moments [17].In our framework, since x ∗ is effectively treated as a random variable,the theorems and proofs from [17] are applicable. We extract the primaryresults from [17] and present them to the reader for completeness.As detailed in [17], the primary assumptions for x ∗ are as follows, Assumption 1- F . x ∗ possesses finite moments of all orders, i.e., (cid:82) x ∗ ( λ ) dλ < ∞ for all k . Assumption 2- F . The distribution function P ( x ∗ ≤ ξ ) is continuous. The primary theorems from [17] that guarantee convergence are listedbelow.
Theorem 3.1.
The sequence of orthogonal polynomials associated with therandom variable x ∗ satisfying assumptions 1- F and 2- F is dense in theHilbert space L (Ω , σ, P ) if and only if the moment problem is uniquely solv-able for its distribution.Proof. See [17].The conditions for convergence are given by the theorem below, as shownin [17].
Theorem 3.2.
If one of the following conditions for P ( x ∗ ≤ ξ ) satisfyingassumptions 1- F and 2- F is valid, then the moment problem is uniquelysolvable and the set of polynomials associated with x ∗ is dense in L (Ω , σ, P ) .1. P ( x ∗ ≤ ξ ) has compact support.2. The moment sequence of { µ k } of the distribution satisfies lim inf k →∞ k √ µ k k < ∞ or ∞ (cid:88) k =0 k √ µ k = 0 . . The random variable is exponentially integrable (cid:90) R exp( a | λ | ) P ( dλ ) < ∞ .
4. If the distribution has a symmetric, differentiable and strictly positivedensity l x ∗ and for a real number λ there holds (cid:90) ∞−∞ − log( l x ∗ ( λ ))1 + λ dλ = ∞ and − λl (cid:48) x ∗ ( λ ) l x ∗ (cid:37) ∞ ( λ → ∞ , λ ≥ λ ) . Proof.
See [17].As long as the above assumptions and conditions are satisfied, the ex-pansion in Eqn. 7 is guaranteed to converge to the solution for the problemin Eqn. 10. We now analyze the error incurred as a result of the interchangeof the integral and minimum operators in Eqn. 9.
A key approximation in the derivation for Eqn. 10 was the interchange ofintegral and minimum operators in Eqn. 9. We now bound the error intro-duced due to the interchange of the two operators.
Lemma 3.3. If f ( x, λ ) in Eqn. 6 is Lipschitz continuous with respect to x with a Lipschitz constant L , then the error due to the interchange of theintegral and minimization in Eqn. 9 has the following bound, (cid:12)(cid:12)(cid:12) E (cid:104) min x f ( x, λ ) (cid:105) − min x E [ f ( x, λ )] (cid:12)(cid:12)(cid:12) ≤ L (cid:90) R p | ˆ p ( λ ) − q | ρdλ, where ˆ p ( λ ) = argmin x f ( x, λ ) and q = argmin x E [ f ( x, λ )] .Proof. The above definition of ˆ p ( λ ) gives, E (cid:104) min x f ( x, λ ) (cid:105) = (cid:90) R p f (ˆ p ( λ ) , λ ) ρdλ, (12)and using the above definition of q gives, we get,argmin x E [ f ( x, λ )] = (cid:90) R p f ( q, λ ) ρdλ. (13)8ote that the error due to the interchange in Eqn. 9 is given by, (cid:12)(cid:12)(cid:12) E (cid:104) min x f ( x, λ ) (cid:105) − min x E [ f ( x, λ )] (cid:12)(cid:12)(cid:12) . (14)Note that since f ( x, λ ) is Lipschitz continuous with respect to x with aconstant L , we get, (cid:12)(cid:12)(cid:12) E (cid:104) min x f ( x, λ ) (cid:105) − min x E [ f ( x, λ )] (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) R p ( f (ˆ p ( λ ) , λ ) − f ( q, λ )) ρdλ (cid:12)(cid:12)(cid:12)(cid:12) , ≤ (cid:90) R p | f (ˆ p ( λ ) , λ ) − f ( q, λ ) | ρdλ, ≤ L (cid:90) R p | ˆ p ( λ ) − q | ρdλ (15)As can be seen above, the bound depends on the ρ weighted deviation ofˆ p ( λ ) from q . Here ˆ p ( λ ) is the optimal solution in Eqn. 6 parameterized by λ and q is the optimal solution in Eqn. 5 for the expected value of f ( x, λ ).In particular, if ˆ p ( λ ) deviates from q in the tail of ρ , the error due to theinterchange is expected to be minimal. Moreover, the Lipschitz constant L bounds the variation of f ( x, λ ) as a function of the argument. Thus, if L is small, the above bound is again expected to be small. Note that error isonly expected to be 0 if ˆ p ( λ ) = q ∀ λ .We now analyze different properties related to the polynomial chaosbased stochastic optimization approach. Convexity is an important property that is frequently exploited in opti-mization [35]. The optimization of convex functions over convex domainsis tractable using well-known polynomial time algorithms. Non-convex op-timization, on the other hand, is in general NP-hard [36]. Most genericoptimization software are able to efficiently compute globally optimal solu-tions in convex settings. Thus, if the underlying function f ( x, λ ) is convex,it is important that the polynomial chaos based approach preserve convex-ity. Therefore, we now explore the impact of the expansion on the convexityproperties of f ( x, λ ). Lemma 3.4. If f ( x, λ ) in Eqn. 6 is convex for all values of λ , then F ( a , . . . , a r ) in Eqn. 10 is also convex. roof. For brevity, let F ( (cid:126)a ) = F ( a , . . . , a r ). Taking (cid:126)a = θ(cid:126)b +(1 − θ ) (cid:126)c impliesthat, a k = θb k + (1 − θ ) c k , and dropping the argument of ψ k ( λ ) (for simplicity) gives, f ( θ (cid:88) k b k ψ k + (1 − θ ) (cid:88) k c k ψ k , λ ) ≤ θf ( (cid:88) k b k ψ k , λ ) + (1 − θ ) f ( (cid:88) k c k ψ k , λ ) , (16)since f ( x, λ ) is a convex function. Note that one uses the polynomial ex-pansion outlined in Eqn. 7 to obtain the above expression. Using the aboveexpression in conjunction with the fact that ρ ( λ ) ≥ (cid:90) R n f ( θ (cid:88) k b k ψ k + (1 − θ ) (cid:88) k c k ψ k , λ ) ρ ( λ ) dλ ≤ θ (cid:90) R n f ( (cid:88) k b k ψ k , λ ) ρ ( λ ) dλ + (1 − θ ) (cid:90) R n f ( (cid:88) k c k ψ k , λ ) ρ ( λ ) dλ, (17)which can be rewritten as, F ( (cid:126)a ) ≤ θF ( (cid:126)b ) + (1 − θ ) F ( (cid:126)c ) , (18)or, F ( θ(cid:126)b + (1 − θ ) (cid:126)c ) ≤ θF ( (cid:126)b ) + (1 − θ ) F ( (cid:126)c ) . (19)The above inequality proves that F ( (cid:126)a ) is convex.The preservation of convexity is analogous to the preservation of Hamil-tonian structure by the polynomial chaos framework [37, 38]. Remark 3.5.
It is easy to show that if the underlying cost function f ( x, λ ) isa homogeneous polynomial [39], then the resulting cost function F ( (cid:126)a ) is alsohomogeneous. This can be shown by using the fact that the right hand side ofthe expansion x = r (cid:88) k =0 a k ψ k ( λ ) is effectively a dot product between the vectors [ a , a , . . . , a r ] and [ H , H , . . . , H r ] . Since this form is linear in (cid:126)a , any ho-mogeneous polynomial will also be homogeneous with the same degree as theoriginal system. In a similar fashion one can show that the transformation reserves the sum-of-squares (S.O.S.) hierarchy [40]. In particular, sinceany sum-of-squares polynomial can, without loss of generality, be written inthe following quadratic form, z T Qz where z = [1 , x , x , . . . , x , x x , . . . x dm ] T .When one expands each variable using the orthogonal polynomials, it can beshown that the quadratic structure is preserved, implying that the S.O.S.structure is preserved by the transformation. In the previous section, we omitted the constraints in Eqn. 4 for simplic-ity. The imposition of constraints can be addressed using the Lagrangianframework [35] as follows, L ( x, u, v, λ ) = f ( x, λ ) + m (cid:88) i =1 u i g i ( x, λ ) + n (cid:88) j =1 v j h j ( x, λ ) . (20)Using the polynomial expansion in Eqn. 7 and integrating with respect to ρ ( λ ) dλ gives, ˜ L ( (cid:126)a, u, v ) = F ( (cid:126)a ) + m (cid:88) i =1 u i G i ( (cid:126)a ) + n (cid:88) j =1 v j H j ( (cid:126)a ) , (21)where (cid:126)a = [ a , . . . , a r ]. Both G i and H j are the terms obtained by integrating g i and h j weighted by ρ ( λ ) (note that the definition is similar to the onefor F in Eqn. 10). It is easy to show that as a consequence of ρ ≥
0, theinequality and equality constraints are preserved i.e. H j ( (cid:126)a ) = 0 and G i ( (cid:126)a ) ≤
0. Thus, ˜ L ( u, v ) in the above equation is the corresponding Lagrangian forthe stochastic problem. Now one can define the Lagrange dual function asfollows, ˆ(cid:32)L( u, v ) = min (cid:126)a ˜ L ( (cid:126)a, u, v ) . (22)Note that Eqn. 4 is the primal form. It can be shown that ˆ(cid:32)L( u, v ) ≤ F ∗ ,where F ∗ is the optimal solution of primal formulation [35]. Thus, one getsa dual optimization problem of the form,max u,v ˆ(cid:32)L( u, v ) , subject to u ≥ . (23)11he difference between F ∗ and max u,v ˆ(cid:32)L( u, v ) is known as the duality gap.The condition F ∗ = max u,v ˆ(cid:32)L( u, v ) is known as the strong duality. Boththe conditions hold for the stochastic setting considered here. One canalso derive the corresponding Karush-Kuhn-Tucker (KKT) conditions thatguarantee optimality [41, 42] of the form, ∇ F ( (cid:126)a ) + m (cid:88) i =1 u i ∇ G i ( (cid:126)a ) + n (cid:88) j =1 v j ∇ H j ( (cid:126)a ) = 0 , along with the following constraints, G i ( (cid:126)a ) ≤ , i = 1 , . . . , mH j ( (cid:126)a ) = 0 , j = 1 , . . . , nu i ≥ i = 1 , . . . , mu i G i ( (cid:126)a ) = 0 , i = 1 , . . . , m. (24)The above conditions are necessary for optimality. For convex problems, theKKT conditions are also sufficient for optimality. Thus, by computing theKKT conditions one can obtain the optimal solution (condition on strongduality e.g. Slater’s condition). We refer the reader to [35] for furtherdetails.We note that, in general, a Lagrange multiplier approach can, undercertain conditions, convert minima (or maxima) into saddle points as shownbelow. Saddles points have both increasing as well as decreasing directionson the energy landscape (for further information about saddle points werefer the reader to [43]). Lemma 4.1.
Let x ∗ be a minimum (similar argument holds for maxima)to the following 1-dimensional problem, min x ∈ R d f ( x, λ ) ,h ( x, λ ) = 0 , . (25) Then x ∗ becomes a saddle point in the Lagrangian formulation, min x,v L ( x, v, λ ) L ( x, v, λ ) = f ( x, λ ) + vh ( x, λ ) , (26) if ∇ h (cid:54) = 0 . roof. For Eqn. 26, the optimality conditions are, ∇ f ( x, λ ) + v ∇ h ( x, λ ) = 0 , L v = h ( x, λ ) = 0 . (27)Let x ∗ satisfy the above equations, then the stability of the point is deter-mined by the eigenvalues of the Jacobian matrix, J = (cid:20) ∇ f + v ∇ h ∇ h ∇ h (cid:21) . (28)The characteristic equation for the eigenvalues η , η are given by, η − η ( ∇ f + v ∇ h ) − ( ∇ h ) = 0 . (29)Thus, this implies that the product of the eigenvalues η η = − ( ∇ h ) caneither be negative or zero. In the case the product is non-zero, it is easyto see that x ∗ is a saddle (since it must have one positive eigenvalue). Wenote that ∇ h = 0 corresponds to h = c which is a trivial constraint and theminimization problem is rendered superfluous.To address the above issue, we intend to explore the future integration ofthe polynomial chaos framework with barrier [44] and penalty methods [45]for optimization. We now demonstrate the polynomial chaos based stochastic optimizationframework on a few illustrative optimization examples. We start with asimple 1-dimensional quadratic unconstrained optimization problem. Weincrease the problem complexity by illustrating the approach on a standardnon-convex 2-dimensional optimization problem. We finally include con-straints and demonstrate the approach on a complicated human-machinetask scheduling problem that is inspired from real-world aerospace applica-tions.
We now demonstrate the proposed approach on a simple illustrative opti-mization problem. Consider an objective function of the form,min x (1 + λ ) x + x, (30)13here λ is a normally distributed random variable with mean µ ( λ ) = 0 . µ ( λ ) = 0 .
1. Since λ is normally distributed, weexpand x in terms of Hermite polynomials [15]. Expanding x to the secondorder results in the following expression, x ≈ a ψ + a ψ + a ψ , (31)We now perform the steps outlined in Eqns. 8 to 10, which results in thefollowing optimization problem,min [ a ,a ,a ] a + a + a + µ ( λ ) [2 a a + 4 a a ] + a . (32)Solving eqn. 32 using standard quadratic programming solvers yields a meanof µ ( x ∗ ) ≈ − .
505 and standard deviation µ ( x ∗ ) = 0 . µ ( x ∗ ) = − .
508 and standard deviation µ ( x ∗ ) = 0 . Consider the Himmelblau test function [46] for optimization given by, f ( x , x ) = ( x + x − + ( x + x − . (33)This well-known function has four local minima and one maximum as shownin Fig. 1. It is frequently used to test new optimization algorithms.Let us now consider a random version of the above cost function, f ( x , x ) = ( x + x −
11 + 2 . λ ) + ( x + x − . (34)where λ is random variable drawn from the standard normal distribution.We compute the four minima for the deterministic case (Eqn. 33) usingmultiple initial conditions (one initial condition is picked in the basin ofattraction for each equilibrium). The computed equilibria are depicted inFig. 1 and tabulated in the second column of table 1.We then introduce the uncertainty as captured in Eqn. 34. The baselinestatistics (mean and standard deviations) for the four equilibria are com-puted using crude Monte Carlo sampling. In particular, the statistics of14igure 1: The cost function and equilibria in the nominal Himmelblau ex-ample.every equilibrium is computed using 1000 independent Monte Carlo sam-ples. The computed mean and standard deviations are tabulated in table 1.We then use a simple first order polynomial chaos expansion for x and x of the form, x ≈ a ψ + a ψ x ≈ b ψ + b ψ . (35)Using the orthonormality conditions as described in Eqn. 2, results in theEqulibria Deterministic Random - Monte Carlo Random - Polynomial ChaosMean Std. Dev. Mean Std. Dev.Equil. 1 (3 . , .
0) (2 . , .
0) (0 . , .
09) (2 . , .
06) (0 . , . − . , .
13) ( − . , .
13) (0 . , .
06) ( − . , .
12) (0 . , . − . , − .
28) ( − . , − .
28) (0 . , .
04) ( − . , − .
29) (0 . , . . , − .
85) (3 . , − .
85) (0 . , .
07) (3 . , − .
83) (0 . , . [ a ,a ,b ,b ] ( a + b ) + 3( a + b ) + 6( a a + b b ) + 2( a b + a b + a b )+ 4( a a b + a b b ) − a + a ) − b + b ) + 8 a a + 2 a b − a − b + 4 b + 174 . (36)For the polynomial chaos expansion we perform a single optimization foreach equilibria by picking an initial condition in its basin of attraction. Theoptimal values of [ a , a , b , b ] are used to compute the mean and stan-dard deviations of the equilibrium values under uncertainty. The results areshown in table 1. We find that in comparison to Monte Carlo, the polyno-mial chaos approach for stochastic optimization fares well. In particular, wefind that the approach gets within 2 −
3% of the mean value with the simplelinear approximation in Eqn. 35. The standard deviation values are alsovery close with the exception of equilibrium 1. We believe that the causeof this is the quadratic nature of the local gradient that will require secondorder expansions for x and x .Let us now consider optimization problems of greater complexity suchas a formulation of human-machine optimization subject to constraints. We now consider the application of the polynomial chaos for stochastic opti-mization in the context of task scheduling for human-machine interaction. Inparticular, we consider the problem of scheduling tasks for human-UAV mis-sions in the presence of uncertainty. Using the formulation detailed in [47],we apply our approach to a finite horizon scheduling problem. Our goalis to optimally schedule tasks in the presence of behavioral variance thataccounts for levels of arousal or stress on cognitive performance [48]. Forexample, the Yerkes-Dodson law is frequently used to capture the notionthat intermediate levels of stress or arousal give rise to the best human per-formance [48]. This law can be imposed in the scheduling algorithm as aconstraint that ensures that the task load of an unmanned aerial vehicle(UAV) operator is kept within predetermined bounds [47].The overall problem can posed as a linear program by relaxing the as-sumption that each task must fit into a single slot as done in [47]. Let x ij denote the fraction of task i that is performed in time (task) slot j (seeFig. 2). Simplifying the formulation in [47], each task i can be specified bythe following 3-tuple [ t i , r i , δw i ]. Here t i is the amount of time it takes to16omplete task i , r i is the reward for completing the task, and δw i is theincrease in task load of the operator due to the completion of the task.Figure 2: Task scheduling depiction.Adapting [47], the overall optimization problem can be posed as follows,max x ij (cid:88) i (cid:88) j r i x ij , subject to (cid:88) j x ij ≤ , (cid:88) i δw i x ij ≤ β, (37)where β is the maximum task load threshold for the operator [48]. Weintroduce uncertainty in the upper bounds of all the inequality constraints.This variability corresponds to flexibility in the requirement for completionof tasks – in other words, we model the uncertainty on the requirement ofthe percentage of tasks that must be completed.Note that in this problem formulation, one is maximizing the rewardwhile ensuring that the task load does not exceed prescribed limits. In [47],the authors deal with uncertainty in constraints by using a scenario basedapproach. This methodology involves appending the cost function with mul-tiple samples of the constraints and adding them to the original problemformulation. This approach suffers from multiple drawbacks including the inability to deal with uncertainty in the cost function and scalability issuesdue to the number of appended constraints . In contrast, the polynomialchaos approach is able to deal with uncertainty in both cost and constraintswithout artificially increasing the number of constraints. However, as notedpreviously, the polynomial chaos approach does suffer from a separate curse17f dimensionality (as shown in Eqn. 3) that constrains its application toproblems with a large number of the uncertain parameters.Let us assume that the the operator has three available tasks in thistask pool with identical reward r i = 1 . δw i = 3 .
0. We also include a “rest” task that reduces the overallaccumulated task load by setting the associated δw i to − .
0. We assume thatthe UAV operator has three such available “rest” tasks at his/her disposal.We compute the task schedule over 10 slots. Let β (the threshold of the taskload and variability in the upper bounds of all inequality constraints) be arandom variable drawn from a Gaussian distribution with mean µ ( β ) = 1 . µ = 0 . .
48% of the three tasks while using 91 .
47% of available rest peri-ods (spread out over 10 slots). The variance in the completion of the tasksis 3 .
77% with 3 .
05% variance on the resting task. Polynomial chaos basedstochastic optimization exploits the orthogonality constraints of the polyno-mial expansions, as shown in Eqn. 2. The method predicts that on average100% of the three tasks will be completed by utilizing 92 .
4% of the “rest”task. Moreover, it predicts that the variance of task completion is 4% with1% variance on the resting task. As can be seen from the numerical values,polynomial chaos based method gets accuracy to the second decimal whenpredicting the mean and variance of the performance of the task schedulinglinear program without resorting to expensive sampling based methods . Robust and stochastic optimization methods have found application in awide variety of settings including control theory [49], system design [50],portfolio optimization [51], and inventory optimization [52] to name a few.Despite several existing algorithms, robust and stochastic optimization innon-convex settings remains an open and challenging area of critical impor-tance.In this work, we take early steps towards extending uncertainty quantifi-cation methods for optimization under parametric uncertainty. In particu-lar, we use polynomial chaos based techniques for optimizing functions in thepresence of uncertainty. We treat the optimization variable value as a ran-18om variable and expand it using orthogonal polynomials. Exploiting theseorthonormality constraints allows one to construct a method with exponen-tial convergence [7]. Although, the approach is standard for uncertaintyanalysis in the presence of uncertainty, very little work has been done to ex-ploit these methods for stochastic optimization. Our paper lays out a frame-work for using the polynomial chaos approach for optimizing uncertain costfunctions in the presence of constraints which may also be uncertain. We in-clude convergence proofs, derive error bounds, and study the preservation ofstructure (convexity and homogeneity). We then demonstrate the approachon a simple unconstrained one dimensional optimization problem, a two-dimensional non-convex problem, and a constrained optimization problemmotivated from task allocation in human-machine systems. The approachis found to accurately capture the statistics (moments) of the optimizingvalues in an efficient manner without resorting to expensive sampling basedcomputations . This results in orders-of-magnitude reduction in the compu-tational effort in finding statistics of the optimal solution in problems withlow dimensional uncertainty.In future work, we intend to extend this approach to optimization ofdiscontinuous functions using wavelet expansions [53] and construction ofiterative optimization methods by extending the framework in [21]. Thelatter approach is expected to mitigate the curse of dimensionality associatedwith polynomial chaos expansions, thereby expanding its applicability.
This material is based on work supported by the US Air Force Research Lab(AFRL), Air Force Office of Scientific Research, under contract FA9550-14-C-0022 and Defense Advanced Research Projects Agency (DARPA) andSpace and Naval Warfare Systems Center, Pacific (SSC Pacific) under con-tract number N6600118C4031. The author thanks Dr. C. William Gear,Dr. Warren Powell, and Dr. Arvind Raghunathan for discussions and sug-gestions related to the work.
References [1] R. E. Caflisch. Monte Carlo and Quasi-Monte Carlo methods.
ActaNumerica , 7:1–49, 1998. 192] H. Niederreiter. Quasi-Monte Carlo methods and pseudo-random num-bers.
Bulletin of the American Mathematical Society , 84(6):957–1041,1978.[3] J. Bucklew.
Introduction to rare event simulation . Springer, 2004.[4] A. Budhiraja and P. Dupuis.
Analysis and Approximation of RareEvents: Representations and Weak Convergence Methods , volume 94.Springer, 2019.[5] M. S. Allen and J. A. Camberos. Comparison of uncertainty prop-agation / response surface techniques for two aeroelastic systems. In , 2009.[6] T. Sahai, V. Fonoberov, and S. Loire. Uncertainty as a stabilizer of thehead-tail ordered phase in carbon-monoxide monolayers on graphite.
Physical Review B , 80(11):115413, 2009.[7] N. Wiener. The homogeneous chaos.
American Journal of Mathematics ,60(4):897–936, 1938.[8] J. Linderoth, A. Shapiro, and S. Wright. The empirical behavior ofsampling methods for stochastic programming.
Annals of OperationsResearch , 142(1):215–241, 2006.[9] S. S. Drew and T. Homem-de-Mello. Quasi-Monte Carlo strategies forstochastic optimization. In
Proceedings of the 38th conference on Win-ter simulation , pages 774–782. Winter Simulation Conference, 2006.[10] X. Wan and G. E. Karniadakis. Recent advances in polynomial chaosmethods and extensions. In
Computational Uncertainty in MilitaryVehicle Design Meeting Proceedings . NATO/OTAN, Paper ReferenceNumber: RTO-MP-IST-999, 2008.[11] G. Szeg¨o.
Orthogonal polynomials , volume 23. Amer MathematicalSociety, 1939.[12] R. H. Cameron and W. T. Martin. The orthogonal development ofnon-linear functionals in series of Fourier-Hermite functionals.
Annalsof Mathematics , pages 385–392, 1947.[13] S. Poles and A. Lovison. A polynomial chaos approach to robustmultiobjective optimization. In
Dagstuhl Seminar Proceedings . SchlossDagstuhl-Leibniz-Zentrum f¨ur Informatik, 2009.2014] Cosmin Safta, Richard L-Y Chen, Habib N Najm, Ali Pinar, and Jean-Paul Watson. Toward using surrogates to accelerate solution of stochas-tic electricity grid operations problems. In , pages 1–6. IEEE, 2014.[15] X. Wan and G. E. Karniadakis. Beyond Wiener-Askey expansions:Handling arbitrary PDFs.
Journal of Scientific Computing , 27:455–464, 2006.[16] H. Ogura. Orthogonal functions of the Poisson processes.
IEEE Trans-actions on Information Theory , 18(4):473–481, 1972.[17] O. G Ernst, A. Mugler, H.-J. Starkloff, and E. Ullmann. On the con-vergence of generalized polynomial chaos expansions.
ESAIM: Mathe-matical Modelling and Numerical Analysis , 46(2):317–339, 2012.[18] C. G. Webster.
Sparse Grid Stochastic Collocation Techniques for theNumerical Solution of Partial Differential Equations with Random In-put Data . PhD thesis, Florida State University, 2007.[19] F. Nobile, R. Tempone, and C. G. Webster. A sparse grid stochasticcollocation method for partial differential equations with randon inputdata.
SIAM Journal on Numerical Analysis , 46:2309–2345, 2008.[20] N. Zabaras and B. Ganapathysubramanian. A scalable framework forthe solution of stochastic inverse problems using a sparse grid collo-cation approach.
Journal of Computational Physics , 227:4697–4735,2008.[21] A. Surana, T. Sahai, and A. Banaszuk. Iterative methods for scalableuncertainty quantification in complex networks.
International Journalfor Uncertainty Quantification , 2(4), 2012.[22] T. Sahai, A. Speranzon, and A. Banaszuk. Wave equation based algo-rithm for distributed eigenvector computation. In , pages 7308–7315. IEEE, 2010.[23] T. Sahai, A. Speranzon, and A. Banaszuk. Hearing the clusters in agraph: A distributed algorithm.
Automatica , 48:15–24, 2012.[24] S. Klus, T. Sahai, C. Liu, and M. Dellnitz. An efficient algorithm forthe parallel solution of high-dimensional differential equations.
Journalof Computational and Applied Mathematics , 235:3053–3062, 2011.2125] G. Blatman and B. Sudret. An adaptive algorithm to build up sparsepolynomial chaos expansions for stochastic finite element analysis.
Probabilistic Engineering Mechanics , 25(2):183–197, 2010.[26] G´eraud Blatman and Bruno Sudret. Adaptive sparse polynomial chaosexpansion based on least angle regression.
Journal of ComputationalPhysics , 230(6):2345–2367, 2011.[27] X. Ma and N. Zabaras. An adaptive hierarchical sparse grid collocationalgorithm for the solution of stochastic differential equations.
Journalof Computational Physics , 228(8):3084–3113, 2009.[28] X. Ma and N. Zabaras. Kernel principal component analysis forstochastic input model generation.
Journal of Computational Physics ,230(19):7311–7331, 2011.[29] Y. M. Marzouk and H. N. Najm. Dimensionality reduction and poly-nomial chaos acceleration of Bayesian inference in inverse problems.
Journal of Computational Physics , 228(6):1862–1902, 2009.[30] P. Kali and S. W. Wallace.
Stochastic Programming . Springer, 1994.[31] E. Delage and Y. Ye. Distributionally robust optimization under mo-ment uncertainty with application to data-driven problems.
Operationsresearch , 58(3):595–612, 2010.[32] V. Gabrel, C. Murat, and A. Thiele. Recent advances in robust op-timization: An overview.
European Journal of Operational Research ,235(3):471–483, 2014.[33] A. Shapiro and T. Homem-de-Mello. On the rate of convergence of op-timal solutions of Monte Carlo approximations of stochastic programs.
SIAM Journal on Optimization , 11(1):70–86, 2000.[34] D. Xiu and G. E. Karniadakis. Modeling uncertainty in flow simulationsvia generalized polynomial chaos.
J. Comp. Phys. , 187:137–167, 2003.[35] S. Boyd and L. Vandenberghe.
Convex optimization . Cambridge Uni-versity Press, 2004.[36] T. Sahai. Dynamical systems theory and algorithms for NP-hard prob-lems.
Workshop on Set Oriented Numerics , 2020.2237] J. M. Pasini and T. Sahai. Polynomial chaos based uncertainty quantifi-cation in Hamiltonian and chaotic systems. In , pages 1113–1118. IEEE, 2013.[38] J. M. Pasini and T. Sahai. Polynomial chaos based uncertainty quantifi-cation in Hamiltonian, multi-time scale, and chaotic systems.
Journalof Computational Dynamics , 1(2):357, 2014.[39] D. Cox, J. Little, and D. O’Shea.
Ideals, varieties, and algorithms:an introduction to computational algebraic geometry and commutativealgebra . Springer Science & Business Media, 2013.[40] Jean B Lasserre. A sum of squares approximation of nonnegative poly-nomials.
SIAM review , 49(4):651–669, 2007.[41] W. Karush. Minima of functions of several variables with inequalitiesas side conditions.
Master thesis, University of Chicago , 1939.[42] H. W. Kuhn and A. W. Tucker.
Proceedings of 2nd Berkeley Sympo-sium . Berkeley: University of California Press, 1951.[43] J. Guckenheimer and P. Holmes.
Nonlinear oscillations, dynamical sys-tems, and bifurcations of vector fields , volume 42. Springer-Verlag NewYork, 1983.[44] N. Karmarkar. A new polynomial-time algorithm for linear program-ming. In
Proceedings of the sixteenth annual ACM symposium on The-ory of computing , pages 302–311. ACM, 1984.[45] D. P Bertsekas.
Nonlinear programming . Athena scientific Belmont,1999.[46] D. M. Himmelblau et al.
Applied nonlinear programming . McGraw-Hill,2018.[47] J. R. Peters and L. F. Bertuccelli. Robust task scheduling for multi-operator supervisory control missions.
Journal of Aerospace Informa-tion Systems , 2016.[48] R. M. Yerkes and J. D. Dodson. The relation of strength of stimulusto rapidity of habit-formation.
Journal of comparative neurology andpsychology , 18(5):459–482, 1908.[49] C. E. Garcia, D. M. Prett, and M. Morari. Model predictive control:theory and practice. A survey.
Automatica , 25(3):335–348, 1989.2350] H.-G. Beyer and B. Sendhoff. Robust optimization–a comprehen-sive survey.
Computer methods in applied mechanics and engineering ,196(33):3190–3218, 2007.[51] D. Bertsimas, D. B. Brown, and C. Caramanis. Theory and applicationsof robust optimization.
SIAM Review , 53(3):464–501, 2011.[52] D. Bertsimas and A. Thiele. A robust optimization approach to inven-tory theory.
Operations Research , 54(1):150–168, 2006.[53] T. Sahai and J. M. Pasini. Uncertainty quantification in hybrid dynam-ical systems.