[PDF] Momentum-based Accelerated Mirror Descent Stochastic Approximation for Robust Topology Optimization under Stochastic Loads

Abstract

Robust topology optimization (RTO) improves the robustness of designs with respect to random sources in real-world structures, yet an accurate sensitivity analysis requires the solution of many systems of equations at each optimization step, leading to a high computational cost. To open up the full potential of RTO under a variety of random sources, this paper presents a momentum-based accelerated mirror descent stochastic approximation (AC-MDSA) approach to efficiently solve RTO problems involving various types of load uncertainties. The proposed framework can perform high-quality design updates with highly noisy stochastic gradients. We reduce the sample size to two (minimum for unbiased variance estimation) and show only two samples are sufficient for evaluating stochastic gradients to obtain robust designs, thus drastically reducing the computational cost. We derive the AC-MDSA update formula based on ℓ 1 -norm with entropy function, which is tailored to the geometry of the feasible domain. To accelerate and stabilize the algorithm, we integrate a momentum-based acceleration scheme, which also alleviates the step size sensitivity. Several 2D and 3D examples with various sizes are presented to demonstrate the effectiveness and efficiency of the proposed AC-MDSA framework to handle RTO involving various types of loading uncertainties.

Full PDF

MMomentum-based Accelerated Mirror Descent StochasticApproximation for Robust Topology Optimization under StochasticLoads

Weichen Li a , Xiaojia Shelly Zhang a,b, ∗ a Department of Civil and Environmental Engineering, University of Illinois Urbana-Champaign, 205 NorthMathews Ave, Urbana, IL 61801, USA b Department of Mechanical Science and Engineering, University of Illinois Urbana-Champaign

Abstract

Robust topology optimization (RTO) improves the robustness of designs with respect to randomsources in real-world structures, yet an accurate sensitivity analysis requires the solution of manysystems of equations at each optimization step, leading to a high computational cost. To open upthe full potential of RTO under a variety of random sources, this paper presents a momentum-basedaccelerated mirror descent stochastic approximation (AC-MDSA) approach to eﬃciently solve RTOproblems involving various types of load uncertainties. The proposed framework can perform high-quality design updates with highly noisy stochastic gradients. We reduce the sample size to two(minimum for unbiased variance estimation) and show only two samples are suﬃcient for evaluatingstochastic gradients to obtain robust designs, thus drastically reducing the computational cost. Wederive the AC-MDSA update formula based on (cid:96) -norm with entropy function, which is tailoredto the geometry of the feasible domain. To accelerate and stabilize the algorithm, we integrate amomentum-based acceleration scheme, which also alleviates the step size sensitivity. Several 2Dand 3D examples with various sizes are presented to demonstrate the eﬀectiveness and eﬃciency ofthe proposed AC-MDSA framework to handle RTO involving various types of loading uncertainties. Keywords:

Robust topology optimization, stochastic approximation, load uncertainty, mirrordescent stochastic approximation, acceleration scheme, step size strategies

1. Introduction

Topology optimization has been widely used in many disciplines, such as aerospace engineering[1, 2], biomedical engineering [3, 4], and architectural design [5]. The main goal of topology opti-mization is to ﬁnd the distribution of material to achieve optimized performance [6, 7]. While theclassical setting of topology optimization assumes problem-related parameters that are determin-istic, real-world structures are subjected to various sources of randomness, such as load, material ∗ Corresponding author

Email address: [email protected] (Xiaojia Shelly Zhang )

Preprint submitted to Journal of L A TEX Templates September 1, 2020 a r X i v : . [ c s . C E ] A ug roperty, and geometry, which can inﬂuence the layout of optimized designs. Thus, robust topologyoptimization (RTO) has been employed to improve the robustness of designs concerning randomsources [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19].One common random source comes from the loading, which includes load magnitudes, direc-tions, locations, and distributions. Many studies have contributed to the RTO with load ran-domness using various approaches, such as semideﬁnite programming [20], conversion to manyload cases [21, 22], ﬁrst-order reliability method approximation [12], Karhunen-Loeve expansion tomodel stochastic load ﬁelds [13, 14], perturbation techniques [23], stochastic collocation [15], uni-variate dimension reduction [16], polynomial chaos expansion [17], game theory[24], linear elastictheory [25], and non-probabilistic interval uncertainty [18]. These approaches successfully producerobust optimized designs. For large-scale problems, particularly in three dimensions (3D), somemay require a relatively high computational cost as they typically solve multiple systems of equa-tions at each optimization step in order to accurately estimate the sensitivity information. In thiswork, we aim to reduce the computational cost associated with RTO problems using stochasticapproximation.Stochastic approximation (SA) [26] is a family of stochastic optimization methods known for itslow computational cost and eﬀectiveness [27]. In the standard setting, SA methods solve stochasticoptimization problems with the objective function in the form of the expectation of a stochasticfunction [28]. Instead of computing the exact gradient, the classic SA method uses a stochastic oneas the gradient descent direction. Thus, SA is also known as stochastic gradient descent (SGD).The SGD was initially developed by Robbins and Monro [26] and improved in [28, 27, 29]. In[27, 28], the classic SA (or SGD) is generalized to the mirror descent stochastic approximation(MDSA) by replacing the traditional (cid:96) -norm deﬁnition of distance in SGD with a more generaldeﬁnition. With the general setting, MDSA adapts its update to the underlying geometry ofthe feasible space and obtain improvements in the convergence performance [28, 30, 31]. Oneof the most popular versions of MDSA is the entropic MDSA, which is based on the (cid:96) -normsetting [28]. In a related area, smooth convex programming, accelerated methods (also known asmomentum methods) were ﬁrst developed by Polyak [32] and signiﬁcantly improved by Nesterov[33, 34]. These methods are referred to as the accelerated gradient descent and proved to possessan unimprovable rate of convergence for convex problems as a linear Krylov subspace method [34].The accelerated methods are incorporated into SA and MDSA to speed up the convergence ofstochastic optimization[35, 36, 37, 38]. Inspired by a popular version of accelerated SA methods,accelerated mirror descent stochastic approximation (AC-MDSA) [35], this paper derives an AC-MDSA framework tailored for the topology optimization accounting for stochastic loads.In the ﬁeld of topology optimization, the idea of integrating stochastic optimization algorithmshas been recently explored in a few studies. For instance, Zhang et al.[39] proposed a stochasticsampling algorithm that requires 5 to 6 samples to estimate the gradient and solve deterministictopology optimization problems with hundreds of load cases. De et al. [40] applied SGD algorithms2o compliance minimization of RTO problems with load uncertainty and shows improvementsover GCMMA [41]. Pﬂug et al. [42] developed a continuous stochastic gradient method (CSG)that shows superiority over traditional SGD methods when applied to the expected complianceminimization (without the variance term). Both [40] and [42] treat the volume constraint as apenalization term in the objective function, thereby converting the constrained optimization toan unconstrained problem. The volume constraint represents a feasible domain bounded a planein which (cid:96) -norm-based entropic MDSA performs better than the (cid:96) -norm-based SGD (and itsvariants) [28]. Recently, the (cid:96) -norm-based entropic MDSA has been proposed and tailored fortopology optimization with many deterministic load cases [31] and requires only a single sampleat each optimization step, thereby signiﬁcantly reducing the computational cost compared to thestandard weighted average formulation [31]. Theoretical and numerical comparisons of the entropicMDSA and SGD are also carried out therein, and show better performance (objective functionvalues and computational time) of the (cid:96) -norm entropic MDSA than SGD ( (cid:96) -norm-based) forcompliance minimization with a volume constraint [31]. The advantage of entropic MDSA comesfrom the use of (cid:96) -norm and entropic distance function to mimic the underlying geometry of thefeasible design space represented by the linear volume constraint [31]. Therefore, we focus on theentropic MDSA with the (cid:96) -norm setting in this study.In this work, we propose a novel momentum-based AC-MDSA algorithm to solve RTO prob-lems with the volume constraint involving various types of loading uncertainties. The proposedAC-MDSA approach can perform high-quality design variable updates with noisy stochastic gradi-ents. As a result, we demonstrate that only two samples are suﬃcient for computing the stochasticgradients at each optimization step, which is the minimum number of samples for unbiased vari-ance estimation. Second, in order to adapt to the underlying geometry of the feasible set deﬁnedby the volume constraint, we derive the explicit update formula in the (cid:96) -norm setting by intro-ducing the entropy function as the distance-generating function in the AC-MDSA method. Third,we present adaptive step-size recalibration and damping schemes which, in conjunction with themomentum-based acceleration mechanism, to improve the convergence performance of AC-MDSAwith signiﬁcantly reduced sensitivity to various step size choices. Through numerical examples inboth 2D and 3D, we showcase that the proposed AC-MDSA approach can eﬃciently produce robustdesigns with respect to diﬀerent types of loading uncertainties and exhibits scalable performancefor RTO problems of various problem sizes and geometries.The remainder of this paper is organized as follows. Section 2 reviews the RTO formulation forcompliance minimization problem considering various load uncertainties. Section 3 introduces thetheoretical background of AC-MDSA and derives a momentum-based entropic AC-MDSA updatealgorithm for the RTO problem. Section 4 proposes algorithmic techniques for improving con-vergence performance, including adaptive step size recalibration and damping schemes. Section5 presents four numerical examples illustrating the eﬀectiveness and eﬃciency of the proposedentropic AC-MDSA algorithm in producing robust optimized designs under various loading uncer-3ainties. Finally, Section 6 provides concluding remarks.

2. Robust topology optimization formulation

In this section, the RTO formulation of compliance minimization problem considering loadinguncertainties is introduced, and the unbiased estimations of the objective function and gradient ofthe RTO formulation are presented using a ﬁnite number of samples. In this work, we focus onthe density-based approach [6].For a given mesh consisting of n ﬁnite elements, the RTO aims to minimize the weighted sumof the mean and variance of the compliance under load randomness . More speciﬁcally, the RTOformulation is introduced as follows:min x J ( x ) = κw E [ C ( x , ξ )] + 1 − κw Var [ C ( x , ξ )]s.t. V ( x ) V − V f = 0 x ( i ) ∈ [0 , , i = 1 , , ..., n with K ( E ( x )) u ( x , ξ ) = f ( ξ ) , (1)where x is the design variable vector; f ( ξ ) is the random load vector with ξ being a randomvector representing various types of load uncertainty; K and u are the global stiﬀness matrixand displacement vector, respectively; V is the total volume of the design domain; and V f is theprescribed volume fraction. For a given structure with design variable x , V ( x ) stands for the totalvolume of that structure as follows, V ( x ) = n (cid:88) i =1 v ( i ) ¯ x ( i ) = v T ¯ x = v T Hx , (2)where v ( i ) and ¯ x ( i ) is the volume and the ﬁltered/physical density of the i th element, respectively;and H is the matrix representation of density ﬁlter [43, 2], such that ¯ x = Hx , which is used toprevent the checkerboard pattern and achieve mesh-independent designs [44, 45, 46]. In addition,the modiﬁed simpliﬁed isotropic material with penalization (SIMP) [6, 47, 48] is adopted, whichinterpolates the Young’s modulus of each element as E ( i ) (cid:0) ¯ x ( i ) ( x ) (cid:1) = E min + (cid:0) ¯ x ( i ) ( x ) (cid:1) p ( E − E min ) , i = 1 , ..., n, (3)where E is the Young’s modulus of the solid material; E min is the Ersatz stiﬀness which is takento be 10 − ; and p is the SIMP penalization parameters, which is taken to be 3 [49] in this study. A similar approach used by many studies is the weighted sum of mean and standard deviation as the objectivefunction, this work focuses on the weighted sum of mean and variance. C ( x , ξ ) = f T ( ξ ) u ( x , ξ ), where E [ · ] and Var [ · ] stand for expectationand variance operators, respectively; and κ ∈ [0 ,

1] is a prescribed coeﬃcient representing therelative importance of the expectation over the variance in the objective function. Because theexpectation and the variance of the compliance have diﬀerent units, we normalize their relativeweights by w and w , respectively, where w = ¯ f T ¯ f /E with ¯ f = E [ f ( ξ )] being the expectation ofthe load vector f ( ξ ) [11].The stochastic gradient of the objective function J in formulation (1) is given by g ( x ) . = ∇ x J ( x ) = E [ G ( x , ξ )] = E (cid:104) κw G µ ( x , ξ ) + 1 − κw G Var ( x , ξ ) (cid:105) , (4)where G µ ( x , ξ ) . = ∇ x C ( x , ξ ) and G V ar ( x , ξ ) . = 2 (cid:16) C ( x , ξ ) − E [ C ( x , ξ )] (cid:17) ∇ x C ( x , ξ ) , (5)respectively. In the above expressions, the stochastic gradient of the compliance with respect tothe design variable, ∇ x C ( x , ξ ), is obtained through the chain rule as ∇ x C ( x , ξ ) = H T ∇ ˜ x C ( x , ξ ) , (6)where ∇ ¯ x C ( x , ξ ) is the stochastic gradient of the compliance with respect to the ﬁltered designvariable ¯ x , whose i th component is given by ∂C ( x , ξ ) ∂ ¯ x ( i ) = − p (¯ x ( i ) ) p − (cid:16) u ( i ) ( x , ξ ) (cid:17) T k ( i )0 u ( i ) ( x , ξ ) (7)Symbols u ( i ) and k ( i ) are the nodal displacement vector and the element stiﬀness matrix (corre-sponds to solid material) of the i th element, respectively.In this work, we employ unbiased estimators of the objective function and its stochastic gradi-ent. The unbiased estimators of E [ C ( x , ξ )] and Var[ C ( x , ξ )] using m samples are denoted by µ m ( x )and Var m ( x ) with Var m ( x ) = ( σ m ( x )) , where σ m ( x ) is the estimate of the standard deviation.The estimators µ m ( x ) and Var m ( x ) are given by µ m ( x ) = 1 m m (cid:88) j =1 C ( x , ξ j ) andVar m ( x ) = 1 m − m (cid:88) j =1 (cid:16) C ( x , ξ j ) − µ m ( x ) (cid:17) , (8)respectively, where ξ j , j = 1 , ..., m are independent and identically distributed (i.i.d.) samplesof the random vector ξ . Notice that, for the variance estimator Var m ( x ), it requires m ≥ J ( x ), denoted by J m ( x ), is given by J m ( x ) = κw µ m ( x ) + 1 − κw Var m ( x ) (9)Similarly, the unbiased estimators of ∇ x (cid:16) E [ C ( x , ξ )] (cid:17) and ∇ x (cid:16) Var[ C ( x , ξ )] (cid:17) using m samples,which are respectively denoted by G µm ( x ) and G Var m ( x ), take the forms of G µm ( x ) = 1 m m (cid:88) j =1 ∇ x C ( x , ξ j ) and G Var m ( x ) = 2 m − (cid:40) m (cid:88) j =1 (cid:16) C ( x , ξ j ) ∇ x C ( x , ξ j ) (cid:17) − µ Cm ( x ) G µm ( x ) (cid:41) (10)Accordingly, the corresponding unbiased estimator of the stochastic gradient of the objective func-tion g ( x ) using m samples, which is later denoted as G m takes the form of G m ( x ) = κw G µm ( x ) + 1 − κw G Var m ( x ) (11)We note that, as required by G Var m ( x ), at least two i.i.d. samples are needed to evaluate the aboveunbiased gradient estimator, namely m ≥ m is needed. This isbecause those update algorithms typically require higher accuracy in the estimation of gradient(11) to perform high-quality updates [31], which leads to a large sample size m and the solutionof m linear systems (in the limit of m → ∞ , we have G m ( x ) → g ( x )). Thus, the associatedcomputational cost can be prohibitive, particularly for large-scale problems.To address this challenge, we propose an accelerated MDSA algorithm in Section 3 tailoredfor the RTO formulation (1). Compared with the standard optimization algorithms in topologyoptimization, AC-MDSA is a stochastic optimization method, which can perform high-qualitydesign variable update with highly noisy gradient estimations. As we demonstrate in the designexamples, with the tailored AC-MDSA method proposed in this work, we can eﬃciently andaccurately solve RTO problems with only two samples (i.e., m = 2) at every optimization step,where 2 is the minimum sample size for the unbiased gradient estimator.

3. Accelerated Mirror Descent Stochastic Approximation: theory and algorithm

This section introduces the background of AC-MDSA and derives the update algorithm whenapplied to the RTO problem. We ﬁrst review the general framework of the MDSA [27, 28] andintroduce an accelerated MDSA using momentum-based techniques. One major advantage of theMDSA is that, through its general deﬁnition of the distance-generating function, the design variable6pdate can be adapted according to the underlying geometry of the feasible set (see [28, 30] fordetailed discussions). Exploiting this advantage, this section then derives the update formula of theAC-MDSA in the (cid:96) -norm setting with the entropy function as the distance-generating function. The MDSA is introduced in [28] to solve stochastic optimization problems of the formmin x ∈ X (cid:8) φ ( x ) = E [Φ ( x , ξ )] (cid:9) , (12)where X ⊂ R n is the feasible set of x (typically assumed to be a nonempty bounded convex set),and ξ is a random vector with a given probability distribution. The gradient of the objectivefunction is given by: ∇ φ ( x ) = E [ ∇ x Φ ( x , ξ )] = E [ G ( x , ξ )] , (13)where G ( x , ξ ) = ∇ x Φ ( x , ξ ) is the stochastic gradient. We note that, although we assume thediﬀerentiability of Φ ( x , ξ ) with respect to x , the above setting is applicable to the non-smoothcase [28].Before we introduce the general framework of MDSA, let us ﬁrst introduce the relevant nota-tions [28]. We denote (cid:107) · (cid:107) as a generalized norm deﬁned on R n with (cid:107) x (cid:107) ∗ . = sup (cid:107) y (cid:107)≤ y T x beingits dual norm. We deﬁne ω ( · ) : X → R as a distance-generating function with modulus α > (cid:107) · (cid:107) , such that ω ( · ) is convex and continuous on X , continuously diﬀerentiable,and strongly convex with parameter α with respect to (cid:107) · (cid:107) , namely,( x (cid:48) − x ) T ( ∇ ω ( x (cid:48) ) − ∇ ω ( x )) ≥ α (cid:107) x (cid:48) − x (cid:107) ∀ x (cid:48) , x ∈ X (14)Based on the distance-generating function ω ( · ), we then introduce a prox-function (also knownas the Bregman divergence [50]) B : X × X → R + as: B ( x , z ) = ω ( z ) − (cid:2) ω ( x ) + ∇ ω ( x ) T ( z − x ) (cid:3) (15)Notice that, due to the convexity of ω ( · ), we can show that B ( x , · ) is non-negative. Associatedwith the distance-generating function ω ( · ), a prox-mapping P x : R n → X can be deﬁned as: P x ( y ) = arg min z ∈ X (cid:8) y T ( z − x ) + B ( x , z ) (cid:9) (16)Notice that because of the strong convexity of B ( x , · ), the above prox-mapping is well deﬁnedand has a unique value. Making use of the prox-mapping, the MDSA update the design variable7ccording to the following formula, x k +1 = P x k (cid:16) η k G m ( x k ) (cid:17) = arg min z ∈ X (cid:40)(cid:16) G m ( x k ) (cid:17) T ( z − x k ) + 1 η k B ( x k , z ) (cid:41) , (17)where x k , η k >

0, and G m ( x k ) are the design variable, step size, and unbiased gradient estimator(using m samples) at optimization step k , respectively.To gain a better understanding, let us take a closer look at the above update formula. Theﬁrst term in the right bracket of (17) is a linear approximation of the objective function at x k using the gradient estimator G m ( x k ), and the second term is a strongly convex function scaledby 1 /η k . We can show that, at z = x k , the gradient of B ( x k , z ) vanishes and, as a result, thegradient of the entire expression in the bracket with respect to z equals to G m ( x k ). In addition,the Hessian of the expression in the bracket equals to ∇ ω ( z ) scaled by 1 /η k . This indicates thatthe expression in the bracket can be deemed as a convex approximation of the original objectivefunction using the stochastic gradient G m ( x k ), and the local curvature of the expression can becontrolled through η k .We conclude this subsection with several remarks on the MDSA framework. First, same asthe classical SA methods, the MDSA method can work with highly noisy gradient estimators. Toensure the convergence of MDSA update when applied to general stochastic optimization prob-lems, only a single sample is required with a properly chosen step size policy [28]. This is ﬁrstlydemonstrated in topology optimization by [31] for a randomized formulation to optimize structuresunder many deterministic load cases and the reduction to one sample load case. Alternatively, onecan evaluate the gradient estimator G m ( x k ) using multiple samples [39] and integrate with a com-monly use update scheme (e.g., Optimality Criteria). Second, as compared to the classical SAapproaches, the MDSA framework allows for a more general setup mainly because of the generaldeﬁnition of distance generating-function ω ( · ). In fact, if we choose distance-generating function ω ( x ) = 1 / || x || with || · || being the Euclidean norm, the MDSA update becomes the classic SA(or equivalently SGD) method [26, 28]. As demonstrated theoretically and numerically in [28]for general stochastic optimization problems, by choosing a proper distance-generating function,MDSA can adapt the update to the geometry of the problem, which leads to better performancein accuracy and convergence. This advantage is exploited in [31] for a deterministic topologyoptimization problem. In general, the performance of the MDSA is sensitive to the choice of the step size policy. Largestep size in MDSA could potentially lead to divergence, while too small step size may result inslow convergence. To alleviate this sensitivity, we introduce an accelerated version of MDSA [35],referred to as AC-MDSA, which makes use of momentum-based acceleration techniques. The AC-MDSA algorithm is proposed in [35] for general stochastic optimization problems and is shown8o achieve the optimal convergence rate for convex problems. The general update algorithm ispresented in Algorithm 1. Compared to the classic MDSA, which updates the x k sequence, theAC-MDSA includes updating two additional sequences, namely a “middle variable” x mdk and an“aggregated variable” x agk [35]. Algorithm 1

AC-MDSA Initialize: x , x ag = x , step sizes β and η . Set: x mdk = β − k x k + (1 − β − k ) x agk , with β k computed by (32) Get x k +1 using MDSA update (17) : x k +1 = P x k (cid:0) η k G m (cid:0) x mdk (cid:1)(cid:1) , with η k computed by (29) Set: x agk +1 = β − k x k +1 + (cid:0) − β − k (cid:1) x agk The modiﬁcations from the standard MDSA update (17) in the AC-MDSA algorithm mainlylie in three aspects. First, in addition to the sequence of x k , the algorithm updates the sequences x mdk and x agk . We note that the converged sequence x agk represents the ﬁnal solution. As we show inthe next subsection, x agN represents the history weighted average of x k from step 1 to step N witha linear weight [35]. The introduction of two additional sequences adds a negligible computationalcost as they are vector additions. Second, the AC-MDSA algorithm performs update x k +1 usingthe gradient estimator evaluated at x mdk instead of x k . Third, compared with MDSA, AC-MDSArequires the speciﬁcation of β k , which acts as a weight factor to compute x mdk and x agk . Having presented the general frameworks of MDSA and AC-MDSA, we now derive an AC-MDSA algorithm with the (cid:96) -norm tailored for the RTO problem (1) and propose the explicitupdate formula. With the volume (linear) and box constraints, the feasible set X of the RTOformulation (1) is give by X = { x ∈ R n : V ( x ) V − V f ≤ , x ( i ) ∈ [0 , , i = 1 , ..., n } (18)We deﬁne a scaled design variable vector ˜ x such that ˜ x ( i ) := ˜ v ( i ) x ( i ) with ˜ v ( i ) being ˜ v ( i ) := (cid:0) H T v (cid:1) ( i ) / ( V V f ). The corresponding feasible set of the scaled variable is:˜ X = { ˜ x ∈ R n : n (cid:88) i =1 ˜ x ( i ) − ≤ , x ( i ) ∈ (cid:2) , ˜ v ( i ) (cid:3) , i = 1 , ..., n } (19)Accordingly, the gradient estimator of the objective function in (1) with respect to ˜ x is obtainedas: ˜ G m ( ˜ x ) = diag (cid:18) v ( i ) (cid:19) G m ( x ) (20)9he feasible set ˜ X is similar to a standard simplex set. Thus, we choose the (cid:96) -norm withentropy function as ω in the proposed AC-MDSA algorithm, denoted as entropic AC-MDSA,because this setup leads to an improved convergence performance over the (cid:96) -norm setting (whichleads to the classical SA/SGD method) for a simplex set [31]. The distance-generating function ω of the entropic AC-MDSA takes the following form, ω ( ˜ x ) = n (cid:88) i =1 ˜ x ( i ) ln ˜ x ( i ) , (21)and the corresponding prox-function becomes B ( ˜ x , z ) = n (cid:88) i =1 (cid:18) z ( i ) ln z ( i ) ˜ x ( i ) − z ( i ) + ˜ x ( i ) (cid:19) (22)By plugging (22) into the prox-mapping (16) and dropping the constant terms, the update formula(17) becomes˜ x k +1 = P ˜ x k (cid:16) η k ˜ G m ( ˜ x k ) (cid:17) = arg min z ∈ ˜ X (cid:40)(cid:16) ˜ G m ( ˜ x k ) (cid:17) T z + 1 η k n (cid:88) i =1 (cid:32) z i ln z ( i ) ˜ x ( i ) k − z ( i ) (cid:33)(cid:41) (23)The above formula is given as a minimization problem, where ˜ x k +1 is its unique minimizer. Next,we derive an explicit update formula for (23). The Lagrangian of (23) with respect to the (scaled)volume constraint is given by L ( z , λ ) = ˜ G Tm ( ˜ x k ) z + 1 η k n (cid:88) i =1 (cid:32) z ( i ) ln z ( i ) ˜ x ( i ) k − z ( i ) (cid:33) + λ ( l T z − , (24)where λ ∈ R + is the Lagrange multiplier associated with the constraint and l is a constant vectorwhose components are all 1. Imposing the gradient condition ∇ z L ( z , λ ) = gives: ∂L ( z , λ ) ∂z i = ˜ G ( i ) m ( ˜ x k ) + 1 η k ln z ( i ) ˜ x ( i ) k + λ = 0 , (25)which can be recast as z ( i ) ( λ ) = ˜ x ( i ) k exp (cid:16) − η k (cid:16) ˜ G ( i ) m ( ˜ x k ) + λ (cid:17)(cid:17) (26)Incorporating the box constraints, we then obtain the update formula:˜ x ( i ) k +1 ( λ ∗ ) =  z ( i ) ( λ ∗ ) , if ˜ x ( i ) k +1 ≤ z ( i ) ( λ ∗ ) ≤ ¯˜ x ( i ) k +1 , ¯˜ x ( i ) k +1 , if z ( i ) ( λ ∗ ) > ¯˜ x ( i ) k +1 , ˜ x ( i ) k +1 , if z ( i ) ( λ ∗ ) < ˜ x ( i ) k +1 , (27)10here ¯˜ x ( i ) k +1 . = min { ˜ x ( i ) k + ˜ v ( i ) move, ˜ v ( i ) } is the upper bound of updated design variables and ˜ x ( i ) k +1 . =max { ˜ x ( i ) k − ˜ v ( i ) move, } is the lower bound, and λ ∗ solves equation (cid:80) ni =1 ˜ x ( i ) k +1 ( λ ) = 1 . In practice, λ ∗ is obtained using the bi-section method. Notice that, in (27), we introduce a move limit denotedas move , which will be adaptively adjusted by a damping scheme (see Section 4.3). After obtaining˜ x k +1 , we map it back to the original feasible space as x ( i ) k +1 := 1˜ v ( i ) ˜ x ( i ) k +1 (28)Finally, the proposed entropic AC-MDSA for RTO problem (1) is obtained by replacing (17) inthe step 3 of Algorithm (1) with (26), (27), and (28). Several remarks can be made regardingthe above entropic AC-MDSA update. First, the derived update (26) and (27) can handle bothpositive and negative stochastic gradients, thus it is also applicable to RTO problems with otherobjective functions, such as the compliance mechanism design [12, 19]. Second, as long as westart from a feasible initial guess, the ˜ x k (and x k ) always stays positive. Thus, the lower bound x min = 0 of the design variables is typically not active. By tailoring AC-MDSA algorithm to RTO,we ﬁnd that, compared with the MDSA method, the AC-MDSA method can lead to acceleratedconvergence performance without an increase in computational cost. We also observe that theAC-MDSA method is considerably less sensitive to the choice of step sizes, which allows us to uselarger step sizes. These advantages will be demonstrated in the numerical examples in Section 5.

4. Algorithmic parameters and implementation details of the entropic AC-MDSA forRTO

This section discusses the algorithmic and implementation details of the proposed entropic AC-MDSA algorithm for RTO problems. In particular, we present the step size policy and introducerelated techniques (i.e., step size recalibration and adaptive damping) to accelerate the convergenceperformance and reduce the step-size sensitivity of the AC-MDSA algorithm.

Typically, the step size policy is critical for stochastic optimization algorithms. The step sizepolicy adopted in this work is based on [35] and involves two sequences η k and β k . The policy for η k is given by η k = θ ¯ η k + 12 , k = 1 , , ..., N, (29) Because compliance problems have active volume constraints in practice, the proposed update formula assumesthat mapped volume constraint is active throughout the optimization process, namely, (cid:80) ni =1 ˜ x ( i ) = 1. θ is a user-deﬁned scaling factor that adjusts the step size, and ¯ η is computed according to¯ η = √ αD ω, ˜ X ( N + 2) (4 M + Σ ) , (30)where α = 1, D ω, ˜ X is the diameter of set ˜ X measured by the distance-generating function ω ,which is taken to be (cid:112) ln( n ) [35, 28]. Parameters M and Σ are estimates of the upper bounds ofthe stochastic gradients and its variance. In this work, they are estimated using sampling-basedtechniques, M = (cid:118)(cid:117)(cid:117)(cid:116) N M N M (cid:88) i =1 (cid:13)(cid:13)(cid:13) ˜ G ( i ) m ( ˜ x k ) (cid:13)(cid:13)(cid:13) ∞ and Σ = (cid:118)(cid:117)(cid:117)(cid:116) N Σ N Σ (cid:88) i =1 (cid:13)(cid:13)(cid:13) ˜ G ( i ) m ( ˜ x k ) − Q (cid:13)(cid:13)(cid:13) ∞ , (31)where ˜ G ( i ) m denotes the i th evaluation of ˜ G m using an independent set of m samples, N M and N Σ are the numbers of evaluations to estimate the upper bounds of the stochastic gradients and itsvariance, respectively, and Q = 1 /N M (cid:80) N M i =1 ˜ G ( i ) m ( ˜ x k ). The policy for β k is given by β k = k + 12 , k = 1 , , ..., N (32)If we plug (32) into the expression of aggregated variable ˜ x agk +1 in Algorithm (1), ˜ x agk +1 can be recastas [35]: ˜ x agk +1 = 2 k + 1 ˜ x k +1 + k − k + 1 ˜ x agk = (cid:80) kt =1 ( t ˜ x t +1 ) (cid:80) kt =1 t (33)The above expression indicates that the aggregated variable ˜ x agk +1 , obtained by adopting β policy(32), is the weighted average of the history of variable ˜ x t +1 with linearly varying weights. We notethat history-averaging techniques are widely used in SA methods to suppress noise and acceler-ate convergence [28, 29]. Such a noise-suppressing strategy is diﬀerent from Monte Carlo basedmethods [51], in which the noise is reduced through estimations using many samples within oneoptimization step. The history-averaging technique can lead to a small change of designs as the optimizationproceeds, leading to slow convergence. To address this, we propose a step size recalibration schemeto speed up the evolution of the design and convergence.The basic idea of the recalibration scheme is to adaptively re-initialize the acceleration methodthroughout the optimization by tracking the changes of design variables. Speciﬁcally, we monitorthe (cid:96) -norm of the change of x agk , namely, (cid:107) ∆ x agk (cid:107) = (cid:13)(cid:13) x agk +1 − x agk (cid:13)(cid:13) (34)12f (cid:107) ∆ x agk (cid:107) becomes smaller than a tolerance (cid:15) rst , then we recompute ¯ η , M , and Σ based on (30)and (31), and set k = 1 in evaluating η k and β k . To avoid frequent recalibration, we require thenumber of optimization steps between two consecutive recalibrations to be larger than a prescribedminimum step ∆ rst , and monitoring of (cid:107) ∆ x agk (cid:107) starts after the ﬁrst N rst steps. We note thatrecalibration schemes of similar forms are shown to be eﬀective for accelerated gradient descentmethods in other applications, for example, see [31]. Because of the stochastic nature of the entropic AC-MDSA algorithm, optimization terminatesat the maximum step unless a decaying step size policy is adopted. Thus, this work adopts theadaptive damping scheme proposed in [39] to eﬀectively terminate the optimization after the designhas converged. Inspired by the simulated annealing [52, 53], the adaptive damping scheme monitorsthe average progress of the design at each step and reduces the move limit when small progressis detected. The average progress of the design at k th step is characterized by the eﬀective stepratio, R k , which is deﬁned as: R k := N D (cid:107) E k − E k − N D +1 (cid:107) (cid:107) E − E k − (cid:107) , (35)where E is the vector of elemental Young’s moduli, N D is the history window size.The eﬀective step ratio, R k , represents the relative magnitude of the average design changeover the past N D steps to the current design change. A small R k indicates slow progress over theprevious N D steps, the move limit is then reduced. Speciﬁcally, if R k is lower than a tolerance (cid:15) damp , the move limit is scaled down by a factor τ , i.e. move = move/τ . Here, we use τ = 2. Theadaptive damping scheme is activated after a prescribed minimum number of steps N damp . To conclude this section, we summarize the proposed entropic AC-MDSA algorithm and itsparameters in Algorithm 2. The objective function value and design quality are generally insen-sitive to most of the algorithm parameters, i.e., N rst , ∆ rst , (cid:15) rst , N damp , (cid:15) damp , τ , N D , N max , N min ,and (cid:15) . We have investigated various parameter values and summarized the value ranges used inthis study in Table 1, which are generally recommended. The step size scaling factor, θ , has moreinﬂuence on the results, as it directly adjusts the magnitude of the step size η k . In general, a larger θ (and therefore larger η k ) leads to faster convergence and design evolution, but θ should not betoo large as it may result in instability. The proper range of θ needs to be calibrated with a fewpilot runs, but in general, the range that produces a stable and steady convergence is wide.13 able 1: Range of parameter values for AC-MDSA Parameter Value Usage N rst

100 or 300 Adaptive step recalibration∆ rst (cid:15) rst N damp ∼

450 Adaptive damping scheme (cid:15) damp τ N D N max ∼

600 Termination of optimization N min ∼ (cid:15) N M M and Σ (31) N Σ lgorithm 2 Entropic AC-MDSA algorithm for robust topology optimization Initialize: x , θ Set ˜ x = diag (cid:0) ˜ v ( i ) (cid:1) x , ˜ x ag = ˜ x ; compute ¯ η using (30); and set k in = 1. for k = 1 , ..., N max do if k ≥ N rst and k in ≥ ∆ rst and || x agk +1 − x agk || < (cid:15) rst then Set k in = 1 and ˜ x k = ˜ x agk Compute ¯ η using (30) end if Set ˜ x mdk = β − k in ˜ x k + (1 − β − k in ) ˜ x agk with β k in deﬁned in (32). Compute gradient estimator ˜ G m (cid:0) ˜ x mdk (cid:1) according to (20) and (11) using m i.i.d samples. Update ˜ x k +1 using entropic MDSA (26)–(27) Set ˜ x agk +1 = β − k in ˜ x k +1 + (cid:0) − β − k in (cid:1) ˜ x agk with β k in deﬁned in (32). Compute x agk +1 using (28) if k ≥ N min and || x agk +1 − x agk || ∞ < (cid:15) then break end if Evaluate eﬀective step ratio R k using (35) if k ≥ N damp and R k ≤ (cid:15) damp then move = move/τ end if k in = k in + 1 end for Output: x ∗ = x agk

5. Numerical examples

This section presents four examples to demonstrate the eﬀectiveness and eﬃciency of the en-tropic AC-MDSA algorithm. First, to verify the results by AC-MDSA, we compare the ﬁnaldesigns, objective function values, and computational cost of the AC-MDSA with those from theMonte Carlo (MC) method. The MC method evaluates the sensitivity using m = 1 ,

000 samplesat each optimization step to get suﬃciently accurate gradients and uses a popular optimizationupdate algorithm, MMA [54], to update the design variables with the estimated sensitivity. Thesecond example shows that the AC-MDSA, although using two samples, eﬀectively reﬂects theinﬂuence of κ (relative weight of mean and variance) through both designs and objective functionvalues. Example 3 demonstrates the AC-MDSA using problems with diﬀerent domain geometries,multiple random loads, and various mesh sizes. Finally, in Example 4, we solve a three-dimensional(3D) problem to show the applicability of the entropic AC-MDSA with an iterative linear solver.15he key information of the four examples is summarized in Table 3. The investigated κ values forthe robust designs are κ = 0 . . . w = 1, i.e. κ : √ − κ ,is 2, 1, , which are commonly used values in the RTO literature. The κ values are summarizedin Table 2. Table 2: Investigated κ values and their equivalent mean-to-s.t.d. ratios κ value Equivalent ratio of mean : s.t.d.( κ : √ − κ ) with w = 11 -0.828 1 : 0.50.618 1 : 10.282 1 : 3We implement the proposed AC-MDSA algorithm in the PolyTop code [55]. To comprehensivelyand fairly evaluate the algorithm’s performance, we carry out 50 consecutive and independent runsfor each κ studied in every 2D example and present the statistical data related to the algorithm’sperformance. Notice the 50 trials are only for evaluating statistical consistency and are not requiredfor practical use of the algorithm. The presented design for each κ is a representative design chosenfrom the 50 trials and has an objective function value close to the mean value of the 50 objectivefunction values. At the end of the optimization, denoting x ∗ as the optimized solution, we use m = 10 ,

000 samples to obtain accurate estimates of the objective function value, the mean, andthe standard deviation of the compliance for the ﬁnal design, denoted as ˆ J ( x ∗ ), ˆ µ ( x ∗ ), and ˆ σ ( x ∗ ),respectively. For comparison, we also include the deterministic designs with the objective functionbeing the compliance under deterministic loads that take the mean values of the random loads. Theˆ µ ( x ∗ ) and ˆ σ ( x ∗ ) of the optimized deterministic design is evaluated using the same random loadcorresponding to the stochastic cases. The total wall-clock time and the number of optimizationsteps are reported. All the examples are performed on a machine with an Intel(R) Xeon(R) Silver4116 CPU, 2.10GHz processor and 64 GB of RAM, running MATLAB R2018b. In this work, thestate equation is solved using the sparse direct solver and preconditioned conjugate gradient solverfor 2D and 3D problems, respectively. For most two-dimensional (2D) examples, we enforce thedesign symmetry about the vertical axis, and we study a 2D example without symmetry constraint.For the 3D example, we enforce design symmetry about the two vertical planes.16 able 3: Brief description of the numerical examples. Ex. Dim. Name Load uncertainty Feature1 2D Simple columnbenchmark Random direction ∼ U ( π, π ) - Veriﬁcation of entropicAC-MDSA with MC- Study of sample size, m = 2 , , ∼ N (0 , . ) - Study of κ values, kappa = 1, 0.618, 0.2823 2D Double hook& Torsion disk Deterministicvertical/normal &multiple randomhorizontal/tangentialcomponents ∼ N (0 , . ) - Problem size study, n = 114 k , 51 k , 13 k - Complex design geometriesand multiple independentrandom components- Comparison with MC4 3D Crane Deterministic z-direction,multiple random x- & y-directions ∼ N (0 , . ) - Combination of entropicAC-MDSA with iterativelinear solver The ﬁrst example is the simple column involving randomness in the load direction, whichis commonly studied in the literature of RTO. We ﬁrst verify the proposed entropic AC-MDSA(using two samples) by comparing its results with the ones obtained by the MC method (using1000 samples). Then, we demonstrate the robustness of the entropic AC-MDSA algorithm withrespect to diﬀerent sample sizes m (thus diﬀerent accuracy levels) for computing the gradientestimator. Finally, we compare the performance of the entropic AC-MDSA algorithm with theentropic MDSA (without acceleration).Figure 1a shows the design domain and boundary conditions of the simple column problem. Thedomain is ﬁxed at the bottom and is subjected to a load f with a deterministic magnitude of 1 anda random direction, deﬁned by α ∼ U ( π, π ) with the standard deviation being √ π , which isin the common range used in the literature [22, 13]. We consider three cases: a deterministic design( α ∼ U ( π, π )), a robust design with κ = 1, and a robust design with κ = 0 . = 100 ×

100 = 10 , R = 3. For the entropic AC-MDSAalgorithm, we use the sample size m = 2, θ = 600 n , N rst = 100, N damp = 400, (cid:15) damp = 0 . N max = 500, and N max = 400. The ﬁlter radius begins to reduce to R = 1 . m = 1000 and N max = 100, and the ﬁlter radius starts todecrease at the 60th step, which is at the same stage relative to the N max (60 /

100 = 0 .

6) as theone in AC-MDSA (300 /

500 = 0 . N max = 100 for the MC becausethe computational cost for MC with m = 1000 samples is excessive. α H W f D f y f x a b Figure 1: Geometry and boundary conditions of (a) Example 1: simple column, H = W = 100, point load f has adeterministic magnitude of 1 and a random load direction α ∼ U ( π, π ); (b) Example 2: half circle, D = 1,point load has a deterministic vertical component f y = 1 and a random horizontal component f x ∼ N (0 , . ). d = 0.618 m = 1000 MC = 1 m = 2 AC-MDSA b = 1 m = 1000 MC e = 0.618 m = 2 AC-MDSA cfa

AC-MDSA Deterministic m = 1 MC Deterministic m = 1= 14.60 = 6.74 = 9.02 = . = 9.23 = 1.47 = 9.41 = 0.54= 10.05 = 0.41= 14.60 = 6.74 = 7.01 = 9.02 = 5.93= 6.28= 9.23= 7.06 Figure 2: Final designs of (a) AC-MDSA, deterministic; (b) AC-MDSA, κ = 1; (c) AC-MDSA, κ = 0 . κ = 1; (f) MC, κ = 0 . .1.1. Veriﬁcation of AC-MDSA with MC Here, we verify the entropic AC-MDSA algorithm with MC method by comparing the repre-sentative ﬁnal designs and objective function values as shown in Figure 2 and statistics in Table4. For each κ , the representative design of the AC-MDSA in Figure 2 has an objective functionvalue close to the mean of the objective function values of the 50 trials. For the deterministiccases (Figures 2a and 2d), the entropic AC-MDSA and MC methods produce similar designs withcomparable objective values, demonstrating that the entropic AC-MDSA can also be used to solvedeterministic problems. Notice that in the deterministic case, even though the ˆ µ and ˆ σ are identi-cal for the AC-MDSA and MC, the ˆ J are diﬀerent. This is because the ˆ µ and ˆ σ are evaluated withthe random load, and ˆ J is obtained with the deterministic load, which is not computed based on ˆ µ and ˆ σ . In the robust designs with κ = 1 (Figures 2b and 2e), both methods produce similar designswith two split legs, and the design by AC-MDSA has slightly wider distances between the two legsand a slightly lower ˆ J ( x ∗ ) (and lower ˆ µ ( x ∗ ) and ˆ σ ( x ∗ )). In the robust designs with κ = 0 . J ( x ∗ ). This comparison veriﬁesthat, with only two samples in each optimization step, the entropic AC-MDSA produces similardesigns and objective function values as the MC method with 1 ,

000 samples. We note that eventhough the MC achieves slightly higher objective function values, MC’s solution can be potentiallyimproved with more optimization steps and more computational time. Comparing designs withvarious κ values, the design with higher weight in variance ( κ = 0 . σ ( x ∗ ). In terms of computational eﬃciency, AC-MDSA generally has low computational costs asindicated in Table 4 due to its use of two samples. = 1 AC-MDSA ( m = 2) = 1 MC ( m = 1000) = 0.618 MC ( m = 1000) = 0.618 AC-MDSA ( m = 2) Obj. = 9.02

Obj. = 5.93

Obj. = 6.28

Obj. = 9.23 = = . Figure 3: Performance comparison of the proposed AC-MDSA algorithm and the MC method: ˆ µ ( x ∗ ) versus ˆ σ ( x ∗ )for κ = 1 and κ = 0 . κ value. (Representative designs from each case isshown next to the highlighted markers.) a O b j e c t i v e O b j e c t i v e AC-MDSA ( m = 2)AC-MDSA final MC ( m = 1000)MC final AC-MDSA ( m = 2)AC-MDSA final MC ( m = 1000)MC final Figure 4: History of estimated objective function value for (a) κ = 1; (b) κ = 0 . J ( x ∗ ). To evaluate the overall performance and consistency of the entropic AC-MDSA, the ˆ µ ( x ∗ )versus ˆ σ ( x ∗ ) of the 50 independent trials (one trial is one run of the numerical experiment) areplotted in Figure 3. We observe that 50 independent trials with κ = 1 lead to similar ˆ µ ( x ∗ ), andthose 50 trials with κ = 0 .

618 (higher weight on Var) have similar ˆ σ ( x ∗ ), indicating the AC-MDSAalgorithm produces consistent designs. Also, the ones with κ = 0 .

618 have considerably lower ˆ σ ( x ∗ )and higher ˆ µ ( x ∗ ) than those with κ = 1, demonstrating the algorithm can eﬀectively reﬂect theimpact of κ with only two samples. In the κ = 0 .

618 case, although MC method produces a designwith the lowest ˆ σ ( x ∗ ), its ˆ µ ( x ∗ ) is considerably higher than the designs produced by AC-MDSA,resulting in an overall higher objective function value.Figure 4 shows the history of the estimated objective values of AC-MDSA and MC methodsfor κ = 1 and κ = 0 . m = 2 samples per step, and the one in the MC is estimated with m = 1000 samples. However,the true objective of AC-MDSA evaluated at the end of the optimization with 10000 samples hasa similar value to that obtained by MC as indicated in Figure 2.20 .1.2. Study of sample size Next, we study the inﬂuence of various sample sizes m , which is used to compute the stochasticgradient, on the performance of the proposed AC-MDSA. We consider m = 2 , m = 10 , m = 100samples. Figure 5 shows the history of the error (norm) of stochastic gradients estimated usingthe three m values for κ = 1 (Figures 5 a and b) and κ = 0 .

618 (Figures 5 c and d). The erroris deﬁned as the diﬀerence between the estimated gradient using m samples and the referenceestimated gradient using 1000 samples. Several observations can be made. First, as we expect, alarger m leads to a smaller diﬀerence between the estimated gradient and the reference estimatedgradient. Second, the cosine of the angle between the stochastic and the reference estimatedgradient vectors for both κ = 1 and κ = 0 .

618 are close to 1 after the ﬁrst few steps, indicatingthe estimated gradient with a small sample size has fairly accurate directions, but this observationcan be problem-dependent. ac db

Optimization step ( G m ∙ G ) / ( || G m || || G || ) ( G m ∙ G ) / ( || G m || || G || ) || G m - G || / n . -3 Optimization step || G m - G || / n .

100 200 300 400321 10 -3 m = 2 m = 10 m = 100 m = 2 m = 10 m = 100 Figure 5: Error history comparison of stochastic gradients with m = 2 , m = 10 , m = 100 samples used inAC-MDSA. (a) Error norm of the stochastic gradient: κ = 1; (b) cosine of the angle between the stochastic andthe reference gradient vectors: κ = 1; (c) error norm of the stochastic gradient: κ = 0 . κ = 0 . Various sample sizes m produce gradient estimators with diﬀerent accuracy levels; thus, westudy the sensitivity of the AC-MDSA performance to m . Table 4 summarizes the performanceand the associated computational cost of the entropic AC-MDSA with m = 2, 10, and 100 samplesand compares with the ones from the MC method. The statistics in for the entropic AC-MDSAin Table 4 are averaged over the 50 independent trials (for evaluating statistical consistency andare not needed in practice). The computational time shown in Table 4 is for reference. To have21 more comprehensive comparison of computational cost, more investigation is needed. For theentropic AC-MDSA, a larger m leads to small diﬀerences in the ﬁnal objective function valuesand convergence steps. This showcases that the proposed entropic AC-MDSA can perform high-quality updates with highly noisy gradient estimators (i.e., m = 2). Therefore, we use m = 2 forthe remaining studies. Table 4: Performance of AC-MDSA (averaged over 50 trials) and MC methods: Simple column example

Algorithm κ ˆ J ( x ∗ ) ˆ µ ( x ∗ ) ˆ σ ( x ∗ ) N step N solve WC time

WC time N step (avg.) (avg.) (avg.) (avg.) (avg.) (sec.) (sec.)AC-MDSA 1 9.02 9.02 1.13 411.3 862.6 81.1 0.2 m = 2 0.618 5.93 9.41 0.55 420.9 881.8 82.8 0.2AC-MDSA 1 9.00 9.00 1.07 407.6 855.3 136.0 0.3 m = 10 0.618 5.96 9.48 0.52 414.0 868.1 138.5 0.3AC-MDSA 1 8.99 8.99 1.07 403.4 846.7 807.3 2.0 m = 100 0.618 5.97 9.49 0.52 410.6 861.2 824.4 2.0MC 1 9.23 9.23 1.47 100 100000 . m = 1000 0.618 6.28 10.05 0.41 100 100000 . We compare the performance of the entropic AC-MDSA algorithm with the entropic MDSAalgorithm (without acceleration) to demonstrate the advantage of the acceleration technique. Inparticular, we aim to demonstrate that, with the acceleration scheme, the AC-MDSA is less sensi-tive to various step sizes. We consider the case of κ = 0 .

618 and use the same step size recalibration,damping, and ﬁlter radius reduction setup for the MDSA algorithm. The symmetry of the designsis not imposed in this comparison. For the entropic MDSA, the step size formula is adopted from[28, 31]. Figure 6 shows the ﬁnal designs of AC-MDSA and MDSA with three values of step sizescaling factor θ . Notice θ is set to a smaller value than previous cases, and this is because whenthe symmetry constraint is absent, the algorithm needs a smaller step size to guarantee stable andsteady convergence. Each design is a representative one selected from the results of 20 indepen-dent trials. The range of θ value for the entropic MDSA is determined based on pilot runs. Asshown in Figure 6, the entropic MDSA is more sensitive to diﬀerent choices of θ (i.e., diﬀerentstep sizes) than the entropic AC-MDSA. For various θ values considered, the entropic AC-MDSAyields similar results (which are also similar to Figures 2c and f) with comparable performance,whereas the entropic MDSA yields less consistent results. Besides, although the design symme-try is not imposed, the entropic AC-MDSA produces nearly-symmetric designs while the entropicMDSA yields asymmetric ones, indicating the entropic AC-MDSA is more robust and stable than22he entropic MDSA (without acceleration). Thus, the remaining of the study uses the entropicAC-MDSA algorithm. d be = 0.618 m = 2 AC-MDSA = 0.618 m = 2 AC-MDSA = 0.618 m = 2 AC-MDSA = 0.618 m = 2 MDSA = 0.618 m = 2 MDSA = 0.618 m = 2 MDSA cfa θ = 2000 θ = 10 θ = 30 θ = 50 θ = 6000 θ = 10000= 10.69 = 0.48 = 6.69 = 10.46 = 0.46 = 6.54 = 10.37 = 0.47 = 6.49= 10.62 = 1.16 = 7.07= 10.83 = 1.28 = 7.32= 10.78 = 0.93 = 6.99 Figure 6: Final design of (a) AC-MDSA: θ = 2000; (b) AC-MDSA: θ = 6000; (c) AC-MDSA: θ = 10 , θ = 10; (e) MDSA: θ = 30; (f) MDSA: θ = 50. The design in each case, respectively, is a representativedesign chosen from the 20 trials. The second example demonstrates that the entropic AC-MDSA eﬀectively captures the inﬂu-ence of various κ values (relative weight of mean and variance for compliance) on the designs. Fig-ure 1b shows the design domain and boundary conditions. The domain (discretized by n = 40 , ∼ N (0 , . ).We consider three cases: κ = 1, κ = 0 . κ = 0 . R is initialized as 0 . .

004 after 300 steps with an interval of 30 steps. We choose θ = 8000 n , θ = 100 n ,and θ = 10 n for κ = 1, κ = 0 . κ = 0 . N rst = 300, N damp = 450, (cid:15) damp = 0 . N max = 600, and N min = 450.Figure 7 shows the designs obtained by the entropic AC-MDSA for the deterministic case (i.e.,the horizontal load is 0) and three stochastic cases with a wide range of κ . For the stochasticcases, each design is a representative one from 50 independent trials with the objective functionvalues close to the mean of the 50 objective function values. The three stochastic designs showthe impact of various κ values: as κ decreases (more weight on the variance), the angle betweenthe two arms increases, improving the robustness in resisting the random horizontal load.23 m = AC-MDSA c = 1 m = AC-MDSA b AC-MDSA Determinstic m = a = 0.282 m = AC-MDSA d = 12.35 = 7.90 = 6.72 = 8.05 = 1.43 = 8.05 = 8.70 = 0.54 = 5.49 = 14.48 = 0.31 = 4.15 Figure 7: Final designs and objective function values of half circle: (a) deterministic; (b) κ = 1; (c) κ = 0 . κ = 0 . The impact of varying κ is shown in Figure 8, which plots ˆ µ ( x ∗ ) versus ˆ σ ( x ∗ ) of a total of 150independent trials (50 for each κ ) with representative designs. Several observations can be made.First, as κ decreases, ˆ σ ( x ∗ ) decreases (indicating improved robustness) and ˆ µ ( x ∗ ) increases. TheAC-MDSA produces consistent designs for each κ case. Second, the designs for larger κ typicallyhave similar ˆ µ ( x ∗ ) but widely distributed ˆ σ ( x ∗ ), while the designs for smaller κ typically havesimilar ˆ σ ( x ∗ ) but widely distributed ˆ µ ( x ∗ ). This observation is consistent with the deﬁnition ofthe objective function in (9). = 1 = 0.618 = 0.282 AC-MDSA (m = 2) AC-MDSA (m = 2) AC-MDSA (m = 2)

Obj. = 8.05

Obj. = 4.15

Obj. = 5.49

Figure 8: ˆ µ ( x ∗ ) versus ˆ σ ( x ∗ ) of the 50 trials by the entropic AC-MDSA. (Highlighted markers correspond to thepresented designs.) The third example, which includes the double hook and the disk problem, is designed to showthat the AC-MDSA algorithm can tackle problems with various problem sizes, geometries, andmultiple independent random loads. Additionally, using the double hook example, we show thatthe parameters of the AC-MDSA algorithm are insensitive to various mesh sizes. Figure 9 showsthe design domains and boundary conditions of the double hook and the disk problems.24 W f y H H f x f y f x f T f N f N f T f N f T f T f N f N f D out D in a b Figure 9: Geometry and boundary conditions of (a) double hook, W = 4 , W = 1 , H = 1 , H = 1 .

5, two pointloads have deterministic vertical components f y = f y = 1 and random horizontal components f x , f x ∼ N (0 , . ); (b) disk, D out = 2 , D in = 0 .

6, ﬁve point loads have deterministic normal components f N = f N = f N = f N = f N = 1 and random tangential components f T , f T , f T , f T , f T ∼ N (0 , . ). i = 0.618 AC-MDSA m = f = 0.618 AC-MDSA m = h = 1 AC-MDSA m = g AC-MDSA Determinstic m = d AC-MDSA Determinstic m = e = 1 AC-MDSA m = c = 0.618 AC-MDSA m = b = 1 AC-MDSA m = a AC-MDSA Determinstic m = n ≈ k n ≈ k n ≈ k = 179.60 = 55.39 = 140.39 = 144.71 = 6.58 = 72.36 = 148.43 = 5.17 = 48.42= 177.50 = 55.15 = 138.73 = 145.24 = 6.83 = 72.62 = 148.07 = 5.07 = 48.20= 178.64 = 54.88 = 138.63 = 148.22 = 7.24 = 74.11 = 154.27 = 4.80 = 49.88 Figure 10: Double hook: deterministic and robust designs ( κ = 1 and κ = 0 . n = 114 , n = 50 , n = 12 , In the double hook problem, the two point loads have deterministic vertical components withmagnitudes 1 and random horizontal components ∼ N (0 , . ). We use θ = n and θ = 0 . n for25 = 1 and κ = 0 . N rst = 100, N damp = 450, (cid:15) damp = 0 . N max = 600, and N min = 450. We ﬁrst evaluate the sensitivity of the AC-MDSA algorithmic parameters (e.g., stepsize factor θ and initial step to monitor recalibration N rst ) to various mesh sizes, n = 114 , n = 50 ,

688 and n = 12 , cba = 0.618 MC m = 1000 MC Deterministic m = 1 = 1 MC m = 1000 = 176.65 = 54.35 = 138.67 = 146.02 = 7.42 = 73.01 = 150.32 = 4.54 = 48.42 n ≈ k Figure 11: Double hook ( n = 114 , κ = 1 and κ = 0 . Comparing (vertically) the designs with three problem sizes, as shown in Figure 10, they haveconsistent geometric features and similar objective function values for each case of κ . This ob-servation demonstrates that the proposed AC-MDSA algorithm and associated parameters canlead to mesh-insensitive designs. Consistent observations can be made among designs from variousproblem sizes. The deterministic and robust designs diﬀer in the upper domain. The determin-istic design forms a single connection to the support, resulting in less resistance to moments andhorizontal loads. The robust design with κ = 1 has two separated arms without braces, which cancarry moment but is weak in resisting horizontal shear forces. The robust design with κ = 0 . σ ( x ∗ ) of the three designs fromleft to right.The ﬁnal designs and objective function values obtained by the MC method with 1000 samplesare shown in Figure 11. The main geometric features are similar to the designs from the AC-MDSAwith two samples, but with more small branches. For the objective function values, AC-MDSAachieves a slightly lower value in the κ = 1 design and an identical value in the κ = 0 .

618 designas compared to the MC method. Figure 12 (a) shows ˆ µ ( x ∗ ) and ˆ σ ( x ∗ ) of the 50 trials from theAC-MDSA and one trial from the MC method. We can observe that a lower κ value producesdesigns with lower ˆ σ ( x ∗ ). The statistics, including computational cost, is shown in Table 5, andthe data related to AC-MDSA are averaged values over the 50 trials (for evaluating statisticalconsistency and are not needed in practice). The AC-MDSA algorithm solves approximately 1240linear systems with an average wall-clock time of approximately 2.4 seconds per step.26

44 146 148 150456789 42 43 44 453.03.54.04.55.05.56.06.57.0

Obj. = 48.42

Obj. = 72.36

Obj. = 73.01

Obj. = 8.66

Obj. = 5.71 = 1

AC-MDSA (m = 2) = 1

MC (m = 1000) = 0.618

AC-MDSA (m = 2) a b

Figure 12: ˆ µ ( x ∗ ) versus ˆ σ ( x ∗ ) of the 50 trials by AC-MDSA (highlighted markers correspond to the presenteddesigns) (a) double hook (including the two designs by MC method); (b) disk.Table 5: Performance of AC-MDSA (averaged over 50 trials) and MC methods: double hook example( n = 114 , Algorithm κ ˆ J ( x ∗ ) ˆ µ ( x ∗ ) ˆ σ ( x ∗ ) N step N solve WC time

WC time N step (avg.) (avg.) (avg.) (avg.) (avg.) (sec.) (sec.)AC-MDSA 1 72.36 144.72 6.82 600.0 1240.0 1539.2 2.6 m = 2 0.618 48.50 147.88 5.41 600.0 1240.0 1291.2 2.2MC 1 73.01 146.02 7.42 100 10 m = 1000 0.618 48.42 150.32 4.54 100 10 In the disk problem, ﬁve loads are equally distributed on the outer perimeter, and each hasa deterministic normal component 1 and a random tangential component ∼ N (0 , . ). We use n = 72 ,

000 elements, θ = 3000 n and θ = 0 . n for κ = 1 and κ = 0 . N rst = 300, N damp = 450, (cid:15) damp = 0 . N max = 600, and N min = 450. Figure 13 shows the representativeoptimized designs for deterministic, κ = 1, and κ = 0 .

618 cases. The deterministic design containsrods with uniform widths, resisting normal load components, whereas the robust design with κ = 1leads to a structure with two branches that resist each random tangential loads. In the robustdesign with κ = 0 . σ ( x ∗ ) values of the three designs conﬁrm that the robustness is eﬀectivelyimproved when κ drops. Table 6 and Figure 12 (b) show the statistics of the 50 trials from AC-MDSA, which solves two linear systems per step with an average wall-clock time of 1.7 seconds. c = 0.618 m = AC-MDSA b = 1 m = AC-MDSA a AC-MDSA Determinstic m = = 49.51 = 11.12 = 32.04 = 43.29 = 5.87 = 8.66 = 44.36 = 3.84 = 5.71 Figure 13: Final designs and objective function values of disk by AC-MDSA: (a) deterministic; (b) κ = 1;(c) κ = 0 . Algorithm κ ˆ J ( x ∗ ) ˆ µ ( x ∗ ) ˆ σ ( x ∗ ) N step N solve WC time

WC time N step (avg.) (avg.) (avg.) (avg.) (avg.) (sec.) (sec.)AC-MDSA 1 8.66 43.29 5.89 600.0 1240.0 1047.8 1.7 m = 2 0.618 5.72 43.74 4.51 600.0 1240.0 1030.6 1.7 Both the double hook and disk examples show that the two-sample AC-MDSA can solve prob-lems with various mesh sizes, complex geometries, and multiple independent random loads. As κ changes, we observe apparent changes in both the designs and ˆ σ ( x ∗ ) values. The optimized designsand algorithmic parameters are insensitive to the change of mesh sizes. Finally, we show that theAC-MDSA algorithm requires a low computational cost to handle RTO problems as it needs onlytwo linear solves per step. The last example, which solves a 3D crane problem, demonstrates the applicability and eﬃ-ciency of the entropic AC-MDSA. Figure 14 shows the domain and boundary conditions. Thedomain is ﬁxed on the top and subjected to two point loads that have deterministic z compo-nents with magnitudes 1 and random x and y components ∼ N (0 , . ). The FE mesh consists of n = 352 ,

000 hexahedral elements. The ﬁlter radius R is initialized as 0 .

15 and starts to decrease af-ter 320 steps by 1 /

30 every 25 steps until 0 .

05. We use θ = 2000 and impose symmetry constraintswith respect to the x and y planes, and N rst = 100, N damp = 430, (cid:15) damp = 0 . N max = 450, and N min = 430. The objective function values of the ﬁnal designs are evaluated using 1 ,

000 samples.28e use the GPU-accelerated preconditioned conjugate gradient (PCG) built-in solver from Matlabwith the Jacobi preconditioner and choose a relatively high tolerance of 10 − for convergence asthe entropic AC-MDSA does not require accurate evaluation of sensitivity. B H H W W f z f y f x xy z f z f y f x Figure 14: Geometry and boundary conditions of Example 4: 3D crane, W = 4 , W = 1 , H = 1 . , H = 1 , B = 1,two point loads have deterministic z components f z = f z = 1 and random x and y components f x , f y , f x , f y ∼ N (0 , . ) . c = 0.828 m = 2 AC-MDSA b = 1 m = 2 AC-MDSA a AC-MDSA Deterministic m = 1 xyz xyz xyz = 1595.3 = 578.5 = 952.7 = 1483.3 = 291.9 = 741.6 = 1627.4 = 93.7 = 1051.0 Figure 15: Optimized designs and objective function values of 3D crane from AC-MDSA: (a) deterministic; (b) κ = 1; (c) κ = 0 . The optimized designs and objective function values for deterministic, κ = 1, and κ = 0 . κ values on the ﬁnal designs, both qualitatively and quantitatively. Qual-29tatively, in the deterministic design (Figure 15a), no lateral braces are formed among the fourcolumns on the upper part, resulting in poor resistance to shear in the x -direction and torquein the x − y plane. In addition, the material in the lower part is mostly distributed within the x − z plane, which also leads to poor resistance to loads in the y -direction. The robust design with κ = 1 (Figure 15b), on the other hand, forms pairs of braces in the x − z planes between the fourcolumns, which improves the resistance to the random load components in the x - and y -directionsthat potentially impose shear and torsion. However, no braces appear in the y − z planes. In thelower part, two branches are formed in the upper and middle chords of the beam. These branchescan increase the stiﬀness of resisting the random loads in the y -direction. Finally, the robust designwith κ = 0 .

828 (Figure 15c) forms four braces in both the x − z and y − z planes between thefour columns, leading to the highest resistance to the shear and torsion imposed by the randomload components in the x − and y -directions. In the lower part of the design, the upper chordbranches are further split to enhance the resistance to loads in the y -direction. The middle chordbecomes two independent members, and the lower chord splits into two branches. In addition,two members connecting the two lower chords are formed. These features clearly indicate theincrease in the structural robustness when κ decreases. Quantitatively, the inﬂuence of κ is alsorevealed by the values of ˆ µ ( x ∗ ) and ˆ σ ( x ∗ ) of the optimized designs. For the deterministic case, thedesign has both the highest ˆ µ ( x ∗ ) and ˆ σ ( x ∗ ) because the load randomness is not considered in theoptimization. For the robust designs, as κ becomes smaller, ˆ µ ( x ∗ ) increases while ˆ σ ( x ∗ ) decreasesconsiderably, which is consistent with the corresponding importance in the objective function ofthe RTO formulation (1).This 3D example shows that the proposed AC-MDSA algorithm eﬀectively produces designswith various levels of robustness. The AC-MDSA uses a relatively high tolerance for the iterativelinear solve, which may suggest high tolerance can be used to reduce computational cost furtheras AC-MDSA does not require accurate evaluation of gradients. However, more investigation isneeded to verify this potential.

6. Concluding remarks

In this work, we introduce a momentum-based accelerated mirror descent stochastic approxima-tion algorithm to solve RTO problems involving various load randomness eﬃciently and eﬀectively.Built upon MDSA, the proposed AC-MDSA framework is capable of performing high-quality de-sign variable updates with highly noisy stochastic gradients. We show that stochastic gradientsevaluated using only two samples (two being the smallest sample size for unbiased gradient es-timators) are suﬃcient to obtain robust designs in RTO. We derive the AC-MDSA update inthe (cid:96) -norm setting using the entropy function as the distance-generating function. The AC-MDSA algorithm is shown to exhibit stable convergence performance insensitive to various stepsize choices. In addition, several techniques, including an adaptive step-size recalibration scheme30nd an adaptive damping scheme, are developed to improve the convergence performance. Several2D and 3D numerical examples involving various geometries, problem sizes, uncertainties are pre-sented, demonstrating that the proposed AC-MDSA algorithm with only two samples eﬀectivelyand eﬃciently handles RTO problems involving various types of load uncertainties.In the simple column benchmark, the AC-MDSA with two samples produces designs with noworse objective function values than the MC method with 1000 samples for both robust designswith a low computational cost. The study on sample size shows that, although a larger numberof samples results in higher accuracy in sensitivity, two samples are suﬃcient to produce designswith similar objective function values. In addition, the AC-MDSA shows superior stability thanthe standard MDSA for a wide range of step sizes. The half circle example demonstrates thatAC-MDSA eﬀectively reﬂects various levels of robustness through geometric features and standarddeviation of compliance ˆ σ ( x ∗ ) of the ﬁnal designs. As κ (relative weight of mean and variance ofcompliance in the objective function) decreases, ˆ σ ( x ∗ ) become smaller consistently. The doublehook and disk examples show that the AC-MDSA can tackle various geometries and multiple inde-pendent random loads. The mesh size study further demonstrates the consistency of the AC-MDSAand insensitivity of algorithm parameters to various problem sizes. For the larger problem size( n = 114 , References [1] N. Aage, E. Andreassen, B. S. Lazarov, O. Sigmund, Giga-voxel computational morphogenesisfor structural design, Nature 550 (7674) (2017) 84–86. doi:10.1038/nature23911 .URL https://doi.org/10.1038/nature23911 doi:10.1007/s11831-015-9151-2 .URL https://doi.org/10.1007/s11831-015-9151-2 [3] A. Sutradhar, G. H. Paulino, M. J. Miller, T. H. Nguyen, Topological optimization for de-signing patient-speciﬁc large craniofacial segmental bone replacements, Proceedings of theNational Academy of Sciences 107 (30) (2010) 13222–13227. , doi:10.1073/pnas.1001208107 .URL [4] V. J. Challis, A. P. Roberts, J. F. Grotowski, L.-C. Zhang, T. B. Sercombe, Proto-types for bone implant scaﬀolds designed via topology optimization and manufacturedby solid freeform fabrication, Advanced Engineering Materials 12 (11) (2010) 1106–1110. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/adem.201000154 , doi:10.1002/adem.201000154 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/adem.201000154 [5] L. L. Beghini, A. Beghini, N. Katz, W. F. Baker, G. H. Paulino, Connecting architecture andengineering through structural topology optimization, Engineering Structures 59 (2014) 716– 726. doi:https://doi.org/10.1016/j.engstruct.2013.10.032 .URL [6] M. P. Bendsøe, O. Sigmund, Topology Optimization: Theory, Methods, and Applications,Springer Berlin Heidelberg, Berlin, Heidelberg, 2004. doi:10.1007/978-3-662-05086-6_1 .URL https://doi.org/10.1007/978-3-662-05086-6_1 [7] O. Sigmund, K. Maute, Topology optimization approaches, Structural and MultidisciplinaryOptimization 48 (6) (2013) 1031–1055. doi:10.1007/s00158-013-0978-6 .URL https://doi.org/10.1007/s00158-013-0978-6 [8] O. Sigmund, Manufacturing tolerant topology optimization, Acta Mechanica Sinica 25 (2)(2009) 227–239. doi:10.1007/s10409-009-0240-z .URL https://doi.org/10.1007/s10409-009-0240-z [9] F. Wang, B. S. Lazarov, O. Sigmund, On projection methods, convergence and robust formu-lations in topology optimization, Structural and Multidisciplinary Optimization 43 (6) (2011)767–784. doi:10.1007/s00158-010-0602-y .URL https://doi.org/10.1007/s00158-010-0602-y [10] A. Asadpoure, M. Tootkaboni, J. K. Guest, Robust topology optimization of structures withuncertainties in stiﬀness application to truss structures, Computers & Structures 89 (11)322011) 1131 – 1141, computational Fluid and Solid Mechanics 2011. doi:https://doi.org/10.1016/j.compstruc.2010.11.004 .URL [11] P. D. Dunning, H. A. Kim, Robust topology optimization: Minimization of expected andvariance of compliance, AIAA Journal 51 (11) (2013) 2656–2664. arXiv:https://doi.org/10.2514/1.J052183 , doi:10.2514/1.J052183 .URL https://doi.org/10.2514/1.J052183 [12] N. Kogiso, W. Ahn, S. Nishiwaki, K. Izui, M. Yoshimura, Robust topology optimization forcompliant mechanisms considering uncertainty of applied loads, Journal of Advanced Mechan-ical Design, Systems, and Manufacturing 2 (1) (2008) 96–107. doi:10.1299/jamdsm.2.96 .[13] J. Zhao, C. Wang, Robust structural topology optimization under random ﬁeld loadinguncertainty, Structural and Multidisciplinary Optimization 50 (3) (2014) 517–522. doi:10.1007/s00158-014-1119-6 .URL https://doi.org/10.1007/s00158-014-1119-6 [14] S. Chen, W. Chen, S. Lee, Level set based robust shape and topology optimization underrandom ﬁeld uncertainties, Structural and Multidisciplinary Optimization 41 (4) (2010) 507–524. doi:10.1007/s00158-009-0449-2 .URL https://doi.org/10.1007/s00158-009-0449-2 [15] Q. Zhao, X. Chen, Z.-D. Ma, Y. Lin, Robust topology optimization based on stochasticcollocation methods under loading uncertainties, Mathematical Problems in Engineering 2015(2015) 1–14.URL [16] Y.-C. Chan, K. Shintani, W. Chen, Robust topology optimization of multi-material latticestructures under material and load uncertainties, Frontiers of Mechanical Engineering 14 (2)(2019) 141–152. doi:10.1007/s11465-019-0531-4 .URL https://doi.org/10.1007/s11465-019-0531-4 [17] V. Keshavarzzadeh, F. Fernandez, D. A. Tortorelli, Topology optimization under uncertaintyvia non-intrusive polynomial chaos expansion, Computer Methods in Applied Mechanics andEngineering 318 (2017) 120 – 147. doi:https://doi.org/10.1016/j.cma.2017.01.019 .URL [18] J. Wu, J. Gao, Z. Luo, T. Brown, Robust topology optimization for structures under intervaluncertainty, Advances in Engineering Software 99 (2016) 36 – 48. doi:https://doi.org/10.1016/j.advengsoft.2016.05.002 .URL arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.6061 , doi:10.1002/nme.6061 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.6061 [20] A. Ben-Tal, A. Nemirovski, Robust truss topology design via semideﬁnite programming,SIAM Journal on Optimization 7 (4) (1997) 991–1016. arXiv:https://doi.org/10.1137/S1052623495291951 , doi:10.1137/S1052623495291951 .URL https://doi.org/10.1137/S1052623495291951 [21] J. K. Guest, T. Igusa, Structural optimization under uncertain loads and nodal locations,Computer Methods in Applied Mechanics and Engineering 198 (1) (2008) 116 – 124, com-putational Methods in Optimization Considering Uncertainties. doi:https://doi.org/10.1016/j.cma.2008.04.009 .URL [22] P. D. Dunning, H. A. Kim, G. Mullineux, Introducing loading uncertainty in topology op-timization, AIAA Journal 49 (4) (2011) 760–768. arXiv:https://doi.org/10.2514/1.J050670 , doi:10.2514/1.J050670 .URL https://doi.org/10.2514/1.J050670 [23] G. A. da Silva, A. T. Beck, E. L. Cardoso, Topology optimization of continuum structures withstress constraints and uncertainties in loading, International Journal for Numerical Methodsin Engineering 113 (1) (2018) 153–178. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.5607 , doi:10.1002/nme.5607 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.5607 [24] E. Holmberg, C.-J. Thore, A. Klarbring, Game theory approach to robust topology opti-mization with uncertain loading, Structural and Multidisciplinary Optimization 55 (4) (2017)1383–1397. doi:10.1007/s00158-016-1548-5 .URL https://doi.org/10.1007/s00158-016-1548-5 [25] J. Zhao, C. Wang, Robust topology optimization under loading uncertainty based on linearelastic theory and orthogonal diagonalization of symmetric matrices, Computer Methods inApplied Mechanics and Engineering 273 (2014) 204 – 218. doi:https://doi.org/10.1016/j.cma.2014.01.018 .URL [26] H. Robbins, S. Monro, A stochastic approximation method, Ann. Math. Statist. 22 (3) (1951)400–407. doi:10.1214/aoms/1177729586 .URL https://doi.org/10.1214/aoms/1177729586 arXiv:https://doi.org/10.1137/070704277 , doi:10.1137/070704277 .URL https://doi.org/10.1137/070704277 [29] B. T. Polyak, A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAMJournal on Control and Optimization 30 (4) (1992) 838–855. arXiv:https://doi.org/10.1137/0330046 , doi:10.1137/0330046 .URL https://doi.org/10.1137/0330046 [30] G. Lan, A. Nemirovski, A. Shapiro, Validation analysis of mirror descent stochastic ap-proximation method, Mathematical Programming 134 (2) (2012) 425–458. doi:10.1007/s10107-011-0442-6 .URL https://doi.org/10.1007/s10107-011-0442-6 [31] X. S. Zhang, E. de Sturler, A. Shapiro, Topology Optimization With Many Right-Hand Sides Using Mirror Descent Stochastic ApproximationReduction From Manyto a Single Sample, Journal of Applied Mechanics 87 (5), 051005 (02 2020). arXiv:https://asmedigitalcollection.asme.org/appliedmechanics/article-pdf/87/5/051005/6483901/jam\_87\_5\_051005.pdf , doi:10.1115/1.4045902 .URL https://doi.org/10.1115/1.4045902 [32] B. Polyak, Some methods of speeding up the convergence of iteration methods, USSRComputational Mathematics and Mathematical Physics 4 (5) (1964) 1 – 17. doi:https://doi.org/10.1016/0041-5553(64)90137-5 .URL [33] Y. Nesterov, A method for solving the convex programming problem with convergence rateadd 1/k square, Dokl. Akad. Nauk SSSR 269 (1983) 543–547.URL https://ci.nii.ac.jp/naid/10029942205/en/ [34] Y. Nesterov, Lectures on Convex Optimization, Springer Optimization and Its Applications,Springer International Publishing, Berlin, 2018.[35] G. Lan, An optimal method for stochastic composite optimization, Mathematical Program-ming 133 (1) (2012) 365–397. doi:10.1007/s10107-010-0434-y .URL https://doi.org/10.1007/s10107-010-0434-y arXiv:https://doi.org/10.1137/110848864 , doi:10.1137/110848864 .URL https://doi.org/10.1137/110848864 [37] S. Ghadimi, G. Lan, Optimal stochastic approximation algorithms for strongly convex stochas-tic composite optimization, ii: Shrinking procedures and optimal algorithms, SIAM Journalon Optimization 23 (4) (2013) 2061–2089. arXiv:https://doi.org/10.1137/110848876 , doi:10.1137/110848876 .URL https://doi.org/10.1137/110848876 [38] S. Ghadimi, G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochas-tic programming, Mathematical Programming 156 (1) (2016) 59–99. doi:10.1007/s10107-015-0871-8 .URL https://doi.org/10.1007/s10107-015-0871-8 [39] X. S. Zhang, E. de Sturler, G. H. Paulino, Stochastic sampling for deterministic structuraltopology optimization with many load cases: Density-based and ground structure approaches,Computer Methods in Applied Mechanics and Engineering 325 (2017) 463 – 487. doi:https://doi.org/10.1016/j.cma.2017.06.035 .URL [40] S. De, J. Hampton, K. Maute, A. Doostan, Topology optimization under uncertainty using astochastic gradient-based approach (2019). arXiv:1902.04562 .[41] K. Svanberg, Mma and gcmma two methods for nonlinear optimization (2007).[42] L. Pﬂug, N. Bernhardt, M. Grieshammer, M. Stingl, Csg: A new stochastic gradient methodfor the eﬃcient solution of structural optimization problems with inﬁnitely many states,Structural and Multidisciplinary Optimization 61 (6) (2020) 2595–2611. doi:10.1007/s00158-020-02571-x .URL https://doi.org/10.1007/s00158-020-02571-x [43] B. Bourdin, Filters in topology optimization, International Journal for Numerical Methodsin Engineering 50 (9) (2001) 2143–2158. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.116 , doi:10.1002/nme.116 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.116 [44] A. Diaz, O. Sigmund, Checkerboard patterns in layout optimization, Structural Optimization10 (1) (1995) 40–45. doi:10.1007/BF01743693 .3645] C. S. Jog, R. B. Haber, Stability of ﬁnite element models for distributed-parameter optimiza-tion and topology design, Computer Methods in Applied Mechanics and Engineering 130 (3)(1996) 203 – 226. doi:https://doi.org/10.1016/0045-7825(95)00928-0 .URL [46] O. Sigmund, P. J., Numerical instabilities in topology optimization: A survey on proceduresdealing with checkerboards, mesh-dependencies and local minima, Structural Optimization16 (1) (1998) 68–75. doi:10.1007/BF01214002 .[47] M. P. Bendsøe, Optimal shape design as a material distribution problem, Structural optimiza-tion 1 (1989) 193–202.[48] O. Sigmund, Morphology-based black and white ﬁlters for topology optimization,Structural and Multidisciplinary Optimization 33 (4) (2007) 401–424. doi:10.1007/s00158-006-0087-x .URL https://doi.org/10.1007/s00158-006-0087-x [49] E. Andreassen, A. Clausen, M. Schevnels, B. Lazarov, O. Sigmund, Eﬃcient topology op-timization in matlab using 88 lines of code, Struct Multidisc Optim 43 (1) (2010) 1–16. doi:10.1007/s00158-010-0594-7 .[50] L. Bregman, The relaxation method of ﬁnding the common point of convex sets and itsapplication to the solution of problems in convex programming, USSR Computational Math-ematics and Mathematical Physics 7 (3) (1967) 200 – 217. doi:https://doi.org/10.1016/0041-5553(67)90040-7 .URL [51] R. Y. Rubinstein, D. P. Kroese, Simulation and the Monte Carlo Method, 3rd Edition, WileyPublishing, 2016.[52] S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi, Optimization by simulated annealing, Science220 (4598) (1983) 671–680. arXiv:https://science.sciencemag.org/content/220/4598/671.full.pdf , doi:10.1126/science.220.4598.671 .URL https://science.sciencemag.org/content/220/4598/671 [53] P. Salamon, P. Sibani, R. Frost, Facts, Conjectures, and Improvements for Simulated Anneal-ing, Society for Industrial and Applied Mathematics, Philadelphia, 2002.[54] K. Svanberg, The method of moving asymptotesa new method for structural optimiza-tion, International Journal for Numerical Methods in Engineering 24 (2) (1987) 359–373. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.1620240207 , doi:10.1002/nme.1620240207 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.1620240207 doi:10.1007/s00158-011-0696-x .URL https://doi.org/10.1007/s00158-011-0696-xhttps://doi.org/10.1007/s00158-011-0696-x