Accelerating engineering design by automatic selection of simulation cases through Pool-Based Active Learning
AAccelerating engineering design by automaticselection of simulation cases through Pool-BasedActive Learning
Jos´e Hugo Gaspar Elsas , Nicholas A. G. Casaprima , and IvanF. M. Menezes Tecgraf Institute, PUC-Rio Department of Mechanical Engineering, PUC-RioSeptember 18, 2020
Abstract
A common workflow for many engineering design problems requiresthe evaluation of the design system to be investigated under a range ofconditions. These conditions usually involve a combination of several pa-rameters. To perform a complete evaluation of a single candidate config-uration, it may be necessary to perform hundreds to thousands of simula-tions. This can be computationally very expensive, particularly if severalconfigurations need to be evaluated, as in the case of the mathematicaloptimization of a design problem. Although the simulations are extremelycomplex, generally, there is a high degree of redundancy in them, as manyof the cases vary only slightly from one another. This redundancy can beexploited by omitting some simulations that are uninformative, therebyreducing the number of simulations required to obtain a reasonable ap-proximation of the complete system. The decision of which simulationsare useful is made through the use of machine learning techniques, whichallow us to estimate the results of “yet-to-be-performed” simulations fromthe ones that are already performed. In this study, we present the resultsof one such technique, namely active learning, to provide an approximateresult of an entire offshore riser design simulation portfolio from a subsetthat is 80% smaller than the original one. These results are expected tofacilitate a significant speed-up in the offshore riser design.
Modern engineering designs place a significant emphasis in simulation tech-niques in order to estimate the behavior of a given system under a wide varietyof circumstances. Simulating all the relevant conditions the system would be1 a r X i v : . [ c s . C E ] S e p ubject to, prior to experiments or deployment, is the ultimate goal. Advancesin computing power have allowed the routine use of finite element and finite vol-ume simulations for a range of applications, even with modest computationalresources. Nevertheless, it can still be challenging to run multiple simulationson a limited time horizon.Many engineering design problems require the simulation of the system undera range of boundary conditions, referred in some contexts as loading conditions,which can depend on more than one parameter. For example, to explore thedesign of a Y-junction mixer, we may need to investigate the junction behavior asa function of flow rates in both inlets. This requires simulations to be performedfor combinations of parameters covering the region of interest for the two flowrates; for each combination, a computational fluid dynamics (CFD) calculationneeds to be performed, which in itself is extremely expensive. This impliesthat a full exploration of both parameters can be seriously challenging, if notprohibitively expensive.In recent years there has been increasing interest in combining machine learn-ing techniques with traditional engineering techniques , such as physical simu-lation and mathematical optimization. One of the focus point for this nexus oftechniques has been the exploration of large parameters spaces in search for rareor counter-intuitive optimal configurations. Reviews for applications of machinelearning in engineering design[21, 13] and material design [8] exemplify .With regard to using machine learning to explore parameter spaces in acomputationally inexpensive manner, two notable examples can be cited. Pe-terson et al. [17] used Random Forest regression trained over a set of 60,000simulations sampled with Latin hypercube. They created a model capable ofexploring the parameter space of inertial confinement fusion targets and foundoptimal, albeit counter-intuitive configurations through optimization. Faber etal. [3] used ridge kernel regression to compute the formation energies of 2 mil-lion crystals from density functional theory simulations, starting from a subsetof 10,000 simulations. However, this in itself would be extremely challenging ifthe simulations are required for every material.For the design of off-shore oil risers in ultradeep-water conditions investi-gated in this study, most of the mechanical loading is defined by (i) the seacurrents in the installation region of the riser, (ii) the sea waves that impact thefloating unit to which the riser is attached. Sea currents may change through-out the year, and hence, several currents must therefore be investigated. Thereis no simple way to specifying all possible currents a riser will be subject to,therefore using a list of historically observed currents is necessary. Similarly, itis necessary to specify a list of waves using historical and meteorological data.For this scenario, the loading cases are constructed as lists of combinationsof sea currents and waves, which are defined by several parameters, such asspeed and direction of water at several depths, for currents, and amplitude,direction, and frequency in the case of waves. Many of these combinations maybe redundant because the mechanical tensions in the riser (the output fromthe analyses) of many of these simulations can be estimated from the resultsof other simulations. Therefore, we seek to investigate a more efficient way to2earch the parameter space by avoiding the execution of redundant simulations.The challenge here lies in finding the most informative cases, which allow for arapid convergence of the prediction of the remainder of the list.A subject of considerable interest in this field is how to enable the mathe-matical optimization of the riser, while investigating the widest range of loadingconditions. This would require executing all simulations for each configurationgenerated by the optimization procedure. Currently, very often, this is pro-hibitively expensive, owing to the very high execution time required to run allthe simulations. One compromise is to enable the use of optimization is to hand-pick the loading cases to be run during the optimization. If a critical loadingcase is not added to the optimization, the optimized design may fail during alater validation phase, and this would require restarting the entire optimizationprocess.A viable alternative to run all simulations is to dynamically choose the mostinformative simulations, and infer the results of the cases that are not executed.This combination would allow us to indirectly use the full portfolio without theassociated cost, thereby circumventing the limitations presented by manuallychoosing a smaller portfolio. Active learning is the technique we demonstratehere to perform such an inference and choice process [19]. It is a sub-fieldof machine learning, in which the learning algorithm supports or dictates thesampling decisions for further data, dealing with both labeled and unlabeleddata.In the parlance of active learning, the simulation portfolio represents a poolof “unlabeled” data, in which the simulations results have yet to be calculated,which must be “queried.” Each query represents an expensive operation,i.e.,running a simulation, which we want to use as sparingly as possible in order tosave costs. To achieve the same, we need to find an intelligent way to samplethe “yet-to-be-run” simulations in the pool. We also wish to demonstrate thatthe improvements are not merely a result of having more points available toestimate the un-run simulations, thereby showing that a good query criterionimproves the convergence of the regression.The rest of this paper is structured as follows: In Section 2, we present thebasic theory of active learning, including the concepts of pool-based learning,Gaussian-process regression and uncertainty Sampling. In Section 3, we presentthe offshore Riser design problem, including the most common riser configu-ration and the definitions of currents and waves. We conclude this section bydiscussing how to prepare the data to be used in the regression. In Section 4,we present the results for the active learning selection for each of the variablesof interest separately and for all of them combined. Finally, in Section 5, wepresent a perspective of the work and some closing remarks. The classical problem of active learning is designed to solve with is how toperform data acquisition in which the labeling process might be expensive, slow3r complex. A comprehensive literature survey can be found in Settles [19], onwhich this work is based. In the active learning problem there are four maincomponents, namely (i) the labeled training set, (ii) the machine learning model,(iii) the unlabeled pool/unexplored space and (iv) the “oracle”.The machine learning model performs some form of supervised learning overan initially labeled dataset, which is the training set. The trained model isused to predict the results of some desired variable,i.e., the label(s), over theunlabeled pool which can then be searched to produce a query candidate to besent to the oracle. The oracle returns the exact label label of the queried point,which can be then added to the labeled set, and therefore, used to augment thetraning data. This cycle repeats itself.In the context of engineering design, the points represent parameters insome variable space that determine either the design variables or, in our case,the boundary conditions, which the system is subject to. The role of the oracleis performed by the simulation software, in which the boundary conditions areinput. The software, in turn, returns the resulting mechanical loads. The labelsthemselves are the results of the simulation, which in the case of our problemare not labels in the sense of classification problems, but rather target variablevalues as in regression problems. Finally, the un-labelled pool is the list of yet-to-be-run simulations. A complete diagram of the active learning loop can beseen in Figure 1. 4igure 1: Schematic of the active learning loop. The components of the activelearning are the ’labeled pool’, or dataset, of size n ( D n ), ’Unlabeled pool’ ( U ),the machine learning model and the Oracle. The physics solver fulfills the roleof ’Oracle’ in the active learning literature. Picture of the physics solver is fromthe program Anflex [11] for analysis of rigid and flexible risersThe initial training set can be formed by randomly selecting cases from thecomplete list or by running simulations sampled by some type of a “designof experiment”, for example, the Latin Hypercube sampling [10]. This cycleallows us to sequentially run the simulations in such a way that the quality ofthe predictions of the yet-to-be-run simulations (unlabeled pool) is maximized.To be able to predict the value of the labels over the un-labeled pool, we need amachine learning model to perform the regression over those points, which weapply Gaussian process regression. Gaussian Process regression [18] is an efficient technique for interpolationsthat approach arbitrary functions, while providing estimations for both the func-tion and the uncertainty in such estimation. In the context of active learning,the ability to provide easy estimations of uncertainty is particularly useful, aswill be discussed in Section 2.2. In this study, Gaussian process regression per-forms the role of the machine learning model in the active learning procedure.The Gaussian process regression expresses the problem of estimating thefunction value by assuming that the objective function is drawn from a Gaus-sian process. This means that the distribution over any set of values of the5unction on n points D n = { f ( x ) , f ( x ) , ..., f ( x n ) } , is a joint multivariate nor-mal distribution, N ( µ n , Σ n ). The mean vector and covariance matrix definedas ( µ n ) k = m ( x k ) = m ( x ) k and (Σ n ) ij = k ( X , X ) ij = k ( x i , x j ), respectively.The function m ( x ) is the mean, and k ( x , x (cid:48) ) the covariance of the Gaussian pro-cess. The statement that f ( x ) is modeled by the Gaussian Process is writtenas f ( x ) ∼ GP ( m ( x ) , k ( x , x (cid:48) )).The Gaussian process for regression is performed by computing the con-ditional probability, P ( f ( x ) |D n ), of the value of a point f ( x ) given the valueof the previous n points, D n . The conditional probability is itself a normaldistribution; its parameters are given by f ( x ) |D n ∼ N ( µ ∗ n ( x ) , σ ∗ n ( x )) (1) µ ∗ n ( x ) = m ( x ) + k ( x , X ) k ( X , X ) − ( Y − m ( X )) (2) σ ∗ n ( x ) = k ( x , x ) − k ( x , X ) k ( X , X ) − k ( X , x ) (3)where Y is the vector of observed evaluations, Y k = f ( x k ), or Y k = f ( x k ) + (cid:15) k in the case of noisy evaluations with iid noise, (cid:15) k ∼ N (0 , σ ); k ( X , x ) is acolumn vector in k ( X , x ) k = k ( x k , x ); k ( x , X ) is the transpose vector; giventhe covariance k used to model the objective function, k ( X , X ) = Σ n is thecovariance matrix associated with dataset D n . The quality of the interpolationpractically depends on the choice between the functions m and k in encoding theassumption on the behavior, especially the smoothness of the modeled function, f ( x ).The covariance alone , is very often, sufficiently powerful to model the entirefunction; hence, a common choice is to set m ( x ) = 0. Nonetheless, the choiceof covariance function k ( x , x (cid:48) ) cannot be avoided and constitutes one of themodeling building blocks for the Gaussian process regression, and consequently,the success of the active learning process.For Gaussian Process Regression, training approximately corresponds tocomputing the product k ( x , X ) k ( X , X ) − . For other machine learning mod-els this could correspond to the gradient descent procedure, as is the case ofneural networks, or any other training procedure. The key idea is to train overthe set of points that have been already labeled, i.e. the points over which thesimulation have already been run.Gaussian Process regression is particularly effective in cases where the num-ber of points to be used in the training set is small or limited, under whichmost learning algorithms struggle. Because the goal of accelerating engineeringdesign by selecting most informative simulations entails having the fewest sim-ulation available for regression, it is reasonable to use the most efficient methodavailable. One of the main limitations of Gaussian Process regression, owing toits being an intrinsically non-parametric learning method, is the sharp rise incomputational cost as the number of training points increases. However, this isnot a problem in our case. 6 .2 Uncertainty Sampling Once the model is trained, it can be used to predict the values of the results ofthe physics simulations, i.e. to perform inference of the labels, over the unlabeledpool of possible yet-to-be-run cases. Each inference produces a prediction of thevalue of the physical variable computed by the physics solver in addition to animplied uncertainty. This uncertainty express, from a Bayesian point of view,a confidence interval for the predicted variables. This entails the amount ofinformation that can be gained by running the respective simulation.To produce a candidate query, one must choose at least one candidate fromthe unlabeled set U , which essentially amounts to optimizing some selectioncriterion over the set U .A very simple and commonly used query criterion is called “uncertaintysampling” [7], in which the inferred uncertainty by the machine learning modelis used as the objective function to optimize, which is a form of Information-Based Objective Function, as described in the seminal work by MacKay[9].Therefore, for our finite dataset U , the optimal query amounts to finding thepoint to be queried with the maximum inferred uncertainty x ∗ us : x ∗ us = argmax x i ∈U σ n ( x i ) (4)For classification problems, alternative objective functions may include clas-sification entropy [20] which is a measure of information, to encode a givendistribution from the data already available. If a significant amount of informa-tion is not required to encode a new case from the old ones, the new case maynot be very informative if it were to be acquired. In this study, we only useduncertainty sampling, which is elaborated in Section 4. To demonstrate the effectiveness of the loading case selection we ran theprocedure on a reference dataset, in which all simulations were run and theactive learning loop was executed. The difference between predicted and exactvalues in the unlabeled set U was computed at each iteration. With the exactvalues, it is possible to compute how the implied uncertainty, represented bythe predicted standard deviation, relates to the error of the machine learningprediction. If it were possible to bound the error by a function of the implieduncertainty, we can deduce that it is possible to omit simulation as long as theerror is kept under control.In Section 3.1 we will discuss the physical system used as an example inthis study, which is a Steel Lazy Wave Riser (SLWR) in a ultra-deep waterenvironment, including references for the definitions of the physical variables ofinterest and the physics solver used to produce the simulation data. In Section3.3, we will present details on how the input parameters for the simulations weredefined, including how sea currents and waves are parametrized.7ext, it is essential to format the data to be used in the machine learningin such a way that the feature vectors x and target labels y can be fed tothe model. The feature vectors’ x component are, mostly, normalized inputparameters, and we present the normalization used in this work in Section 3.4.Targets are also normalized for some of the variables, details will be discussedin Section 4. In this study, we apply active learning to the design of production riserin a SLWR configuration. Production risers are pipes, which can be rigid orflexible, used to flow oil and gas from the wellhead to the floating unit. Thereare multiple possible deployable configurations (Figure 2), and the decision asto which one fits best is based on water depth, environmental conditions anddetermined by waves and currents, as well as cost.The simplest and easiest to deploy is the free hanging catenary, in which therisers are simply attached to the floating unit at one end and to the wellhead atthe opposite end without any buoys. As the water depth increases and the envi-ronmental conditions get harsher, this configuration may no longer be feasible.This is because the mechanical stresses increase and lead to structural prob-lems, such as buckling in the region where the riser touches the seabed, knownas the touchdown zone (TDZ). Studies indicate [6] that the riser configurationthat include buoyancy elements, e.g., SLWR, have milder stresses than thosewithout buoyancy, e.g., free hanging catenary. Therefore, configurations suchas the SLWR are more suitable for deployment in conditions found in ultra-deepwaters.Figure 2: Riser configurations, taken from Cardoso et al.[1] adapted fromClausen and D’Souza[2].The SLWR configuration (Figure 3) is defined mainly by the top angle ( θ top ),the length of the top segment ( L L L
3) and the buoys’ dimensions, such as theirlength ( L f ), diameter ( D f ) and inter-buoy spacing ( SP ). In order for a con-figuration to be considered feasible, it must keep key mechanical stress metricsunder control in a variety of different environmental conditions.8igure 3: SLWR configurationTo assess the behavior of the riser under these conditions, in principle, sim-ulations must be run for each of the possible combination of waves and currentsthat are on record. The currents and waves investigated in the riser design arespecified by historical data and are not controllable by the engineers responsiblefor the project, which motivates the application of pool-based active learningto this problem. The number of combinations of waves and currents can exceed1000 finite element analysis.Specifically, the case presented on this paper is a SLWR deployed in alocation with a water depth of 2200 m and the horizontal distance between theconnections on the floating unit to the connection on the wellhead is 3001 m.The riser studied is composed of rigid pipes with properties described on table1: Inner diameter 0 . m Outer diameter 0 . m Coating Thickness 0 . m Young’s modulus 207 GPaDensity 7919 kg/m Yield strength 404 .
16 MPaTensile strength 483 .
84 MPaTable 1: Pipe properties
Each unlabeled pool is generated from a riser configuration by performing aseries of simulation with varying loading conditions. For the same loading condi-tion, different riser configurations perform differently, which therefore allows us9o explore how dependency on the particular configuration affects performanceof the active learning loop. Using the parameters shown in Figure 3, the Riserswe used to generate the data are described in Table 2:Param. Conf. 0 Conf. 1 Conf. 2 Conf. 3 Conf. 4 Conf. 5 θ top ( o ) 7.0 7.919 7.923 7.997 7.989 8.000 L (m) 1421.500 1019.069 973.240 676.622 855.062 108.000 L (m) 828.000 463.364 502.765 541.291 563.301 991.379 L (m) 2325.000 2238.995 2265.995 2252.889 2347.458 2480.278 D f (m) 2.400 2.788 2.808 2.795 2.838 3.000 L f (m) 3.000 2.585 2.564 2.437 2.538 2.293 SP (m) 12.000 12.897 12.925 13.300 13.439 14.647Table 2: Parameters characterizing the Riser Configurations utilized. Loading cases are used to establish the environmental impacts on the risersand are therefore, vital to the accuracy of the simulations. A loading case isa combination of a current and a wave, which defines one of the states to bestudied. For a study to be representative of a real deployed case, it shouldcontain information from the location where the riser is to be deployed, andthe results, such as mechanical stresses and norm factors, should all be under asafety criteria or material limits.
Currents are defined by the velocity profile over the water depth, as shown inFigure 4. The velocity profile is expressed discretely and three main attributesmust be defined, namely the water depth of each step of the discretization, andthe corresponding direction and magnitude of the water velocity. An exampleof a sea current is shown in Figure 4, and the actual data defining a current isgiven in Table 2. For the purpose of this work, currents are considered as staticloads, which do not vary during the simulation.10epth (m) Velocity (m/s) Direction Angle ( θ )0 0.46 SW 225 o
50 0.46 SW 225 o
100 0.46 SW 225 o
150 0.42 SW 225 o
200 0.40 SW 225 o
250 0.39 WSW 247 . o
300 0.39 WSW 247 . o
350 0.37 W 270 o
375 0.36 W 270 o
800 0.41 NW 315 o o o o o Table 3: Example table defining a currentFigure 4: Illustrative representation of a current velocity profile
Oceanic waves on the surface are usually irregular, varying in length, di-rection, and height. To model such behavior, waves are treated as stochasticprocess with a parametrized spectrum. One of the most frequently used spectrafor modeling is the Joint North Sea Wave Observation Project (JONSWAP)model [5]. The power spectrum of the wave in this model is defined as:11 ( ω ) = αg ω exp (cid:20) − (cid:16) ω p ω (cid:17) (cid:21) γ r (5) r = exp (cid:20) − ( ω − ω p ) σ ω p (cid:21) (6)This model has free parameters given by a base coefficient α , dominant an-gular frequency ω p , secondary base γ and secondary width σ , along the direction θ from which the wave is traveling from. γ is sometimes called the enhancementfactor.In this study, the waves were defined by inserting five main properties: thewave’s height, period, direction and spectrum. With this information, the simu-lation algorithm is capable of calculating the spectrum coefficients. An exampleof a set of wave parameters is shown in table 4:Height (m) Period (s) Azimuth ( θ az ) α γ o H , dominantperiod T p , azimuthal angle θ , and power spectrum parameters α and γ The result of the simulation of the riser dynamics are two sets of variablesrelated to the mechanical tension of the riser during the simulation. The firstvariable is the maximum Axial tension (FX) on the top of the riser over thecourse of the simulation. This variable is meant to be negative denoting thefact that the riser should always pulls down on the floating unit.The second variable returned by the simulation was the maximum normal-ized tension, named DNVUF-201, along the entire riser, computed over thecourse of the simulation. The pipe elements are subjected to bending momentand effective tension; these were used to define a variable called “DNV Utiliza-tion Factor 201” (DNVUF-201), which a combined loading criteria defined bythe offshore standard DNVGL-ST-F201 [4] as follows: γ SC γ m | M d | M k (cid:115) − (cid:18) p i − p p b (cid:19) + (cid:18) T e T k (cid:19) + (cid:18) p i − p o p b (cid:19) if p i > p o ( γ SC γ m ) (cid:34) | M d | M k + (cid:18) T e T k (cid:19) (cid:35) + (cid:18) p o − p min p c (cid:19) if p i ≤ p o (7)where γ SC = material safety class factor, γ m = material resistance factor, M d = bending moment, M k = plastic bending moment, T e = effective tension,12 k = plastic axial force, p i = internal pressure, p o = external pressure, p b =burst pressure (for p i > p o ), p c = collapse pressure (for p i < p o ) and p min =minimum internal pressure. In a proper installation, in which no excessivetension is applied to the riser, it is expected that the maximum of DNVUF-201satisfies max DNVUF-201 ≤ To feed the machine learning model, the data must be in an appropriateformat for use in Gaussian process regression. To compute the covariance valuesusing a standard kernel, such as Gaussian or Mat`ern kernels, categorical featuresmust be converted into some form of real variable value. In our case, the majortransformation was to convert compass direction into angular information, andthen into horizontal and vertical components of a vector.The target variables for these simulations are the values of the DNVUF-201and the value of FX axial force at the top section of the riser, where the pipeconnects to the floating unit. To this end, we normalized the FX variables overthe dataset D n , meaning that we repeated the normalization at each iteration ofthe active learning process. In doing so we only used the data of the simulationsthat would have been performed for normalization.We did not normalize the DNVUF-201 for the end case. Normalizing thedata either through computing its logarithm or through typical mean subtrac-tion and standard deviation normalization did not improve the performance ofthe active learning algorithm and hence, was not considered. As mentioned previously, currents are defined by specifying their velocityfield at different depths. In its original form, the data format cannot be eas-ily processed by a machine learning model. To address this shortcoming, wetransformed the information in the table into a single array that contained theinformation from all three columns. For vector fields, it is natural to use the L -norm to compute the similarity of two fields. A sea current is a vector field u : [ a, b ] → R , which defines the x and y components of the water velocity ateach depth.For a velocity field that is specified on a set of nodes { x , x , ..., x n } ⊂ [ a, b ],in which u i = u ( x i ), it is possible to approximate the L -norm of the velocityvector u by a Riemann sum as: 13 ba || u ( x ) || dx ≈ n − (cid:88) i =0
12 ( x i +1 − x i )( || u i || + || u i +1 || ) = (8)= (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:114) x − x u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + n − (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:114) x i +1 − x i − u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:114) x n − x n − u n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = n (cid:88) i =0 || v i || Equation (8) presents a natural way to re-write the information containedin tables such as in Table 3. Compass points are converted into angles θ , andthen horizontal and vertical components of the velocity field are computed as u x = u cos θ and u y = u sin θ , respectively, where u in the absolute value ofthe velocity, as informed by the column “velocity”. Then, it is necessary tore-scale the velocity components according to the coefficients in Equation (8).This allow to define x and y values for velocity components at different depths,i.e. { vx0 , vx50 , vx100 , ..., vx2200 } and { vy0 , vy50 , vy100 , ..., vy2200 } . As in ourcase, all currents are specified at the same depths, the information about thedepth itself can be made purely positional, thereby allowing the first column tobe suppressed after the re-scaling has been performed.The set { vx0 , vx50 , ..., vx2200 , vy0 , vy50 , ..., vy2200 } comprises the sea cur-rent contribution to the feature vector utilized in the training and inferencesteps for each of the loading cases. The values obtained for the current featurevectors were further normalized to make the means zero and standard deviationsunitary over the set of loading conditions.As mentioned in Section 3.3.2, the waves were defined by the JONSWAPmodel parameters. An example wave can be seen in Table 3. The dominantangular frequency is computed as ω p = 2 π/T p . To prepare the data for themachine learning model, we absorbed the direction information by multiplyingby coefficient α as follows: α x = α cos θ az and α y = α sin θ az . The remainingparameters are used without any modification. Therefore, the feature vectorcomponents related to the sea wave were { α x , α y , σ, γ, ω p } . To produce the base data comprising the results of the mechanical responseof the risers to the several sea current and waves cases, we utilized the Anflexsimulation software [11]. Anflex is a computational program for nonlinear, bothstatic and dynamic analysis of risers based on the Finite Element Method.The active learning loop, including the pre-processing of the simulation pa-rameters to turn them into proper feature vectors, was written using a com-bination of the numpy [12], pandas [14, 22] and scikit-learn [15] opensourcepython libraries. An example dataset, together with Jupyter notebooks [16]that implement the active learning procedure are available in the url: https://git.tecgraf.puc-rio.br/jhelsas/active-learning-loading-case-selection14he kernel function utilized was a pure Gaussian, in which the width andthe constant coefficient are available for the library to optimize over the trainingset. The errors were computed after transforming the data back to the originalnormalization of the data, instead of using the normalized data.The dataset as a whole was comprised of 526 simulations for 6 different riserconfiguration. The feature vector for each loading case was had 33 components,5 coming from the definition of the wave and 28 coming from the definition of thecurrent. There were 6 target variables that models were trained to predict, whichcorrespond to the DNVUF-201 and FX variables for each loading condition. Thetraining dataset increases along the active learning loop, starting at 25 elementsand going up to 325 elements. The test dataset, which is the unlabeled set U ,shrinks from 501 elements to 201 elements. We divide the results in two case studies. In Section 4.1 we present aninvestigation of the case, in which the uncertainty and selection is done for eachtarget variable separately. Next, in Section 4.2, we present the case in whichone selection was performed for all variables simultaneously and the uncertaintywas computed jointly for all variables. The second case was more reflective ofreality, because the simulation returns more than one variable of interest..Finally, in Section 4.3, we present our investigation on whether the conver-gence was on account of a smart choice from the unlabeled dataset or was onlyan volume effect owing to the number of simulations run. To do so we comparedthe measures of error with those of the case, in which the sampling was purelyrandom and therefore, did not have any contribution from the uncertainty sam-pling component.For a single variable k , the machine learning model predicts µ kn ( x i ) , σ kn ( x i )for x i ∈ U which have several uncertainties, one for each physical quantityreturned by the simulation. To be able to perform an uncertainty sampling, asingle variable must be chosen to represent the quantity being optimized. Inprinciple, all variables can be potential candidates. This poses a difficulty inextending the sampling procedure to more than one variable.Therefore, instead of attempting some sort of multi-objective optimization,we computed the simplest variable that aggregate the uncertainty from all vari-ables, i.e., the geometric mean of the uncertainties ¯ σ n ( x i ) = (cid:0) Π Kk =0 σ kn ( x i ) (cid:1) /K .This allowed us to perform regular uncertainty sampling over this averagedquantity in the same way we would for a regular uncertainty. We demonstratein Section 4.2 that this choice did not degrade the performance of the activelearning procedure in converging the regression over the unlabeled dataset U .We present, in Table 5, a sample of the values of all target variables on U mentioned in Section 4.2. Here, both the exact and predicted values are shownside by side. It is clear from Table 5 that it is possible to get reasonably goodapproximations to the values of the simulations through the inference process,which is one of the central tenets of this work.15NVUF-201 criterion EmptyCase Exact Predicted Error (Abs.) Error (%) Std.Dev. -3743.67 -3755.80 12.1212 0.323780 21.1400 -2456.17 -2406.77 49.4048 2.011456 37.6659 -3179.94 -3113.61 66.3339 2.086011 54.6702 -4479.17 -4268.80 210.3750 4.696736 27.6913 -4080.17 -3968.35 111.8178 2.740516 50.7698 -3540.84 -3589.40 48.5529 1.371224 57.7834 -3838.46 -3827.99 10.4802 0.273032 19.4366 -4451.07 -4438.07 12.9975 0.292009 31.0689 -3156.19 -3217.21 61.0162 1.933222 41.9970 -4310.14 -4294.23 15.9168 0.369288 40.3006Table 5: Comparsion of exact and predicted values for the DNVUF-201 and FX,jointly sampled together with the other variables, as explained in Section 4.2.The results are presented for an empty riser interior. The table was extractedfrom data, in which D n was initialized with 25 randomly selected cases and theactive learning loop was then iterated for another 75 times. This took the totalnumber of points on the “ran simulations” dataset D n to 100 points.To quantify the quality of the approximation and measure the speed of con-vergence, we computed several deviation measures from the data; firstly, wecompared the root mean square (RMS) error against the root mean square im-plied standard deviation (Std.Dev.) and maximum standard deviation, in whichthe averaging process was performed over the yet-to-be run simulations U . If itwas possible to approximate or bound the RMS error, which depends on databeyond the labeled dataset D n , by an affine combination of the RMS and max-imum implied Std.Dev., then controlling the desired error over U was possiblewith only the data on the labeled dataset D n . To fully control the error, itwas also necessary to bound the maximum error over U , using one such affinecombination. Sections 4.1 and 4.2 establish exactly these bounds for the cases,in which each variable was queried separately, whereas, here, all variables are16ointly queried.Although there was some some volume effect allowing for reasonable results,judicious choice of the query candidate had a significant impact on the overallquality of the result. To demonstrate this observation, in Section 4.3, we com-pare the RMS error and Std.Dev. for active sampling and for random sampling.Ultimately, the amount of time that can be saved in the engineering design isa function of the uncertainty that can be tolerated in the inference process. If noerror can be tolerated, such as in a critical validation phase, all simulations mustbe run. If some error can be tolerated, as is the case in a intermediary step,it is possible to use the regression process to skip uninformative simulations.Therefore the potential speedup can only be assessed from the analysis of theconvergence of the inference process. To exemplify and explore the effect of the methodology presented in thiswork, we ran the standard uncertainty sampling over the values of DNVUF-201and FX variables for the three filling cases, namely empty, water and mixtureof water and oil (mean). This provides a baseline case for how active learningworks in our problem. It also allows us to have a reference for the convergencecurves in the simplest case, against which the results presented in Section 4.2can be compared.The values we computed were the RMS error (cid:15)
RMS , D n , the maximum er-ror (cid:15) Max , D n , and the corresponding implied standard deviations σ RMS , D n and σ Max , D n over U .Furthermore, all the results were based on 6 different riser configurations and n = 33 runs with different seeds for the initial sampling, which we averaged overto produce the curves. In addition, we computed the 95% confidence intervals,with each variable being sampled alone. The results can be seen in Figures 5and 6: 17igure 5: Measures of deviation for the predictions of DNVUF-201 in relation n . The left-side plots show (cid:15) DNVRMS , D n , σ DNVRMS , D n and σ DNVMax , D n . The right-side plotsshow (cid:15) DNVMax , D n against multiples of σ DNVMax , D n . First line refers to empty, the secondto mean and third to Water fillings. The curves correspond to average over allthe seeds and configurations, and shaded areas correspond to 95% confidenceintervals. Sampling was done separately for each variable.18igure 6: Measures of deviation for the predictions of FX in relation n . Theleft-side plots show (cid:15) FXRMS , D n , σ FXRMS , D n and σ FXMax , D n . The right-side plots show (cid:15) FXMax , D n against multiples of σ FXMax , D n . First line refers to empty, the second tomean and third to Water fillings. The curves correspond to average over allthe seeds and configurations, and shaded areas correspond to 95% confidenceintervals. Sampling was done separately for each variable.From the left-side graphs in Figures 5 and 6, we can observe that theRMS error decreased monotonically, as the number of samples increased. Fur-thermore, we observe that the RMS error could be reasonably bounded by (cid:15) DNVRMS , D n < σ DNVRMS , D n + σ with 0 . ≤ σ ≤ .
15, and (cid:15)
DNVMax , D n < kσ DNVMax , D n for4 . ≤ k ≤
6. For axial tension, (cid:15)
FXRMS , D n ≈ ( σ FXRMS , D n + σ FXMax , D n ) /
2, while beingmostly bounded by σ FXMax , D n . Also (cid:15) FXMax , D n < kσ FXMax , D n for 6 . ≤ k ≤ . D n .Considering reasonable RMS errors (cid:15) DNVRMS , D n < .
025 and for Axial Tensionaround (cid:15)
F X
RMS , D n <
200 N, it is possible to obtain an inference over the entireunlabeled dataset U with only 100 evaulations, while also keeping the maximumerror (cid:15) DNV
Max , D n < .
10 and (cid:15)
F X
Max , D n < The primary interest, as explained in Section 1, was to be able to run smartsequential simulations that return multiple results, so that values of all thevariables produced by the simulation can be jointly inferred in the most efficientway possible. To do so, we adopted the standard uncertainty sampling, usingthe geometric mean of the uncertainties of all six target values as uncertainty.These were the values of DNVUF-201 and FX for the 3 filling cases, empty,water and mixture of water and oil (mean).In this case, the error values are done for all variables on a single samplingsequence, with the points used to compute the inference over the same set U for all target variables. The results are presented in the same format as inSection 4.1. However, they represent represent the results of the joint samplingof all variables. All results were based on n c = 6 different riser configurationsand n = 33 runs with different seeds for the initial sampling, which were thenaveraged over to produce the curves and also compute 95% confidence intervals.The results for this case can be seen in Figures 7 and 8:20igure 7: Measures of deviation for the predictions of DNVUF-201 in relation n . The left-side plots show (cid:15) DNVRMS , D n , σ DNVRMS , D n and σ DNVMax , D n . The right-side plotsshow (cid:15) DNVMax , D n against multiples of σ DNVMax , D n . First line refers to empty, the secondto mean and third to Water fillings. The curves correspond to average over allthe seeds and configurations, and shaded areas correspond to 95% confidenceintervals. Sampling was done separately for each variable.21igure 8: Measures of deviation for the predictions of FX in relation n . Theleft-side plots show (cid:15) FXRMS , D n , σ FXRMS , D n and σ FXMax , D n . The right-side plots show (cid:15) FXMax , D n against multiples of σ FXMax , D n . First line refers to empty, the second tomean and third to Water fillings. The curves correspond to average over allthe seeds and configurations, and shaded areas correspond to 95% confidenceintervals. Sampling was done separately for each variable.A majority of the results obtained in Section 4.1 remain valid, as can beseen in Figures 7 and 8. We continue to have the monotonic decay of the RMSerror, and the mostly monotonic decay of the maximum errors. Furthermore, wecontinue to have the ability to approximate and bound those errors. In addition,a majority of the bounds still apply as is in this case, therefore showing thatthe performance haven’t been too degraded by changing from querying eachvariable separately to querying them together.22s mentioned in Section 4.1, considering reasonable RMS errors for DNVUF-201 with only 100 evaulations, while also keeping the maximum error undercontrol over all yet-to-be run simulations imply that these results represent areduction of more than 80% of the number of simulations required, comparedto the total number of simulations, which was 512. If more relaxed tolerancescould be accepted, such as (cid:15) DNVRMS , D n < .
04 and (cid:15)
DNVRMS , D n <
400 N , it would bepossible to achieve reductions of up to 90% in the total number of simulations,while keeping (cid:15)
DNVMax , D n < .
15 and (cid:15)
FXMax , D n < To evaluate the impact of optimal querying, we compare the results for jointlyquerying all variables together for both uncertainty sampling and for randomsampling. Each random sampling case corresponds an uncertainty samplingcase with the same initial points. At each step, the we randomly sample withoutreplacement a case from U , otherwise it is the same as described in Section 4.2.Similar results can be obtained for the sampling performed in Section 4.1. Theresults can be seen in Figures 9: 23igure 9: Measures of deviation for DNVUF-201 and FX predictions in relationto n . The left-side plots show (cid:15) DNVRMS , D n and σ DNVRMS , D n for the active learningand random samplings for DNVUF-201. The right side plots show (cid:15) FXRMS , D n and σ FXRMS , D n for the active learning and random samplings for FX. The firstline refers to empty case, while the second and third lines refer to the mixtureand water filled cases. The curves correspond to the average over all the seedsand configurations, and shaded areas correspond to 95% confidence intervals.Sampling was performed jointly for all variables.The most notable feature of the results was that the average error for randomsampling have not only larger (cid:15) RMS , D n , but much larger spread than the activelearning results. This is not unexpected since it reasonable to consider that anefficient sampling criteria will be more consistent than random selection. Themuch higher spread shows that there was a highly uneven variety of results that24ould be obtained from the randomly sampled points; however, it was not thecase with the uncertainty sampling. This further corroborates the robustness ofthe results obtained from uncertainty sampling.In summary, it is clear that the average error also decreases with the increas-ing number, and the error per se was acceptable, thus allowing for a reasonableapproximation of U ; however, this result was inferior to that from the uncer-tainty sampling. This supports the use of direct random sampling or obtainingsome points through a design of experiment, and then feeding them directlyto a machine learning model, as mentioned in Section 1. The decrease in theRMS error comes purely from an increasing number of evaluations, which is thevolume effect mentioned earlier. To obtain results similar to the ones aimedat Section 4.2, one would need upwards of 250 evaluations, with is much lessefficient than uncertainty sampling. We validated the application of the technique of active learning in aidingengineering design of an oil and gas riser, over a set of loading conditions. Theproposed method lowered the execution cost required to acquire a reasonablepicture of the response of the physical system. We showed that it is feasible toobtain satisfactory results with 80% fewer simulations than that required by thecomplete set of cases, thus representing a five-fold reduction in the executiontime.We validated the methodology by comparing the results of the inference fromthe machine learning model against a reference set of performed simulations. Weshowed that the RMS and maximum errors can be robustly approximated andbounded by affine combinations of RMS and maximum implied uncertainty.This enables the use of the inference to create usable confidence intervals forthe yet-to-be-run simulations, which in turn can be chosen to run or not be run,thus allowing for the mentioned improvement in design time to take place.The methodology presented here is agnostic to the origin of the data fed tothe machine learning model; therefore, it is readily applicable to other cases,in which a range of potentially similar simulation must be run to assess thebehavior of a physical system in the context of a engineering design problem.The major difference for new cases lies in preparing the input variables that rep-resent the simulations parameters in a way that is appropriate for the machinelearning model to train and infer.We also note that classes of engineering design problems, which require ex-periments to be run instead of simulations, can also be addressed by this method-ology, because the machine learning model, as mentioned earlier, is agnostic tothe origin of the data, and the objective is to omit certain expensive, complex,or difficult steps of the design. This step is performed in silico in our case, butcould also be done in a laboratory or in a field system.The performance can, in principle, be further improved by adjusting the ma-chine learning model, such as by choosing a better covariance kernel in our case.25dditionally, parallelization of the sampling could possibly be achieved throughsome variation of Thompson sampling, enabling parallel execution of the ac-tive learning loop. Finally, alternative forms of normalization or assembling thefeature vectors may have additional positive effects on the performance of thealgorithm, and this is currently under investigation by the authors.
This work was partially supported by the National Council for Scientific andTechnological Development (CNPq - Conselho Nacional de DesenvolvimentoCientfico e Tecnolgico - Brazil). It was also financed by the Coordenao deAperfeioamento de Pessoal de Nvel Superior - Brasil (CAPES) - Finance Code001.The authors acknowledge the support provided by the Tecgraf Instituteof Technical-Scientific Software Development of PUC-Rio (Tecgraf/PUC-Rio),Brazil. We are also especially grateful to Dr. Ludimar Aguiar and Dr. MarcosA. Martins, from CENPES/Petrobras, for their constructive comments and use-ful discussions on riser design. The authors would also like to thank Dr. F´abioRamos for many fruitful discussions on machine learning and its applications.Any opinions, findings, conclusions, or recommendations expressed here arethose of the authors and do not necessarily reflect the views of the sponsors.
References [1] PHS Cardoso, NAG Casaprima, FV Senhora, and IFM Menezes. Optimiza-tion of catenary risers with hydrodynamic dampers.
Ocean Engineering ,184:134–142, 2019.[2] T Clausen and R D’Souza. Dynamic risers key component for deepwaterdrilling, floating production.
Offshore , 61(5):89–90, 2001.[3] Felix A. Faber, Alexander Lindmaa, O. Anatole von Lilienfeld, and RickardArmiento. Machine learning energies of 2 million elpasolite ( abC D ) crys-tals. Phys. Rev. Lett. , 117:135502, Sep 2016.[4] DNV GL. Dnvgl-st-f201 dynamic risers, January 2018.[5] Klaus Hasselmann, T. Barnett, E. Bouws, H. Carlson, D. Cartwright,K Enke, J Ewing, H Gienapp, D. Hasselmann, P. Kruseman, A Meerburg,Peter Muller, Dirk Olbers, K Richter, W. Sell, and H. Walden. Measure-ments of wind-wave growth and swell decay during the joint north sea waveproject (jonswap).
Deut. Hydrogr. Z. , 8:1–95, 01 1973.[6] Breno P Jacob, Marta CT Reyes, Beatriz SLP de Lima, Ana LFL Torres,Marcio M Mourelle, Renato Silva, et al. Alternative configurations for steelcatenary risers for turret-moored fpsos. In
The Ninth International Offshore nd Polar Engineering Conference . International Society of Offshore andPolar Engineers, 1999.[7] David D. Lewis and William A. Gale. A sequential algorithm for trainingtext classifiers. In Proceedings of the 17th Annual International ACM SI-GIR Conference on Research and Development in Information Retrieval ,SIGIR 94, page 312, Berlin, Heidelberg, 1994. Springer-Verlag.[8] Yue Liu, Tianlu Zhao, Wangwei Ju, and Siqi Shi. Materials discoveryand design using machine learning.
Journal of Materiomics , 3(3):159 –177, 2017. High-throughput Experimental and Modeling Research towardAdvanced Batteries.[9] D. J. C. MacKay. Information-based objective functions for active dataselection.
Neural Computation , 4(4):590–604, July 1992.[10] M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of threemethods for selecting values of input variables in the analysis of outputfrom a computer code.
Technometrics , 21(2):239, May 1979.[11] M. M. Mourelle, E. C. Gonzalez, B. P. Jacob, and F. L. L. Carneiro. Anflex- computational system for flexible and rigid riser analysis, internationalsymposium; 9th, offshore engineering. In
Offshore engineering, Interna-tional symposium; 9th, Offshore engineering , pages 441–458, Chichester,1995. Wiley;.[12] Travis E Oliphant.
A guide to NumPy , volume 1. Trelgol Publishing USA,2006.[13] Jitesh H. Panchal, Mark Fuge, Ying Liu, Samy Missoum, and ConradTucker. Special Issue: Machine Learning for Engineering Design.
Jour-nal of Mechanical Design , 141(11), 10 2019. 110301.[14] The pandas development team. pandas-dev/pandas: Pandas, February2020.[15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas-sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.
Journal of Machine Learning Research ,12:2825–2830, 2011.[16] Fernando P´erez and Brian E. Granger. IPython: a system for interactivescientific computing.
Computing in Science and Engineering , 9(3):21–29,May 2007.[17] J. L. Peterson, K. D. Humbird, J. E. Field, S. T. Brandon, S. H. Langer,R. C. Nora, B. K. Spears, and P. T. Springer. Zonal flow generation ininertial confinement fusion implosions.
Physics of Plasmas , 24(3):032702,2017. 2718] Carl Edward Rasmussen and Christopher K. I. Williams.
Gaussian Pro-cesses for Machine Learning (Adaptive Computation and Machine Learn-ing) . The MIT Press, 2005.[19] Burr Settles. Active learning literature survey. Computer Sciences Techni-cal Report 1648, University of Wisconsin–Madison, 2009.[20] C. E. Shannon. A mathematical theory of communication.
The Bell SystemTechnical Journal , 27(3):379–423, July 1948.[21] G. Gary Wang and S. Shan. Review of Metamodeling Techniques in Sup-port of Engineering Design Optimization.
Journal of Mechanical Design ,129(4):370–380, 05 2006.[22] Wes McKinney. Data Structures for Statistical Computing in Python. InSt´efan van der Walt and Jarrod Millman, editors,