Deep-learning-based surrogate flow modeling and geological parameterization for data assimilation in 3D subsurface flow
DDeep-learning-based surrogate flow modeling and geologicalparameterization for data assimilation in 3D subsurface flow
Meng Tang a,b, ∗ , Yimin Liu a,b , Louis J. Durlofsky a,b a
367 Panama Street, Stanford, CA, 94305 b Department of Energy Resources Engineering, Stanford University
Abstract
Data assimilation in subsurface flow systems is challenging due to the large number offlow simulations often required, and by the need to preserve geological realism in the cali-brated (posterior) models. In this work we present a deep-learning-based surrogate modelfor two-phase flow in 3D subsurface formations. This surrogate model, a 3D recurrent resid-ual U-Net (referred to as recurrent R-U-Net), consists of 3D convolutional and recurrent(convLSTM) neural networks, designed to capture the spatial-temporal information associ-ated with dynamic subsurface flow systems. A CNN-PCA procedure (convolutional neuralnetwork post-processing of principal component analysis) for parameterizing complex 3Dgeomodels is also described. This approach represents a simplified version of a recently de-veloped supervised-learning-based CNN-PCA framework. The recurrent R-U-Net is trainedon the simulated dynamic 3D saturation and pressure fields for a set of random ‘channel-ized’ geomodels (generated using 3D CNN-PCA). Detailed flow predictions demonstratethat the recurrent R-U-Net surrogate model provides accurate results for dynamic statesand well responses for new geological realizations, along with accurate flow statistics foran ensemble of new geomodels. The 3D recurrent R-U-Net and CNN-PCA procedures arethen used in combination for a challenging data assimilation problem involving a channel-ized system. Two different algorithms, namely rejection sampling and an ensemble-basedmethod, are successfully applied. The overall methodology described in this paper mayenable the assessment and refinement of data assimilation procedures for a range of realistic1 a r X i v : . [ phy s i c s . c o m p - ph ] J u l nd challenging subsurface flow problems. Keywords: surrogate model, deep learning, reservoir simulation, history matching, dataassimilation, inverse modeling
1. Introduction
History matching entails the calibration of geological model parameters such that flowsimulation predictions are in essential agreement with historical (observed) data. The pre-diction uncertainty associated with these calibrated (posterior) models is typically less thanthat from the initial (prior) models, which renders them more useful for reservoir/aquifermanagement. History matching involves many challenges, including the computational de-mands associated with the required flow simulations, and the need to preserve geologicalrealism in the posterior models. Thus this application will benefit from the developmentand use of accurate surrogate models for flow predictions and effective geological parame-terizations.In recent work, a surrogate flow model that applies a recurrent residual U-Net (R-U-Net)procedure was used in conjunction with a deep-learning-based geological parameterizationtechnique, referred to as CNN-PCA [1, 2], to successfully history match 2D channelizedsystems [3]. In this paper, we extend the surrogate dynamic flow model to handle 3Dsystems. The standalone CNN-PCA parameterization, which enables complex geomodels tobe described in terms of a small set of uncorrelated variables, has recently been extended totreat 3D systems [4]. Here we introduce a simplified yet effective variant of the general 3DCNN-PCA procedure. This parameterization is then combined with the 3D surrogate flowmodel for uncertainty quantification and history matching. ∗ Corresponding author
Email addresses: [email protected] (Meng Tang), [email protected] (Yimin Liu), [email protected] (Louis J. Durlofsky)
Preprint submitted to Computer Methods in Applied Mechanics and Engineering July 28, 2020 here have been a number of deep-neural-network architectures developed for surrogateflow modeling in subsurface systems. Zhu and Zabaras [5] devised a fully convolutionalencoder-decoder architecture to capture pressure and velocity maps for single-phase flowproblems characterized by random 2D Gaussian permeability fields. In later work, an au-toregressive strategy was integrated with a fully convolutional encoder-decoder architecturefor the prediction of time-dependent transport in 2D problems [6]. Tang et al. [3] proposeda combination of a residual U-Net with convLSTM to capture the saturation and pressureevolution in 2D oil-water problems, with wells operating under wellbore pressure control. Moet al. [7] applied a 2D deep residual dense convolutional neural network to predict solutionconcentration distributions in 3D groundwater flow problems. In this treatment, the 3D out-put is a stack of output from 2D convolutional channels. Within the context of optimization(rather than history matching), Jin et al. [8] presented an autoregressive embed-to-controlframework to predict well responses and dynamic state evolution with varying well controls,for a fixed geological description.The approaches cited above all require numerical simulation data to train the neuralnetwork. In physics informed surrogate models [9, 10, 11, 12, 13], by contrast, simulationdata are not needed for training. With these procedures the governing partial differentialequations (PDEs) are approximated by formulating PDE residuals, along with the initialand boundary conditions, as the objective function to be minimized by the neural network[14]. Issues with existing physics informed procedures may arise, however, if these methodsare applied for strongly heterogeneous models involving large numbers of wells (which act as‘internal’ boundary conditions), or with large sets of geomodels characterized by different de-tailed property distributions. Both of these capabilities are required for data assimilation inrealistic problems. Zhu et al. [15] proposed a physics-constrained surrogate model for uncer-tainty quantification with high-dimension discretized fields, where an approximate governingPDE residual is applied as the training objective. To date, this promising approach has only3een implemented for steady-state problems.Other recent applications of deep learning in subsurface flow settings include the use of aCNN-based surrogate model to predict permeability in tight rocks with strong heterogeneity[16], and the use of neural networks in the context of multicomponent thermodynamic flashcalculations [17]. The latter development could be quite useful in compositional reservoirsimulation, as a substantial fraction of the computational effort is often associated with thephase equilibria calculations.As noted earlier, the effective parameterization of geomodels is also an important compo-nent of many data assimilation procedures. With such a treatment, the detailed geologicalmodel (e.g., porosity and/or permeability in every grid block in the model) can be repre-sented with many fewer parameters than there are grid blocks. This means that a relativelysmall set of parameters must be determined during data assimilation, and that posterior ge-omodels will maintain geological realism. Tools based on deep-learning have been shown tobe applicable for such geological parameterizations. Specific approaches include those basedon variational autoencoders (VAEs) [18, 19] and generative adversarial networks (GANs)[20, 21, 22, 23, 24, 25, 26, 27]. Algorithms based on a combination of VAE and GAN havealso been devised [7]. CNN-PCA methods, which use CNNs to post-process models param-eterized using principal component analysis (PCA), have been developed for 2D systems[1, 2], and have recently been extended to 3D [4]. A variant of this 3D CNN-PCA methodwill be presented and applied in this paper.In this work we (1) extend the existing recurrent R-U-Net procedure to provide a surro-gate model for dynamic 3D two-phase flow problems, (2) present a simplified 3D CNN-PCAparameterization for binary ‘channelized’ geomodels, and (3) combine the deep-learning-based 3D surrogate flow model and geomodel parameterization to perform uncertainty quan-tification and data assimilation. The 3D recurrent R-U-Net involves a similar layout to thatused for 2D problems, though we now apply 3D CNN and convLSTM architectures. The4D recurrent R-U-Net surrogate flow model is trained using high-fidelity simulation resultsfor realizations generated by 3D CNN-PCA. The 3D CNN-PCA procedure applied hereinvolves the use of a supervised-learning approach to map PCA models to channelized re-alizations. The combined methodology is then applied for data assimilation. Two differenthistory matching algorithms – rejection sampling and ensemble smoother with multiple dataassimilation [28] – are implemented and assessed.This paper proceeds as follows. In Section 2 we present the equations governing two-phase subsurface flow and describe the well model used in this work. Next, in Section 3, wedescribe the simplified 3D CNN-PCA procedure used for geomodel parameterization. Thedeep-neural-network architecture and training setup for the 3D recurrent R-U-Net surrogateflow model is presented in Section 4. Detailed flow comparisons between model predictionsand high-fidelity simulation results appear in Section 5. The use of the 3D recurrent R-U-Netsurrogate and CNN-PCA for data assimilation is considered in Section 6. A summary andsuggestions for future work are provided in Section 7. Architecture details for 3D CNN-PCAand 3D recurrent R-U-Net are provided in the Appendix.
2. Governing Equations for Two-phase Flow
We consider the flow of two immiscible fluids in subsurface formations. The governingequations and problem setup are applicable for oil reservoir simulation (e.g., production of oilvia water injection), or for the modeling of an environmental remediation project involvingwater and a nonaqueous phase liquid (NAPL). Our terminology and notation are consistentwith the oil-water problem. The system is 3D and includes gravitational effects.Phases are denoted by j = o, w , where o indicates oil phase and w water phase. Dueto immiscibility, the oil and water components exist only in their respective phases. Mass5onservation is expressed as −∇ · ( ρ j v j ) − q w j = ∂∂t ( φρ j S j ) , j = o, w. (1)Here ρ j denotes phase density, v j is the phase (Darcy) velocity, q w j is the source/sink term(superscript w indicates well), t is time, φ is the rock porosity (volume fraction of thepore space), and S j is the phase saturation (volume fraction occupied by phase j , with S o + S w = 1). Darcy velocity v j is given by v j = − k k rj ( S j ) µ j ( ∇ p j − ρ j g ∇ z ) , (2)where k is the absolute permeability tensor (permeability can be viewed as a flow conduc-tivity), k rj is the relative permeability to phase j , µ j indicates phase viscosity, p j denotesphase pressure, g is gravitational acceleration, and z is depth. Here, k rj is usually a stronglynonlinear function of phase saturation S j , and we may also have µ j = µ j ( p j ). In this workwe neglect capillary pressure effects, which generally have minimal impact at field scales.This means we have p o = p w = p .Eqs. 1 and 2 are typically discretized using finite volume treatments, and the resulting setof nonlinear algebraic equations is solved using Newton’s method. The solution provides thepressure and saturation in all grid blocks in the model. The problem is complicated by thefact that permeability and porosity can be highly discontinuous in space (as different geologicfeatures may display very different properties), and because the saturation field developsshocks as a result of the near-hyperbolic character of the water transport equation. In thiswork, all high-fidelity simulations (HFS), i.e., the numerical solution of Eqs. 1 and 2, areperformed using Stanford’s Automatic Differentiation General Purpose Research Simulator,ADGPRS [29].Well responses (phase flow rates into or out of wells or wellbore pressure) are of primary6nterest in many subsurface flow problems, and these quantities typically comprise the keydata to be assimilated. Wells appear in the finite volume formulation through the discretizedrepresentation of the source term q w j . In this work we consider 3D models with verticalwells that penetrate multiple blocks. For a grid block containing a well (referred to as awell block), well rates are related to the pressure and saturation in the block through thePeaceman model [30]: (cid:0) q w j (cid:1) i = WI i (cid:18) k rj ρ j µ j (cid:19) i ( p − p w ) i . (3)Here (cid:0) q w j (cid:1) i is the source/sink term for phase j in block i , p i is well-block pressure, p w i is thewellbore pressure in block i , and WI i is the well index in block i . The well index is given byWI i = 2 πk i ∆ z ln r r w , (4)where k i the grid block permeability (assumed to be isotropic), ∆ z is the block thickness, r w is the wellbore radius, and r = 0 . x , where ∆ x is the block dimension in the x -direction(here we assume ∆ x = ∆ y ). More general expressions for WI i exist for cases with anisotropicpermeability and/or ∆ x (cid:54) = ∆ y .For a vertical well that penetrates multiple blocks, the wellbore pressure p w i varies withdepth z as a result of gravitational effects. Wellbore pressure at the first (uppermost)perforation is referred to as the bottom-hole pressure (BHP). The wellbore pressure in theblock below, p w i +1 , is computed using p w i +1 = p w i + ρ i,i +1 g ∆ z i,i +1 . (5)Here ρ i,i +1 denotes the fluid mixture density between neighboring well-blocks i and i + 1 and∆ z i,i +1 is the depth difference. The fluid mixture density can be approximated based on theindividual phase densities and the phase flow rates. See [31] for further details.7n all flow results presented in this paper, the treatment described above is applied tocompute well rates for both the surrogate model and for the ADGPRS high-fidelity simula-tions. This procedure has been implemented as a standalone module that accepts BHP, gridgeometry and well-block properties and simulation results (well-block permeability, pressureand saturation) as input, and provides (cid:0) q w j (cid:1) i , for j = o, w , for all well blocks i , as output.For the ADGPRS runs we could also use the well model in the simulator, which gives verysimilar results. To assure identical well-model treatments, however, we compute ADGPRS(HFS) well rates from the global simulation results using Eqs. 3, 4 and 5.
3. 3D CNN-PCA Low-dimensional Model Representation
In recent work, Liu and Durlofsky [4] introduced 3D CNN-PCA for the parameterizationof complex geomodels in 3D. Here we apply a simplified yet effective variant of this generalapproach, which performs very well for the geomodels considered in this study.The 3D CNN-PCA method represents an extension of the earlier 2D procedure [1].The method post-processes low-dimensional PCA (principal component analysis) modelssuch that large-scale geological structures, characterized in terms of multiple-point spatialstatistics, are recovered. Parameterizations of this type are very useful in the context ofuncertainty quantification and data assimilation, as we will see later.The method developed in [4] includes three loss terms – reconstruction loss, style lossand hard data loss. Style loss can be important for large models with few wells (and thuslimited conditioning data), as it penalizes realizations that do not include the requisitelarge-scale features. In the system considered here, however, there is a sufficient amountof conditioning data, and models developed using only reconstruction loss and hard dataloss are fully adequate. Therefore, here we present this simplified version of 3D CNN-PCA,which includes only these two losses.A geological realization can be represented as a vector m ∈ R n b of geologic variables,8here n b is the number of grid blocks in the model. Geological parameterizations map m to a new low-dimensional variable ξ ∈ R l , where l (cid:28) n b . The first step in our procedureis to construct a PCA parameterization of the geomodel. As described in detail in [1], thisentails generating a set of n r realizations of m using geomodeling tools such as Petrel [32],and assembling these realizations in a centered data matrix Y ∈ R n b × n r , Y = 1 √ n r − m − ¯ m m − ¯ m · · · m n r − ¯ m ] , (6)where m i ∈ R n b represents realization i and ¯ m ∈ R n b is the mean of the n r realizations. Notethat these realizations are all conditioned to available data at well locations (this so-called‘hard’ data could correspond to rock type, porosity, and/or permeability). Singular valuedecomposition of Y allows us to write Y = U Σ V (cid:124) , where U ∈ R n b × n r and V ∈ R n r × n r areleft and right singular matrices, and Σ ∈ R n r × n r is a diagonal matrix containing the singularvalues.New (PCA) realizations m pca ∈ R n b can be generated through application of m pca = ¯ m + U l Σ l ξ l , (7)where U l ∈ R n b × l contains the l columns in U associated with the largest singular valuesand Σ l ∈ R l × l contains the l largest singular values. Standalone PCA provides accurategeomodels m pca when the initial realizations m i follow a multi-Gaussian distribution. Withmore complex systems such as channelized models, which are characterized by multipointspatial statistics, PCA does not fully preserve the features appearing in m i .In CNN-PCA, we post-process PCA models using a deep convolutional neural network(CNN) to better preserve the complex spatial correlations. This post-processing, referred to9s a model transform net, is represented as m cnnpca = f W ( m pca ) , (8)where f W indicates the model transform net and the subscript W denotes the trainableweights. Here we apply a supervised-learning approach to train the 3D model transformnet.Our method proceeds as follows. Given an original geomodel m , we map m onto thefirst ˆ l principal components, ˆ ξ ˆ l = Σ − l U T ˆ l ( m − ¯ m ) . (9)The model m can then be approximately reconstructed usingˆ m pca = ¯ m + U ˆ l Σ ˆ l ˆ ξ ˆ l , (10)where ˆ m pca ∈ R n b is the reconstructed PCA model, and ˆ l is a low-dimensional variable suchthat ˆ l < l (for reasons explained below). The difference between ˆ m pca and the corresponding m can be viewed as the reconstruction error. The goal of the supervised training is todetermine the parameters in the transform net f W such that this error is minimized. Inother words, we seek to minimize the difference between f W ( ˆ m pca ) and m .We use the same n r realizations of m as were used for constructing the PCA represen-tation to generate the corresponding ˆ m i pca models. This forms the training set, with eachpair of ( m i , ˆ m i pca ), i = 1 , . . . , n r , representing one training sample. The supervised learningtraining loss for training sample i is defined as L i rec = || f W ( ˆ m i pca ) − m i || . (11)This quantity is referred to as the reconstruction loss.10uring the test stage, 3D CNN-PCA will be used to generate geomodels for historymatching. This entails post-processing new PCA models m pca ( ξ l ), with ξ l determined bythe history matching algorithm, or sampled from the standard normal distribution. By usingˆ l < l during training, the resulting reconstruction error is more representative of what willbe encountered at the test stage, where we do not have corresponding pairs ( m i , ˆ m i pca ). Wefound this approach to lead to better performance at test time, when f W is applied for newPCA models. Note that in [4], an additional perturbation is applied in Eq. 10. This doesnot appear to be required for the system considered here, possibly because our model isslightly less challenging than those considered in [4], as discussed earlier.Hard data loss is also required to ensure that hard data at well locations are honored.This loss is of the form L i h = [ h T ( m i − f W ( ˆ m i pca )] , where h is an indicator vector, with h i = 1 indicating the presence of hard data at grid block i , and h i = 0 the absence of harddata. Note that hard data are captured in the PCA models, but without the inclusion of L i h these data would not necessarily be maintained in the m cnnpca models. The final trainingloss for training sample i is now L i = L i rec + γ h L i h , (12)where γ h is a weighting factor for the hard data loss. We determine γ h by considering arange of values and selecting a value that is sufficiently large such that essentially all harddata are honored in the output models. Specifically, in the case considered in this paper, γ h is set such that, over all new 3D CNN-PCA geomodels, hard data in well blocks are honoredmore than 99.9% of the time.Construction of the 3D model transform net entails replacing the 2D convolutional layers,upsampling layers and padding layers in the 2D model transform net [1] with their 3D coun-terparts. This can be readily accomplished using the Pytorch deep-learning framework [33].11he architecture details of the 3D CNN-PCA transform net are provided in the Appendix.The underlying geomodels used in this work are defined on grids containing 40 × × n r = 3000 realizations of m are generated using Petrel. Fig. 1 displays mapsof rock type (referred to as facies) for three of these realizations. Although the three modelsresemble one another in terms of their channelized geology, they clearly differ in the detailedchannel locations and the resulting ‘linkages’ between injection and production wells. Forexample, in the model in Fig. 1b, wells I2 and P3 are connected by a high-permeabilitychannel, while in Fig. 1a these wells are not connected. These differences in connectivityresult in very different flow behavior.The n r = 3000 realizations of m are used to construct the PCA representation and thecorresponding ˆ m pca models. Based on limited numerical experimentation, we use a reduceddimension of l = 200 for the new PCA models and a dimension of ˆ l = 50 for the reconstructedPCA models. The weighting factor for hard data loss is γ h = 0 .
02. We use a batch size of 4and train the model transform net for 10 epochs. This requires about 30 minutes using oneTesla V100 GPU.Three random test-set CNN-PCA models are shown in Fig. 2. The CNN-PCA faciesmaps are thresholded such that the average rock-type proportions match those in the originalPetrel models. The 3D CNN-PCA realizations display channel features (width, sinuosity) inapproximate visual agreement with those in Fig. 1. We also observe variability in channelconnectivity between realizations. In fact, for an oil-water flow problem defined for thismodel, the resulting flow statistics for 200 random CNN-PCA realizations were found to bein very close agreement with those from 200 reference Petrel geomodels. These results, not12resented here for brevity, demonstrate that the level of geological variability, in terms ofits impact on flow, is essentially the same between the two sets of models.
Petrel models 3 (a)
Petrel models 3 (b)
Petrel models 3 (c)
Figure 1: Petrel realizations honoring rock-type data at the six wells. Black cylinders denote productionwells, and white cylinders denote injection wells. (a) (b) (c)
Figure 2: 3D CNN-PCA realizations honoring rock-type data at the six wells. Model in (a) is used in the 3Drecurrent R-U-Net assessment below. Model in (b) is the true model used in the history matching example.Black cylinders denote production wells, and white cylinders denote injection wells.
4. Recurrent R-U-Net Surrogate Model and Training
In this section, we present the detailed 3D recurrent R-U-Net surrogate model and train-ing setup. This surrogate model is intended to provide predictions for the saturation andpressure fields, at particular time steps, for any (random) geological realization generatedby the 3D CNN-PCA procedure described in Section 3.13 .1. Recurrent R-U-Net Architecture
In the context of history matching or uncertainty quantification, many simulation runsmust be performed to solve the discretized versions of Eq. 1 and 2 with different geomodels m . We express a simulation run in this setting as x = f ( m ) , (13)where f denotes the reservoir simulator and x ∈ R n b × n t represents a dynamic state. Here n b indicates the number of grid blocks and n t the number of time steps at which solutionvariables are predicted (here we take n t = 10, which is much less than the total number oftime steps in the simulation). Because in this work we consider oil-water systems, x containseither water saturation or pressure variables. We use x t ∈ R n b to denote the saturation orpressure field at time step t .The 3D recurrent R-U-Net, illustrated in Fig. 3, is applied to replace the expensivenumerical simulation f . This model is denoted as ˆ x = ˆ f ( m ), where ˆ f indicates the surrogateoperator and ˆ x ∈ R n b × n t denotes the predicted states from the surrogate model. Therecurrent R-U-Net consists of a 3D residual U-Net [34], which itself entails encoding anddecoding nets, and convLSTM neural networks [35]. The encoding net extracts featuremaps F to F from the input geomodel m . Of these feature maps, F represents m inthe most compressed manner. For this reason, F is propagated in time to provide featurerepresentations of the state maps (i.e., we have F , · · · , F n t ) by the 3D convLSTM neuralnetwork. Combined with extracted local features F to F , the propagated F t ( t = 1 , · · · , n t )is upsampled to the corresponding (approximated) state map ˆ x t ∈ R n b by the decoding net.The 3D residual U-Net (R-U-Net) is the key module for capturing the spatial correlationsbetween geological properties and the predicted states. A schematic of the 3D R-U-Net isshown in Fig. 4. This figure illustrates how local features F to F are combined with14 (cid:1824)(cid:3548) (cid:2869) (cid:1824)(cid:3548) (cid:2870) (cid:1824)(cid:3548) (cid:3041) (cid:3295) ... Figure 3: Recurrent R-U-Net architecture incorporating 3D convLSTM into the R-U-Net (see Appendix fordetailed specifications). Here the convLSTM net takes the global feature map F from the encoding netand generates a sequence of feature maps F t ( t = 1 , . . . , n t ). These are decoded, separately, into a sequenceof predictions for the states ˆ x t ( t = 1 , . . . , n t ) using the same decoding net (figure modified from [3]). ncoding Net Decoding Net … …… … F (cid:4666)𝐦(cid:4667) 𝐱(cid:3548) 𝐦 F (cid:4666)𝐦(cid:4667) F (cid:4666)𝐦(cid:4667) ConcatenateConcatenate … Figure 4: Schematic illustration of 3D residual U-Net (R-U-Net) (detailed architecture is provided in theAppendix). This network entails encoding and decoding nets. Local features F to F extracted in theencoding net are concatenated with the upsampled features in the decoding net to generate predictions forthe states. F during upsampling. The R-U-Net output is the target state(either pressure or saturation) at a particular time step. The concatenation of local featuresextracted in the encoding net to the corresponding upsampled features in the decoding netfacilitates the transfer of multiscale information, which leads to improved state predictions.The 3D convLSTM net enables the surrogate model to capture temporal dynamics.This network effectively incorporates the advantages of LSTM [36] in modeling temporalevolution, while maintaining the spatial information extracted by CNNs. The internal cellstate C t in the 3D convLSTM net carries the spatial-temporal information flow. At time t ,this quantity is given by C t = f t ◦ C t − + i t ◦ ˜ C t , (14)where the forget gate f t controls which information to discard from the previous cell state C t − , the input gate i t specifies which information to update from the new candidate cellstate ˜ C t , and ◦ denotes the Hadamard product. The output state at time t , H t , is computedfrom C t and the output gate o t via H t = o t ◦ tanh( C t ) , (15)where o t controls which information in the cell state C t is transferred to the output state H t .The three gates f t , i t and o t , as well as the new candidate cell state ˜ C t , are determinedfrom the previous output state H t − and the current input χ t using the following expressions f t = σ ( W xf ∗ χ t + W hf ∗ H t − + b f ) , (16) i t = σ ( W xi ∗ χ t + W hi ∗ H t − + b i ) , (17)17 t = σ ( W xo ∗ χ t + W ho ∗ H t − + b o ) , (18)˜ C t = tanh( W xc ∗ χ t + W hc ∗ H t − + b c ) , (19)where W and b are 3D convolution kernel weight and bias terms. The parameters associatedwith these quantities, which are tuned during the training process, are shared across con-vLSTM cells. This reduces the number of tunable parameters that must be learned duringtraining.The detailed architecture of the overall recurrent R-U-Net is provided in the Appendix.The architectures of the encoding and decoding nets are very similar to those described in[3], with 2D convolutional kernels replaced with 3D convolutional kernels. This combinationof convolutional and recurrent neural network architectures enables the surrogate model toaccurately capture spatial-temporal information in the subsurface flow problems of interest. The recurrent R-U-Net is trained using saturation and pressure solutions from a set ofhigh-fidelity simulation runs. We now describe the simulation setup.The models contain 40 × ×
20 grid blocks, with each block of dimensions 20 m ×
20 m × x , y and z directions. The well locations are shown in Figs. 1 and 2. Oneinjector and two of the producers are completed (open to flow) in the top 10 layers, whilethe other injector and two producers are completed in the bottom 10 layers. The rock typefor grid block i is denoted by m i ( m i = 1 for sand and m i = 0 for mud). For grid blockscorresponding to sand, we specify k i = 2000 md and φ i = 0 .
25, and for those correspondingto mud, k i = 20 md and φ i = 0 . and 1038 kg/m , respectively, at reservoir conditions.As discussed in [3], problems with BHPs specified are often more challenging for surrogatemodels than those with fixed rates. This is because, with BHPs prescribed, different volumesof fluid (as a function of time) are injected into, and flow through, each geomodel. Thisadditional variability, along with the high degree of sensitivity in the well rates to well-blockquantities, evident from Eq. 3, acts to significantly complicate the surrogate model.The high-fidelity simulations are all performed using ADGPRS [29]. Well rates arecomputed from the states and wellbore pressure in each well block using the proceduredescribed in Section 2 (see Eqs. 3, 4 and 5). The problem setup, fluid properties and wellsettings described here are used in all flow simulations in this paper. S w k r k rw k ro Figure 5: Oil-water relative permeability curves used in all simulations.
A total of 2500 simulation runs are performed, and the resulting HFS saturation andpressure solutions are used for training. The neural network parameters are tuned, in anautomated manner, to minimize the L p norm of the difference between the recurrent R-U-Net output ˆ x and the simulation reference x . Here ˆ x ti and x ti denote either saturation orpressure in every grid block, at time t , for training sample i . Training thus entails finding19he tunable network parameters θ (with the optimum denoted by θ ∗ ) via θ ∗ = argmin θ n s n t n s (cid:88) i =1 n t (cid:88) t =1 || ˆ x ti − x ti || pp + λ n s n t n w n s (cid:88) i =1 n t (cid:88) t =1 n w (cid:88) w=1 || ˆ x t, w i − x t, w i || pp . (20)Here, n s denotes the number of training samples, n t indicates the number of time stepsconsidered, and n w is the number of well blocks. The second term on the right-hand sideof Eq. 20 acts to add weight (with weighting factor λ ) to well-block quantities, indicatedby superscript w. As discussed earlier, these values are essential for predicting well rates,which are the primary quantities used for data assimilation.We found that better predictions of the global saturation and pressure fields wereachieved by using the L norm for saturation and the L norm for pressure. Therefore,two separate recurrent R-U-Nets are used in this work. We use a batch size of 8 and traineach of the recurrent R-U-Nets for 260 epochs using the Adam optimizer [37] with a learningrate decay schedule. These trainings each require about 7 hours using a Tesla V100 GPU.As discussed in [3], training could also be accomplished using a single network, with ap-propriate (tuned) weights for the pressure and saturation losses. The use of multiple GPUswould accelerate training.Proper data preparation and scaling contribute to the effective training of the recurrentR-U-Net. In this work, we use the same data pre-processing technique as described in [3].Specifically, the input binary geomodel is naturally represented by rock-type block values (0denotes mud and 1 denotes sand), and saturation maps are physically constrained between0 and 1, so these sets of values are used directly. Pressure maps are normalized at each timestep based on the maximum and minimum grid-block pressures observed at that time stepin the high-fidelity simulation. We found this detrending treatment to lead to better overallperformance than normalization based on the time-invariant injector and producer BHPs.See [3] for details. 20 . Evaluation of Recurrent R-U-Net Performance A total of 400 new CNN-PCA geomodels (test cases) are used to evaluate the performanceof the 3D recurrent R-U-Net. The surrogate model predicts pressure and saturation fieldsat n t = 10 time steps (50, 100, 150, 300, 400, 500, 600, 700, 850 and 1000 days). We firstpresent predictions for saturation at three particular time steps, 50, 400 and 1000 days, fora single geomodel. This geomodel, shown in Fig. 2(a), is characterized by saturation andpressure prediction errors (quantified below) that are slightly greater than the median errorsover the 400 test cases. Thus we view this case as representative.The predicted and reference saturation fields are shown in Fig. 6. The top row displaysthe recurrent R-U-Net 3D state map predictions and the bottom row shows the high-fidelity(ADGPRS) simulation results. The progress of the saturation front is, as expected, stronglyimpacted by the high-permeability channels. The 3D recurrent R-U-Net is able to capturethe evolution of the saturation field reasonably accurately, though some error is evident.Differences can be observed, for example, in the top layer, in the progress of the frontoriginating from injector I1 (compare Fig. 6(c) to Fig. 6(f)). We also see some water in thevicinity of producer P2 in Fig. 6(c) that does not appear in Fig. 6(f). We note finally thatthe 3D aspect of the saturation field appears to be predicted accurately, as is evident fromthe cross-sectional views in Fig. 6.Because the global pressure field changes only slightly in time, instead of showing theevolution of pressure for a single realization, we instead present the pressure fields for threedifferent test cases. These results are displayed in Fig. 7. All results are at 400 days. Theglobal pressure fields for the three different realizations show significant differences, thoughthe surrogate model provides an accurate visual 3D representation in all cases.In addition to visual comparisons, it is of interest to quantify the errors associated withthe saturation and pressure predictions. We thus compute relative saturation and pressure21 a) 50 days (surr) (b) 400 days (surr) (c) 1000 days (surr)(d) 50 days (sim) (e) 400 days (sim) (f) 1000 days (sim) Figure 6: Saturation fields from 3D recurrent R-U-Net surrogate model (top row) and high-fidelity simulator(bottom row) at three different times. (a) Realization 1 (surr) (b) Realization 2 (surr) (c) Realization 3 (surr)(d) Realization 1 (sim) (e) Realization 2 (sim) (f) Realization 3 (sim)
Figure 7: Pressure maps from 3D recurrent R-U-Net surrogate model (top row) and high-fidelity simulator(bottom row) for three different test-case realizations. All results are at 400 days. Saturation results inFig. 6 are for Realization 1, shown in Fig. 2(a). δ S = 1 n e n b n t n e (cid:88) i =1 n b (cid:88) j =1 n t (cid:88) t =1 (cid:13)(cid:13)(cid:13) ( ˆ S w ) ti,j − ( S w ) ti,j (cid:13)(cid:13)(cid:13) ( S w ) ti,j , (21) δ p = 1 n e n b n t n e (cid:88) i =1 n b (cid:88) j =1 n t (cid:88) t =1 (cid:13)(cid:13) ˆ p ti,j − p ti,j (cid:13)(cid:13) p ti, max − p ti, min , (22)where n e = 400 is the number of test models considered and p ti, max and p ti, min are the maxi-mum and minimum grid block pressures for test-case i at time step t . The denominator inEq. 21 is well behaved since the initial (and minimum) saturation is 0.1.Applying Eqs. 21 and 22, we obtain δ S = 5 .
7% and δ p = 0 . n t = 10 time steps, though in these and subsequent figures the points are connected linearlyto provide continuous curves. The general level of agreement in well rates is quite close, andtrends and water breakthrough (i.e., appearance of water at production wells) are capturedwith reasonable accuracy by the surrogate model. Slight discrepancies are apparent in someinstances, however, such as in late-time water rate in Fig. 8(b) and (h).We now present statistical results for well responses over the full set of test cases. TheP , P and P oil and water production rates, evaluated over all 400 test cases, are shownin Fig. 9. The P (solid) curves depict the median response at each time step, while the P
00 400 600 800 1000Days0100200300400500 O il r a t e ( m / D a y ) surrogatesim (a) P1 oil rate
200 400 600 800 1000Days0200400600 W a t e r r a t e ( m / D a y ) surrogatesim (b) P1 water rate
200 400 600 800 1000Days0100200300 O il r a t e ( m / D a y ) surrogatesim (c) P2 oil rate
200 400 600 800 1000Days020406080100 W a t e r r a t e ( m / D a y ) surrogatesim (d) P2 water rate
200 400 600 800 1000Days050100150200 O il r a t e ( m / D a y ) surrogatesim (e) P3 oil rate
200 400 600 800 1000Days050100150 W a t e r r a t e ( m / D a y ) surrogatesim (f) P3 water rate
200 400 600 800 1000Days0100200300400500 O il r a t e ( m / D a y ) surrogatesim (g) P4 oil rate
200 400 600 800 1000Days0200400600800 W a t e r r a t e ( m / D a y ) surrogatesim (h) P4 water rate Figure 8: Comparison of oil (left) and water (right) production rates, for all four production wells, forthe geomodel shown in Fig. 2(a). Red and black curves represent results from the 3D recurrent R-U-Netsurrogate model and high-fidelity simulation, respectively.
00 400 600 800 1000Days100200300 O il r a t e ( m / D a y ) surrogatesim (a) P1 oil rate
200 400 600 800 1000Days0100200300400500 W a t e r r a t e ( m / D a y ) surrogatesim (b) P1 water rate
200 400 600 800 1000Days200300400500600 O il r a t e ( m / D a y ) surrogatesim (c) P2 oil rate
200 400 600 800 1000Days0200400600800 W a t e r r a t e ( m / D a y ) surrogatesim (d) P2 water rate
200 400 600 800 1000Days200400600 O il r a t e ( m / D a y ) surrogatesim (e) P3 oil rate
250 500 750 1000Days050010001500 W a t e r r a t e ( m / D a y ) surrogatesim (f) P3 water rate
200 400 600 800 1000Days100200300400500600 O il r a t e ( m / D a y ) surrogatesim (g) P4 oil rate
200 400 600 800 1000Days0200400600800 W a t e r r a t e ( m / D a y ) surrogatesim (h) P4 water rate Figure 9: Comparison of oil (left) and water (right) production rate statistics, for all four production wells,over the full ensemble of 400 test cases. Red and black curves represent results from the recurrent R-U-Netsurrogate model and high-fidelity simulation, respectively. Solid curves correspond to P results, lower andupper dashed curves to P and P results. (dashed) curves show the responses corresponding to the 10th and 90th percentiles.Agreement in these statistical quantities is seen to be very close, which again demonstratesthe accuracy of the surrogate model. It is noteworthy that accuracy is maintained over largeranges in rates, as is particularly evident in the water rate plots in Fig. 9.The average relative errors in oil and water production rates, across all production wellsand over all time steps, are given by δ r,j = 1 n e n p n t n e (cid:88) i =1 n p (cid:88) k =1 n t (cid:88) t =1 (cid:13)(cid:13)(cid:13) (ˆ r j ) ti,k − ( r j ) ti,k (cid:13)(cid:13)(cid:13) ( r j ) ti,k + (cid:15) , (23)where n p denotes the number of production wells, (ˆ r j ) ti,k denotes the phase ( j = o, w )production rate from the surrogate model for well k at time step t in test sample i , and ( r j ) ti,k denotes the corresponding HFS result. A constant (cid:15) = 10 is introduced in the denominatorto avoid division by very small values. Over n e = 400 test samples, the relative errors foroil and water production rates are found to be δ r,o = 6 .
1% and δ r,w = 8 . δ r,w = 5 . δ r,o is about the same. As we will see, this level of accuracy is indeed sufficient forthe data assimilation studies considered in the next section.
6. Use of Deep-Learning Procedures for History Matching
We now apply the 3D recurrent R-U-Net surrogate model for a challenging data as-similation problem. Two different history matching algorithms are considered – a rigorousrejection sampling (RS) procedure, and ensemble smoother with multiple data assimila-tion (ES-MDA). The 3D CNN-PCA algorithm is used in both cases to generate the geo-models evaluated by the surrogate flow model.In RS, 3D CNN-PCA is used only to provide a very large number of prior models; i.e., ξ isnot manipulated to achieve a history match. Overall timing is reduced since the generation26f prior models is much faster with 3D CNN-PCA than with geological modeling software.In the ES-MDA framework, by contrast, the low-dimensional variable ξ is updated by thehistory matching algorithm. The use of CNN-PCA in this setting ensures that posteriormodels (for any ξ ) are consistent geologically with the original set of object-based Petrelrealizations. Rejection sampling can be used to provide a reliable reference for posterior uncertainty.RS is usually very expensive computationally, however, as it requires generating and eval-uating an extremely large number of prior samples. Thus the application of RS is clearlyfacilitated through the use of the 3D recurrent R-U-Net surrogate model and CNN-PCA.RS accepts or rejects prior samples independently, thus assuring that the accepted modelsare independent. In this work, we combine the RS approach described in [38] with ourdeep-learning-based treatments. The overall workflow is as follows: • Sample the low-dimensional variable ξ ∈ R l from its prior distribution, given by N ( µ ξ , C ξ ). Construct m cnnpca ( ξ ) using CNN-PCA. • Sample a probability p from a uniform distribution in [0 , • Compute the likelihood function L ( m cnnpca ( ξ )), given by L ( m cnnpca ( ξ )) = c exp (cid:18) −
12 [ ˆ f ( m cnnpca ( ξ )) − d obs ] (cid:124) C − D [ ˆ f ( m cnnpca ( ξ )) − d obs ] (cid:19) , (24)where c is a normalization constant, ˆ f ( m cnnpca ( ξ )) indicates the surrogate model pre-dictions for well rates for geomodel m cnnpca ( ξ ), d obs denotes the observed well ratedata, and C D is the covariance matrix of data measurement error. • Accept m cnnpca ( ξ ) if p ≤ L ( m cnnpca ( ξ )) S L , where S L is the maximum likelihood value overall prior models considered. 27
00 400 600 800 1000Days100200300 P o il r a t e ( m / d a y ) observedtrueposteriorprior (a) P1 oil rate
200 400 600 800 1000Days0100200300400500 P w a t e r r a t e ( m / d a y ) observedtrueposteriorprior (b) P1 water rate
200 400 600 800 1000Days200300400500600 P o il r a t e ( m / d a y ) observedtrueposteriorprior (c) P2 oil rate
200 400 600 800 1000Days0200400600800 P w a t e r r a t e ( m / d a y ) observedtrueposteriorprior (d) P2 water rate
200 400 600 800 1000Days200400600 P o il r a t e ( m / d a y ) observedtrueposteriorprior (e) P3 oil rate
250 500 750 1000Days050010001500 P w a t e r r a t e ( m / d a y ) observedtrueposteriorprior (f) P3 water rate
200 400 600 800 1000Days100200300400500600 P o il r a t e ( m / d a y ) observedtrueposteriorprior (g) P4 oil rate
200 400 600 800 1000Days0200400600800 P w a t e r r a t e ( m / d a y ) observedtrueposteriorprior (h) P4 water rate Figure 10: Oil (left) and water (right) production rates for all four production wells. Gray regions representthe prior P –P range, red circles and curves denote observed and true data, and blue dashed curves denotethe RS posterior P (lower), P (middle) and P (upper) predictions. Vertical dashed line indicates thelatest time at which data are collected. .2. Problem Setup and Rejection Sampling Results The ‘true’ model considered here, which is shown in Fig. 2(b), is a randomly selected 3DCNN-PCA realization. Recall that all realizations (including the true model) are conditionedto hard data at well locations. The true data d true are obtained by performing high-fidelitysimulation using the true geomodel. The observed data d obs comprise the true data withrandom error added d obs = d true + (cid:15) , (25)where (cid:15) is the measurement error vector, with all components taken to be Gaussian randomvariables with mean of zero and covariance consistent with the C D matrix defined above. (a) (b) (c) Figure 11: Three (randomly selected) 3D CNN-PCA realizations accepted by the rejection sampling proce-dure.
With RS, very few models will be accepted if we attempt to match a large number ofdata points, particularly if data error is small. Therefore, we use data at only two time steps,specifically at day 100 and day 150. Oil and water production rates are collected at the fourproducers, which means we have a total of 16 observations. The standard deviation of themeasurement error is set to 15% of the corresponding true data. One million prior modelsare generated using CNN-PCA. The 3D recurrent R-U-Net is then applied to provide wellresponse predictions for all of these models. The generation of 10
3D CNN-PCA modelsrequires about 3 hours, and the 10 surrogate model predictions require about 8 hours. Weestimate that high-fidelity simulation for these 10 models would require about 170,000 hours29
00 400 600 800 1000
Days P o il r a t e ( m / d a y ) sim posteriorsurr posteriorprior (a) P1 oil rate
200 400 600 800 1000Days0100200300400500 P w a t e r r a t e ( m / d a y ) sim posteriorsurr posteriorprior (b) P1 water rate
200 400 600 800 1000Days200300400500600 P o il r a t e ( m / d a y ) sim posteriorsurr posteriorprior (c) P2 oil rate
200 400 600 800 1000Days0200400600800 P w a t e r r a t e ( m / d a y ) sim posteriorsurr posteriorprior (d) P2 water rate
200 400 600 800 1000Days200400600 P o il r a t e ( m / d a y ) sim posteriorsurr posteriorprior (e) P3 oil rate
250 500 750 1000Days050010001500 P w a t e r r a t e ( m / d a y ) sim posteriorsurr posteriorprior (f) P3 water rate
200 400 600 800 1000Days100200300400500600 P o il r a t e ( m / d a y ) sim posteriorsurr posteriorprior (g) P4 oil rate
200 400 600 800 1000Days0200400600800 P w a t e r r a t e ( m / d a y ) sim posteriorsurr posteriorprior (h) P4 water rate Figure 12: Oil (left) and water (right) production rates for all four production wells. Gray regions representthe prior P –P range, blue dashed curves denote surrogate model posterior P (lower), P (middle)and P (upper) predictions, red dashed curves denote high-fidelity simulation posterior P (lower), P (middle) and P (upper) predictions. Vertical dashed line indicates the latest time at which data arecollected. Surrogate model posterior samples determined using RS.
30f serial computation (as each HFS requires around 10 minutes). A total of 151 (out of 10 )models are accepted by the RS procedure.The history matching results using this procedure are shown in Fig. 10. The gray areasdisplay the P -P interval for the prior models (i.e., 80% of the prior model results fallwithin the gray regions). The red points and curves denote the observed data and the truemodel flow response. The blue dashed curves indicate the P (lower), P (middle) and P (upper) posterior RS results.The posterior P –P ranges are in most cases significantly smaller than the prior P –P ranges. For this reason, predictions based on the posterior models would be expectedto be much more useful than prior forecasts. It is evident that the posterior uncertaintyranges cover the true well responses even when these results are on the edge of the priorP -P interval, e.g., the well P2 oil rate at late time (Fig. 10(c)). We also see that theposterior predictions result in significant uncertainty reduction in well P1 and P2 water rates(Fig. 10(b) and (d)), even though water production at these wells has not yet occurred atday 150.In Fig. 11 we present three of the realizations accepted by the RS procedure. Theserealizations all show a lack of connectivity (through sand) between injector I1 and producerP2, at least in the layers that are visible in these images. This is consistent with the true 3DCNN-PCA model (shown in Fig. 2(b)) and with the low water production rates in Fig. 10(d).We also see that injector I2 is connected to producer P3 through sand in all three posteriorrealizations. This is again consistent with the true model, and with the early breakthroughand high water production rates in P3, evident in Fig. 10(f).In the results above, realizations are accepted or rejected based on the surrogate flowmodel predictions. It is useful to evaluate the accepted (posterior) models by applying HFSto assure that these models do indeed provide numerical simulation results in agreementwith observed data. We therefore simulate the 151 models accepted by RS using ADGPRS.31esults for oil and water production rates for the four producers are shown in Fig. 12. Therewe see very close agreement between the recurrent R-U-Net predictions (blue dashed curves)and the high-fidelity simulation predictions (red dashed curves). This close correspondencebetween 3D recurrent R-U-Net and HFS results for posterior models clearly demonstratesthe applicability of the 3D recurrent R-U-Net for this challenging history matching problem. Ensemble smoother with multiple data assimilation (ES-MDA) [28] is an iterative formof the ensemble smoother [39] procedure. ES-MDA breaks the single global update appliedin ES into multiple data assimilation steps. In ES-MDA, observed data are perturbed,consistent with an inflated data covariance matrix, at each data assimilation step. ES-MDAis an efficient and widely used history matching algorithm, though there are no guaranteesthat it provides correct posterior quantification in complex nonlinear problems. In addition,the method may experience ensemble collapse [40] in some cases. A key usage for the RSresults presented in Section 6.2 is to assess the performance of much more efficient algorithmssuch as ES-MDA. We now proceed with this evaluation.The overall workflow for ES-MDA, with geomodels parameterized using 3D CNN-PCAand flow evaluations performed using the 3D recurrent R-U-Net, is as follows:1. Specify N a , the number of data assimilation steps, and corresponding inflation coeffi-cients α i , ( i = 1 , · · · , N a ), where (cid:80) N a i =1 α − i = 1. Specify N e , the number of ensemblemembers.For each ensemble member:2. Sample the low-dimensional variable ξ ∈ R l from its prior distribution N ( µ ξ , C ξ ).3. For i = 1 , · · · , N a : 32 Construct m cnnpca ( ξ ) using 3D CNN-PCA. Then generate surrogate flow predic-tions ˆ f ( m cnnpca ( ξ )). • Perturb the observation data through application of d uc = d obs + √ α i C / D z d and z d ∼ N (0 , I ). • Update low-dimension variable ξ i to ξ i +1 using ξ i +1 = ξ i + C MD ( C DD + α i C D ) − [ d uc − ˆ f ( m cnnpca ( ξ i ))] , (26)where C MD is the cross-covariance matrix between ξ i and predicted dataˆ f ( m cnnpca ( ξ i )), C DD is the auto-covariance matrix of the predicted data, and C D is the covariance matrix of data measurement error.We specify N e = 200 and N a = 10. The same true model, the same 16 data observations,and the same standard deviation of measurement error as in Section 6.2 are used here. Acomparison of the ES-MDA and (reference) RS history matching results is presented inFig. 13. We see that ES-MDA is able to provide posterior estimates of many well quantitiesthat are in close agreement with the RS results. ES-MDA does, however, overestimateposterior uncertainty in some well responses, as is evident in Fig. 13(c), (d) and (f).We note that ES-MDA performance can be sensitive to the values specified for N a andthe inflation coefficients α i . The determination of ‘optimal’ choices for these parameters canbe facilitated through comparisons to reference RS results, which would be very difficult togenerate in the absence of a surrogate model for flow. We plan to assess different historymatching algorithms, along with parameter tuning procedures, in future work.
7. Concluding Remarks
In this work the recurrent R-U-Net surrogate model, which is capable of predicting dy-namic state quantities in subsurface flow systems, was extended to 3D. The method was ap-33
00 400 600 800 1000Days100200300400 P o il r a t e ( m / d a y ) RSES-MDAprior (a) P1 oil rate
200 400 600 800 1000Days0200400 P w a t e r r a t e ( m / d a y ) RSES-MDAprior (b) P1 water rate
200 400 600 800 1000Days200300400500600 P o il r a t e ( m / d a y ) RSES-MDAprior (c) P2 oil rate
200 400 600 800 1000Days0200400600800 P w a t e r r a t e ( m / d a y ) RSES-MDAprior (d) P2 water rate
200 400 600 800 1000Days200400600 P o il r a t e ( m / d a y ) RSES-MDAprior (e) P3 oil rate
250 500 750 1000Days050010001500 P w a t e r r a t e ( m / d a y ) RSES-MDAprior (f) P3 water rate
200 400 600 800 1000Days100200300400500600 P o il r a t e ( m / d a y ) RSES-MDAprior (g) P4 oil rate
200 400 600 800 1000Days0200400600800 P w a t e r r a t e ( m / d a y ) RSES-MDAprior (h) P4 water rate
Figure 13: Oil (left) and water (right) production rates for all four production wells. Gray regions representthe prior P –P range, blue dashed curves denote rejection sampling posterior P (lower), P (middle)and P (upper) predictions, red dashed curves denote ES-MDA posterior P (lower), P (middle) andP (upper) predictions. Vertical dashed line indicates the latest time at which data are collected. models were generated and evaluated(which would have been extremely time consuming using geological modeling software andHFS), and 151 were accepted, for the problem considered. Significant uncertainty reductionwas achieved, and high-fidelity simulation of the accepted geomodels provided results thatclosely tracked 3D recurrent R-U-Net predictions. ES-MDA was then applied. The low-dimensional 3D CNN-PCA variables were updated iteratively, and flow evaluations wereagain accomplished using the 3D recurrent R-U-Net. RS provides ‘target’ posterior predic-tions, and the tuning of ES-MDA parameters can be accomplished with reference to theseresults.There are many important directions that should be considered in future work in thisgeneral area. A key challenge is the extension of the methods presented here to practicalmodels containing, e.g., O (10 − ) cells. Because training time can scale with problem35ize, this may require the development and evaluation of different network architectures. Itwill also be of interest to extend our surrogate model to treat coupled flow-geomechanicssystems. These problems can be very expensive to simulate, so surrogate models could behighly useful in this context. Finally, additional history matching algorithms, includingthose considered too expensive for use in traditional settings, can be evaluated and tunedusing our deep-learning-based parameterization and surrogate flow modeling capabilities.
8. Acknowledgements
We are grateful to the Stanford Smart Fields Consortium and to Stanford–Chevron CoREfor partial funding of this work. We also thank the Stanford Center for Computational Earth& Environmental Science (CEES) for providing computational resources.
Appendix A. Model Architecture Details
Appendix A.1. 3D CNN-PCA Transform Net Architecture
The architecture of the 3D CNN-PCA transform net is given in Table 1. Here, n x , n y and n z refer to the geomodel dimensions, ‘conv’ represents a 3D convolutional layerfollowed by batch normalization and ReLU nonlinear activation, while ‘deconv’ denotes a3D deconvolutional (upsampling) layer followed by batch normalization and ReLU. The last‘conv layer’ only contains one 3D convolutional layer. A ‘residual block’ contains a stack oftwo convolutional layers, each with 128 filters of size 3 × × able 1: Network architecture for the 3D CNN-PCA model transform net Layer Output size
Input ( n x , n y , n z , 1)conv, 32 filters of size 9 × × ×
1, stride (1, 1, 1) ( n x , n z , n y , 32)conv, 64 filters of size 3 × × ×
32, stride (2, 2, 2) ( n x / n y / n z /
2, 64)conv, 128 filters of size 3 × × ×
64, stride (2, 2, 1) ( n x / n y / n z /
2, 128)residual block, 128 filters ( n x / n y / n z /
2, 128)residual block, 128 filters ( n x / n y / n z /
2, 128)residual block, 128 filters ( n x / n y / n z /
2, 128)residual block, 128 filters ( n x / n y / n z /
2, 128)residual block, 128 filters ( n x / n y / n z /
2, 128)deconv, 64 filters of size 3 × × × n x / n y / n z /
2, 64)deconv, 32 filters of size 3 × × ×
64, stride (2, 2, 2) ( n x , n y , n z , 32)conv layer, 1 filter of size 9 × × ×
64, stride (1, 1, 1) ( n x , n y , n z , 1) Appendix A.2. 3D R-U-Net Schematic
A detailed schematic of the 3D R-U-Net used in this study is provided in Fig. 14. Theboxes represent 3D multichannel feature maps and the arrows denote the various operations.The values to the left of the boxes give the corresponding 3D feature map dimensions, andthe values above the boxes indicate the number of channels. The operations ‘conv,’ ‘residualblock,’ ‘deconv’ and ‘conv layer’ are as defined in Appendix A.1, though here they havedifferent numbers of filters and strides. Fig. 14 shows that the extracted features in theencoding net (left) are copied and concatenated to the upsampled features in the decodingnet (right).
Appendix A.3. 3D Recurrent R-U-Net Architecture
The detailed architecture of the recurrent R-U-Net is shown in Table 2. In the table,we use a single number to represent stride because they are the same in all directions. TheconvLSTM3D block, which employs 64 filters of size 3 × ×
3, performs all of the LSTMgate operations. The convLSTM net generates ( n x / , n y / , n z / ,
64) activation maps for all37
𝟏𝟔𝟒 𝟔𝟒𝟔𝟒𝟔𝟒 𝟔𝟒 𝟑𝟐𝟑𝟐 𝟏𝟔 conv , stride conv , stride residual block deconv , stride deconv , stride copy and concatenateconv layer , stride
211 1 1 11 11 12 22 4 4
Figure 14: Schematic of the 3D R-U-Net. Boxes represent different multichannel feature maps, with valuesat the left indicating feature map dimensions and values above providing the number of channels (figuremodified from [3]). t time steps. The decoder processes these n t activation maps separately to produce thestate predictions. Table 2: 3D recurrent R-U-Net architecture details
Net Layer Output size
Encoder Input ( n x , n y , n z , × ×
3, stride 2 ( n x / , n y / , n z / , × ×
3, stride 1 ( n x / , n y / , n z / , × ×
3, stride 2 ( n x / , n y / , n z / , × ×
3, stride 1 ( n x / , n y / , n z / , × ×
3, stride 1 ( n x / , n y / , n z / , × ×
3, stride 1 ( n x / , n y / , n z / , × ×
3, stride 1 ( n x / , n y / , n z / , , n t )Decoder residual block, 64 filters of size 3 × ×
3, stride 1 ( n x / , n y / , n z / , , n t )residual block, 64 filters of size 3 × ×
3, stride 1 ( n x / , n y / , n z / , , n t )deconv, 64 filters of size 3 × ×
3, stride 1 ( n x / , n y / , n z / , , n t )deconv, 32 filters of size 3 × ×
3, stride 2 ( n x / , n y / , n z / , , n t )deconv, 32 filters of size 3 × ×
3, stride 1 ( n x / , n y / , n z / , , n t )deconv, 16 filters of size 3 × ×
3, stride 2 ( n x , n y , n z , , n t )conv layer, 1 filter of size 3 × ×
3, stride 1 ( n x , n y , n z , , n t )39 eferences [1] Y. Liu, W. Sun, L. J. Durlofsky, A deep-learning-based geological parameterization for history matchingcomplex models, Mathematical Geosciences 51 (6) (2019) 725–766.[2] Y. Liu, L. J. Durlofsky, Multilevel strategies and geological parameterizations for history matchingcomplex reservoir models, SPE Journal 25 (01) (2020) 81–104.[3] M. Tang, Y. Liu, L. J. Durlofsky, A deep-learning-based surrogate model for data assimilation indynamic subsurface flow problems, Journal of Computational Physics 413 (2020) 109456.[4] Y. Liu, L. J. Durlofsky, 3D CNN-PCA: a deep-learning-based parameterization for complex geomodels,arXiv preprint arXiv:2007.08478 (2020) .[5] Y. Zhu, N. Zabaras, Bayesian deep convolutional encoder–decoder networks for surrogate modeling anduncertainty quantification, Journal of Computational Physics 366 (2018) 415–447.[6] S. Mo, N. Zabaras, X. Shi, J. Wu, Deep autoregressive neural networks for high-dimensional inverseproblems in groundwater contaminant source identification, Water Resources Research 55 (5) (2019)3856–3881.[7] S. Mo, N. Zabaras, X. Shi, J. Wu, Integration of adversarial autoencoders with residual dense convo-lutional networks for estimation of non-Gaussian hydraulic conductivities, Water Resources Research56 (2) (2020) 2019WR026082.[8] Z. L. Jin, Y. Liu, L. J. Durlofsky, Deep-learning-based surrogate model for reservoir simulation withtime-varying well controls, Journal of Petroleum Science and Engineering 192 (2020) 107273.[9] J. Han, A. Jentzen, E. Weinan, Solving high-dimensional partial differential equations using deeplearning, Proceedings of the National Academy of Sciences 115 (34) (2018) 8505–8510.[10] J. Berg, K. Nystr¨om, A unified deep artificial neural network approach to partial differential equationsin complex geometries, Neurocomputing 317 (2018) 28–41.[11] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: a deep learning frame-work for solving forward and inverse problems involving nonlinear partial differential equations, Journalof Computational Physics 378 (2019) 686–707.[12] L. Sun, H. Gao, S. Pan, J.-X. Wang, Surrogate modeling for fluid flows based on physics-constraineddeep learning without simulation data, Computer Methods in Applied Mechanics and Engineering 361(2020) 112732.[13] Z. Mao, A. D. Jagtap, G. E. Karniadakis, Physics-informed neural networks for high-speed flows,Computer Methods in Applied Mechanics and Engineering 360 (2020) 112789.
14] L. Lu, X. Meng, Z. Mao, G. E. Karniadakis, DeepXDE: a deep learning library for solving differentialequations, arXiv preprint arXiv:1907.04502 (2019) .[15] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, P. Perdikaris, Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data, Journal of Com-putational Physics 394 (2019) 56–81.[16] J. Tian, C. Qi, Y. Sun, Z. M. Yaseen, Surrogate permeability modelling of low-permeable rocks usingconvolutional neural networks, Computer Methods in Applied Mechanics and Engineering 366 (2020)113103.[17] T. Zhang, Y. Li, Y. Li, S. Sun, X. Gao, A self-adaptive deep learning algorithm for accelerating multi-component flash calculation, Computer Methods in Applied Mechanics and Engineering 369 (2020)113207.[18] E. Laloy, R. H´erault, J. Lee, D. Jacques, N. Linde, Inversion using a new low-dimensional representationof complex binary geological media based on a deep neural network, Advances in Water Resources 110(2017) 387–405.[19] S. W. A. Canchumun, A. A. Emerick, M. A. C. Pacheco, Towards a robust parameterization forconditioning facies models using deep variational autoencoders and ensemble smoother, Computersand Geosciences 128 (2019) 87–102.[20] S. Chan, A. H. Elsheikh, Parametrization and generation of geological models with generative adver-sarial networks, arXiv preprint arXiv:1708.01810 (2017) .[21] S. Chan, A. H. Elsheikh, Parametric generation of conditional geological realizations using generativeneural networks, arXiv preprint arXiv:1807.05207 (2018) .[22] E. Dupont, T. Zhang, P. Tilke, L. Liang, W. Bailey, Generating realistic geology conditioned on physicalmeasurements with generative adversarial networks, arXiv preprint arXiv:1802.03065 (2018) .[23] L. Mosser, O. Dubrule, M. J. Blunt, Conditioning of three-dimensional generative adversarial networksfor pore and reservoir-scale models, arXiv preprint arXiv:1802.05622 (2018) .[24] E. Laloy, R. H´erault, D. Jacques, N. Linde, Training-image based geostatistical inversion using a spatialgenerative adversarial neural network, Water Resources Research 54 (1) (2018) 381–406.[25] E. Laloy, N. Linde, C. Ruffino, R. H´erault, G. Gasso, D. Jacques, Gradient-based deterministic inversionof geophysical data with generative adversarial networks: is it feasible?, Computers and Geosciences133 (2019) 104333.[26] S. Chan, A. H. Elsheikh, Parametrization of stochastic inputs using generative adversarial networks