[PDF] Discovering state-parameter mappings in subsurface models using generative adversarial networks

Abstract

A fundamental problem in geophysical modeling is related to the identification and approximation of causal structures among physical processes. However, resolving the bidirectional mappings between physical parameters and model state variables (i.e., solving the forward and inverse problems) is challenging, especially when parameter dimensionality is high. Deep learning has opened a new door toward knowledge representation and complex pattern identification. In particular, the recently introduced generative adversarial networks (GANs) hold strong promises in learning cross-domain mappings for image translation. This study presents a state-parameter identification GAN (SPID-GAN) for simultaneously learning bidirectional mappings between a high-dimensional parameter space and the corresponding model state space. SPID-GAN is demonstrated using a series of representative problems from subsurface flow modeling. Results show that SPID-GAN achieves satisfactory performance in identifying the bidirectional state-parameter mappings, providing a new deep-learning-based, knowledge representation paradigm for a wide array of complex geophysical problems.

Full PDF

DDiscovering state-parameter mappings in subsurfacemodels using generative adversarial networks

Alexander Y. Sun Bureau of Economic Geology, Jackson School of Geosciences, The University of Texas at Austin, Austin,TX

Corresponding author: A. Y. Sun, [email protected] –1– a r X i v : . [ phy s i c s . d a t a - a n ] O c t bstract A fundamental problem in geophysical modeling is related to the identiﬁcation and approx-imation of causal structures among physical processes. However, resolving the bidirectionalmappings between physical parameters and model state variables (i.e., solving the forwardand inverse problems) is challenging, especially when parameter dimensionality is high.Deep learning has opened a new door toward knowledge representation and complex pat-tern identiﬁcation. In particular, the recently introduced generative adversarial networks(GANs) hold strong promises in learning cross-domain mappings for image translation.This study presents a state-parameter identiﬁcation GAN (SPID-GAN) for simultaneouslylearning bidirectional mappings between a high-dimensional parameter space and the cor-responding model state space. SPID-GAN is demonstrated using a series of representativeproblems from subsurface ﬂow modeling. Results show that SPID-GAN achieves satisfac-tory performance in identifying the bidirectional state-parameter mappings, providing anew deep-learning-based, knowledge representation paradigm for a wide array of complexgeophysical problems.

Plain Language Summary

Development of physically-based models requires two steps, mathematical abstraction(forward modeling) and parameter estimation (inverse modeling). A high-ﬁdelity modelrequires high-quality parameter support. The need for identifying forward and reverse map-pings (i.e., a function that associates element of one set to another) is thus ubiquitous ingeophysical research. A signiﬁcant challenge in geosciences is that geoparameters are spa-tially heterogeneous and high dimensional, and yet can only be observed at limited locations.The conventional workﬂow, built on minimizing the model-observation mismatch at mea-surement locations, does not oﬀer an eﬃcient way for estimating the spatial structure ofhigh-dimensional parameter ﬁelds. This work presents a deep-learning-based framework foridentifying the state-parameter bidirectional mappings using the recently introduced gen-erative adversarial networks (GANs). GANs have been shown to be adept at associatingimages from one domain to another. Its potential for discovering mappings in physicallybased models has not been demonstrated so far. This work shows that GAN can achievehigh performance in learning bidirectional parameter-to-state mappings in physically basedmodels, thus providing a new way of thinking and doing things in geosciences. The impli-cation for additional applications in subsurface modeling is signiﬁcant. –2–

Introduction

Deep learning (DL) has achieved great success in image recognition and business intel-ligence over the past decade, continuously narrowing the gap between artiﬁcial intelligenceand human intelligence. Tremendous interests exist in the geophysical research communityto leverage the strength of DL for solving similar image recognition and prediction problems,such as land cover and land use classiﬁcation [

Castelluccio et al. , 2015], extreme weatherevent forecasting [

Shi et al. , 2015;

Liu et al. , 2016], estimation of particulate matter levels[

Li et al. , 2017], and data imputation [

Fang et al. , 2017]. To achieve high accuracy, many DLalgorithms require a large amount of labeled training data (i.e., co-observed predictors andpredictands), which is generally hard to acquire in geoscience domains due to the invisibilityof subsurface processes and sparsity of in situ monitoring networks.In machine learning, semi-supervised learning has been used to tackle the issue oflimited labeled data [

Chapelle et al. , 2006]. As its name suggests, semi-supervised learningsits in between the traditional unsupervised learning (all training data are unlabeled) andsupervised learning (all training data are labeled). Semi-supervised learning methods useunlabeled data, together with limited labeled data, to build better machine learning models,under the assumption that unlabeled data are more abundant and carry information thatis useful for the inference of target variables [

Chapelle et al. , 2006].Semi-supervised learning may help solve a fundamental problem in geosciences, namely,estimating the underlying generative model of sampled data so that new samples can besynthesized from the learned model. In fact, generative process modeling is at the core of allphysical sciences, where mechanistic models have long been applied to extract, abstract, andapproximate the observed causal structures in order to simulate samples of the underlyingphysical processes. In geosciences in particular, parametric forward modelings are carriedout by solving partial diﬀerential equations (PDEs) governing the spatially and/or tem-porally varying subsurface physical processes, whereas inverse problems are formulated toidentify the model parameters by using observations of state variables [

Sun and Sun , 2015].Ideally, both the forward and inverse modeling should be done in a closed loop manner suchthat new information can be continuously assimilated to reduce uncertainty. Thus, the needfor resolving the bidirectional mappings between state and parameter spaces always exists.An outstanding challenge is that many subsurface processes are nonlinear, multiscale, andhigh-dimensional, making it nontrivial to establish such mappings in practice. –3– he recently introduced generative adversarial networks (GANs), which may be con-sidered a subclass of DL for semi-supervised learning, hold strong promises not only forlearning the generative processes of high-dimensional images with limited labeled data, butalso for translating seemingly unrelated images across diﬀerent domains [

Goodfellow et al. ,2014]. An open question is whether these interesting features of GANs can beneﬁt thegeophysical modeling community. Here I explore the bidirectional mapping capability ofGANs and hypothesize that GANs may provide a new workﬂow for inferring parameter-state mappings. Many of the conventional parameter estimation methods are built on theminimization of certain distance measures between observed and simulated values by run-ning a forward model iteratively, while uncertainty quantiﬁcation is usually done a posteriori.Under the GAN framework, the model parameter space and state space are regarded as twoinherently related image domains, and DL-based functional relationships are obtained tofacilitate the cross-domain learning, namely, estimating parameters for given model states,and vice versa.The main purpose of this study is to formulate a state-parameter identiﬁcation GAN(SPID-GAN) for obtaining deep bidirectional representations of geophysical models. In thefollowing, I ﬁrst introduce the proposed SPID-GAN framework, which combines the tra-ditional geostatistical simulation, physically-based forward modeling, and the point-basedparameter estimation workﬂow with cross-domain deep learning. The framework is demon-strated using two diﬀerent examples from subsurface ﬂow modeling, in which the model pa-rameters are spatially heterogeneous, representing high-dimensional samples obtained fromsingle- and bimodal distributions. I show that the DL-based SPID-GAN is well adept atlearning the subtle spatial patterns in model states and parameter ﬁelds, thus representinga powerful tool for approximating the causal structures of physical models.

The original GAN introduced by

Goodfellow et al. [2014] consists of a pair of dis-criminator and generator, designed to compete with each other as in a two-player game,thus the word adversarial. The role of the generator is to create “fake” samples that areindistinguishable from the training data, while the role of the discriminator is to classifythe generated samples to determine whether they are real or fake. Let G denote a generator –4– odel that deﬁnes a mapping G : X → Y , namely, it takes x ∈ X as input and generatesa fake sample G ( x ) that has the same support as the training data y ∈ Y . In addition, let D denote a discriminator that determines whether a sample is drawn from the empiricaldistribution of training data p data ( y ), or from the generator distribution p model ( G ( x )). Thegoal of the generator is thus to push its sample distribution p model ( G ( x )) toward the datadistribution p data ( y ). At optimality the discriminator is maximally confused and cannotdistinguish real samples from ones that are fake (i.e., predicting with a probability of 0.5for all inputs) [ Goodfellow , 2016]. Note that similar principles are behind many Bayesianstatistical inversion and ensemble-based data assimilation algorithms, in which the commongoal is updating a prior distribution to a posterior distribution that reﬂects the newer infor-mation. In practice, many of the conventional methods are either limited to low-dimensionalproblems (e.g., particle ﬁlter and Markov chain Monte Carlo) or to multivariate Gaussiandistributions (e.g., the ensemble Kalman ﬁlter). In comparison, GANs make no such as-sumptions on the distributions of domain data.Existing GANs diﬀer by how domains are deﬁned and what cross-domain mappingsare to be learned. The generator in the original GAN takes a sample from a low-dimensionallatent space (i.e., a random noise vector) and turns it into a real image (e.g., a car). Such agenerative process resembles the PCA-based random ﬁeld simulation algorithm commonlyused in geostatistics [

Satija and Caers , 2015]. The main diﬀerence is that GANs trainDL models to learn complex structural patterns embedded in the training data, while theeigenvector-based representation in PCA is linear and restricted to the 2nd-order statistics.Since the work of

Goodfellow et al. [2014], a large number of GAN models have been in-troduced for cross-domain learning. So far, GANs have been demonstrated in a number ofinspiring applications, such as (a) image superresolution, where low-resolution images areused to generate their high-resolution counterparts; examples include deep convolution GAN(dcGAN) [

Radford et al. , 2015] and superresolution GAN (SRGAN) [

Ledig et al. , 2017]; (b)cross-domain image-to-image translation, where labeled information in the form of eithertext descriptions or images is used to generate images in another domain; examples includethe conditional GAN [

Mirza and Osindero , 2014], coupled GAN [

Liu and Tuzel , 2016],DiscoGAN [

Kim et al. , 2017], and DualGAN [

Yi et al. , 2017]. Newer GANs can performdirect cross-domain learning without using low-dimensional latent space vectors as done inthe original GAN. –5– uilding on the existing cross-domain learning GANs, SPID-GAN approaches thestate-parameter bidirectional mapping problem by using two pairs of generators and dis-criminators Forward mapping G P S : R P → R S , D S : R S → [0 , , Reverse mapping G SP : R S → R P , D P : R P → [0 , . (1)where G P S deﬁnes a forward mapping from the parameter space P to the model state space S , while G SP provides a reverse mapping from S to P . The two discriminators D S and D P are used to determine the authenticity of samples generated for the respective domains interms of probability. A practical working assumption on G P S and G SP is that they are bijec-tive, meaning each element of the domain P is mapped by exactly one element of the domain S . This helps prevent the many-to-one mappings during training, which is also known asthe mode collapse problem [ Zhu et al. , 2017]. Another assumption is the continuity of map-pings, namely, if two elements in domain P are close, then also should be the correspondingelements in domain S . The same assumption is also implied by the stability requirement ofinverse solutions [ Sun and Sun , 2015]. Thus, to arrive at meaningful solutions, one needsnot only a proper algorithm design (detailed below), but also an appropriate experimentaldesign (next section).The loss function used for training G P S consists of three terms [

Kim et al. , 2017;

Zhuet al. , 2017] J ( G ) P S (cid:16)

P, S, θ ( D S ) , θ ( G PS ) , θ ( G SP ) (cid:17) = J ( D S ) P S + d tran (cid:16) G P S ( x , θ ( G PS ) ) , y (cid:17) + d cyc (cid:16) G SP (cid:16) G P S ( x , θ ( G PS ) ) , θ ( G SP ) (cid:17) , x (cid:17) , (2)where x ∈ P , y ∈ S , and θ ( · ) denote the unknown parameters of the respective genera-tors/discriminators. Before training, a standard practice in DL is to scale all training datato the same range (e.g., [0 , J ( D S ) P S (cid:16) θ ( D s ) , θ ( G PS ) (cid:17) = 12 E x ∼ p data ( x ) (cid:13)(cid:13)(cid:13) D S (cid:16) G P S (cid:16) x , θ ( G PS ) (cid:17) , θ ( D s ) (cid:17)(cid:13)(cid:13)(cid:13) +12 E y ∼ p data ( y ) (cid:13)(cid:13)(cid:13) D S (cid:16) y , θ ( D S ) (cid:17) − (cid:13)(cid:13)(cid:13) , (3)for which the goal is to minimize the error rate of the discriminator on the fake sample(toward 0) and on the real sample (toward 1). In Eq. (3), the expectation ( E ) is taken overall training samples. Note that the mean square error (MSE) is used here, instead of thebinary entropy loss used by Goodfellow et al. [2014]. –6– he second term on the right-hand-side of Eq. (2) measures the translation loss interms of mean absolute error (MAE), d tran (cid:16) G P S ( x , θ ( G PS ) ) , y (cid:17) = E y ∼ p data ( y ) (cid:12)(cid:12)(cid:12) G P S (cid:16) x , θ ( G PS ) (cid:17) − y (cid:12)(cid:12)(cid:12) , (4)where expectation is calculated based on all pairs of generated and training images. The lastterm on the right-hand-side of Eq. (2) quantiﬁes the cycle consistency (or reconstructionloss) between the two generators using MAE d cyc (cid:16) G SP (cid:16) G P S ( x , θ ( G PS ) ) , θ ( G SP ) (cid:17) , x (cid:17) = E x ∼ p data ( x ) (cid:12)(cid:12)(cid:12) G SP (cid:16) G P S ( x , θ ( G PS ) ) , θ ( G SP ) (cid:17) − x (cid:12)(cid:12)(cid:12) . (5)By minimizing the reconstruction loss, the cycle consistency term helps to mitigate the modecollapse problem [ Kim et al. , 2017;

Zhu et al. , 2017]. The loss function of the generator G SP can be deﬁned similarly to Eq. (2) by switching P and S . The total generator loss function J ( G ) for SPID-GAN is the average of the two generator losses. Each discriminator uses aloss function in the same form as Eq. (3) but with the opposite sign. The generator anddiscriminator loss functions are highly coupled and need to be solved from the followingminimax optimization problemˆ θ ( G PS ) , ˆ θ ( D S ) , ˆ θ ( G SP ) , ˆ θ ( D P ) = arg min θ ( G PS ) ,θ ( G SP ) max θ ( D S ) ,θ ( D P ) J ( G ) (cid:16) θ ( G PS ) , θ ( D S ) , θ ( G SP ) , θ ( D P ) (cid:17) . (6)In practice, the optimization problem in Eq. (6) is solved by using alternating gradientupdating steps for generators and discriminators, with parameters of one group ﬁxed whenparameters of the other are being updated in each iteration.Figure 1a illustrates the workﬂow of SPID-GAN. The ﬁrst step shown on the left isrelated to data preparation, which may be facilitated by a rich set of tools available from thegeostatistics literature [ Deutsch et al. , 1998]. For example, if parameter measurements areavailable, they may be integrated to generate the so-called “conditional realizations” of theparameter ﬁeld to honor prior information. If measurements of state variables are available,they may be used to generate plausible images of the state ﬁeld via kriging. Model statemeasurements may also be used to select the most probable parameter ﬁelds by choosingthe parameter sets that minimize the diﬀerences between GAN-predicted state values andthe actual observed values. Assuming point measurements of parameter and state variablesare available, the SPID-GAN workﬂow implies an additional loss term on the generatorsthat is enforced through pre- and post-processing, –7– obs = E S kG P S ( x )( u S ) − y ( u S ) k + E P kG SP ( y )( u P ) − x ( u P ) k , (7)where u P and u S are locations of P and S measurements. These diﬀerent use cases will beillustrated in the Result section.In this work, SPID-GAN is implemented by using convolutional neural networks(CNN), a class of deep feed-forward neural networks specially designed for image patternrecognition [ LeCun et al. , 2015]. A brief introduction of common CNN terminologies isgiven in Supporting Information (SI) S1. The two generators share a deep learning neuralnetwork design that is similar to DiscoGAN [

Kim et al. , 2017], which includes a series ofconvolutional and deconvolutional layers to help uncover features at multiple scales (Figure1b). The input images have dimensions 128 × × Ulyanovet al. , 2016]. The two discriminators use the same design as shown in Figure 1c. For thehidden layers, the layer conﬁguration is repeated twice using alternating stride sizes 1 and 2.All codes are written in Python using the open-source deep learning package, Keras [

Cholletet al. , 2015]. The Adam optimization solver [

Kingma and Ba , 2014] is used for training, witha learning rate 0.0002 and a forgetting factor 0.5. Unless otherwise noted, the number ofepochs used in training is 125 and the batch size is 10. The computing time taken for eachepoch is about 23 s on a Cloud-based Ubuntu instance running on an Intel Xeon E5-2580CPU node with GPU acceleration (NVIDIA Tesla K40).To quantify the skill of trained generators, the structural similarity index (SSIM)commonly used in image analysis [

Wang et al. , 2004] is adopted as a metric. For two slidingwindows u and v of dimensions n p × n p , SSIM is deﬁned as SSIM ( u , v ) = (2 µ u µ v + c )(2 σ uv + c )( µ u + µ v + c )( σ u + σ v + c ) , (8) –8– here u and v represent two patches from the simulated image (using GAN) and testingimage (from numerical model), respectively, µ represents the mean, σ represents the vari-ance, and c (0.01) and c (0.03) are small constants used to stabilize the denominator. Themean value of SSIM, averaged over all sliding windows, ranges from -1 to 1, with 1 beingattainable when two images are identical. The size of the sliding window used in this studyis n p = 7. To demonstrate the usefulness of SPID-GAN for learning the bidirectional mappings,I consider groundwater ﬂow in a spatially heterogeneous aquifer, which is a representativegeoscience problem and has been studied extensively. The governing equation is given bythe following PDE S s ( z ) ∂h ( z , t ) ∂t = ∇ · ( K ( z ) ∇ h ( z , t )) + q w ( z , t ) , (9)where h [L] is hydraulic head, S s [1/L] is speciﬁc storage, K [L/T] is hydraulic conductivity, z denotes spatial coordinates, t is time, and q w [L /T] is the source/sink term. For thepurpose of illustration, the state variable is h and the parameter is K , both are spatiallydistributed variables. All other quantities are assumed deterministic. The lateral dimensionsof the aquifer are 1280 m × ×

10 mgrid blocks. The thickness of the aquifer is 20 m. The goal of SPID-GAN is to train twogenerators simultaneously to approximate the physical model speciﬁed in Eq. (9). Twodiﬀerent problem settings are considered, single- and multimodal parameter distributions.

In the ﬁrst problem, K is assumed to be a random ﬁeld following log-normal distri-bution. The mean and standard deviation of log K are 2 × − and 1.0. The variogrammodel of log K is Gaussian with max and min ranges of 500 and 100 m, respectively. S s isdeterministic with a value of 2 . × − m − . In the baseline case, constant-head (Dirich-let) boundary conditions are imposed on both the west (21 m) and east (10 m) sides ofthe aquifer. Four pumping wells are put in grid blocks (25, 25), (25,106), (106, 106), and(106,25), with pumping rates of 10, 10, 50, and 20 m /day, respectively. Stochastic realiza- –9– ions of log K are generated by using the sequential Gaussian simulator ( sgsim ), availablefrom the open-source geostatistical package SGeMS [ Remy et al. , 2009]. The ﬂow ﬁeld isﬁrst run to the steady state, followed by a transient simulation period of 3600 s. For trainingand validation, the corresponding head distributions are obtained by using the open-sourcegroundwater ﬂow solver MODFLOW via its Python wrapper, flopy [ Bakker et al. , 2016].The computing time for each forward simulation is 0.04 s. Such geostatistical processingsteps have been broadly used in previous studies, such as ensemble-based data assimila-tion[

Chen and Zhang , 2006;

Franssen et al. , 2009;

Sun et al. , 2009a], hydraulic tomography[

Lee and Kitanidis , 2014], and surrogate modeling and uncertainty quantiﬁcation [

Li andZhang , 2007;

Nowak et al. , 2010]. Training of the SPID-GAN is done using 400 pairs oflog K (parameter domain) and h (state domain) ﬁelds, each having dimensions of 128 × K ﬁeld and the correspondinghead ﬁeld generated using the trained forward generator G P S , while the right two imagesof each row show the input state (head) and corresponding log K ﬁeld predicted using thetrained reverse generator G SP . For this base case, SPID-GAN captures the spatial patternsin head distributions well, on the basis of visual comparisons between Figures 2b and 2c,and between Figures 2f and 2g. Results of testing using 1000 test realizations of log K give an ensemble mean SSIM value of about 0.98 (Figure 2i). In comparison, by visualexaminations of Figures 2a and 2d, and Figures 2e and 2h, it can be seen that the trainedreverse mapping G SP captures the dominant spatial patterns in the original log K images,but tends to underestimate some local maxima or minima. The ensemble mean SSIMobtained from testing on 1000 h ﬁeld samples (not used in training) is 0.78.In general, learning a high-dimensional reverse mapping is a signiﬁcantly more chal-lenging problem to solve, depending on not only the design of the GAN algorithm, butalso the experimental design and quality of training samples. In this case, the learning ofreverse mapping would beneﬁt from any conditions (e.g., boundary conditions and forcings)that can help improve the information content of head ﬁelds and the uniqueness of crossmappings. In an experimental design, if the head ﬁeld is not sensitive to certain parts ofthe parameter ﬁeld, SPID-GAN may give ambiguous inverse solutions in those areas. Toelaborate this latter point, in SI S2 the number of pumping wells is reduced from four to two(Figure S2), and then to none (Figure S3). As a result, the parameter ﬁelds in those casesbecome less identiﬁable—more artifacts start to appear in reversely mapped log K ﬁelds in –10– hose cases, especially in the central part of the domain that is less stimulated than theparts near the west and east boundaries. Nevertheless, the overall SPID-GAN performancestays relatively robust, as can be seen by the SSIM statistics in the respective plots.In addition to diﬀerent boundary/forcing conditions, the eﬀect of training samplequantity on SPID-GAN performance is also investigated. The results, shown in SI S3,suggest that the GAN is relatively robust when the number of training samples varies,indicating the capability of GAN to learn dominant cross-domain patterns.GANs operate with images, while the traditional workﬂow in hydrogeology typicallyinvolves point measurements. This next example demonstrates that the two workﬂows areactually complementary under the new SPID-GAN workﬂow proposed in Figure 1a. Figures3a, b show the “true” log K and head ﬁelds, which are sampled only at limited locations.For simplicity, it is assumed that point observations of K and h are collected from thesame monitoring network (open circles in Figures 3a and 3b, a total of 36 conditioningpoints), although they may well be diﬀerent. First, 1000 log K conditional realizationsare generated using sgsim that honor the prior information on K . For each conditionallog K realization, the trained forward generator G AB from the base case is used to predicta head ﬁeld, which is then sampled at monitoring locations to calculate the MSE betweenthe simulated and actual head observations. The resulting MSE values are sorted in anascending order. Figures 3c,e,g show the top 3 head ﬁelds that have the minimal MSE,while the corresponding log K ﬁelds are plotted in Figures 3d,f,h. The identiﬁed ﬁeldsshow strong resemblance to the synthetic truth, especially where conditional informationis available (i.e., comparing SPID-GAN results to the synthetic truth at the monitoringlocations). Similarly, the workﬂow presented here can be used to form an ensemble ofmodels to perform the DL-based, uncertainty quantiﬁcation in the sense of the recentlyintroduced data space inversion, in which the goal of modeling is not calibration in thetraditional sense, but to establish a data-driven statistical relationship between the observedand forecast variables and to quantify the predictive uncertainty of the forecast variables,by using an ensemble of uncalibrated prior models [ Satija and Caers , 2015;

Sun et al. , 2017;

Jeong et al. , 2018]. –11– .2 Bimodal parameter distribution

In the second problem setting, the feasibility of using SPID-GAN for learning multi-modal distributions is investigated. The aquifer is assumed to consist of two hydrofacies, apermeable channel facies and a background matrix, making the distribution of K bimodal.Identiﬁcation of facies shapes is a representative and yet challenging inversion problem thathas also been studied extensively in the literature [ Liu and Oliver , 2005;

Sun et al. , 2009b;

Zhou et al. , 2011]. In this example, the channel facies has a K value of 2 × − m/s and an S s value of 1 × − m − , while the matrix has a K value of 1 × − m/s and an S s valueof 1 × − m − . The facies realizations are generated using snesim , which is a multipointgeostatistical simulator also available from SGeMS [ Remy et al. , 2009]. Constant heads of11 and 10 m are imposed on the west and east boundaries, and the transient simulationperiod is 5400 s. All other settings are the same as used in the previous two examples.Figures 4a-d and e-h show example results from two test realizations. The forwardgenerator predicts the connected ﬂow paths well, including the fading color pattern in headdistribution from the west boundary to the east boundary (comparing Figures 4b and 4c,and Figures 4f and 4g). The reverse mapping from the head distribution identiﬁes thewell-connected backbone of the ﬂow network (comparing Figure 4a with 4d, and Figure 4ewith 4h), but tends to omit small channel segments that are not connected to the mainﬂow pathways. This is because the information of disconnected segments is not discoverablefrom head distributions. In other words, the head measurements are not sensitive to theparameter information in disconnected channel segments. From a practical standpoint, theconnected features are more important to identify for risk assessment purposes. Finally,Figures 4(i) and (j) show the SSIM statistics on running 1000 pairs of randomly generatedtest samples. Again SPID-GAN gives a reasonable performance in this bimodal distributioncase, with an ensemble mean SSIM value of 0.8 for forward mapping ( K to h ) and 0.71 forreverse mapping ( h to K ). The recently introduced generative adversarial networks (GANs) have shown strongperformance in image-to-image translation. Questions remain about whether GANs can beused to learn the deep causal structures embedded in physical models. In this study, I showthat the usefulness of GANs goes beyond the simple picture mapping and demonstrate the –12– easibility of using a state-parameter identiﬁcation GAN (SPID-GAN) to approximate bidi-rectional mappings between the parameter space and state space of PDEs. Through a seriesof representative problems selected from groundwater modeling, I show that SPID-GANdoes a satisfactory job (with a reasonable amount of computing time) in linking physicalparameters to model states, and is able to capture the complex spatial patterns in parameterand state distributions that are otherwise challenging to obtain. The need for identifyingforward and reverse mappings is ubiquitous in geophysical research. Thus, ﬁndings of thisstudy may have important implications for many similar tasks, such as geostatistical simu-lation, history matching, uncertainty quantiﬁcation, surrogate modeling, optimal design ofexperiments, or any problem that requires sampling from a high-dimensional distributionin a computationally eﬃcient manner.

Acknowledgement

The work was partly supported by the U.S. Department of Energy, National EnergyTechnology Laboratory (NETL) under grant numbers de-fe0026515 and de-fe0031544. Theauthor is grateful to the handling editor, Prof. Valeriy Ivanov, and two anonymous reviewersfor their constructive comments. All the data used are listed in the references or archivedin repository, https://utexas.box.com/s/uq74paot1a2ns3v9hbll01ifev408uus . References

Bakker, M., V. Post, C. D. Langevin, J. D. Hughes, J. White, J. Starn, and M. N. Fienen(2016), Scripting modﬂow model development using python and ﬂopy,

Groundwater , (5), 733–739.Castelluccio, M., G. Poggi, C. Sansone, and L. Verdoliva (2015), Land use classiﬁcation in re-mote sensing images by convolutional neural networks, arXiv preprint arXiv:1508.00092 .Chapelle, O., B. Schölkof, and A. Zien (2006), Semi-Supervised Learning , MIT Press.Chen, Y., and D. Zhang (2006), Data assimilation for transient ﬂow in geologic formationsvia ensemble kalman ﬁlter,

Advances in Water Resources , (8), 1107–1122.Chollet, F., et al. (2015), Keras, https://keras.io.Deutsch, C. V., A. G. Journel, et al. (1998), Geostatistical software library and user’s guide, Oxford University Press, New York .Fang, K., C. Shen, D. Kifer, and X. Yang (2017), Prolongation of smap to spatiotemporallyseamless coverage of continental us using a deep learning neural network,

Geophysical –13– esearch Letters , (21).Franssen, H. H., A. Alcolea, M. Riva, M. Bakr, N. Van der Wiel, F. Stauﬀer, andA. Guadagnini (2009), A comparison of seven methods for the inverse modelling of ground-water ﬂow. application to the characterisation of well catchments, Advances in WaterResources , (6), 851–872.Goodfellow, I. (2016), Nips 2016 tutorial: Generative adversarial networks, arXiv preprintarXiv:1701.00160 .Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,and Y. Bengio (2014), Generative adversarial nets, in Advances in Neural InformationProcessing Systems , pp. 2672–2680.Jeong, H., A. Y. Sun, J. Lee, and B. Min (2018), A learning-based data-driven forecastapproach for predicting future reservoir performance,

Advances in Water Resources , ,95–109.Kim, T., M. Cha, H. Kim, J. Lee, and J. Kim (2017), Learning to discover cross-domainrelations with generative adversarial networks, arXiv preprint arXiv:1703.05192 .Kingma, D. P., and J. Ba (2014), Adam: A method for stochastic optimization, arXivpreprint arXiv:1412.6980 .LeCun, Y., Y. Bengio, and G. Hinton (2015), Deep learning, Nature , (7553), 436.Ledig, C., L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Te-jani, J. Totz, Z. Wang, et al. (2017), Photo-realistic single image super-resolution using agenerative adversarial network, arXiv preprint .Lee, J., and P. K. Kitanidis (2014), Large-scale hydraulic tomography and joint inversionof head and tracer data using the principal component geostatistical approach (pcga), Water Resources Research , (7), 5410–5427.Li, H., and D. Zhang (2007), Probabilistic collocation method for ﬂow in porous media:Comparisons with other stochastic methods, Water Resources Research , (9).Li, T., H. Shen, Q. Yuan, X. Zhang, and L. Zhang (2017), Estimating ground-level pm2.5 by fusing satellite and station observations: A geo-intelligent deep learning approach, Geophysical Research Letters , (23).Liu, M.-Y., and O. Tuzel (2016), Coupled generative adversarial networks, in Advances inNeural Information Processing Systems , pp. 469–477.Liu, N., and D. S. Oliver (2005), Ensemble kalman ﬁlter for automatic history matching ofgeologic facies,

Journal of Petroleum Science and Engineering , (3-4), 147–161. –14– iu, Y., E. Racah, J. Correa, A. Khosrowshahi, D. Lavers, K. Kunkel, M. Wehner,W. Collins, et al. (2016), Application of deep convolutional neural networks for detectingextreme weather in climate datasets, arXiv preprint arXiv:1605.01156 .Mirza, M., and S. Osindero (2014), Conditional generative adversarial nets, arXiv preprintarXiv:1411.1784 .Nowak, W., F. De Barros, and Y. Rubin (2010), Bayesian geostatistical design: Task-drivenoptimal site investigation when the geostatistical model is uncertain, Water ResourcesResearch , (3).Radford, A., L. Metz, and S. Chintala (2015), Unsupervised representation learning withdeep convolutional generative adversarial networks, arXiv preprint arXiv:1511.06434 .Remy, N., A. Boucher, and J. Wu (2009), Applied Geostatistics with SGeMS: A User’sGuide , Cambridge University Press.Satija, A., and J. Caers (2015), Direct forecasting of subsurface ﬂow response from non-linear dynamic data by linear least-squares in canonical functional principal componentspace,

Advances in Water Resources , , 69–81.Shi, X., Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo (2015), Convolutionallstm network: A machine learning approach for precipitation nowcasting, in Advances inNeural Information Processing Systems , pp. 802–810.Sun, A. Y., A. Morris, and S. Mohanty (2009a), Comparison of deterministic ensemblekalman ﬁlters for assimilating hydrogeological data,

Advances in Water Resources , (2),280–292.Sun, A. Y., A. P. Morris, and S. Mohanty (2009b), Sequential updating of multimodal hy-drogeologic parameter ﬁelds using localization and clustering techniques, Water ResourcesResearch , (7).Sun, N.-Z., and A. Sun (2015), Model Calibration and Parameter Estimation: For Environ-mental and Water Resource Systems , Springer.Sun, W., M.-H. Hui, and L. J. Durlofsky (2017), Production forecasting and uncertaintyquantiﬁcation for naturally fractured reservoirs using a new data-space inversion proce-dure,

Computational Geosciences , (5-6), 1443–1458.Ulyanov, D., A. Vedaldi, and V. Lempitsky (2016), Instance normalization: The missingingredient for fast stylization. arxiv 2016, arXiv preprint arXiv:1607.08022 .Wang, Z., A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004), Image quality assessment:from error visibility to structural similarity, IEEE Transactions on Image Processing , –15– (4), 600–612.Yi, Z., H. Zhang, P. Tan, and M. Gong (2017), Dualgan: Unsupervised dual learning forimage-to-image translation, arXiv preprint arXiv:1704.02510 .Zhou, H., J. J. Gomez-Hernandez, H.-J. H. Franssen, and L. Li (2011), An approach tohandling non-gaussianity of parameters and state variables in ensemble kalman ﬁltering, Advances in Water Resources , (7), 844–864.Zhu, J.-Y., T. Park, P. Isola, and A. A. Efros (2017), Unpaired image-to-image translationusing cycle-consistent adversarial networks, arXiv preprint arXiv:1703.10593 . –16– igure Captions

1. (a) SPID-GAN consists of two pairs of generators and discriminators trained to-gether, one pair for identifying the forward parameter-to-state mapping ( G P S , D S ) (ﬁrstrow), and the other for identifying the reverse state-to-parameter mapping ( G SP , D P ) (sec-ond row); cycle consistency is enforced by minimizing the reconstruction loss (third column);observations of parameter and state variables can be fused through pre- and post-processingsteps (e.g., kriging and geostatistical simulation); (b) the design of generator model followsa downsampling-upsampling pattern using convolutional and deconvolutional layers to learnfeatures at multiple scales; (c) the discriminator uses repeating convolutional layers to im-prove learning of ﬁne-scale features.2. SPID-GAN results for two random realizations (a-d and e-h): (a) and (e) are originallog K realizations generated using sgsim , (c) and (d) are original head images simulatedusing MODFLOW; head ﬁelds (b) and (f) are generated using the trained forward generator G P S , while parameter ﬁelds (d) and (h) are generated by the trained reverse generator G SP .All contours are normalized for visualization purposes. Subplots (i) and (j) show histogramsof structural similarity indices (SSIM) calculated on 1000 test samples not used duringtraining. The mean SSIM of G P S is 0.98 and the mean SSIM of G SP is 0.78.3. Illustration of the use of prior information in SPID-GAN: (a) and (b) show the“true” log K and head ﬁelds, which are sampled only at monitoring locations (open circles);(c), (e), (g) show the top three log K realizations identiﬁed as the “closest” to the true headﬁeld, as measured using MSE between simulated and observed head values; (d), (f), (h)show the corresponding head ﬁelds, which resemble the true head ﬁeld.4. Illustration of the use of SPID-GAN to identify bidirectional mappings for a bimodalparameter distribution: subplot (a)-(d) show results from a test realization; subplot (e)-(h)show results from another test realization. The SSIM histograms obtained using 1000 testrealizations are shown in subplots (i)-(j). The mean SSIM of the forward generator is 0.8,and the mean SSIM of the reverse generator is 0.71. –17– igure 1. (a) SPID-GAN consists of two pairs of generators and discriminators trained together,one pair for identifying the forward parameter-to-state mapping ( G PS , D S ) (ﬁrst row), and the otherfor identifying the reverse state-to-parameter mapping ( G SP , D P ) (second row); cycle consistencyis enforced by minimizing the reconstruction loss (third column); observations of parameter andstate variables can be fused through pre- and post-processing steps (e.g., kriging and geostatisticalsimulation); (b) the design of generator model follows a downsampling-upsampling pattern usingconvolutional and deconvolutional layers to learn features at multiple scales; (c) the discriminatoruses repeating convolutional layers to improve learning of ﬁne-scale features.–18– igure 2. SPID-GAN results for two random realizations (a-d and e-h): (a) and (e) are originallog K realizations generated using sgsim , (c) and (d) are original head images simulated usingMODFLOW; head ﬁelds (b) and (f) are generated using the trained forward generator G PS , whileparameter ﬁelds (d) and (h) are generated by the trained reverse generator G SP . All contours arenormalized for visualization purposes. Subplots (i) and (j) show histograms of structural similarityindices (SSIM) calculated on 1000 test samples not used during training. The mean SSIM of G PS is 0.98 and the mean SSIM of G SP is 0.78. –19– igure 3. Illustration of the use of prior information in SPID-GAN: (a) and (b) show the “true”log K and head ﬁelds, which are sampled only at monitoring locations (open circles); (c), (e), (g)show the top three log K realizations identiﬁed as the “closest” to the true head ﬁeld, as measuredusing MSE between simulated and observed head values; (d), (f), (h) show the corresponding headﬁelds, which resemble the true head ﬁeld. –20– igure 4.igure 4.