[PDF] Space-Time Nonlinear Upscaling Framework Using Non-local Multi-continuum Approach

Abstract

In this paper, we develop a space-time upscaling framework that can be used for many challenging porous media applications without scale separation and high contrast. Our main focus is on nonlinear differential equations with multiscale coefficients. The framework is built on nonlinear nonlocal multi-continuum upscaling concept and significantly extends the results in the proceeding paper. Our approach starts with a coarse space-time partition and identifies test functions for each partition, which plays a role of multi-continua. The test functions are defined via optimization and play a crucial role in nonlinear upscaling. In the second stage, we solve nonlinear local problems in oversampled regions with some constraints defined via test functions. These local solutions define a nonlinear map from macroscopic variables determined with the help of test functions to the fine-grid fields. This map can be thought as a downscaled map from macroscopic variables to the fine-grid solution. In the final stage, we seek macroscopic variables in the entire domain such that the downscaled field solves the global problem in a weak sense defined using the test functions. We present an analysis of our approach for an example nonlinear problem. Our unified framework plays an important role in designing various upscaled methods. Because local problems are directly related to the fine-grid problems, it simplifies the process of finding local solutions with appropriate constraints. Using machine learning (ML), we identify the complex map from macroscopic variables to fine-grid solution. We present numerical results for several porous media applications, including two-phase flow and transport.

Full PDF

SSpace-Time Nonlinear Upscaling Framework Using Non-localMulti-continuum Approach

Wing T. Leung ∗ Eric T. Chung † Yalchin Efendiev ‡ Maria Vasilyeva § Mary Wheeler ¶ September 4, 2019

Abstract

In this paper, we develop a space-time upscaling framework that can be used for many challengingporous media applications without scale separation and high contrast. Our main focus is on nonlineardiﬀerential equations with multiscale coeﬃcients. The framework is built on nonlinear nonlocal multi-continuum upscaling concept [16] and signiﬁcantly extends the results in the proceeding paper [17].Our approach starts with a coarse space-time partition and identiﬁes test functions for each partition,which play a role of multi-continua. The test functions are deﬁned via optimization and play a crucial rolein nonlinear upscaling. In the second stage, we solve nonlinear local problems in oversampled regionswith some constraints deﬁned via test functions. These local solutions deﬁne a nonlinear map frommacroscopic variables determined with the help of test functions to the ﬁne-grid ﬁelds. This map can bethought as a downscaled map from macroscopic variables to the ﬁne-grid solution. In the ﬁnal stage, weseek macroscopic variables in the entire domain such that the downscaled ﬁeld solves the global problemin a weak sense deﬁned using the test functions. We present an analysis of our approach for an examplenonlinear problem.Our uniﬁed framework plays an important role in designing various upscaled methods. Becauselocal problems are directly related to the ﬁne-grid problems, it simpliﬁes the process of ﬁnding localsolutions with appropriate constraints [16]. Using machine learning (ML), we identify the complex mapfrom macroscopic variables to ﬁne-grid solution. We present numerical results for several porous mediaapplications, including two-phase ﬂow and transport.

Many porous media models are nonlinear and deriving these nonlinear macroscopic equations rely on someassumptions. For example, the well-known two-phase ﬂow and transport model assumes that the relativepermeabilities are functions of local saturations [6]. Similarly, for unsaturated ﬂows, the nonlinear relationsbetween pressures and capillary curves use local relations. All these problems have space-time heterogeneities.Some rigorous upscaling tools are needed to generalize these models and understand the errors associated inthese macroscopic models. This is one of our goals in this paper.Many approaches are suggested for nonlinear upscaling in the past, e.g., [2, 26, 3, 25, 13, 7, 9, 22, 1,20, 31, 32, 42, 40, 4, 37, 12, 15, 10, 44, 5, 11, 36, 41, 46]. For multi-phase ﬂows, these techniques includepermeability or transmissibility upscaling [21, 45, 8, 38] for single-phase ﬂow and pseudo-relative permeabilityapproach [8, 39, 6]. The pseudo-relative permeability approach computes nonlinear relative permeabilityfunctions. These nonlinear approaches are known to lack robustness and are process dependent [23, 24]. To ∗ ICES, University of Texas, Austin, TX, USA ( [email protected] ) † Department of Mathematics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China( [email protected] ) ‡ Department of Mathematics & Institute for Scientiﬁc Computation (ISC), Texas A&M University, College Station, Texas,USA ( [email protected] ) § Institute for Scientiﬁc Computation (ISC), Texas A&M University, College Station, Texas, USA ( [email protected] ) ¶ ICES, University of Texas, Austin, TX, USA ( [email protected] ) a r X i v : . [ m a t h . NA ] S e p igure 1: Schematic description of the method.overcome these diﬃculties, one needs a better understanding of nonlinear upscaling methods for space-timeheterogeneous problems. Nonlinear upscaling methods for scale separation cases are rigorously treated in[43, 27]. Among these approaches, some deal with problems that have both space and time heterogeneities.Our proposed approaches take their origin in the Constraint Energy Minimizing Generalized MultiscaleFinite Element Method (GMsFEM) and Nonlocal Multi-Continua upscaling, which are related. The mainidea of these approaches is to use multiple macroscopic parameters to represent the solution over eachcoarse-grid block. We refer to these degrees of freedom as continua, which are important for achievinga high order accuracy. We note that generalized continua concepts are also introduced in computationalmechanics [28], which include generalized continuum theories (e.g., [28]), computational continua framework(e.g., [35]), and other approaches. Computational continua ([35, 29]), which use nonlocal quadrature tocouple the coarse scale system stated on unions of some disjoint computational unit cells, are introduced fornon-scale-separation heterogeneous media. In [34, 33, 30], the computational continua with model reductiontechnique is combined.An important step that connects multiscale methods and upscaling techniques includes using basis func-tions such that the resulting degrees of freedom have physical meanings, typically averages of the solution.For nonlinear problems, using linear basis functions is not very suitable. The local problems are nonlin-ear problems. For this reason, in our ﬁrst work [17], we provided a framework for NLMC for stationaryproblems. In this paper, we provide a uniﬁed framework for nonlinear NLMC for problems with space-timeheterogeneities, analysis, and machine learning based simpliﬁed local solves.In Figure 1, we illustrate the main steps of our approach. Below, we brieﬂy describe them. In the ﬁrst step,we identify continua in each coarse block. This is done with the help of test functions, which can separatethe features that can not be localized within the region of inﬂuence (oversampling region designated withgreen color in Figure 1). For nonlinear problems, each continua is deﬁned by a corresponding test function.Continua play the role of macroscale variables. In our examples, macroscale variables are average solutionvalues in some selected heterogeneous regions (such as channels).In Step 2, once we identify the continua, we use oversampling regions to deﬁne downscaling maps. Theoversampling region represents the region of inﬂuence and thus, the macroscopic parameter interactionsare deﬁned within oversampling regions. The local nonlinear problems are formulated in the oversampledregions using constraints. However, these computations are expensive and require appropriate local problems.Instead, we propose to use local space-time models of the original PDEs and perform many tests withvarious boundary conditions and sources. These local solutions are used to train macroscopic parameters asa function of multiple macroscale continua variables. For machine learning, we use deep learning algorithms,which allow approximating complex multi-continua dependent functions.In Step 3, we seek a coarse-grid solution (the values in each continua) such that the downscaled global ﬁne-2 i K i,1 Figure 2: Schematic of the coarse grid K i , the oversampling region K i, and the ﬁne grids.scale solution satisﬁes the variational formulation that uses the test functions deﬁned in Step 1. An exampleof test functions that we use is piecewise constant functions in each subregions (deﬁned as channels). Then,the macroscale variables are average solutions deﬁned in these subregions. The corresponding downscaledmaps represent the local ﬁne-grid solutions given these constraints. The global coarse-grid formulation canbe thought as a mass balance equation formulated for each continua.The main contributions of this paper are the following: • Novel upscaled model for space-time; • Uniﬁed framework using test functions; • Easy local problems and machine learning calculations; • Numerical results that uses machine learning and nonlinear upscaled models.In the paper, we present an analysis of our approach for a model problem, which consists of heterogeneousp-Laplacian ( p = 2). This model problem requires nonlinear upscaling and some oversampling in order toshow an optimal convergence of our proposed approach.In conclusion, the paper is organized as follows. In Section 2, we give some preliminary results of thenonlocal multicontinua approach. In Section 3, we present our approach, which uses the space-time nonlocalmulticontinua approach. In this section, we present examples and convergence results. The numerical resultsare presented in Section 4. In this section, we will give a brief overview of the NLMC method for linear problems [16]. Our goal is tosummarize the key ideas and motivate our new space-time nonlinear NLMC method. We consider a modelelliptic equation with a heterogeneous coeﬃcient − ∇ · ( κ ∇ u ) = f, in Ω . (1)Here κ is the heterogeneous ﬁeld, f is a given source and Ω is the physical domain.The NLMC method is deﬁned on a coarse mesh, T H , of the domain Ω. We write T H = (cid:83) { K i | i =1 , · · · , N } , where K i denotes the i -th coarse element and N denotes the number of coarse elements in T H .For each coarse element K i , we deﬁne an oversampled region K + i , which is obtained by enlarging the coarseblock K i by a few coarse grid layers. We will also denote K + i = K i,l when the oversampling region is obtainedby enlarging K i by l coarse grid layers. See Figure 2 for an illustration of coarse grid and oversample region.In particular, a structured coarse grid is shown with boundaries of coarse elements are denoted red. A coarsecell K i is denoted green and its oversampled region K + i obtained by enlarging K i by one coarse grid layer isenclosed by black lines.The NLMC method consists of three main ingredients:3. Choice of continua.2. Local basis functions.3. Global coupling.For each coarse element K i , we will identify multiple continua corresponding to various solution features.This can be done via a local spectral problem or a suitable weight function. Using the deﬁnition of continua,we will deﬁne a set of local basis functions by solving some local problems on oversample regions. Then, theﬁnal NLMC system is deﬁned using these multiscale basis functions and a suitable variational formulation.In the following, we will discuss these concepts in detail.Now we will specify the deﬁnition of continuum that is used in our studies. For each coarse block K i , wewill identify a set of continua which are represented by a set of auxiliary basis functions φ ji , where j denotesthe j -th continuum. There are multiple ways to construct these functions φ ji .One way is to follow the idea proposed in CEM-GMsFEM [18]. In this framework, the auxiliary basisfunctions φ ji are obtained as the dominant eigenfunctions of a local spectral problem deﬁned on K i . Theseeigenfunctions can capture the heterogeneities and the contrast of the medium. We can also follow theframework in the original NLMC method [16], designed for ﬂows in fractured media, which can be easilymodiﬁed for general heterogeneous media. In this approach, one identiﬁes explicit information of fracturenetworks. The auxiliary basis functions φ ji are piecewise constant functions, namely, they equal one withinone fracture network and zero otherwise. Moreover, one can deﬁne the continua by using properties of theheterogeneous media. In this case, the auxiliary basis functions are piecewise constant functions deﬁned withrespect to a partition of the coarse cell K i , such as the medium coeﬃcients have a bounded contrast in eachsubregion [47].Once the auxiliary basis functions φ ji are speciﬁed, we can construct the required basis functions. Theidea generalizes the original energy minimization framework in CEM-GMsFEM. First, we denote the spaceof auxiliary basis functions as V aux . Consider a given coarse element K i and a given continuum j within K i .We will use the corresponding auxiliary basis function φ ji to construct our required multiscale basis function ψ ji by solving a problem in an oversampled region K + i . Speciﬁcally, we ﬁnd ψ ji ∈ H ( K + i ) and µ ∈ V aux such that (cid:90) K + i κ ∇ ψ ji · ∇ v + (cid:90) K + i ˜ κµv = 0 , ∀ v ∈ H ( K + i ) , (cid:90) K (cid:96) ˜ κψ ji φ (cid:96)m = δ j(cid:96) δ im , ∀ K (cid:96) ⊂ K + i , (2)where δ im denotes the standard delta function and ˜ κ is a weight function. We remark the function µ servesas a Lagrange multiplier for the constraints in the second equation of (2). We also remark that the basisfunction ψ ji has mean value one on the j -th continuum within K i and has mean value zero in all othercontinua in all coarse elements within K + i . In practice, the above system (2) is solved in K + i using a ﬁnemesh, which is typically a reﬁnement of the coarse grid. See Figure 2 for an illustration.Finally, we can derive the NLMC system. Let V ms be the space spanned by the basis functions { ψ ji } .We will represent the approximate solution u ms ∈ V ms as a linear combination of basis functions, namely, u ms = N (cid:88) i =1 (cid:88) j U ji ψ ji . Then, we will ﬁnd u ms by the following variational formulation a ( u ms , ψ ) = ( g, ψ ) , ∀ ψ ∈ V ms . This variational formulation results in the following upscaled model for the solution U = ( U ji ): A T U = F where the upscaled stiﬀness matrix A T is deﬁned as( A T ) ( i,(cid:96) ) jm = a ( ψ ij , ψ (cid:96)m ) := (cid:90) Ω κ ∇ ψ ij · ∇ ψ (cid:96)m , (3)4nd the upscaled source term F is deﬁned as ( F ) ( j ) i = ( g, ψ ji ) . We remark that the nonlocal connections of the continua are coupled by the matrix A T . We also remarkthat the local computation in (2) results from a spatial decay property of the multiscale basis function, see[18, 19, 14] for the theoretical foundation.The above NLMC idea can be extended to nonlinear elliptic problems, resulting in a nonlinear NLMCmethod (14)-(15). See Section 3.4 for the derivation and the convergence analysis. In this section, we present the nonlinear non-local multicontinua (NLMC) method. We will ﬁrst give somegeneral concept of the methodology in Section 3.1. Then, in Section 3.2, we give some illustrative examplesincluding linear problems and pseudomonotone problems. The main methodological details of the methodare presented in Section 3.3. Finally, we present a convergence analysis of the method for a model ellipticproblem in Section 3.4.

We will ﬁrst present some general concepts of our nonlinear NLMC using the following model nonlinearproblem

M U t + ∇ · G ( x, t, U ) = g, (4)where G is a nonlinear operator that has a multiscale dependence with respect to space (and time, in general)and M is a linear operator. In the above equation, U is the solution and g is a given source term. Ourmethod has three key ingredients, namely, the choice of continua, the construction of local downscaling mapand the construction of the coarse scale model. We will summarize these concepts in the following. • The choice of continua

The continua serve as our macroscopic variables in each coarse element. Our approach uses a set oftest functions to deﬁne the continua. To be more speciﬁc, we consider a coarse element K i . We willchoose a set of test functions { ψ ( j ) i ( x, t ) } to deﬁne our continua, where j denotes the j -th continuum.Using these test functions, we can deﬁne our macroscopic variables as U ( j ) i = (cid:104)(cid:104) U, ψ ( j ) i (cid:105)(cid:105) where (cid:104)(cid:104)· , ·(cid:105)(cid:105) is a space-time inner product. • The construction of local downscaling map

Our upscale model uses a local downscaling map to bring microscopic information to the coarse gridmodel. The proposed downscaling map is a function deﬁned on an oversampling region subject to someconstraints related to the macroscopic variables. In time-dependent problems, the oversampling regioncan be regarded as a zone of inﬂuence for coarse-grid variables deﬁned on the target coarse block K i .More precisely, we consider a coarse element K i , and an oversampling region K + i such that K i ⊂ K + i .Then we ﬁnd a function φ by solving the following local problem M φ t + ∇ · G ( x, t, φ ) = µ, in K + i . (5)The above equation (5) is solved subjected to constraints deﬁned by the following functionals I φ ( ψ ( j ) i ( x, t )) . This constraint ﬁxes some averages of φ with respect to ψ ( j ) i ( x, t ). We remark that the function µ serves as the Lagrange multiplier for the above constraints. This local solution builds a downscalingmap F msi : I φ ( ψ ( j ) i ( x, t )) → φ. The construction of coarse scale model

We will construct the coarse scale model using the test functions { ψ ( j ) i ( x, t ) } and the local downscalingmap. Our upscaling solution U ms is deﬁned as a combination of the local downscaling maps. Tocompute U ms , we use the following variational formulation (cid:104)(cid:104) M U mst + ∇ · G ( x, t, U ms ) , ψ ( j ) i (cid:105)(cid:105) = (cid:104)(cid:104) g, ψ ( j ) i (cid:105)(cid:105) . (6)The above equation (6) is our coarse scale model.We would like to brieﬂy summarize above steps. The ﬁrst step deﬁnes multicontinua, which play therole of macroscopic variables. They are critical in multiscale modeling and need to be deﬁned apriori. Thesecond step constructs downscaling maps and can be computationally intensive. We will propose a machinelearning technique in combination with solving local problems of the original equation subject to variousboundary conditions. From here, the macroscale ﬂuxes will be deﬁned as a function of macroscopic variablesin oversampled regions. This high dimensional functions will be learned using machine learning techniquesduring coarse-grid solution step (Step 3). Next, we will give some examples (Section 3.2) and then presenta more detailed description of the algorithm (Section 3.3). We will present two model problems, and discuss how our nonlinear NLMC is applied.

In this section, we will construct our upscaling model for a case that G is a linear operator. We will followthe general concepts in Section 3.1. First, we discuss the choice of continua. For each coarse element K i , weconsider a set of test functions { ψ ( j ) i ( x, t ) } deﬁned for x ∈ K i . Here the index j denotes the j -th continuum.One choice of these test functions is a set of piecewise constant functions. Another choice of these testfunctions is the ﬁrst j dominant eigenfunctions of an appropriate spectral problem.Next, we discuss the construction of the local downscaling map. We ﬁx a continuum ψ ( j ) i ( x, t ) in thecoarse region K i . Let K + i be an oversampling region. With the assumption that G is linear, we can representthe downscaling map, denoted by φ ( j ) i , as a linear combination of some generic local solutions { φ ( j,l ) i,m } . Toﬁnd these functions { φ ( j,l ) i,m } , we solve the following M ( φ ( j,l ) i,m ) t + ∇ · G ( x, t, φ ( j,l ) i,m ) = µ ( j,l ) i,m (cid:104)(cid:104) φ ( j,l ) i,m , ψ ( r ) s (cid:105)(cid:105) = δ lr δ ms (7)on the oversample region K + i , where (cid:104)(cid:104)· , ·(cid:105)(cid:105) is an inner product and µ ( j,l ) i,m plays the role of Lagrange multiplier.Using these functions { φ ( j,l ) i,m } , we can represent the local downscaling map ψ ( j ) i ( x, t ) as ψ ( j ) i ( x, t ) = (cid:88) m,l U ( l ) m φ ( j,l ) i,m . Since G is linear, we have G ( x, t, (cid:88) m,l U ( l ) m φ ( j,l ) i,m ) = (cid:88) m,l U ( l ) m G ( x, t, φ ( j,l ) i,m ) . Let { χ i } be a set of partition of unity functions corresponding to the partition { K + i } of the domain Ω. Theﬁnal upscale solution is then deﬁned as the combination φ := (cid:80) i (cid:80) j χ i φ ( j ) i . Using the test functions ψ ( j ) i ,we can compute the macroscopic value { U ( l ) m } by the following variational formulation (cid:104)(cid:104) M φ t + ∇ · G ( x, t, φ ) , ψ ( j ) i (cid:105)(cid:105) = (cid:104)(cid:104) g, ψ ( j ) i (cid:105)(cid:105) , ∀ ψ ( j ) i . (8)6 .2.2 Pseudomonotone case Next, we consider another example for which G is a pseudo-monotone operator. In this case, to computethe downscaling map, F ms , we will need to solve the following local problem: ﬁnd F ms ( U ) and µ such that M ( F ms ( U )) t + ∇ · G ( x, t, F ms ( U )) = µ (cid:104)(cid:104)F ms ( U ) , ψ ( j ) i (cid:105)(cid:105) = U ( j ) i (9)The coarse grid system is then deﬁned as (cid:88) l,m (cid:104)(cid:104) M ( F ms ( U )) t + ∇ · G ( x, t, F ms ( U )) , ψ ( j ) i (cid:105)(cid:105) = (cid:104)(cid:104) g, ψ ( j ) i ( x, t ) (cid:105)(cid:105) ∀ ψ ( j ) i . (10) In this section, we give the details of our nonlinear NLMC framework. We consider the following modelproblem of ﬁnding u ∈ V such that ∂ t u + L ( u ) = f, in Ω × (0 , T ]with u ( · ,

0) = 0, where L is a nonlinear diﬀerential operator, T > V is a suitable functionspace. We use a diﬀerent notation for nonlinear diﬀerential operator as in (4) to simplify the notations, andour methodology remains applicable to the problem described by (4).Next, we discuss the mesh. We assume that Ω is partitioned by a coarse mesh T H (see Figure 2) withmesh size H > , T ] is partitioned into coarse time intervals denoted as T T = { ( t i , t i +1 ] } . A space-time element K ( n,i ) is then deﬁned by K i × ( t n , t n +1 ] for a coarse cell K i ∈ T H and the n -th time interval( t n , t n +1 ]. The construction of our nonlinear NLMC method follows the three steps explained in Section 3.1. Approximation by global basis

The discussion of our method starts with the use of global basis functions. In this case, the basis functionsare global in space and in time. The motivation of this follows from the global basis of CEM-GMsFEM [18],for which coarse grid convergence is obtained. • Choice of continua

The continua is deﬁned using a set of test functions. Consider a space-time element K ( n,i ) , we willintroduce a set of test functions V aux = { ψ ( n,i ) j } which corresponding to diﬀerent continua of theproblem. We notice that ψ ( n,i ) j is supported in K ( n,i ) . We let N c be the number of such test functions.Then we will deﬁne macroscopic variables by U ( n,i ) j = s ( u, ψ ( n,i ) j ) = (cid:90) T (cid:90) Ω ˜ κψ ( n,j ) j u where s ( · , · ) is a weighted L inner product with weighting function ˜ κ such that c H − ≤ ˜ κ ≤ c H − .Note that this condition for the weighting function is motivated by the weighting function used inCEM-GMsFEM. • Global downscaling map

We will deﬁne a downscaling map. This downscaling map will give a function deﬁned globally in spaceand in time with constraints deﬁned using a given set of macroscopic values. More precisely, we ﬁx aset of macroscopic values { U ( n,i ) j } . We will then deﬁne a function F = ( F , F ) such that F ∈ V and F ∈ V aux . These functions are obtained by solving (cid:90) T (cid:90) Ω ( ∂ t + L ) F ( U ) v − s ( F ( U ) , v ) = 0 , ∀ v ∈ V,s ( F ( U ) , ψ ( n,i ) j ) = U ( n,i ) j , ∀ ψ ( n,i ) j ∈ V aux . We notice that the global function F has macroscopic values equal to the given values { U ( n,i ) j } andthe function F serves as the Lagrange multiplier for these constraints.7 Coarse grid model

Next, using the downscaling map, we can deﬁne the global coarse grid problem as: ﬁnding U ∈ R N c such that s ( F ( U ) , ψ ( n,i ) j ) = (cid:90) T (cid:90) Ω f ψ ( n,i ) j , ∀ ψ ( n,i ) j ∈ V aux . Then, the global numerical solution u glo is deﬁned by u glo = F ( U ). Nonlinear NLMC method

Now we will present the nonlinear NLMC method. The key ingredient is that we will replace the globaldownscaling map above by a local downscaling map. • Local downscaling map

We will introduce the localized downscaling operator F ms = ( F ms, , F ms, ). Consider a space-timeelement K ( n,i ) . We deﬁne a space-time oversampling region K ( n,i )+ = K + i × ( t − n , t n +1 ] where t − n < t n .We will then deﬁne a function F ( n,i ) loc = ( F ( n,i ) loc, , F ( n,i ) loc, ) such that F ( n,i ) loc, ∈ V ( K ( n,i )+ ) and F ( n,i ) loc, ∈ V aux ( K ( n,i )+ ), where V ( K ( n,i )+ ) and V aux ( K ( n,i )+ ) are restrictions of V and V aux on K ( n,i )+ respectively.These functions are obtained by solving (cid:90) t n +1 t − n (cid:90) K i + ( ∂ t + L ) F ( n,i ) loc, ( U ) v − s ( F ( n,i ) loc, ( U ) , v ) = 0 , ∀ v ∈ V ( K ( n,i )+ ) ,s ( F ( n,i ) loc, ( U ) , ψ ( n,i ) j ) = U ( n,i ) j , ∀ ψ ( n,i ) j ∈ V aux ( K ( n,i )+ ) . Finally the localized downscale operator is deﬁned by F ms,p ( U ) = (cid:80) n,l χ ( n,i ) F ( n,i ) loc,p ( U ) where p = 1 , χ ( n,i ) is a partition of unity such that (cid:80) n,i χ ( n,i ) ≡ . • Coarse grid model

The coarse grid problem is then deﬁned as: ﬁnding U ∈ R N c such that s ( F ms, ( U ) , ψ ( n,i ) j ) = (cid:90) T (cid:90) Ω f ψ ( n,i ) j , ∀ ψ ( n,i ) j ∈ V aux and the nonlinear NLMC solution u ms is deﬁned by u ms = F ms, ( U ). In this section, we present a concept of the analysis for the method. We will use a simple monotone ellipticequation to illustrate the main ideas. We consider the following problem: ﬁnd u such that ∇ · ( κ ( x, ∇ u )) = f, in Ω ,u = 0 , on ∂ Ω , (11)where κ ( x, v ) is a heterogeneous function. The weak formulation of the above equation can be written as:ﬁnd u ∈ V = H (Ω) such that A Ω ( u, w ) = (cid:90) Ω f w, ∀ w ∈ H (Ω) , where, for any open subset ω ⊂ Ω of the domain, the operator A ω is deﬁned by A ω ( u, w ) = (cid:90) ω κ ( x, ∇ u ) · ∇ w. We will assume that the heterogeneous function κ ( x, v ) satisﬁes the following two properties. Assumption on κ ( x, v ) 8. If the vector ﬁeld v = 0, then κ ( x, v ) = 0.2. Lipschitz continuity with respect to v :We assume there exist a function κ ∈ L ∞ (Ω) such that | κ ( x, z ) − κ ( x, v ) | ≤ C κ ( x ) | z − v | . (12)3. Monotonicity:We assume that the following coercivity condition holds κ ( x, v ) · v ≥ C κ ( x ) | v | . (13)Next, for any open subset ω ⊂ Ω of the domain, we deﬁne two inner products a ω ( · , · ) and s ω ( · , · ) asfollows a ω ( u, w ) = (cid:90) ω κ ∇ u · ∇ w and s ω ( u, w ) = (cid:90) ω ˜ κuw where ˜ κ ( x ) = κ (cid:80) i |∇ χ i | and { χ i } Ni =1 is a set of partition of unity functions corresponding to the coarsemesh such that 0 ≤ χ i ≤

1. The norms (cid:107) · (cid:107) a ( ω ) and (cid:107) · (cid:107) s ( ω ) corresponding to these inner products aredeﬁned as (cid:107) u (cid:107) a ( ω ) = a ω ( u, u ) and (cid:107) u (cid:107) s ( ω ) = s ω ( u, u )respectively. To simplify the notation, we use A , (cid:107) · (cid:107) a and (cid:107) · (cid:107) s to denote A Ω , (cid:107) · (cid:107) a (Ω) and (cid:107) · (cid:107) s (Ω) respectively.In the following Lemma, we will show that the operator A ω satisﬁes some coercivity and continuityproperties. Lemma 1.

For ω ⊂ Ω , u, v, w ∈ H ( ω ) , we have A ω ( u, u ) ≥ C (cid:107) u (cid:107) a ( ω ) and (cid:12)(cid:12)(cid:12) A ω ( u, w ) − A ω ( v, w ) (cid:12)(cid:12)(cid:12) ≤ C (cid:107) u − v (cid:107) a ( ω ) (cid:107) w (cid:107) a ( ω ) . Moreover, we have (cid:12)(cid:12)(cid:12) A ω ( u, w ) (cid:12)(cid:12)(cid:12) ≤ C (cid:107) u (cid:107) a ( ω ) (cid:107) w (cid:107) a ( ω ) . Proof.

By the assumption (13), we have A ω ( u, u ) = (cid:90) ω κ ( x, ∇ u ) · ∇ u ≥ C (cid:90) ω κ ( x ) |∇ u | = C (cid:107) u (cid:107) a ( ω ) . which proves the ﬁrst inequality. By the assumption (12), we have (cid:12)(cid:12)(cid:12) A ω ( u, w ) − A ω ( v, w ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:90) ω (cid:16) κ ( x, ∇ u ) − κ ( x, ∇ v ) (cid:17) · ∇ w (cid:12)(cid:12)(cid:12) ≤ (cid:90) ω (cid:12)(cid:12) κ ( x, ∇ u ) − κ ( x, ∇ v ) (cid:12)(cid:12) · (cid:12)(cid:12) ∇ w (cid:12)(cid:12) ≤ C (cid:90) ω κ (cid:12)(cid:12) ∇ ( u − v ) (cid:12)(cid:12) · (cid:12)(cid:12) ∇ w (cid:12)(cid:12) ≤ C (cid:107) u − v (cid:107) a ( ω ) (cid:107) w (cid:107) a ( ω ) which gives the second inequality. Recall that κ ( x,

0) = 0. Thus we have (cid:12)(cid:12)(cid:12) A ω ( u, w ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) A ω ( u, w ) − A ω (0 , w ) (cid:12)(cid:12)(cid:12) ≤ C (cid:107) u (cid:107) a ( ω ) (cid:107) w (cid:107) a ( ω ) which shows the third inequality. This completes the proof of this lemma.We next prove the following technical result. 9 emma 2. For ω ⊂ Ω , u, v ∈ H ( ω ) , we have (cid:12)(cid:12)(cid:12) A ω ( u, v ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) (cid:90) ω κ ( x, ∇ u ) · ∇ ( vχ i ) (cid:12)(cid:12)(cid:12) ≤ C (cid:107) u (cid:107) a ( ω ) (cid:16) (cid:107) v (cid:107) a ( ω ) + (cid:107) v (cid:107) s ( ω ) (cid:17) . Proof.

Notice that ∇ ( χ i v ) = v ∇ χ i + χ i ∇ v . Thus, we have (cid:90) ω κ ( x, ∇ u ) · ∇ ( vχ i ) = (cid:90) ω vκ ( x, ∇ u ) · ∇ χ i + (cid:90) ω χ i κ ( x, ∇ u ) · ∇ v. We will ﬁrst estimate the term (cid:82) ω vκ ( x, ∇ u ) · ∇ χ i . By the Cauchy-Schwarz inequality, we have | (cid:90) ω vκ ( x, ∇ u ) · ∇ χ i | ≤ (cid:16) (cid:90) ω κ − | κ ( x, ∇ u ) | (cid:17) (cid:16) (cid:90) ω κ |∇ χ i | | v | (cid:17) and, by using (12), we have κ − | κ ( x, ∇ u ) | ≤ C κ |∇ u | . Combining the above, we have | (cid:90) ω vκ ( x, ∇ u ) · ∇ χ i | ≤ C (cid:107) u (cid:107) a ( ω ) (cid:107) v (cid:107) s ( ω ) . To estimate the second term (cid:82) ω χ i κ ( x, ∇ u ) ∇ v , we use the fact that | χ i | ≤ (cid:12)(cid:12)(cid:12) (cid:90) ω χ i κ ( x, ∇ u ) ∇ v (cid:12)(cid:12)(cid:12) ≤ (cid:90) ω | κ ( x, ∇ u ) ||∇ v | ≤ C (cid:90) ω κ |∇ u ||∇ v | ≤ C (cid:107) u (cid:107) a ( ω ) (cid:107) v (cid:107) a ( ω ) . This completes the proof of this lemma.In the following, we will formulate our nonlinear NLMC method for the equation (11). The coarse scaledegrees of freedom (continua) U of the solution is deﬁned as U i,j = (cid:90) Ω ˜ κuµ i,j for some µ i,j ∈ L ∞ (Ω) where µ i,j | K m = 0 if m (cid:54) = i . The auxiliary space V aux is then deﬁned as V aux = span i,j { µ i,j } We remark that the functions in V aux deﬁnes the continua. In particular, µ i,j deﬁnes the j -th continuum inthe coarse cell K i .Next, to construct the numerical upscaling equation for (11), we will deﬁne a global downscaling operator F such that F ( U ) ∈ H (Ω) and (cid:90) Ω κ ( x, ∇ F ( U )) · ∇ v − s ( F ( U ) , v ) = 0 , ∀ v ∈ H (Ω) , (14) s ( F ( U ) , µ i,j ) = ¯ U i,j , ∀ µ i,j ∈ V aux . (15)Next, we will deﬁne a projection operator Π : V → V aux such that (cid:90) Ω ˜ κ Π( u ) µ = (cid:90) Ω ˜ κuµ, ∀ µ ∈ V aux . The global solution U glo ∈ V aux is deﬁned by (cid:90) Ω F ( U glo ) v = (cid:90) Ω f v, ∀ v ∈ V aux and the global downscaled solution u glo is deﬁned as u glo = F ( U glo ). Approximation by global basis

We summarize the main steps: 10. Find U glo ∈ V aux (cid:90) Ω F ( U glo ) v = (cid:90) Ω f v, ∀ v ∈ V aux . (16)2. Deﬁne u glo = F ( U glo ) . (17)Next, we will construct the nonlinear NLMC method. For each K ∈ T H , we will deﬁne a local downscalingoperator F loc,K such that F loc,K ( ¯ U ) ∈ H ( K + ) and (cid:90) K + κ ( x, ∇ F loc,K ( ¯ U )) · ∇ v − (cid:90) K + ˜ κF loc,K ( ¯ U ) v = 0 , ∀ v ∈ H ( K + ) , (cid:90) K + ˜ κF loc,K ( ¯ U ) µ i,j = ¯ U i,j , for µ i,j ∈ V aux . The multiscale solution U ms ∈ V aux is deﬁned by (cid:88) K (cid:90) K F loc,K ( U ms ) v = (cid:90) Ω f v, ∀ v ∈ V aux and the downscaled multiscale solution u ms is deﬁned as u ms = F ms ( U ms ) := (cid:80) K χ K F loc,K ( U ms ) where χ K is a partition of unity such that (cid:80) K χ K ≡ (cid:80) K |∇ χ K | ≤ C (cid:80) i |∇ χ i | and supp { χ K } ⊂ K + = ∪ K i ∩ K (cid:54) = ∅ K i . Nonlinear NLMC method

We summarize the main steps:1. Find U ms ∈ V aux (cid:88) K (cid:90) K F loc,K ( U ms ) v = (cid:90) Ω f v, ∀ v ∈ V aux . (18)2. Deﬁne u ms = F ms ( U ms ) . (19)The analysis of our scheme is based on three assumptions. We summarize them below. Assumption 1:

For all K ∈ T H , v ∈ V ( K ), we have (cid:107) ( I − π ) v (cid:107) s (cid:107) v (cid:107) a ≤ CH.

Assumption 2:

For K ∈ T H , v aux ∈ V aux ( K ), there exist a function w ∈ H ( K ) such that (cid:107) v aux (cid:107) s ( K ) ≤ s K ( v aux , w ) , (cid:107) w (cid:107) a ( K ) ≤ C (cid:107) v aux (cid:107) s ( K ) . Assumption 3:

There exist a C κ > (cid:107) v (cid:107) s ≤ C κ (cid:107) v (cid:107) a , ∀ v ∈ V. We will prove the following lemma for the stability of the downscale map.

Lemma 3.

By assumption 2, we have (cid:107) F ( U ) (cid:107) a ≤ CC C − (cid:107) U (cid:107) s and (cid:107) F loc,K ( U ) (cid:107) a ≤ CC C − (cid:107) U (cid:107) s roof. First, by Lemma 1 and (14), we have (cid:107) F ( U ) (cid:107) a ≤ C − A ( F ( U ) , F ( U )) = s ( F ( U ) , F ( U ))and by (15), we have s ( F ( U ) , F ( U )) = s ( U , F ( U )) . Therefore, we have (cid:107) F ( U ) (cid:107) a ≤ C − (cid:107) U (cid:107) s (cid:107) F ( U ) (cid:107) s . By Assumption 2, there exist a function w ∈ H ( K ) such that (cid:107) F ( U ) (cid:107) s ( K ) ≤ s ( F ( U ) , w ) and (cid:107) w (cid:107) a ≤ (cid:107) F ( U ) (cid:107) s ( K ) . Hence, we have (cid:107) F ( U ) (cid:107) s ( K ) ≤ s ( F ( U ) , w ) = (cid:90) Ω κ ( x, ∇ F ( U )) · ∇ w ≤ C (cid:107) F ( U ) (cid:107) a (cid:107) w (cid:107) a ≤ CC (cid:107) F ( U ) (cid:107) s ( K ) (cid:107) F ( U ) (cid:107) a . This shows the ﬁrst required inequality. Using a similar argument, we can prove that (cid:107) F loc,K ( U ) (cid:107) a ≤ CC C − (cid:107) U (cid:107) s . This completes the proof of the lemma.In the following lemma, we will give an error bound for the solution F ( U glo ). Lemma 4.

Let u be the solution of (11) and F ( U glo ) be the solution of (16)-(17). We have (cid:107) u − F ( U glo ) (cid:107) a ≤ CC − H (cid:107) ( I − Π)( f ˜ κ − ) (cid:107) s . Proof.

First of all, we note that F ( U glo ) = Π( f ˜ κ − ). So, we have A ( F ( U glo ) , v ) = (cid:90) Ω F ( U glo ) v = s (Π( f ˜ κ − ) , v ) , ∀ v ∈ H (Ω)and A ( u, v ) − A ( F ( U glo ) , v ) = s ( f ˜ κ − − Π( f ˜ κ − ) , v ) , ∀ v ∈ H (Ω) . Therefore, by (13), we have C (cid:107) u − F (Π u ) (cid:107) a ≤ A ( u, u − F ( U glo )) − A ( F ( U glo ) , u − F ( U glo ))= s (cid:16) f ˜ κ − − Π( f ˜ κ − ) , ( I − Π)( u − F ( U glo ) (cid:17) ≤ (cid:107) f ˜ κ − − Π( f ˜ κ − ) (cid:107) s (cid:107) ( I − Π)( u − F ( U glo ) (cid:107) s ≤ CH (cid:107) f ˜ κ − − Π( f ˜ κ − ) (cid:107) s (cid:107) ( u − F ( U glo ) (cid:107) a where the last inequality follows from Assumption 1. This completes the proof.In the next lemma, we give a localization result. To do so, we need some notations for the oversamplingdomain and the cutoﬀ function with respect to these oversampling domains. For each coarse cell K , wedenote K + m ⊂ Ω as the oversampling coarse region by enlarging K by m coarse grid layers. For M > m , wedeﬁne χ M,m ∈ span { χ i } such that 0 ≤ χ M,m ≤ χ M,m = 1 , in K + m , (20) χ M,m = 0 , in Ω \ K + M . (21)Note that, we have K + m ⊂ K + M . 12 emma 5. Assume K + M is an oversampling region obtained by enlarging the coarse cell K by M coarse gridlayers. Let η i = F i ( U ) − F loc,Ki ( U ) . We have (cid:107) η (cid:107) a ( K ) ≤ (1 − C − C − C ) M (cid:107) η (cid:107) a ( K + M ) and (cid:107) η (cid:107) s ( K ) ≤ CC (cid:107) η (cid:107) a ( K ) ≤ CC (1 − C − C − C ) M (cid:107) η (cid:107) a ( K + M ) . Proof.

The ﬁrst step of the proof is to show the following inequality (cid:90) K + m +1 κ |∇ ( F ( ¯ U ) − F locK ( ¯ U )) | ≤ C (cid:90) K + m +1 \ K + m κ |∇ ( F ( ¯ U ) − F locK ( ¯ U )) | , ∀ m ≤ M. (22)To do so, we denote f i = F i ( U ) and g i = F loc,Ki ( U ). By (13), we obtain C (cid:90) K + m +1 κ |∇ ( f − g ) | ≤ (cid:90) K + m +1 (cid:16) κ ( x, ∇ f ) − κ ( x, ∇ g ) (cid:17) · ∇ (cid:16) f − g (cid:17) . Recalling that η i = f i − g i . We notice that (cid:90) K + m +1 (cid:16) κ ( x, ∇ f ) − κ ( x, ∇ g ) (cid:17) · ∇ (cid:16) χ m +1 ,m η (cid:17) = s ( η , χ m +1 ,m η ) . Therefore, we have C (cid:90) K + m +1 κ |∇ ( f − g ) | ≤ s ( η , χ m +1 ,m η ) + (cid:90) K + m +1 (cid:16) κ ( x, ∇ f ) − κ ( x, ∇ g ) (cid:17) · ∇ (cid:16) (1 − χ m +1 ,m ) η (cid:17) = (cid:90) K m +1 \ K m ˜ κη χ m +1 ,m η − (cid:90) K m +1 \ K m (cid:16) κ ( x, ∇ f ) − κ ( x, ∇ g ) (cid:17) · ∇ (cid:16) χ m +1 ,m η (cid:17) . Next, we deﬁne K (cid:48) m = K + m +1 \ K + m and obtain − (cid:90) K (cid:48) m (cid:16) κ ( x, ∇ f ) − κ ( x, ∇ g ) (cid:17) · ∇ (cid:16) χ m +1 ,m η (cid:17) ≤ C (cid:90) K (cid:48) m κ |∇ f − ∇ g | · (cid:16) η ∇ χ m +1 ,m + χ m +1 ,m ∇ η (cid:17) ≤ C (cid:107) η (cid:107) a ( K (cid:48) m ) (cid:16) (cid:107) η (cid:107) a ( K (cid:48) m ) + (cid:107) η (cid:107) s ( K (cid:48) m ) (cid:17) . Therefore, we have C (cid:90) K + m +1 κ |∇ ( f − g ) | ≤ C (cid:107) η (cid:107) a ( K (cid:48) m ) ( (cid:107) η (cid:107) a ( K (cid:48) m ) + (cid:107) η (cid:107) s ( K (cid:48) m ) ) + (cid:90) K m +1 \ K m ˜ κη χ m +1 ,m η ≤ CC (cid:107) η (cid:107) a ( K (cid:48) m ) + ( (cid:90) K m +1 \ K m ˜ κη ) ( (cid:90) K m +1 \ K m ˜ κη ) ≤ CC (cid:107) η (cid:107) a ( K (cid:48) m ) + C (cid:107) η (cid:107) a ( K (cid:48) m ) (cid:107) η (cid:107) s ( K (cid:48) m ) . (23)Next we will estimate (cid:107) η (cid:107) s . By Assumption 2, for K ∈ T H , there exist a v ∈ H ( K ) such that (cid:107) η (cid:107) s ( K ) ≤ s ( η , v ) and (cid:107) v (cid:107) a ≤ C (cid:107) η (cid:107) s ( K ) . Thus, for K ⊂ K + M , we have (cid:107) η (cid:107) s ( K ) ≤ C (cid:82) K (cid:16) κ ( x, ∇ f ) − κ ( x, ∇ g ) (cid:17) · ∇ v (cid:107) v (cid:107) a ≤ CC (cid:107) η (cid:107) a ( K ) . (24)13ombining (23) and (24), we have C (cid:107) η (cid:107) a ( K + m +1 ) ≤ CC (cid:107) η (cid:107) a ( K (cid:48) m ) . This shows (22).By using (22), we have (cid:107) η (cid:107) a ( K + m ) = (cid:107) η (cid:107) a ( K + m +1 ) − (cid:107) η (cid:107) a ( K (cid:48) m ) ≤ (1 − C − C − C ) (cid:107) η (cid:107) a ( K + m +1 ) and we therefore obtain (cid:107) η (cid:107) a ( K ) ≤ (1 − C − C − C ) m (cid:107) η (cid:107) a ( K + m ) . This gives the ﬁrst required inequality. The second required inequality follows from (24). This completesthe proof of tis lemma.The following result gives an error estimate for our nonlinear NLMC solution.

Theorem 1.

Consider the oversampling domain K + M obtained by enlarging K by M coarse cell layers. Let u be the solution of (11) and F ms ( U ms ) be the solution of (18)-(19). Then we have (cid:107) F ms ( U ms ) − u (cid:107) a ≤ CH + C ( M ) + C ( M ) where C ( M ) = CC C − (1 − C − C − C ) M M d (cid:107) U ms (cid:107) s ,C ( M ) = CC κ C C − (1 − C − C − C ) M M d (cid:107) U ms (cid:107) s . Moreover, if M ∼ O (cid:16) log( H − ) + log( C κ ) (cid:17) and H ≤ , then we have (cid:107) F ms ( U ms ) − u (cid:107) a ≤ CH.

Proof.

We will analyze the error by ﬁrst separating the error into three parts as follows (cid:107) F ms ( U ms ) − u (cid:107) a ≤ (cid:107) F ( U glo ) − u (cid:107) a + (cid:107) F ( U ms ) − F ms ( U ms ) (cid:107) a + (cid:107) F ( U glo ) − F ms ( U ms ) (cid:107) a . By Lemma 4, we have (cid:107) F ( U glo ) − u (cid:107) a ≤ CH (cid:107) ( I − Π)( f ˜ κ − ) (cid:107) s and by Lemma 5, we have (cid:107) F ( U ms ) − F ms ( U ms ) (cid:107) a ≤ (cid:88) K (cid:107) F ( U ms ) − F ms ( U ms ) (cid:107) a ( K ) ≤ (1 − C − C − C ) M (cid:88) K (cid:107) F ( U ms ) − F ms ( U ms ) (cid:107) a ( K M ) ≤ − C − C − C ) M (cid:88) K (cid:16) (cid:107) F ( U ms ) (cid:107) a ( K M ) + (cid:107) F ms ( U ms ) (cid:107) a ( K M ) (cid:17) . By Lemma 3, we have (cid:107) F ( U ms ) (cid:107) a ≤ CC C − (cid:107) U ms (cid:107) s and (cid:107) F ms ( U ms ) (cid:107) a ( K M ) ≤ CC C − (cid:107) U ms (cid:107) s ( K M ) . Therefore, we obtain (cid:88) K (cid:16) (cid:107) F ( U ms ) (cid:107) a ( K M ) + (cid:107) F ms ( U ms ) (cid:107) a ( K M ) (cid:17) ≤ CC C − M d (cid:107) U ms (cid:107) s . (cid:107) F ( U glo ) − F ( U ms ) (cid:107) a . By Lemma 1 and Assumption 3, we have (cid:107) F ( U glo ) − F ( U ms ) (cid:107) a ≤ C − s ( F ( U glo ) − F ( U ms ) , F ( U glo ) − F ( U ms )) ≤ C − (cid:107) F ( U glo ) − F ( U ms ) (cid:107) s (cid:107) F ( U glo ) − F ( U ms ) (cid:107) s ≤ C κ C − (cid:107) F ( U glo ) − F ( U ms ) (cid:107) s (cid:107) F ( U glo ) − F ( U ms ) (cid:107) a and by Lemma 5, we have (cid:88) K (cid:107) F ( U glo ) − F ( U ms ) (cid:107) s ( K ) = (cid:88) K (cid:107) F loc,K ( U ms ) − F ( U ms ) (cid:107) s ( K ) ≤ CC (1 − C − C − C ) M (cid:88) K (cid:107) F loc,K ( U ms ) − F ( U ms ) (cid:107) a ( K + M ) ≤ CC C − (1 − C − C − C ) M M d (cid:107) U ms (cid:107) s . Therefore, we have (cid:107) F ( U glo ) − F ( U ms ) (cid:107) a ( K ) ≤ CC κ C C − (1 − C − C − C ) M M d (cid:107) U ms (cid:107) s . Combining the above results, we obtain (cid:107) F ms ( U ms ) − u (cid:107) a ≤ CH (cid:107) ( I − Π) f ˜ κ (cid:107) s + C ( C C − + C κ C C − )(1 − C − C − C ) M M d (cid:107) U ms (cid:107) s . To show the second part of the theorem, we notice that (cid:107) U ms (cid:107) s ≤ (cid:107) F ms ( U ms ) (cid:107) s ≤ C κ (cid:107) F ms ( U ms ) (cid:107) a ≤ C κ (cid:16) (cid:107) F ms ( U ms ) − u (cid:107) a + (cid:107) u (cid:107) a (cid:17) . If M is large enough such that M ≥ (cid:16) C ( C C − + C κ C C − ) (cid:17) + 2 log( C κ ) + log( H − ) − d log( M )log (cid:16) (1 − C − C − C ) − (cid:17) , then we have CC κ ( C C − + C κ C C − )(1 − C − C − C ) M M d ≤ H ≤ (cid:107) F ms ( U ms ) − u (cid:107) a ≤ CH (cid:107) ( I − Π)( f ˜ κ − ) (cid:107) s + C ( C C − + C κ C C − )(1 − C − C − C ) M M d (cid:107) U ms (cid:107) s ≤ CH (cid:107) ( I − Π)( f ˜ κ − ) (cid:107) s + 12 (cid:107) F ms ( U ms ) − u (cid:107) a + H (cid:107) u (cid:107) a . This completes the proof of the theorem.

In this section, we present numerical results for the proposed method. In our examples, we will use simpliﬁedlocal problems to compute macroscale parameters. These local computations will involve machine learningalgorithms. We consider following model problems in fractured and heterogeneous porous media:

Test 1 : Nonlinear ﬂow problem (unsaturated ﬂow problem)

Test 2 : Nonlinear transport and ﬂow problem (two-phase ﬂow problem)15igure 3: Coarse mesh with source term and fracture positions (left). Heterogeneous porous matrix perme-ability in Ω (right)We solve model problem in Ω = [0 , × [0 ,

1] with no ﬂux boundary conditions. Heterogeneous porousmatrix permeability and location of the source terms and fracture position are depicted in Figure 3. We setsource terms f ± = ± q , q = 10 . We use 10 ×

10 coarse grid and 160 ×

160 ﬁne grid.

Test 1.

We consider the solution of the nonlinear equation in fractured heterogeneous media. For thenonlinear coeﬃcients, we use k αβ ( x, u ) = k s ( x ) k r ( u ) with k r ( u ) = exp( − a | u | ), a = 0 . α, β = m, f ). Weset c m = 1, c f = 0, k fs = 10 and T max = 10 − with 20 time steps. Test 2.

We consider the solution of the two-phase ﬂow problem in fractured and heterogeneous porousmedia. For nonlinear coeﬃcients, we set λ w ( s ) = s and λ n ( s ) = (1 − s ) . We set φ α = 1 ( α = m, f ), k f = 10 and T max = 6 . · − with 700 time steps.MSE RMSE (%) MAE (%) Test 1

N N N N N N N N Test 2

N N N N N N N N Test 1 and

Test 2

Figure 4: Reference ﬁne grid solution ( u fine ), mean value on coarse grid of the ﬁne grid solution ( u fine ),coarse grid solution using upscaling method ( u UP ) and coarse grid solution using nonlinear nonlocal machinelearning method ( u NL ). Nonlinear ﬂow problem ( Test 1 ). Pressure on ﬁnal time t m , m = 20Each sample X l contains the information about heterogeneous permeability and fracture positions up to16igure 5: Reference ﬁne grid solution ( s fine , p fine ), mean value on coarse grid of the ﬁne grid solution ( s fine , p fine ), coarse grid solution using upscaling method ( s UP , p UP ) and coarse grid solution using nonlinearnonlocal machine learning method ( s NL , p NL ). Nonlinear ﬂow and transport problem ( Test 2 ). First row:saturation for time t m , m = 300. Second row: saturation for time t m , m = 700. Third row: pressure fortime t m , m = 700the ﬁne grid resolution in local domain, coarse grid mean value of the solution in oversampled local domain Test 1 : X l = ( X kl , X fl , X p m l + ) , Test 2 : X l = ( X kl , X fl , X p α l + , X s α l + , X p β l + , X s β l + )and output Test 1 : Y l = ( T αβ,NLl ) , α, β = m, , Test 2 : Y l = ( T αβ,NLl , T w,αβ,NLl ) , α, β = m, f. Each dataset is divided into training and validation sets with 80 : 20 ratio.For the training of the neural networks, we use a global dataset, where we extract local informationfrom the ﬁne grid calculations on the global domain Ω. We train four neural networks for each type oftransmissibility:

N N for horizontal coarse edges for matrix-matrix ﬂow, N N for vertical coarse edges s formatrix-matrix ﬂow, N N for matrix - fracture ﬂow and N N for fracture - fracture ﬂow. For calculations, weuse 150 epochs with a batch size N b = 90 and Adam optimizer with learning rate (cid:15) = 0 . × × X k and X f , and 3 × X p m . For17ach input data, we have 2 layers of CNN with one ﬁnal fully connected layer. Convolution layer contains 8and 16 feature maps for X k and X f ; and 4 and 8 feature maps for X p m . We use dropout with rate 10 %in each layer in order to prevent over-ﬁtting. Finally, we combine CNN output and perform two additionalfully connected layers with size 200 and 1(one ﬁnal output). Presented algorithm is used to learn dependencebetween multi-input data and upscaled nonlinear transmissibilities.For error calculation on the dataset, we used mean square errors, relative mean absolute and relativeroot mean square errors M SE = (cid:88) i | Y i − ˜ Y i | , RM SE = (cid:115) (cid:80) i | Y i − ˜ Y i | (cid:80) i | Y i | , M AE = (cid:80) i | Y i − ˜ Y i | (cid:80) i | Y i | , where Y i and ˜ Y i denotes reference and predicted values for sample X i Learning performance for neuralnetworks are presented in Table 1 for

Test 1 and

Test 2 . We observe a good convergence with small errorfor each neural network.Next, we consider errors between solution of the coarse grid problem with the reference and predictedupscaled transmissibilities. To measure diﬀerence between reference solution and coarse grid solution, wecompute relative L error e ( u ) = (cid:118)(cid:117)(cid:117)(cid:116) (cid:80) N H i =1 ( u finei − u i ) (cid:80) N H i =1 ( u finei ) , where u = p, s , u fine is the reference solution (mean value on coarse grid of the ﬁne grid solution) and u isthe solution on the coarse grid. In Figure 4, we depict solution of the problem for Test 1 on the ﬁne grid,coarse grid upscaled solution using classic approach and for new method presented ( u fine , u fine , u UP and u NL ). We have e ( u UP ) = 11 . e ( u NL ) = 2 . Test 2 . On the ﬁrst column, we depict a referenceﬁne grid solution ( s fine , p fine ), mean value on coarse grid of the ﬁne grid solution ( s fine , p fine ) on the secondcolumn, coarse grid solution using upscaling method ( s UP , p UP ) on the third column and coarse grid solutionusing nonlinear nonlocal machine learning method ( s NL , p NL ) on the fourth column. On the ﬁrst, secondand third rows, we show a saturation for time t m , m = 300 ,

700 and on fourth row, we have pressure for time t m , m = 700. Fine grid (reference) solution is performed using ﬁnite volume approximation with embeddeddiscrete fracture model, where for error calculations, we used a mean values of the reference solution on thecoarse grid, p fine and s fine . On the last column of the Figure 5, we depict a coarse grid solution usingnonlinear nonlocal transmissibilities that calculate based on the machine learning approach. For machinelearning approach, we have e ( p NL ) = 0 . e ( s NL ) = 3 . e ( p UP ) = 14 . e ( s UP ) = 13 . t m , m = 700. In the paper, we present a general nonlinear upscaling framework for nonlinear diﬀerential equations withmultiscale coeﬃcients. The framework is built on nonlinear nonlocal multi-continuum upscaling concept. Theapproach ﬁrst identiﬁes test functions for each coarse block, which are used to identify macroscale variables(called continua). In the second stage, we solve nonlinear local problems in oversampled regions with someconstraints deﬁned via test functions. Simpliﬁed local problems are proposed for numerical results. Deeplearning algorithms are used to approximate the nonlinear ﬂuxes that are derived in nonlinear upscaling. Inthe ﬁnal stage, macroscale formulation is given and it seeks the values of macroscopic variables such that thedownscaled ﬁeld solves the global problem in a weak sense deﬁned using the test function. We present ananalysis of our approach for an example nonlinear problem. We present numerical results for several porousmedia applications, including two-phase ﬂow and transport.

Acknowledgements

The research of Eric Chung is partially supported by the Hong Kong RGC General Research Fund (Projectnumbers 14304217 and 14302018) and CUHK Faculty of Science Direct Grant 2018-19.18 eferences [1] Assyr Abdulle and Yun Bai. Adaptive reduced basis ﬁnite element heterogeneous multiscale method.

Comput. Methods Appl. Mech. Engrg. , 257:203–220, 2013.[2] G. Allaire and R. Brizzi. A multiscale ﬁnite element method for numerical homogenization.

SIAM J.Multiscale Modeling and Simulation , 4(3):790–812, 2005.[3] T. Arbogast. Implementation of a locally conservative numerical subgrid upscaling scheme for two-phaseDarcy ﬂow.

Comput. Geosci , 6:453–481, 2002.[4] T. Arbogast, G. Pencheva, M.F. Wheeler, and I. Yotov. A multiscale mortar mixed ﬁnite elementmethod.

SIAM J. Multiscale Modeling and Simulation , 6(1):319–346, 2007.[5] T. Arbogast, G. Pencheva, M.F. Wheeler, and I. Yotov. A multiscale mortar mixed ﬁnite elementmethod.

Multiscale Model. Simul. , 6(1):319–346, 2007.[6] J.W. Barker and S. Thibeau. A critical review of the use of pseudorelative permeabilities for upscaling.

SPE Reservoir Eng. , 12:138–143, 1997.[7] Donald L Brown and Daniel Peterseim. A multiscale method for porous microstructures. arXiv preprintarXiv:1411.1944 , 2014.[8] Y. Chen, L. Durlofsky, M. Gerritsen, and X. Wen. A coupled local-global upscaling approach forsimulating ﬂow in highly heterogeneous formations.

Advances in Water Resources , 26:1041–1060, 2003.[9] E. Chung, Y. Efendiev, and S. Fu. Generalized multiscale ﬁnite element method for elasticity equations.

International Journal on Geomathematics , 5(2):225–254, 2014.[10] E. Chung, Y. Efendiev, and W. T. Leung. Generalized multiscale ﬁnite element method for wavepropagation in heterogeneous media.

SIAM Multicale Model. Simul. , 12:1691–1721, 2014.[11] E. Chung and W. T. Leung. A sub-grid structure enhanced discontinuous galerkin method for multiscalediﬀusion and convection-diﬀusion problems.

Communications in Computational Physics , 14:370–392,2013.[12] E. T. Chung, Y. Efendiev, W.T. Leung, M. Vasilyeva, and Y. Wang. Online adaptive local multiscalemodel reduction for heterogeneous problems in perforated domains.

Applicable Analysis , 96(12):2002–2031, 2017.[13] E. T. Chung, Y. Efendiev, and G. Li. An adaptive GMsFEM for high contrast ﬂow problems.

J.Comput. Phys. , 273:54–76, 2014.[14] Eric Chung, Yalchin Efendiev, and Wing Tat Leung. Constraint energy minimizing generalized mul-tiscale ﬁnite element method in the mixed formulation.

Computational Geosciences , 22(3):677–693,2018.[15] Eric Chung, Maria Vasilyeva, and Yating Wang. A conservative local multiscale model reduction tech-nique for stokes ﬂows in heterogeneous perforated domains.

Journal of Computational and AppliedMathematics , 321:389–405, 2017.[16] Eric T Chung, Efendiev, Wing Tat Leung, Maria Vasilyeva, and Yating Wang. Non-local multi-continuaupscaling for ﬂows in heterogeneous fractured media. arXiv preprint arXiv:1708.08379 , 2018.[17] Eric T Chung, Yalchin Efendiev, Wing T Leung, and Mary Wheeler. Nonlinear nonlocal multicon-tinua upscaling framework and its applications.

International Journal for Multiscale ComputationalEngineering , 16(5), 2018.[18] Eric T Chung, Yalchin Efendiev, and Wing Tat Leung. Constraint energy minimizing generalizedmultiscale ﬁnite element method.

Computer Methods in Applied Mechanics and Engineering , 339:298–319, 2018. 1919] Eric T Chung, Yalchin Efendiev, and Wing Tat Leung. Fast online generalized multiscale ﬁnite elementmethod using constraint energy minimization.

Journal of Computational Physics , 355:450–463, 2018.[20] Martin Drohmann, Bernard Haasdonk, and Mario Ohlberger. Reduced basis approximation for nonlinearparametrized evolution equations based on empirical operator interpolation.

SIAM J. Sci. Comput. ,34(2):A937–A969, 2012.[21] L.J. Durlofsky. Numerical calculation of equivalent grid block permeability tensors for heterogeneousporous media.

Water Resour. Res. , 27:699–708, 1991.[22] W. E and B. Engquist. Heterogeneous multiscale methods.

Comm. Math. Sci. , 1(1):87–132, 2003.[23] Y. Efendiev and L.J. Durlofsky. Numerical modeling of subgrid heterogeneity in two phase ﬂow simu-lations.

Water Resour. Res. , 38(8):1128, 2002.[24] Y. Efendiev and L.J. Durlofsky. A generalized convection-diﬀusion model for subgrid transport in porousmedia.

SIAM J. Multiscale Modeling and Simulation , 1(3):504–526, 2003.[25] Y. Efendiev, J. Galvis, and T. Y. Hou. Generalized multiscale ﬁnite element methods (gmsfem).

Journalof Computational Physics , 251:116–135, 2013.[26] Y. Efendiev, J. Galvis, and X.H. Wu. Multiscale ﬁnite element methods for high-contrast problemsusing local spectral basis functions.

Journal of Computational Physics , 230:937–955, 2011.[27] Y. Efendiev and A. Pankov. Numerical homogenization of nonlinear random parabolic operators.

SIAMJ. Multiscale Modeling and Simulation , 2(2):237–268, 2004.[28] DA Fafalis, SP Filopoulos, and GJ Tsamasphyros. On the capability of generalized continuum theoriesto capture dispersion characteristics at the atomic scale.

European Journal of Mechanics-A/Solids ,36:25–37, 2012.[29] Dimitrios Fafalis and Jacob Fish. Computational continua for linear elastic heterogeneous solids onunstructured ﬁnite element meshes.

International Journal for Numerical Methods in Engineering ,115(4):501–530, 2018.[30] Jacob Fish.

Practical multiscaling . John Wiley & Sons, 2013.[31] Jacob Fish and Wen Chen. Space–time multiscale model for wave propagation in heterogeneous media.

Computer Methods in applied mechanics and engineering , 193(45):4837–4856, 2004.[32] Jacob Fish and Rong Fan. Mathematical homogenization of nonperiodic heterogeneous media subjectedto large deformation transient loading.

International Journal for numerical methods in engineering ,76(7):1044–1064, 2008.[33] Jacob Fish, Vasilina Filonova, and Dimitrios Fafalis. Computational continua revisited.

InternationalJournal for Numerical Methods in Engineering , 102(3-4):332–378, 2015.[34] Jacob Fish, Vasilina Filonova, and Zheng Yuan. Reduced order computational continua.

ComputerMethods in Applied Mechanics and Engineering , 221:104–116, 2012.[35] Jacob Fish and Sergey Kuznetsov. Computational continua.

International Journal for NumericalMethods in Engineering , 84(7):774–802, 2010.[36] Jacob Fish, Kamlun Shek, Muralidharan Pandheeradi, and Mark S Shephard. Computational plasticityfor composite structures based on mathematical homogenization: Theory and practice.

ComputerMethods in Applied Mechanics and Engineering , 148(1-2):53–73, 1997.[37] Patrick Henning and Mario Ohlberger. The heterogeneous multiscale ﬁnite element method for elliptichomogenization problems in perforated domains.

Numerische Mathematik , 113(4):601–629, 2009.2038] L. Holden and B.F. Nielsen. Global upscaling of permeability in heterogeneous reservoirs: the OutputLeast Squares (OLS method.

Transport in Porous Media , 40:115–143, 2000.[39] J.R. Kyte and D.W. Berry. New pseudofunctions to control numerical dispersion.

Society of PetroleumEngineers Journal , 15(4):269–276, 1975.[40] Ana-Maria Matache and Christoph Schwab. Two-scale fem for homogenization problems.

ESAIM:Mathematical Modelling and Numerical Analysis , 36(04):537–572, 2002.[41] Caglar Oskay and Jacob Fish. Eigendeformation-based reduced order homogenization for failure analysisof heterogeneous materials.

Computer Methods in Applied Mechanics and Engineering , 196(7):1216–1243, 2007.[42] H. Owhadi and L. Zhang. Metric-based upscaling.

Comm. Pure. Appl. Math. , 60:675–723, 2007.[43] A. Pankov. G -convergence and homogenization of nonlinear partial diﬀerential operators . Kluwer Aca-demic Publishers, Dordrecht, 1997.[44] M. Peszy´nska, M. Wheeler, and I. Yotov. Mortar upscaling for multiphase ﬂow in porous media. Comput.Geosci. , 6(1):73–100, 2002.[45] X.H. Wu, Y. Efendiev, and T.Y. Hou. Analysis of upscaling absolute permeability.

Discrete andContinuous Dynamical Systems, Series B. , 2:158–204, 2002.[46] Zheng Yuan and Jacob Fish. Multiple scale eigendeformation-based reduced order homogenization.

Computer Methods in Applied Mechanics and Engineering , 198(21-26):2016–2038, 2009.[47] Lina Zhao and Eric T Chung. An analysis of the NLMC upscaling method for high contrast problems. arXiv preprint arXiv:1904.11124arXiv preprint arXiv:1904.11124