[PDF] Improving prediction of the terrestrial water storage anomalies during the GRACE and GRACE-FO gap with Bayesian convolutional neural networks

Abstract

The Gravity Recovery and Climate Experiment (GRACE) satellite and its successor GRACE Follow-On (GRACE-FO) provide valuable and accurate observations of terrestrial water storage anomalies (TWSAs) at a global scale. However, there is an approximately one-year observation gap of TWSAs between GRACE and GRACE-FO. This poses a challenge for practical applications, as discontinuity in the TWSA observations may introduce significant biases and uncertainties in the hydrological model predictions and consequently mislead decision making. To tackle this challenge, a Bayesian convolutional neural network (BCNN) driven by climatic data is proposed in this study to bridge this gap at a global scale. Enhanced by integrating recent advances in deep learning, including the attention mechanisms and the residual and dense connections, BCNN can automatically and efficiently extract important features for TWSA predictions from multi-source input data. The predicted TWSAs are compared to the hydrological model outputs and three recent TWSA prediction products. The comparison suggests the superior performance of BCNN in providing improved predictions of TWSAs during the gap in particular in the relatively arid regions. The BCNN's ability to identify the extreme dry and wet events during the gap period is further discussed and comprehensively demonstrated by comparing with the precipitation anomalies, drought index, ground/surface water levels. Results indicate that BCNN is capable of offering a reliable solution to maintain the TWSA data continuity and quantify the impacts of climate extremes during the gap.

Full PDF

mmanuscript submitted to

AGU Journal

Filling the gap between GRACE- andGRACE-FO-derived terrestrial water storage anomalieswith Bayesian convolutional neural networks

Shaoxing Mo , Yulong Zhong , Xiaoqing Shi , Wei Feng , , Xin Yin , andJichun Wu Key Laboratory of Surﬁcial Geochemistry of Ministry of Education, School of Earth Sciences andEngineering, Nanjing University, Nanjing, China School of Geography and Information Engineering, China University of Geosciences (Wuhan), Wuhan,China State Key Laboratory of Geodesy and Earth’s Dynamics, Institute of Geodesy and Geophysics, InnovationAcademy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan, China School of Geospatial Engineering and Science, Sun Yat-Sen University, Zhuhai, China State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Nanjing HydraulicResearch Institute, Nanjing, China

Key Points: • A Bayesian deep learning method is proposed for ﬁlling the gap between GRACE-and GRACE-FO-derived terrestrial water storage anomalies • A state-of-the-art performance is obtained in bridging the gap at a global scale (ex-cluding Antarctica) • Extreme dry and wet events during the gap are successfully identiﬁed

Corresponding author: Xiaoqing Shi, [email protected]

Corresponding author: Wei Feng, [email protected] –1– a r X i v : . [ phy s i c s . a o - ph ] J a n anuscript submitted to AGU Journal

Abstract

There is an approximately one-year observation gap of terrestrial water storage anomalies(TWSAs) between the Gravity Recovery and Climate Experiment (GRACE) satellite andits successor GRACE Follow-On (GRACE-FO). This poses a challenge for water resourcesmanagement, as discontinuity in the TWSA observations may introduce signiﬁcant biasesand uncertainties in hydrological model predictions and consequently mislead decision mak-ing. To tackle this challenge, a Bayesian convolutional neural network (BCNN) is proposedin this study to bridge this gap using climatic data as inputs. Enhanced by integrating re-cent advances in deep learning, BCNN can eﬃciently extract important features for TWSApredictions from multi-source input data. The predicted TWSAs are compared to the hy-drological model outputs and three recent TWSA prediction products. Results suggest thesuperior performance of BCNN in bridging the gap. The extreme dry and wet events duringthe gap period are also successfully identiﬁed by BCNN.

Plain Language Summary

The remote sensing satellites Gravity Recovery and Climate Experiment (GRACE)provide valuable and accurate observations of terrestrial water storage changes at a globalscale. These observations have been used widely for sustainable water resources manage-ment and understanding water cycle. However, there is an about one-year gap that theGRACE observations are missing due to a break-in period between two satellites. Fillingthis gap is thus of crucial signiﬁcance for practical hydrological, agricultural, and ecologicalapplications. We propose in this work a Bayesian convolutional neural network (BCNN)by leveraging recent advances in deep learning to bridge this gap based on climatic data.Results show that our BCNN achieves a state-of-the-art performance in terms of ﬁllingaccuracy compared to previous studies.

The Gravity Recovery and Climate Experiment (GRACE) satellite and its successorGRACE Follow-On (GRACE-FO) provide unprecedentedly accurate observations of thespatio-temporal dynamics of terrestrial water storage anomalies (TWSAs) at a global scale.These TWSA observations have been widely utilized together with hydrological models toassess water cycle, droughts and ﬂoods, the impacts of changing climate on terrestrial waterstorage, etc (Feng et al., 2018; Gentine et al., 2019; Rateb et al., 2020; Richey et al., 2015;Rodell et al., 2018; Soltani et al., 2021; Tapley et al., 2019). Particularly, their applications ingroundwater-related studies (e.g., the groundwater depletion and land subsidence problemsin North China Plain and California’s Central Valley) are more extensive due to the diﬃcult-to-observe nature of groundwater systems, substantially augmenting our knowledge towardthe systems and thus beneﬁting groundwater resources management (e.g., Chang et al.,2020; Chen et al., 2014; Famiglietti et al., 2011; Feng et al., 2013, 2018; B. Li et al., 2019;Smith & Majumdar, 2020; Zhong et al., 2018).However, there is an approximately one-year gap of TWSA observations between theGRACE and GRACE-FO missions. Considering that the TWSA observations are usuallyassimilated in the hydrological models for higher reliability (B. Li et al., 2019; Soltani et al.,2021; Yin et al., 2020; Zaitchik et al., 2008), discontinuity in the time series observations mayintroduce signiﬁcant biases and uncertainties in model predictions and consequently misleaddecision making (A. Y. Sun et al., 2020). Bridging this gap is thus of crucial importancefor practical applications. There have been many eﬀorts undertaken to predict/reconstructGRACE TWSAs at regional or global scales using data-driven methods (e.g., Ahmed et al.,2019; Forootan et al., 2014; Forootan et al., 2020; Humphrey et al., 2017; Humphrey &Gudmundsson, 2019; Jing et al., 2020; F. Li et al., 2020; Long et al., 2014; Richter et al.,2021; A. Y. Sun et al., 2019, 2020; Z. Sun et al., 2020; Wang et al., 2021). While these studieshave generally obtained relatively good performances in humid regions, the performance in –2–anuscript submitted to

AGU Journal arid and semiarid regions remains relatively poor (a global aridity index map is shownin Figure A1), calling for innovative solutions. The poor performance in arid/semiaridregions is mainly due to the diﬃculty in modeling the long-term TWSA trends caused byanthropogenic activities (Humphrey & Gudmundsson, 2019; F. Li et al., 2020; Z. Sun et al.,2020).Recent years have witnessed a rapid development of deep learning and its impressiveperformance in various applications (Gu et al., 2018; Reichstein et al., 2019; Shen, 2018). Inthis study, we propose a Bayesian convolutional neural network (BCNN) driven by climaticinputs to bridge the TWSA observation gap between GRACE and GRACE-FO at a globalscale (excluding Antarctica). In this framework, the input climatic data and the targetGRACE TWSAs are treated as images to leverage the excellent capability of convolutionalneural networks (CNNs) in image processing (Gu et al., 2018; Mo, Zhu, et al., 2019; Mo,Zabaras, et al., 2019; Mo et al., 2020; A. Y. Sun et al., 2019). A. Y. Sun et al. (2019)applied CNNs for prediction of TWSAs in India and the CNN outperformed the hydrolog-ical models in providing more accurate TWSA estimates. The development of our BCNNleverages and combines the recent advances in deep learning, including the residual (He etal., 2016) and dense connection (Huang et al., 2017) modules, channel and spatial atten-tion mechanisms (Woo et al., 2018), Bayesian training strategy (Liu & Wang, 2016; Zhu& Zabaras, 2018) and other (Gu et al., 2018). By comparing with the hydorlogical modeloutputs and three recent TWSA prediction products, we will show that the combinationof these strategies enables the network to automatically extract informative features frommulti-source driving data, oﬀer predictive uncertainties, and consequently achieve state-of-the-art performances in ﬁlling the gap.

The monthly GRACE Mascon product released by the Jet Propulsion Laboratory(JPL) (available at https://podaac.jpl.nasa.gov/dataset/TELLUS GRAC-GRFO MASCONCRI GRID RL06 V2 ), which has a spatial resolution of 0 . ◦ × . ◦ (Watkins et al., 2015), isused in this study. The JPL GRACE TWSA data are provided as anomalies with respectto the 2004 to 2009 average. The observations cover two periods, that is, April 2002-June2017 (i.e. the GRACE mission) and June 2018-present (i.e. the GRACE-FO mission), witha 11-month gap in between. In addition, there are some intermittent one or two monthswith missing TWSA data within each mission, these missing months are interpolated usingthe data of neighboring months. Our aim is to ﬁll the 11-month gap (i.e. July 2017-May2018) for the land areas with the BCNN method. To facilitate the comparison with previousGRACE prediction studies, we resampled averagely the data to 1 ◦ × ◦ grid. The driving data used to predict the GRACE TWSAs are extracted from the EuropeanCentre for Medium-Range Weather Forecasts (ECMWF) ERA5-Land (ERA5L) dataset(available at https://cds.climate.copernicus.eu ). The ERA5L dataset contains animproved version of the land component of the ERA5 climate reanalysis, making it moreaccurate for land applications. The data are provided at a spatial resolution of 0 . ◦ × . ◦ and cover a period from January 1981 to near present. The driving data contain four predic-tor variables, including the monthly precipitation, temperature, cumulative water storagechanges (CWSCs), and ERA5L-derived TWSAs. The spatial resolution of these data isaveragely resampled into to 1 ◦ × ◦ to be consistent with GRACE TWSAs.The water storage change (WSC) is calculated as the diﬀerence between the inﬂow (i.e.precipitation P ) and outﬂows (i.e., evapotranspiration ET and runoﬀ RO ) based on the –3–anuscript submitted to AGU Journal water balance: WSC = P − ET − RO. (1)The CWSC is thus written as follows:CWSC t = t (cid:88) i =1 WSC i = t (cid:88) i =1 ( P i − ET i − RO i ) , (2)where t denotes the month index. CWSC correlates well to GRACE TWSA in most regionsas shown in Figure A2. Therefore, it is used as an additional predictor of GRACE TWSAs.The ERA5L dataset includes water storage in soil moisture (in four layers spanningfrom 0 to 289 cm), snow, and canopy. Thus, the ERA5L TWSAs are calculated by summingthese components and then subtracting the long-term mean between 2004 and 2009 to beconsistent with the GRACE TWSAs, as represented by:TWSA ERA5L = SMS + SWS + CWS − TWS , (3)where SMS, SWS, and CWS are soil moisture storage, snow water storage, and canopy waterstorage, respectively, TWS denotes mean of the three components during 2004-2009. The GRACE TWSA time series in arid and semiarid regions often exhibits a long-termdeclining/rising trend caused by the human intervention and/or changing climate. Thisposes challenges for the TWSA prediction tasks, as the driving data (e.g., the climaticdata or outputs of hydrological models) may not be able to well reﬂect the inﬂuences ofthese factors (Humphrey & Gudmundsson, 2019; F. Li et al., 2020; Z. Sun et al., 2020).For the 11-month gap (July 2017-May 2018) ﬁlling task considered here, fortunately, theGRACE TWSA data before (April 2002-June 2017) and after (June 2018-) the gap areavailable. Thus, we can obtain directly the long-term trend signals for the missing intervalfrom existing data. Then we predict in the gap ﬁlling task the detrended TWSA signalsinstead, which are generally less challenging relative to predicting the original TWSA sig-nals (Humphrey & Gudmundsson, 2019; F. Li et al., 2020). Mathematically, the GRACETWSAs are decomposed via linear detrending into two components:TWSA

GRACE = TWSA detrendGRACE + trend

GRACE , (4)where TWSA detrendGRACE and trend GRACE are the detrended data and a linear trend term, re-spectively. Correspondingly, the driving data described in section 2.2 are also detrended. Inthe prediction task, the BCNN model learns to predict the TWSA detrendGRACE signals. Finally,the predictions for the original TWSAs are obtained by adding the GRACE trend:TWSA pred = TWSA detrendpred + trend

GRACE . (5) We propose a BCNN method to learn the underlying relationship between TWSA detrendGRACE and its predictors. Without loss of generality, here we use x and y to denote the networkinputs (i.e. predictors) and outputs (i.e. TWSA detrendGRACE ), respectively. In BCNN, the inputsand outputs are both organized as images (matrices) and the learning task becomes animage regression problem: η : x ∈ R n x × H × W −→ y ∈ R n y × H × W , (6)where η = η ( x , w ) is a BCNN model, with w denoting all trainable network parameters,including the weights and biases. The inputs x and outputs y become n x and n y images,respectively, all with H × W pixels (grids). –4–anuscript submitted to AGU Journal

The network predictions are inevitably associated with epistemic uncertainties inducedby lack of training data. Failing to estimate the predictive uncertainty may lead to overcon-ﬁdent results. This is especially the case for the GRACE TWSA prediction task consideredhere, as the available training data are limited. Contrary to vanilla CNNs which treatthe network parameters w as deterministic unknowns and thus fails to oﬀer predictive un-certainties, in BCNN w are treated as random variables. Given a set of training data D = { x i , y i } N train i =1 , the network training is to infer the posterior distribution of w , p ( w |D ).Consequently, one can obtain the prediction distribution of the target variables y : p ( y | w ), w ∼ p ( w |D ), and in particular the mean E ( y | w ) and standard deviation Std( y | w ).In BCNN, the Bayesian training strategy proposed in Zhu and Zabaras (2018) is em-ployed. Mathematically, the BCNN model is expressed as follows:ˆ y = η ( x , w ) + n ( x , w ) , (7)where n ( · ) is an additive Gaussian noise term modeling the aleatoric uncertainty. A vari-ational Bayesian inference algorithm called stein variational gradient descent (SVGD) (Liu& Wang, 2016), which is similar to standard gradient descent while maintaining the par-ticle methods’ high eﬃciency, is utilized to estimate the posterior distribution p ( w |D ). Inimplementation, we use N S samples of w to approximate the posterior distribution p ( w |D ).The N S samples { w i } N S i =1 are respectively optimized using the Adam optimizer (Kingma& Ba, 2015) whose gradient derives from SVGD. The predictive mean and uncertainty ofBCNN for an arbitrary input x are then calculated based on the N S predictions (ˆ y ( i ) = η ( x , w i ) + n ( x , w i ) , i = 1 , . . . , N S ). For more details, one can refer to Liu and Wang (2016)and Zhu and Zabaras (2018). The BCNN network architecture is depicted in Figure A3. The convolutional blockattention module (CBAM) proposed in Woo et al. (2018) is used as the basic block. Given n x images with a resolution of H × W as inputs to the network, they are passed through analternating cascade of convolutional/transposed convolutional layers and CBAMs, each ofwhich produces n f H (cid:48) × W (cid:48) feature maps, to extract multi-scale features to ﬁnally predict n y images for the targets. The CBAM block contains two attention modules, namely thechannel and spatial attention modules (Figures A3 and A4). More speciﬁcally, the channelmodule outputs n f weights between 0 and 1 assigning to the n f feature maps to tell thenetwork ‘what’ (i.e. which maps) to attend; the spatial module outputs a H (cid:48) × W (cid:48) weightmatrix assigning to the H (cid:48) × W (cid:48) pixel feature maps to tell the network ‘where’ (i.e. whichregions) to emphasize or suppress. As such, the network is able to automatically focus onimportant features and suppress unnecessary ones (Woo et al., 2018).The residual connection (He et al., 2016) and dense connection (Huang et al., 2017)strategies have been shown to be eﬀective in enhancing information ﬂow through the networkand thus substantially improving the performance. Therefore, they are implemented in ourBCNN model. In particular, in the residual connection structure, the feature maps with thesame shape ( n f × H (cid:48) × W (cid:48) ) but at diﬀerent layers are connected by applying element-wiseaddition (He et al., 2016). In the dense connection structure, the feature maps with thesame size ( H (cid:48) × W (cid:48) ) but at diﬀerent layers are cascaded together and subsequently fed asinputs into the next layer (Huang et al., 2017) (Figure A3). The Mish function (Misra, 2019)is employed in BCNN as the activation function of hidden layers unless otherwise stated.We use twelve years of monthly GRACE TWSA data from April 2002 to March 2014(i.e. 144 months, ∼ ∼ t , the inputs to BCNNare P i , T i , CWSC i , and TWSA ERA5L ,i with i = t − , ..., t . Thus, each sample contains n x = 12 input images and n y = 1 output image (TWSA GRACE ,t ). The region spanning –5–anuscript submitted to AGU Journal from 60 ◦ S to 84 ◦ N and 180 ◦ W to 180 ◦ E represented by a H × W = 144 ×

360 image isconsidered. For non-land pixels (grids), the pixel values are set to a constant of 0. Duringnetwork training, we use N S = 20 samples of w in the SVGD algorithm to approximate theposterior distribution, as suggested in Zhu and Zabaras (2018). The network is trained for200 epochs, with a mean squared error loss function quantifying the predictive accuracy, aninitial learning rate of 0.0025 in the Adam optimizer and a batch size of 12. The trainingis performed on a single GPU (NVIDIA Tesla V100) and takes ∼

80 minutes. The networkperformance is evaluated based on the test dataset using three commonly used metrics:correlation coeﬃcient ( R ∈ [ − , ∈ ( −∞ , ∈ [0 , + ∞ )), where R or NSE values closer to 1.0 orlower RMSE values indicate better performances. To illustrate the performance of BCNN, the three metrics (i.e., R , NSE, and RMSE) arealso computed for the ERA5L TWSAs and the hydrological model Noah-derived TWSAs(1 ◦ × ◦ , version 2.1) (Rodell et al., 2004). Figure 1.

Spatial maps of R (row 1), NSE (row 2), and RMSE (row 3) values between theGRACE TWSAs and Noah- (column 1), ERA5L- (column 2), and BCNN-derived (column 3)TWSAs during the test period (April 2014-June 2017, June 2018-August 2020). (j-m) The densityscatter plots between the GRACE and modeled TWSAs. Figure 1 shows the spatial distribution of the accuracy metrics obtained by Noah,ERA5L, and BCNN. While Noah and ERA5L TWSAs both show relatively good correlationswith GRACE TWSAs in most regions except Greenland and the extremely arid areas likeSahara, Gobi, and Arabian (Figures 1a and 1b), BCNN TWSAs exhibit clearly bettercorrelations with GRACE TWSAs in almost all regions (Figure 1c). For the NSE metric,which measures directly the matching quality between the predicted and observed values,both Noah and ERA5L obtain relatively low values ( <

0) in most regions except in some –6–anuscript submitted to

AGU Journal humid regions like Amazon and Southeastern United States (Figures 1d and 1e). In contrast,BCNN achieves relatively high values ( > m = detrend (TWSA m )+trend GRACE , where detrend( · ) denotes the linear detrending operation, TWSA m represents the Noah or ERA5L TWSAs.Figure A5 shows the same metrics as in Figure 1 for the corrected Noah and ERA5L TWSAs.The comparison between Figures 1(c,f,i,m) and A5 again clearly indicates BCNN’s higherperformance, although the consistency between the corrected Noah/ERA5L TWSAs andGRACE TWSAs has been improved signiﬁcantly as expected.Figure 2 depicts BCNN’s TWSA predictions for three test months in June 2014, June2017, and June 2020. The three months cover the early-, mid-, and late-term of the testperiod, with June 2017 being the last month before the missing gap. For comparison, we alsoshow the reference GRACE TWSAs. It can be seen that BCNN successfully captures thespatial patterns of GRACE TWSAs and provides close predictions in the three months. Thepredicted errors in regions with high-TWSA signals (e.g., Amazon, Central Africa, SouthAsia, and Greenland) are generally larger compared to other regions with relatively low-TWSA signals (Figure 2(g-i)). This is consistent with the spatial distributions of BCNN’spredictive uncertainties shown in Figure 2(j-m), where the regions with high-TWSA signalsgenerally have larger predictive uncertainties. Note the predicted accuracy in these high-TWSA regions is still relatively good as the high-TWSA signals result in a high percentageaccuracy. This can also be revealed by the NSE map shown in Figure 1f.The performance of BCNN in providing reliable predictions for TWSAs is furtherdemonstrated in Figures 2(n-y) and A7, which depict the basin/region-averaged TWSAtime series derived from GRACE, Noah, ERA5L, and BCNN in diﬀerent basins/regions(Locations of these basins/regions are shown in Figure A6). For completeness, we showthe time series in both training and test periods. Again, the BCNN TWSAs ﬁt obviouslybetter with the reference GRACE TWSAs than Noah and ERA5L TWSAs. While BCNNmay slightly underestimate/overestimate some peak/valley values of GRACE TWSAs dur-ing the test period, the GRACE TWSA curves are almost completely enveloped within the95% prediction interval. The GRACE TWSA time series are eﬀective indicators for detection of extreme dry/wetevents, which cause unusual decreases/increases in the TWSA signals, and quantifying thewater loss/gain during the events (Feng et al., 2018; Humphrey et al., 2016; F. Li et al.,2020). For example, the GRACE successfully identiﬁed the 2016/2017 and 2018/2019droughts in Central Europe (Figure 2o) (Boergens et al., 2020), the droughts from 2012to 2016 in Central Valley (Figure 2p) (Xiao et al., 2017), and the ﬂood in Summer 2020in Yangtze River Basin (Figure 2w) (Wei et al., 2020). As shown in Figure 2(n-y), the –7–anuscript submitted to

AGU Journal

Figure 2.

The GRACE (a-c) and BCNN (d-f) TWSAs in three test months (June 2014, June2017, and June 2020). (g-i) The BCNN predicted error (i.e. TWSA

GRACE - TWSA

ERA5L ) and (j-m) standard deviation (Std). (n-y): The region/basin-averaged GRACE, Noah, ERA5L and BCNNTWSA time series in diﬀerent basins/regions. The red shaded area denotes the 95% conﬁdenceinterval (CI) of BCNN predictions.

BCNN TWSAs agree well with GRACE TWSAs during these extreme events, suggestingthat BCNN is capable of detecting the drought- and ﬂood-induced abnormal TWSA signals.We further analyze BCNN’s performance in detecting dry and wet events during thegap. To this end, the trend and seasonal signals are removed from the original BCNNTWSAs, which is done by ﬁtting a linear trend via unweighted least squares, plus annualand semiannual sinusoids to the TWSA time series (Chen et al., 2014; Zhong et al., 2018).The detrended and deseasonalized TWSAs at each grid point is then standardized usingthe z -score formula. The resulting TWSA is denoted as sTWSA detrend , deseasonBCNN and its spa-tial maps in the eleven missing months are shown in Figure 3(a-k). It is observed thatthere are four regions (labeled with (l-o) in Figure 3h) exhibiting abnormal signals, withextremely negative and positive signals indicating dry (regions l and m) and wet events(regions n and o), respectively, where the dry/wet events in regions m and n were also re-ported by European State of the Climate ( https://climate.copernicus.eu/ESOTC ). To –8–anuscript submitted to AGU Journal

Figure 3. (a-k) The standardized signals of the detrended and deseasonalized BCNN TWSAs(sTWSA detrend , deseasonBCNN ) in the missing gap (July 2017-May 2018). The four regions within therectangles (labeled l-o in h) have signiﬁcant abnormal signals during this period. (l-o) The region-averaged time series of sTWSA detrend , deseasonGRACE , sTWSA detrend , deseasonBCNN , standardized precipitationanomalies (s P anom ), and 6-month SPEI in the four regions. The shaded area denotes the gapperiod. examine this, we plot in Figure 3(l-o) the region-averaged times series of standardized pre-cipitation anomalies (s P anom ) and 6-month standardized precipitation-evapotranspirationindex (SPEI). When calculating P anom , we ﬁrst apply moving average on the precipita-tion time series ( P t = (cid:80) i = − P t + i ) to smooth out short-term ﬂuctuations, and then com-pute the anomalies for each month with respect to its respective average during 2002 and2018. The SPEI dataset, which covers a period from 2002 to 2018, is downloaded from https://spei.csic.es/spei database/ . For comparison, the sTWSA detrend , deseasonGRACE timeseries are also displayed in the plot. It is observed that the two sTWSA detrend , deseason linesagree well with the s P anom and SPEI lines over the period 2002-2018. This demonstratesthe capability of sTWSA detrend , deseason in detecting extreme climate events. More notably,sTWSA detrend , deseasonBCNN successfully identiﬁes the extreme dry events in regions l and m (Fig-ures 3l and 3m) and the extreme wet conditions in regions n and o (Figures 3n and 3o)during the 11-month gap (July 2017-May 2018). –9–anuscript submitted to AGU Journal

The results presented in section 4.1 and this section indicate that BCNN can successfullycaptures the complex spatio-temporal behaviors of GRACE TWSAs as well as the abnormalsignals caused by extreme climate conditions. It is worth noting that the test data fromGRACE (April 2014-June 2017) and GRACE-FO (June 2018-Aug 2020) cover the periodsbefore and after the missing gap (i.e. July 2017-May 2018). The good agreement betweenBCNN and GRACE TWSAs suggests the reliability of BCNN in bridging the gap betweenGRACE and GRACE-FO at a global scale.

As presented in section 1, there have been many eﬀorts undertaken to model GRACETWSAs using data-driven methods. Here we restrict the comparison with Humphrey andGudmundsson (2019), F. Li et al. (2020), and Z. Sun et al. (2020) who provided publiclyaccessible global-scale TWSA prediction products. Note that the predicted TWSA productreleased by Humphrey and Gudmundsson (2019) is known as GRACE-REC. For a faircomparison, the GRACE TWSA data and the training periods used for BCNN networktraining are respectively the same as those employed in the three studies. The detaileddescriptions of the three TWSA products are summarized in Table A1.The comparison results are shown in Figures A8-A10, which illustrate the R , NSE,and (normalized) RMSE maps and the density scatter plots between the predicted andGRACE TWSAs. Note that the original GRACE-REC dataset provides the detrended anddeseasonalized TWSAs. The trend and seasonal signals obtained from the GRACE TWSAsand Humphrey et al. (2017), respectively, have been added to the original GRACE-RECTWSAs for consistency. It can be observed that BCNN obtains better metric values inthe vast majority of regions (especially in the arid and semiarid regions) and shows higheragreements with GRACE in comparison to the three previous studies. Recall that theGRACE trend information has been added to the original GRACE-REC TWSAs and wasalso utilized in F. Li et al. (2020) when predicting TWSAs. The results suggest the superiorcapability of BCNN in feature mining and thus providing reliable TWSA predictions tobridge the gap between GRACE and GRACE-FO. Two additional noteworthy advantages ofBCNN over existing methods are the relatively few assumptions and pre-processing involvedand its ability to handle simultaneously the global scale. The GRACE/GRACE-FO TWSA observations, together with the hydrological models,have been vital tools for water-related studies at regional or even global scales. However, theapproximately one-year gap of TWSA observations between the two GRACE missions mayintroduce signiﬁcant biases and uncertainties in models and consequently lead to misleadingpredictions. In this study, we propose a deep learning-based BCNN model driven by climaticdata to ﬁll this gap. By leveraging and implementing recent advances of deep learning (e.g.,the residual and dense connections, attention mechanisms, and Bayesian training strategy),the BCNN model is able to eﬀectively and eﬃciently extract important information forTWSA predictions from multi-source input data. Results show that BCNN can successfullycapture the complex spatio-temporal behaviors of TWSAs and identify the extreme dry/wetevents during the gap. The comparisons with hydrological model outputs and previousstudies further suggest that BCNN obtains a state-of-the-art performance in bridging thegap at a global scale.The outperformance of BCNN is mainly attributed to the use of TWSA trends, whichare derived from the available GRACE/GRACE-FO data before and after the gap, and itsoutstanding performance in feature extraction. The long-term TWSA trends are inducedby anthropogenic and/or natural factors and usually challenging-to-learn (Humphrey &Gudmundsson, 2019; F. Li et al., 2020; Z. Sun et al., 2020). The utilization of this trendinformation makes full use of the existing data and essentially eases the learning task for –10–anuscript submitted to

AGU Journal

BCNN. The BCNN’s robust capability in extracting key features for TWSA predictions isespecially illustrated by the comparison with the TWSA prediction products in which thetrend information was also employed. Note that we are concerned with bridging the gapbetween GRACE and GRACE-FO in the current work. For the TWSA reconstruction task,which aims to reconstruct the TWSAs for pre-2002 and is beyond the scope of this study,the trend information is unavailable. The performance of BCNN for such a task remains tobe investigated.

Acknowledgments

The JPL GRACE Mascon data used in this study are available at https://podaac.jpl.nasa.gov/dataset/TELLUS GRAC-GRFO MASCON CRI GRID RL06 V2 ; ERA5-Land data areavailable at https://cds.climate.copernicus.eu ; Noah TWSA dataset is downloadedfrom https://disc.gsfc.nasa.gov/ . This work was funded by the National Key Re-search and Development Program of China (2018YFC1800600), National Natural ScienceFoundation of China (41730856, 41874095, 41977157, 42002248, 42004073), China Post-doctoral Science Foundation (2020M681550), Jiangsu Planned Projects for PostdoctoralResearch Funds (2020Z133), and Fundamental Research Funds for the Central Universities(020614380106). The authors acknowledge Dr. Yinhao Zhu from Qualcomm AI Researchfor his valuable suggestions on implementing the Bayesian training strategy. The BCNNcodes and the predicted TWSA dataset generated in this work will be made available at https://github.com/njujinchun/bcnn4grace upon publication of this manuscript.

References

Ahmed, M., Sultan, M., Elbayoumi, T., & Tissot, P. (2019). Forecasting GRACE dataover the African watersheds using artiﬁcial neural networks.

Remote Sensing , (15),1769. doi: https://doi.org/10.3390/rs11151769Boergens, E., G¨untner, A., Dobslaw, H., & Dahle, C. (2020). Quantifying the CentralEuropean droughts in 2018 and 2019 with GRACE Follow-On. Geophysical ResearchLetters , (14), e2020GL087285. doi: https://doi.org/10.1029/2020GL087285Chang, L.-L., Yuan, R., Gupta, H. V., Winter, C. L., & Niu, G.-Y. (2020). Why isthe terrestrial water storage in dryland regions declining? A perspective based onGravity Recovery and Climate Experiment observations and Noah land surface modelwith multiparameterization schemes model simulations. Water Resources Research , (11), e2020WR027102. doi: https://doi.org/10.1029/2020WR027102Chen, J., Li, J., Zhang, Z., & Ni, S. (2014). Long-term groundwater variations in NorthwestIndia from satellite gravity measurements. Global and Planetary Change , , 130-138. doi: https://doi.org/10.1016/j.gloplacha.2014.02.007Famiglietti, J. S., Lo, M., Ho, S. L., Bethune, J., Anderson, K. J., Syed, T. H., . . . Rodell,M. (2011). Satellites measure recent rates of groundwater depletion in California’sCentral Valley. Geophysical Research Letters , (3), L03403. doi: https://doi.org/10.1029/2010GL046442Feng, W., Shum, C. K., Zhong, M., & Pan, Y. (2018). Groundwater storage changes inchina from satellite gravity: An overview. Remote Sensing , (5), 674. doi: https://doi.org/10.3390/rs10050674Feng, W., Zhong, M., Lemoine, J.-M., Biancale, R., Hsu, H.-T., & Xia, J. (2013). Evalua-tion of groundwater depletion in North China using the Gravity Recovery and ClimateExperiment (GRACE) data and ground-based measurements. Water Resources Re-search , (4), 2110-2118. doi: https://doi.org/10.1002/wrcr.20192Forootan, E., Kusche, J., Loth, I., Schuh, W.-D., Eicker, A., Awange, J., . . . Shum, C. K.(2014). Multivariate prediction of total water storage changes over West Africa frommulti-satellite data. Surveys in Geophysics , (4), 913-940. doi: https://doi.org/10.1007/s10712-014-9292-0Forootan, E., Schumacher, M., Mehrnegar, N., Bezdˇek, A., Talpe, M. J., Farzaneh, S., –11–anuscript submitted to AGU Journal . . . Shum, C. K. (2020). An iterative ICA-based reconstruction method to produceconsistent time-variable total water storage ﬁelds using GRACE and Swarm satellitedata.

Remote Sensing , (10), 1639. doi: https://doi.org/10.3390/rs12101639Gentine, P., Green, J. K., Gu´erin, M., Humphrey, V., Seneviratne, S. I., Zhang, Y., &Zhou, S. (2019). Coupling between the terrestrial carbon and water cycles—A review. Environmental Research Letters , (8), 083003. doi: https://doi.org/10.1088/1748-9326/ab22d6Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., . . . Chen, T. (2018). Recentadvances in convolutional neural networks. Pattern Recognition , , 354-377. doi:https://doi.org/10.1016/j.patcog.2017.10.013He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).

Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connectedconvolutional networks. In

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR).

Humphrey, V., & Gudmundsson, L. (2019). GRACE-REC: A reconstruction of climate-driven water storage changes over the last century.

Earth System Science Data , (3),1153–1170. doi: https://doi.org/10.5194/essd-11-1153-2019Humphrey, V., Gudmundsson, L., & Seneviratne, S. I. (2016). Assessing global water storagevariability from GRACE: Trends, seasonal cycle, subseasonal anomalies and extremes. Surveys in Geophysics , (2), 357-395. doi: https://doi.org/10.1007/s10712-016-9367-1Humphrey, V., Gudmundsson, L., & Seneviratne, S. I. (2017). A global reconstructionof climate-driven subdecadal water storage variability. Geophysical Research Letters , (5), 2300-2309. doi: https://doi.org/10.1002/2017GL072564Jing, W., Zhao, X., Yao, L., Di, L., Yang, J., Li, Y., . . . Zhou, C. (2020). Can terres-trial water storage dynamics be estimated from climate anomalies? Earth and SpaceScience , (3), e2019EA000959. doi: https://doi.org/10.1029/2019EA000959Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Y. Ben-gio & Y. LeCun (Eds.), Li, B., Rodell, M., Kumar, S., Beaudoing, H. K., Getirana, A., Zaitchik, B. F., . . . Bet-tadpur, S. (2019). Global GRACE data assimilation for groundwater and droughtmonitoring: Advances and challenges.

Water Resources Research , (9), 7564-7586.doi: https://doi.org/10.1029/2018WR024618Li, F., Kusche, J., Rietbroek, R., Wang, Z., Forootan, E., Schulze, K., & L¨uck, C.(2020). Comparison of data-driven techniques to reconstruct (1992–2002) and predict(2017–2018) GRACE-like gridded total water storage changes using climate inputs. Water Resources Research , (5), e2019WR026551. doi: https://doi.org/10.1029/2019WR026551Liu, Q., & Wang, D. (2016). Stein variational gradient descent: A general purpose Bayesianinference algorithm. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, & R. Garnett(Eds.), Advances in Neural Information Processing Systems (NeurIPS) (Vol. 29, pp.2378–2386). Curran Associates, Inc.Long, D., Shen, Y., Sun, A., Hong, Y., Longuevergne, L., Yang, Y., . . . Chen, L. (2014).Drought and ﬂood monitoring for a large karst plateau in Southwest China usingextended GRACE data.

Remote Sensing of Environment , , 145-160. doi: https://doi.org/10.1016/j.rse.2014.08.006Misra, D. (2019). Mish: A self regularized non-monotonic activation function. arXive-prints , arXiv:1908.08681.Mo, S., Zabaras, N., Shi, X., & Wu, J. (2019). Deep autoregressive neural networks for high-dimensional inverse problems in groundwater contaminant source identiﬁcation. WaterResources Research , (5), 3856-3881. doi: https://doi.org/10.1029/2018WR024638Mo, S., Zabaras, N., Shi, X., & Wu, J. (2020). Integration of adversarial autoencoders –12–anuscript submitted to AGU Journal with residual dense convolutional networks for estimation of non-Gaussian hydraulicconductivities.

Water Resources Research , (2), e2019WR026082. doi: https://doi.org/10.1029/2019WR026082Mo, S., Zhu, Y., Zabaras, N., Shi, X., & Wu, J. (2019). Deep convolutional encoder-decodernetworks for uncertainty quantiﬁcation of dynamic multiphase ﬂow in heterogeneousmedia. Water Resources Research , (1), 703-728. doi: https://doi.org/10.1029/2018WR023528Rateb, A., Scanlon, B. R., Pool, D. R., Sun, A., Zhang, Z., Chen, J., . . . Zell, W.(2020). Comparison of groundwater storage changes from GRACE satellites withmonitoring and modeling of major U.S. aquifers. Water Resources Research , (12),e2020WR027556. doi: https://doi.org/10.1029/2020WR027556Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., &Prabhat. (2019). Deep learning and process understanding for data-driven Earthsystem science. Nature , (7743), 195-204. doi: https://doi.org/10.1038/s41586-019-0912-1Richey, A. S., Thomas, B. F., Lo, M.-H., Reager, J. T., Famiglietti, J. S., Voss, K., . . .Rodell, M. (2015). Quantifying renewable groundwater stress with GRACE. WaterResources Research , (7), 5217-5238. doi: https://doi.org/10.1002/2015WR017349Richter, H. M. P., L¨uck, C., Klos, A., Sideris, M. G., Rangelova, E., & Kusche, J. (2021). Re-constructing GRACE-type time-variable gravity from the Swarm satellites. ScientiﬁcReports , , 1117. doi: https://doi.org/10.1038/s41598-020-80752-wRodell, M., Famiglietti, J. S., Wiese, D. N., Reager, J. T., Beaudoing, H. K., Landerer,F. W., & Lo, M. H. (2018). Emerging trends in global freshwater availability. Nature , (7707), 651-659. doi: https://doi.org/10.1038/s41586-018-0123-1Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., . . .Toll, D. (2004). The global land data assimilation system. Bulletin of the AmericanMeteorological Society , (3), 381-394. doi: https://doi.org/10.1175/BAMS-85-3-381Scanlon, B. R., Zhang, Z., Save, H., Sun, A. Y., M¨uller Schmied, H., van Beek, L. P. H.,. . . Bierkens, M. F. P. (2018). Global models underestimate large decadal decliningand rising water storage trends relative to GRACE satellite data. Proceedings of theNational Academy of Sciences , (6), E1080–E1089. doi: https://doi.org/10.1073/pnas.1704665115Shen, C. (2018). A transdisciplinary review of deep learning research and its relevancefor water resources scientists. Water Resources Research , (11), 8558-8593. doi:https://doi.org/10.1029/2018WR022643Smith, R. G., & Majumdar, S. (2020). Groundwater storage loss associated with landsubsidence in Western United States mapped using machine learning. Water ResourcesResearch , (7), e2019WR026621. doi: https://doi.org/10.1029/2019WR026621Soltani, S. S., Ataie-Ashtiani, B., & Simmons, C. T. (2021). Review of assimilating GRACEterrestrial water storage data into hydrological models: Advances, challenges andopportunities. Earth-Science Reviews , , 103487. doi: https://doi.org/10.1016/j.earscirev.2020.103487Sun, A. Y., Scanlon, B. R., Save, H., & Rateb, A. (2020). Reconstruction of GRACEtotal water storage through automated machine learning. Water Resources Research ,e2020WR028666. doi: https://doi.org/10.1029/2020WR028666Sun, A. Y., Scanlon, B. R., Zhang, Z., Walling, D., Bhanja, S. N., Mukherjee, A., & Zhong,Z. (2019). Combining physically based modeling and deep learning for fusing GRACEsatellite data: Can we learn from mismatch?

Water Resources Research , (2),1179-1195. doi: https://doi.org/10.1029/2018WR023333Sun, Z., Long, D., Yang, W., Li, X., & Pan, Y. (2020). Reconstruction of GRACEdata on changes in total water storage over the global land surface and 60 basins. Water Resources Research , (4), e2019WR026250. doi: https://doi.org/10.1029/2019WR026250Tapley, B. D., Watkins, M. M., Flechtner, F., Reigber, C., Bettadpur, S., Rodell, M.,. . . Velicogna, I. (2019). Contributions of GRACE to understanding climate change. –13–anuscript submitted to AGU Journal

Nature Climate Change , , 358–369. doi: https://doi.org/10.1038/s41558-019-0456-2Wang, F., Shen, Y., Chen, Q., & Wang, W. (2021). Bridging the gap between GRACEand GRACE Follow-on monthly gravity ﬁeld solutions using improved multichannelsingular spectrum analysis. Journal of Hydrology , 125972. doi: https://doi.org/10.1016/j.jhydrol.2021.125972Watkins, M. M., Wiese, D. N., Yuan, D.-N., Boening, C., & Landerer, F. W. (2015).Improved methods for observing Earth’s time variable mass distribution with GRACEusing spherical cap mascons.

Journal of Geophysical Research: Solid Earth , (4),2648-2671. doi: https://doi.org/10.1002/2014JB011547Wei, K., Ouyang, C., Duan, H., Li, Y., Chen, M., Ma, J., . . . Zhou, S. (2020). Reﬂec-tions on the catastrophic 2020 Yangtze River Basin ﬂooding in southern China. TheInnovation , (2), 100038. doi: https://doi.org/10.1016/j.xinn.2020.100038Woo, S., Park, J., Lee, J.-Y., & Kweon, I. S. (2018). CBAM: Convolutional block attentionmodule. In Proceedings of the European Conference on Computer Vision (ECCV).

Xiao, M., Koppa, A., Mekonnen, Z., Pag´an, B. R., Zhan, S., Cao, Q., . . . Lettenmaier,D. P. (2017). How much groundwater did California’s Central Valley lose during the2012–2016 drought?

Geophysical Research Letters , (10), 4872-4879. doi: https://doi.org/10.1002/2017GL073333Yin, W., Han, S.-C., Zheng, W., Yeo, I.-Y., Hu, L., Tangdamrongsub, N., & Ghobadi-Far, K. (2020). Improved water storage estimates within the North China Plain byassimilating GRACE data into the CABLE model. Journal of Hydrology , , 125348.doi: https://doi.org/10.1016/j.jhydrol.2020.125348Zaitchik, B. F., Rodell, M., & Reichle, R. H. (2008). Assimilation of GRACE terres-trial water storage data into a land surface model: Results for the Mississippi RiverBasin. Journal of Hydrometeorology , (3), 535-548. doi: https://doi.org/10.1175/2007JHM951.1Zhong, Y., Zhong, M., Feng, W., Zhang, Z., Shen, Y., & Wu, D. (2018). Groundwaterdepletion in the West Liaohe River Basin, China and its implications revealed byGRACE and in situ measurements. Remote Sensing , (4), 493. doi: https://doi.org/10.3390/rs10040493Zhu, Y., & Zabaras, N. (2018). Bayesian deep convolutional encoder–decoder networks forsurrogate modeling and uncertainty quantiﬁcation. Journal of Computational Physics , , 415-447. doi: https://doi.org/10.1016/j.jcp.2018.04.018 –14–anuscript submitted to AGU Journal

Appendix A Supporting Information

Table A1.

Summaries of Experimental Settings in Previous Studies.

Authors Study area GRACE data a Trainingperiod Test periodHumphrey andGudmundsson(2019) b Global JPL masconRL06 - Apr. 2014-June 2017;June 2018-July 2019F. Li et al.(2020) c Global CSR masconRL06 d Apr. 2002-June 2017 June 2018-Dec. 2018Z. Sun et al.(2020) 26 basins JPL masconRL06 Apr. 2002-Jan. 2014 Feb. 2014-June 2017 a There might be multiple GRACE data products used in the referenced studies. Here we list theproduct used in the comparison. b The generated data product is known as GRACE-REC and available at https://doi.org/10.6084/m9.figshare.7670849 . The GRACE-REC product driven by the ERA5 climatic data isused for comparison. c The generated data product is available at https://github.com/strawpants/twsc recon . a The CSR GRACE product is downloaded from . Figure A1.

A global map of the aridity index, with -1 and 1 indicating the most aridand humid conditions, respectively. The data is available at https://doi.org/10.5523/bris.16ctquxqxk46h2v61gz7drcdz3 . Figure A2.

Spatial distribution of the absolute correlation coeﬃcient ( | R | ) between the cumu-lative water storage changes (CWSCs) and GRACE TWSAs (April 2002-June 2017, June 2018-August 2020), where ( t − i ) denotes i -month lag.–15–anuscript submitted to AGU Journal

Figure A3.

Illustration of the Bayesian convolutional neural network (BCNN) architecture.BCNN takes n x images with a size of H × W as inputs and generates n y images with the samesize. It is an alternating cascade of convolutional (Conv)/transposed convolutional (ConvT) layersand convolutional block attention modules (CBAM), each of which outputs n f = 48 feature maps.The size of feature maps is sequentially halved in each Conv layer from H × W to H × W soas to extract multi-scale features, and then sequentially recovered to H × W using the ConvTlayers. The symbols + ○ , c ○ , and × ○ denote the addition (i.e. residual connection), concatenation(i.e. dense connection), and multiplication (i.e. attention connection) operations, respectively. TheMish activation (Act) function (Misra, 2019) is used in the network. BN=batch normalization. Figure A4.

Diagrams of the channel and spatial attention modules. The inputs to each moduleare n f feature maps with a size of H (cid:48) × W (cid:48) . The channel attention module utilizes both max-pooling(MaxPool) outputs ( n f × ×

1) and average-pooling (AvgPool) outputs ( n f × ×

1) with a sharedmulti-layer perceptron (MLP) to produce a channel attention map. The spatial attention moduleutilizes similar two outputs (each with a shape of 1 × H (cid:48) × W (cid:48) ) to produce a spatial attention map.The sigmoid activation is used to guarantee the values in the attention maps are between 0 and 1.Conv=Convolution; Act=Activation. –16–anuscript submitted to AGU Journal

Figure A5.

Spatial maps of R (row 1), NSE (row 2), and RMSE (row 3) between the GRACETWSAs and Noah- (left) and ERA5L-derived (right) TWSAs during the test period (April 2014-June 2017, June 2018-August 2020). Row 4: The density scatter plots between the GRACE andmodeled TWSAs. The Noah and ERA5L TWSAs have been corrected with GRACE TWSAs’ trend(trend GRACE ). That is, TWSA m = detrend (TWSA m )+trend GRACE , where detrend( · ) denotes thelinear detrending operation, and TWSA m represents the Noah or ERA5L TWSAs.–17–anuscript submitted to AGU Journal

Figure A6.

Locations of the regions and river basins considered for result analysis.

Figure A7.

The basin-averaged GRACE, Noah, ERA5L, and BCNN TWSA time serials indiﬀerent river basins. Shaded areas represent the 95% condence interval (CI) of BCNN predictions.–18–anuscript submitted to

AGU Journal

Figure A8.

The predictive R (row 1), NSE (row 2), and RMSE (row 2) accuracy of GRACE-REC (left) and our BCNN (right) for the GRACE TWSAs (test period: April 2014-June 2017, June2018-July 2019). Row 4: The density scatter plots between the GRACE and modeled TWSAs.The trending and seasonal signals extracted from GRACE TWSAs and Humphrey et al. (2017),respectively, have been added to the original GRACE-REC TWSAs for consistency.–19–anuscript submitted to AGU Journal

Figure A9.

The predictive R (row 1), NSE (row 2), and RMSE (row 3) accuracy for the GRACETWSAs obtained by F. Li et al. (2020) (left) and our BCNN (right). Row 4: The density scatterplots between the GRACE and modeled TWSAs. The test period is June 2018-December 2018.F. Li et al. (2020) provided the predicted TWSAs for 26 major river basins over the world.–20–anuscript submitted to AGU Journal

Figure A10.