[PDF] Physics-Guided Recurrent Graph Networks for Predicting Flow and Temperature in River Networks

Abstract

This paper proposes a physics-guided machine learning approach that combines advanced machine learning models and physics-based models to improve the prediction of water flow and temperature in river networks. We first build a recurrent graph network model to capture the interactions among multiple segments in the river network. Then we present a pre-training technique which transfers knowledge from physics-based models to initialize the machine learning model and learn the physics of streamflow and thermodynamics. We also propose a new loss function that balances the performance over different river segments. We demonstrate the effectiveness of the proposed method in predicting temperature and streamflow in a subset of the Delaware River Basin. In particular, we show that the proposed method brings a 33\%/14\% improvement over the state-of-the-art physics-based model and 24\%/14\% over traditional machine learning models (e.g., Long-Short Term Memory Neural Network) in temperature/streamflow prediction using very sparse (0.1\%) observation data for training. The proposed method has also been shown to produce better performance when generalized to different seasons or river segments with different streamflow ranges.

Full PDF

PPhysics-Guided Recurrent Graph Networks forPredicting Flow and Temperature in River Networks

Xiaowei Jia , Jacob Zwart , Jeffery Sadler , Alison Appling , Samantha Oliver , Steven Markstrom ,Jared Willard , Shaoming Xu , Michael Steinbach , Jordan Read , and Vipin Kumar University of Pittsburgh U.S. Geological Survey University of Minnesota [email protected], { jzwart,jsadler,aappling,soliver,markstro,jread } @usgs.gov, { willa099,xu000114,stei0062,kumar001 } @umn.edu Abstract —This paper proposes a physics-guided machinelearning approach that combines advanced machine learningmodels and physics-based models to improve the prediction ofwater ﬂow and temperature in river networks. We ﬁrst builda recurrent graph network model to capture the interactionsamong multiple segments in the river network. Then we presenta pre-training technique which transfers knowledge from physics-based models to initialize the machine learning model and learnthe physics of streamﬂow and thermodynamics. We also proposea new loss function that balances the performance over differentriver segments. We demonstrate the effectiveness of the proposedmethod in predicting temperature and streamﬂow in a subsetof the Delaware River Basin. In particular, we show that theproposed method brings a 33%/14% improvement over the state-of-the-art physics-based model and 24%/14% over traditionalmachine learning models (e.g., Long-Short Term Memory NeuralNetwork) in temperature/streamﬂow prediction using very sparse(0.1%) observation data for training. The proposed method hasalso been shown to produce better performance when generalizedto different seasons or river segments with different streamﬂowranges.

I. I

NTRODUCTION

Machine learning (ML) models, which have found immensesuccess in commercial applications, e.g., computer visionand natural language processing, are beginning to play animportant role in advancing scientiﬁc discovery [1]–[3]. Giventheir power in automatically learning from observation data,ML models are particularly promising in scientiﬁc problemsinvolving complex processes that are not completely under-stood by our current body of knowledge. However, scientiﬁcproblems often involve non-stationary relationships amongphysical variables which can change over space and time. Inthe absence of adequate information about the physical mech-anisms of real-world processes, traditional ML approaches areprone to false discoveries because it is difﬁcult to capturethese complex relationships solely from data. Moreover, thedata available for many scientiﬁc problems is far smaller thanwhat is needed to effectively train advanced ML models.The focus of this paper is on modeling physical systems thathave multiple interacting processes. In particular, we considerthe application of predicting ﬂow and temperature in rivernetworks for both observed and unobserved river segments. Inthis problem, segments in the river network can show differentﬂow and thermodynamic patterns driven by differences in catchment characteristics (e.g. slope, soil characteristics) andmeteorological drivers (e.g. temperature, precipitation). Thesesegments also interact with each other through the wateradvected from upstream to downstream segments. Moreover,there are often only a handful of river segments in a networkthat are monitored; thus there is limited data to train ML mod-els. Accurate prediction of streamﬂow and water temperatureaids in decision making for resource managers, establishesrelationships between ecological outcomes and streamﬂow orwater temperature, and helps predict other biogeochemical orecological processes. For example, drinking water reservoiroperators in the Delaware River Basin need to supply safedrinking water to New York City while also maintainingsufﬁcient streamﬂow and cool water temperatures in the rivernetwork downstream of the reservoirs [4]. Accurate predictionof streamﬂow and water temperature helps managers optimizewhen and how much water to release downstream to maintainthe ﬂow and temperature regimes.In scientiﬁc domains, physics-based models are often usedto study engineering and environmental systems. Even thoughthese models are based on known physical laws that govern re-lationships between input and output variables, most physics-based models are necessarily approximations of reality dueto incomplete knowledge of certain processes or omissionof processes to maintain computational efﬁciency. In partic-ular, existing physics-based approaches for predicting rivernetworks simulate the internal distribution of target variables(e.g., streamﬂow and temperature) based on general physicalrelationships such as energy and mass conservation. However,the model predictions still rely on qualitative parameterizations(approximations) based on soil and surﬁcial geologic classi-ﬁcation along with topography, land cover and climate input.Hence, such models can only provide sub-optimal predictionperformance. Furthermore, calibration of physics-based mod-els for river networks is often extremely time intensive dueto interactions among parameters within segments and acrosssegments and also requires expert knowledge of the systemand model to calibrate successfully. The limitations of physics-based models cut across discipline boundaries and are wellknown in the scientiﬁc community; e.g., see a series of debatepapers in hydrology [5]–[7].Intuitively, we can model each river segment independently a r X i v : . [ phy s i c s . g e o - ph ] S e p y an individual ML model such as a Recurrent NeuralNetwork (RNN). However, this approach has two major lim-itations: 1) In a river network, there exist many differenttypes of river segments with diverse catchment characteristics(e.g., slopes, elevation, etc.). Note that most of the segmentsare poorly observed or completely unobserved, which makesit impossible to build a purely data driven model for eachsegment separately. 2) The individual models may ignore therich spatial and temporal contextual information, e.g., howthe streamﬂows from upstream segments impact the watertemperature in downstream segments in the next few days.The ﬁrst issue could be partly addressed by pre-trainingthe ML model using simulation data produced by physics-based models, but such pre-trained ML models still needsome observations for reﬁnement [8], [9]. In particular, forunobserved river segments, the performance of pre-trained MLmodels can be no better than physics-based models that canhave rather poor performance. Addressing the second issuewill require the development of sophisticated ML architecturethat can leverage latent information that is transferred acrossriver segments.In this paper, we propose a global model, Physics GuidedRecurrent Graph Networks (PGRGrN), which is applied to allthe river segments. The architecture of PGRGrN is based onRecurrent Neural Networks (RNN) and Graph ConvolutionalNeural Networks (GCN), which explicitly captures the spatialinteractions among different river segments as well as theirtemporal dynamics. Modeling of the spatial and temporal con-text is critical for the global ML model as it enables learningof different behavior patterns for different river segments evenwhen they have similar input features on certain dates.Design of such an architecture for this application faces twochallenges. First, existing GCN-based models extract abstractlatent variables (i.e., graph embeddings) to propagate over thenetworks but do not explicitly incorporate the prior physicalknowledge about the interactions among different nodes. Suchlatent variables can become less informative when they arelearned from sparser and less representative observation data,which can make the GCN model not generalizable. To addressthis challenge, we propose to utilize the intermediate variablessimulated by the physics-based model to guide the learningprocess of the graph neural networks. This approach aimsto enforce a physical interpretation to latent variables learnedfrom each river segment by transferring the prior knowledgeencoded by the physics-based model to the proposed MLmodel. Our experimental evaluation shows that this strategyis effective in initializing the ML model, which can then beﬁne-tuned using observations from the river network.The second challenge is that target variables can varydrastically across different processes of a complex system. Forexample, streamﬂow can vary by several orders of magnitudeacross segments in a river network. When we train a global MLmodel over the entire river network, the training process usingtraditional loss functions (e.g., mean squared loss) tends tooptimize the overall performance over training data availablefor the entire river networks, and thus can be dominated by river segments that contribute most to the overall error (e.g.,segments with high streamﬂow). However, it is also importantto accurately predict river segments with lower streamﬂow,as accurate prediction for these segments provides importantinformation regarding the habitat for aquatic life and aquaticbiogeochemical cycling. To address this challenge, we designa new loss function to ensure that the global ML model cansimultaneously capture the local patterns of all the differentsegments. The local patterns of each segment can be extractedusing an individual ML model trained only for this segmentusing simulation data (which is plentiful). Then during thetraining of the global ML model, we use a distance-basedloss function, the contrastive loss function, to enforce itsconsistency with the extracted local patterns.We implement our proposed method in a real-world datasetcollected over 36 years from the Delaware River Basin locatedin the Northeast United States and demonstrate our method’ssuperior predictive performance over existing methods. More-over, we show that the proposed method produces much betterprediction performance using sparse observations and also hasbetter generalizability.Our contributions can be summarized as follows: • We introduce a new recurrent graph network architectureto model a river network with interacting river segments. • We leverage knowledge from physics-based models toguide ML models for extracting latent variables, whichhelps initialize the model while enforcing consistencywith known physical relationships amongst different pro-cesses. • We propose a new contrastive loss function that balancesthe prediction performance over different river segments. • We evaluate the utility in the context of an ecologicallyand societally relevant problem of monitoring river net-works. II. R

ELATED W ORK

Recent works have shown the promise of integrating physicsinto ML models in improving the predictive performanceand generalizability in scientiﬁc problems. This is commonlyconducted in several ways, including developing new modelarchitectures [10], [11], applying additional loss functions [8],[9], and modeling prediction residuals [12], [13]. When ap-plied to systems with interacting processes, ML models areexpected to have sufﬁcient capacity to model such interactions.New ML architectures have been designed to enforce knownphysical relationships among multiple internal processes thatjointly convert inputs to outputs [10], [11], [14], thus reducingthe space for searching parameters. ML models have alsoshown great potential in modeling river networks [15], [16].For example, Moshe et al. [16] propose HydroNets, whichcombines the information from each river segment and itsupstream segments for improving streamﬂow predictions. Italso learns local patterns for each basin by introducing basin-speciﬁc model layers in addition to the global model. Thismethod focuses on predicting basins that are well monitoredand it remains limited in generalizing to different scenariosr learning with less data. In contrast, we leverage the priorphysical knowledge to learn latent variables that make themodel more generalizable.The Graph Convolutional Networks (GCN) model hasproven to be effective in automatically extracting latent factorsthat inﬂuence the neighbors in a graph. The use of GCN hasalso shown improved prediction accuracy in several scientiﬁcproblems [17]–[19]. However, the information propagatedamongst nodes in GCN is essentially an abstract representationlearned by end-to-end training. Such abstract representationsare not meant to enforce consistency with known physical rela-tionships among different processes, such as in river networks.Simulation data have been used to assist in training MLmodels. Since many ML models require an initial choice ofmodel parameters before training, researchers have exploreddifferent ways to physically inform a model starting state.Poor initialization can cause models to anchor in local minima,which is especially true for deep neural networks. One wayto harness physics-based modeling knowledge is to use thephysics-based model’s simulated data to pre-train the MLmodel, which also alleviates data paucity issues. Jia et al. ex-tensively discuss this strategy [8]. They pre-train their Physics-Guided Recurrent Neural Network (PGRNN) models for laketemperature modeling on simulated data generated from aphysics-based model and ﬁne-tune it with little observeddata. They show that pre-training can signiﬁcantly reduce thetraining data needed for a quality model. In addition, Read etal. [20] show that such models are able to generalize better tounseen scenarios than pure physics-based models.III. P

ROBLEM D EFINITION AND P RELIMINARIES

A. Problem deﬁnition

Our objective is to model the dynamics of temperature andstreamﬂow in a set of connected river segments that togetherform a river network. The connections amongst these river seg-ments can be represented in a graph structure G = {V , E , A } ,where V represents the set of river segments and E representsthe set of connections amongst river segments. Speciﬁcally, wecreate an edge ( i, j ) ∈ E if the segment i is anywhere upstreamof the segment j . The matrix A represents the adjacency levelbetween each pair of segments, i.e., A ij = 0 means there is noedge from the segment i to the segment j and a higher valueof A ij indicates that the segment i is closer to the segment j in terms of the river distance. More details of how we generatethe adjacency matrix are discussed in Section V-A.For each river segment i , we have access to its input featuresat multiple time steps X i = { x i , x i , ..., x Ti } . The input features x ti are a D -dimensional vector, which include meteorologicaldrivers, geometric parameters of the segments, etc. (moredetails can be found in Section V-A). We also have a set ofobserved target variables Y = { y ti } but they are only availablefor certain time steps t ∈ { , ..., T } and certain segments i ∈ { , ..., N } . B. Physics-based Streamﬂow and Temperature Model

The Precipitation-Runoff Modeling System (PRMS) [21]and the coupled Stream Network Temperature Model(SNTemp) [22] is a physics-based model that simulates dailystreamﬂow and water temperature for river networks, and othervariables. PRMS is a one-dimensional, distributed-parametermodeling system that translates spatially-explicit meteoro-logical information into water information including evap-oration, transpiration, runoff, inﬁltration, groundwater ﬂow,and streamﬂow. PRMS has been used to simulate catchmenthydrologic variables at regional [23] to national scales [24]in support of resource management decisions, among otherapplications. The SNTemp module for PRMS simulates meandaily stream water temperature for each river segment bysolving an energy mass balance model which accounts forthe effect of inﬂows (upstream, groundwater, surface runoff),outﬂows, and surface heating and cooling on heat transfer ineach stream segment. The SNTemp module is driven by thesame meteorological drivers used in PRMS and also driven bythe hydrologic information simulated by PRMS (e.g. stream-ﬂow, groundwater ﬂow). Calibration of PRMS-SNTemp isextremely time-consuming because it involves a large numberof parameters (84 parameters) and the parameters interact witheach other both within segments and across segments.

C. Recurrent Neural Networks and Long-Short Term Memory

The RNN model has been widely used to model thetemporal patterns in sequential data. The RNN model deﬁnestransition relationships for the extracted hidden representationthrough a recurrent cell structure. In this work, we adopt theLong-Short Term Memory (LSTM) cell which has proven tobe effective in capturing long-term dependencies. The LSTMcell combines the input features x t at each time step and theinherited information from previous time steps. Here we omitthe subscript i as we do not target a speciﬁc river segment.Each LSTM cell has a cell state c t , which serves as amemory and allows preserving information from the past.Speciﬁcally, the LSTM ﬁrst generates a candidate cell state ¯ c t by combining x t and the hidden representation at previoustime step h t − , as follows: ¯ c t = tanh ( W hc h t − + W xc x t + b c ) . (1) Then the LSTM generates a forget gate f t , an input gate g t , and an output gate via sigmoid function σ ( · ) , as follows: f t = σ ( W hf h t − + W xf x t + b f ) , g t = σ ( W hg h t − + W xg x t + b g ) , o t = σ ( W ho h t − + W xo x t + b o ) . (2) The forget gate is used to ﬁlter the information inheritedfrom c t − , and the input gate is used to ﬁlter the candidatecell state at t . Then we compute the new cell state and thehidden representation as follows: c t = f t ⊗ c t − + g t ⊗ ¯ c t , h t = o t ⊗ tanh ( c t ) , (3) where ⊗ denotes the entry-wise product.ccording to the above equations, we can observe that thecomputation of h t combines the information at current timestep ( x t ) and previous time step ( h t − and c t − ), and thusencodes the temporal patterns learned from data.IV. M ETHOD

In this section, we describe the details of the PGRGrNmethod. We start with introducing the model architecture.Then we discuss a strategy to help enforce physical rela-tionships by leveraging the physical knowledge embedded inphysics-based models. Finally, we introduce a contrastive lossfunction that attempts to ensure that the model performance onindividual river segments is not compromised while optimizingthe performance on the entire set of segments. (a) (b)Fig. 1. (a) The RGrN architecture for an example river network with threesegments (b). The segment a and the segment b are upstream of the segment i .The grey arrows indicate the modeling components for upstream segments. A. Recurrent Graph Network

In a river network, most river segments are poorly ob-served or completely unobserved. To this end, we introducea global ML model architecture, Recurrent Graph Network(RGrN), which is trained using data collected from all theriver segments. Effective modeling of river segments requiresthe ability to capture their temporal dynamics and the inﬂuencereceived from upstream segments. Hence, we incorporate theinformation from both previous time steps and neighbors (i.e.,upstream segments) when modeling each segment (Fig. 1).Here we describe the recurrent process of generating thehidden representation h t from h t − , and we repeat this processfor the entire sequence from t = 2 to T ( h learned from anLSTM model). For each river segment i at time t − , the modelextracts latent variables which contain relevant informationto pass to its downstream segments. We refer to these latentvariables as transferred variables. For example, the amount ofwater advected from this segment and its water temperaturecan directly impact the change of water temperature for itsdownstream segments. We generate the transferred variables q t − i from the hidden representation h t − i , as follows: q t − i = tanh ( W q h t − i + b q ) , (4) where W q and b q are model parameters that are used toconvert the hidden representation to transferred variables.After gathering the transferred variables for all the seg-ments, we develop a new recurrent cell structure for eachsegment i that integrates the transferred variables from itsupstream segments into the computation of the cell state c ti .This can be expressed as follows: c ti = f ti ⊗ ( c t − i + (cid:88) ( j,i ) ∈E A ji q t − j ) + g ti ⊗ ¯ c ti (5) We can observe that the forget gate not only ﬁlters theprevious information from the segment i itself but also from itsneighbors (i.e., upstream segments). Each upstream segment j is weighted by the adjacency level A ji between j and i .When a river segment has no upstream segments (i.e., headwater), the computation of c ti is the same as with the standardLSTM. In Eq. 5, we use q t − j from the previous time stepbecause of the time delay in transferring the inﬂuence fromupstream to downstream segments (the maximum travel time isapproximately one day according to PRMS). We also discussthe impact of increasing the time delay in Section V-G.After obtaining the cell state, we can compute the hiddenrepresentation h ti by following Eq. 3. Finally, we generate thepredicted output from the hidden representation as follows: ˆ y ti = W y h ti + b y , (6) where W y and b y are model parameters.After applying this recurrent process to all the time steps,we deﬁne a loss using true observations Y = { y ti } that areavailable at certain time steps and certain segments, as follows: L RGrN = 1 | Y | (cid:88) { ( i,t ) | y ti ∈ Y } ( y ti − ˆ y ti ) . (7) B. Transferring knowledge from physics-based models

The RGrN architecture has the capacity to model the latentinformation that is transferred across river segments. However,training RGrN directly in an end-to-end fashion can onlylearn an abstract representation for transferred variables whileignoring their physical interpretation. The transferred variablescan become less informative when they are learned fromsparser and less representative observation data. To this end,we introduce a new strategy to enforce the prior physical rela-tionships amongst different river segments which are encodedby physics-based models. It helps make RGrN model moregeneralizable and also reduces the amount of observation datarequired to train a high-quality model. This strategy can beapplied to a wide range of scientiﬁc problems that are modeledas a set of interacting processes.We use the river temperature modeling as an example inthis section to better illustrate the proposed strategy. Foreach river segment in the network, the temperature changeis driven by energy exchanges caused by solar radiation,rainfall, evaporation, conductive and convective heat transfer,and net heat advected into river segments (e.g., groundwaterﬂow, upstream ﬂow, downstream ﬂow). These energy ﬂuxesan be summarized into three categories and the process oftemperature change conforms to the equation as follows: ∆ Temperature ∝ F in − F out + F up , (8) where F in denotes the incoming energy ﬂuxes from solarradiation, rainfall and other natural sources, F out denotesthe outgoing energy ﬂuxes including long-wave emission,evaporation, conductive and convective heat transfer, and F up denotes the net heat advected into the river segment fromupstream segments. The term F up can be estimated by a setof intermediate physical variables from upstream segments,which include upstream ﬂow, upstream water temperature,relative humidity, and other physical characteristics. Theseintermediate physical variables can be simulated by PRMS-SNTemp internally.Our goal is to ensure that the transferred variables q ti ateach segment contain sufﬁcient information to represent theseintermediate physical variables so that downstream segmentscan gather all the information needed for capturing F up .For each segment i at time t , we ﬁrst simulate the set ofintermediate variables s ti by running PRMS-SNTemp. Thenwe use s ti to add supervision on the transferred variables q ti such that we can extract s ti from q ti . More formally, we deﬁnea loss function on transferred variables as follows: L trans = 1 NT (cid:88) i (cid:88) t || s ti − ( W s q ti + b s ) || , (9) where W s and b s are model parameters that convert trans-ferred variables to the intermediate physical variables. Wecall this model as Physics-Guided Recurrent Graph Networks(PGRGrN) because we leverage the physical relationshipsbetween different river segments.Since the computation of loss L trans does not require obser-vation data, we can use it to pre-train RGrN to enforce physicalrelationships. This pre-training method not only explicitlyenforces the physical relationships among river segments, butalso enables full usage of physical intermediates obtained fromphysics-based models to enhance the representation learnedfor q ti and its previous layer h ti . In particular, the intermediatephysical variables used in this work include streamﬂow, streamtemperature, relative humidity, cloud cover, groundwater andshallow subsurface ﬂow, and surface runoff.It is noteworthy that the model extracts the intermediatevariables from transferred variables q ti rather than forcing thetransferred variables to be exactly the intermediate physicalvariables. Also, we optimize the loss L trans in pre-trainingrather than using it as a regularizer in supervised learning.In other words, the ML model is guided but not constrainedby the physics-based model output, which allows for moreﬂexibility to automatically learn information in q ti that ispoorly known or not yet discovered while also remaininghelpful for modeling the interactions among river segments.At the same time, we can run PRMS-SNTemp to simulatethe ﬁnal target variables corresponding to y ti . Here we use ˜ y ti to represent the simulated target variables by PRMS-SNTemp.Although the simulated data are not accurate reﬂection of Fig. 2. Error back-propagation during pre-training stage. the observation data, we can generate adequate simulationson every day and for every segment. The simulation dataalso follow many general physical relationships used to buildthe physics-based model. Given that observation data is oftenscarce, we can use simulated target variables to initialize themodel via pre-training. Hence, we deﬁne another pre-trainingloss on target variables as follows: L tar = 1 NT (cid:88) i (cid:88) t (˜ y ti − ˆ y ti ) , (10) Combining Eqs. 9 and 10, we get the ﬁnal pre-training lossas follows: L pre = L tar + λ L trans , (11) where λ is a hyper-parameter to balance two losses. The back-propagation of two losses is shown in Fig. 2.In summary, our pre-training strategy helps initialize PGR-GrN using both simulated intermediate physical variablesand simulated target variables. By enforcing known physicalrelationships between segments, the model becomes more gen-eralizable. Besides, the ML model pre-trained with adequatesimulated ˜ y can get much closer to its ﬁnal optimal solution,and thus require fewer observation data for ﬁne-tuning. Wecall this model which is pre-trained using simulation data andthen ﬁne-tuned with true observations as PGRGrN ptr . C. Segment-wise contrastive loss

The relationship between input features and target variablescan be very complex in environmental systems, e.g., slightchanges in segment slope and catchment size can drasticallyalter the streamﬂow. Traditional loss functions for regressionproblems, such as mean squared loss, tend to be dominatedby river segments with larger errors while degrading theperformance on other segments with smaller errors. This issuecan be further exacerbated given limited observation data onmost river segments, especially low-ﬂow segments. Althoughimproving the segments with smaller errors does not contributemuch to reducing the overall error, accurately predictingstreamﬂow at these segments provides important informationregarding habitat for aquatic life and biogeochemical cycling.e introduce a new loss function to balance the modelperformance over different segments, with the goal that theglobal ML model trained on all the river segments should alsobe consistent with the local patterns extracted from each riversegment. In particular, we train N individual LSTM models, M to M N , for each segment using the simulation data. Eachindividual model M i is trained to predict simulated targetvariables (i.e., ˜ y i ) for a speciﬁc segment i . Even though thereis a gap between simulation data and true observation data,these individual models have a better chance at capturing thelocal temporal patterns of each river segment.When we apply the global PGRGrN model to a speciﬁcsegment i , the patterns predicted by the PGRGrN modelshould be similar to the local patterns learned by the individualmodel M i . Speciﬁcally, we compute the hidden representation h ti from PGRGrN at each time t , which encodes dynamicpatterns that directly output target variables. Similarly, we runthe individual model M i to compute its hidden representation ˜ h ti , which encodes the local temporal patterns for this segment.The hidden representation h ti from PGRGrN should be close tothe corresponding local hidden representation ˜ h ti and differentfrom the local representation of other segments, i.e., ˜ h tj for j (cid:54) = i . More formally, deﬁne a contrastive loss as follows: L ctr = − NT (cid:88) i (cid:88) t log exp ( h tiT W ctr ˜ h ti ) (cid:80) j exp ( h tiT W ctr ˜ h tj ) , (12) where W ctr is model parameters for computing the similarityof hidden representation. To ensure that the hidden represen-tation produced by PGRGrN ( h ) and by individual models ( ˜ h )are comparable, we use shared parameters { W y , b y } in the lastlayer (Eq. 6) for individual models and the global PGRGrNmodel and they are ﬁtted when training individual models.By combining the contrastive loss and the loss of RGrN(Eq. 7), we get the ﬁne-tuning loss as follows: L ﬁnetune = L RGrN + γ L ctr , (13) where γ is a hyper-parameter to balance the supervised lossof PGRGrN and the contrastive loss.The proposed contrastive loss provides an alternative wayin which different segments are comparable with each other.Traditional loss functions do not perform well for everysegment because they are deﬁned on target variables whichmay vary drastically across different segments. Instead, thecontrastive loss matches temporal patterns encoded in thespace of hidden representation, which alleviates this issue.V. E XPERIMENTAL R ESULTS

We evaluate the proposed method for predicting streamtemperature and streamﬂow using real-world data collectedfrom the Delaware River Basin, which is an ecologicallydiverse region and a societally important watershed alongthe east coast of the United States as it provides drinkingwater to over 15 million people [25]. We ﬁrst describe ourdataset and baselines. Then we discuss the results about thepredictive performance using sparse data, the effectiveness of pre-training, the spatial distribution of errors, and modelgeneralization.

A. Dataset and baselines

The dataset is pulled from U.S. Geological Survey’s Na-tional Water Information System [26] and the Water QualityPortal [27], the largest standardized water quality data set forinland and coastal waterbodies [27]. Observations at a speciﬁclatitude and longitude were matched to river segments thatvary in length from 48 to 23,120 meters. The river segmentswere deﬁned by the national geospatial fabric used for theNational Hydrologic Model as described by Regan et al. [24],and the river segments are split up to have roughly a one daywater travel time. We match observations to river segments bysnapping observations to the nearest stream segment withina tolerance of 250 meters. Observations farther than 5,000m along the river channel to the outlet of a segment wereomitted from our dataset. Segments with multiple observationsites were aggregated to a single mean daily streamﬂow orwater temperature value.We study a subset of the Delaware River Basin with 42river segments that feed into the mainstream Delaware Riverat Wilmington, DE. We use input features at the daily scalefrom Oct 01, 1980 to Sep 30, 2016 (13,149 dates). Theinput features have 10 dimensions which include daily averageprecipitation, daily average air temperature, date of the year,solar radiation, shade fraction, potential evapotranspirationand the geometric features of each segment (e.g., elevation,length, slope and width). Air temperature and precipitationvalues were derived from the Daymet gridded dataset. Otherinput features (e.g., shade fraction, solar radiation, potentialevapotranspiration) are difﬁcult to measure frequently, andwe use values produced by the PRMS-SNTemp model asits internal variables. Water temperature observations wereavailable for 32 segments but the temperature was observedonly on certain dates. The number of temperature observationsavailable for each segment ranges from 1 to 9,810 with a totalof 51,103 observations across all dates and segments. Stream-ﬂow observations were available for 18 segments. The numberof streamﬂow observations available for each segment rangesfrom 4,877 to 13,149 with a total of 206,920 observationsacross all dates and segments.We generate the adjacency matrix A based on the riverdistance between each pair of river segment outlets, repre-sented as dist ( i, j ) . We standardize the stream distance andthen compute the afﬁnity level as A ij = 1 / (1+ exp ( dist ( i, j ))) for each edge ( i, j ) ∈ E .We compare model performance to multiple baselines,including the physics-based PRMS-SNTemp model, artiﬁcialneural networks (ANN), RNN with the LSTM cell, and thestate-of-the-art PGRNN method [8] which uses simulation datato pre-train an LSTM model and then ﬁne-tunes it with true ob-servation data (represented as RNN ptr ). Since a region-speciﬁccalibration PRMS-SNTemp is extremely time-consuming, aversion with default values of parameters is widely used in thehydrologic domain [28]. We provide a comparison with thisersion referred to as the PRMS-SNTemp model. We evaluatethree variants of the proposed method, RGrN (trained tominimize L RGrN ), PGRGrN ptr (pre-training using the strategyin Section IV-B and ﬁne-tuning to minimize L RGrN , Eq. 7), andPGRGrN ptr,ctr (pre-training using the strategy in Section IV-Band ﬁne-tuning to minimize L ﬁnetune , Eq. 13). All the MLmodels are trained and applied to all the river segments (i.e.,all models are global). In the following experiments, we traineach ML model using data from the ﬁrst 24 years (Oct 01,1980 to Sep 30, 2004) and then test in the next 12 years(Oct 01, 2004 to Sep 31, 2016). The hidden representation inthese ML models is in 20 dimension (same for q ti ). We setthe learning rate to be 0.0005 and update the model for 200epochs for modeling water temperature and 300 epochs formodeling streamﬂow values. B. Overall prediction performance

We report the testing performance of different methods fortemperature prediction and streamﬂow prediction in Table Iand Table II, respectively. We also test the capacity of eachmodel to learn using less training data by randomly selecting0.1% and 2% labeled data from ﬁrst 24 years for training themodel. For RNN ptr , PGRGrN ptr and PGRGrN ptr,ctr , we assumethe access to simulation data on every single date from Oct 011980 to Sep 20 2016 because they can be generated by simplyrunning PRMS-SNTemp model on input drivers. We repeateach experiment ﬁve times with random model initializationand random selection of sparser data (0.1%, 2%) and reportthe mean and standard deviation of the root mean square error(RMSE).We can observe that the proposed method outperforms base-lines by a considerable margin (Tables I and II). The improve-ment from ANN to RNN shows that the recurrent componentis helpful for capturing temporal patterns. RGrN performsbetter than RNN because it utilizes upstream-downstreamdependencies which are critical for an accurate accounting oftemperature and streamﬂow.We also observe that PGRGrN ptr has much better perfor-mance than RGrN using just 0.1% or 2% data. This is becausewe leverage the physical knowledge to add supervision onmultiple model components (i.e., target variables and trans-ferred variables) and thus the model can learn representativelatent variables without risking overﬁtting small amount ofobservations. PGRGrN ptr,ctr further improves the performanceby adjusting the pre-trained model to conform to local patternsof each segment. The standard pre-training method is alsohelpful for the RNN model, as we can see the improvementfrom RNN to RNN ptr using simulation data.In Fig. 3, we show several examples for predictions madeby PRMS-SNTemp (SIM), RNN, and the proposed model. Intemperature prediction (Figs. 3 (a) and (b)), PRMS-SNTempcaptures the overall dynamic patterns of temperature changebut always has a bias to true observations. RNN gets closer totrue observations but does not perform as well as PGRGrN ptr,ctr in capturing ﬂuctuations of temperature changes.

TABLE IR

OOT M EAN S QURE E RROR ( ± STANDARD DEVIATION ) FORTEMPERATURE MODELING USING

AND

TRAININGLABELS . R

OWS IN GREY COLOR REPRESENT METHODS USINGSIMULATION DATA . H

ERE OUR METHOD IS COMPARED WITH A RTIFICIAL N EURAL N ETWORKS (ANN), R

ECURRENT N EURAL N ETWORKS (RNN)

AND THE

RNN

PRETRAINED USING SIMULATION DATA AND FINE - TUNEDUSING OBSERVATIONS (RNN

PTR ). Method ± ± ± ± ± ± ptr ± ± ± ± ± ± ptr ± ± ± ptr,ctr ± ± ± REDICTION

RMSE

FOR STREAMFLOW MODELING USING

AND

TRAINING LABELS . H

ERE OUR METHOD IS COMPARED WITH A RTIFICIAL N EURAL N ETWORKS (ANN), R

ECURRENT N EURAL N ETWORKS (RNN)

AND THE

RNN

PRETRAINED USING SIMULATIONDATA AND FINE - TUNED USING OBSERVATIONS (RNN

PTR ). Method ± ± ± ± ± ± ptr ± ± ± ± ± ± ptr ± ± ± ptr,ctr ± ± ± We can see similar results for predicting streamﬂow insegments with high ﬂows (Fig. 3 (c)). Here PRMS-SNTempproduces a large bias and also a slow response. PGRGrN ptr,ctr performs much better than both PRMS-SNTemp and RNN.However, for segments with low streamﬂows (Fig. 3 (d)),PRMS-SNTemp better matches with observations than MLmodels. This is because ML models optimize the overall per-formance while low-ﬂow stream segments (mostly headwaters)are a minority in the entire river networks and contribute lessto the loss function. In Fig. 3 (d), we show the results producedby variants of our method to study how the use of contrastiveloss and pre-training alter the predictions. We can observethat the predictions made by RGrN are almost constant overtime while PGRGrN ptr can better capture the dynamics bypre-training the model but has an even larger gap with trueobservations. By adopting the contrastive loss, PGRGrN ptr,ctr effectively reduces the bias on this low-ﬂow segment.To better understand how the performance varies acrossdifferent types of river segments, we show the streamﬂowprediction errors for four types of segments, low ( < m /s ),medium low (0.5-2 m /s ), medium high (2-5 m /s ) and high( > m /s ) in Fig. 4. RGrN and PGRGrN ptr generally performbetter than other methods in medium-high and high-ﬂow riversegments but perform much worse in low-ﬂow segments. Inparticular, we can see RGrN has much larger errors than RNNover low-ﬂow river segments. This is because the neighboringriver segments tend to have similar embeddings after graphconvolution and thus the training of RGrN pays even lessattention to low-ﬂow segments given the fact that there areonly a few low-ﬂow segments in the river network. As shown a) (b)(c) (d)Fig. 3. (a)(b) Examples of temperature predictions by different methods. (c)An example of streamﬂow predictions in a high-ﬂow river segment. (d) Anexample of streamﬂow predictions by variants of the proposed method in alow-ﬂow river segment. Here SIM represents the simulated data produced byPRMS-SNTemp model and OBS represents the true observation data. Here wecompare the performance of Recurrent Neural Networks (RNN), the proposedmodel Recurrent Graph Neural Networks (RGrN), the RGrN model using thepretraining strategy (PGRGrN ptr ) and the RGrN model using both pre-trainingand contrastive loss (PGRGrN ptr,ctr ).(a) (b)Fig. 4. Distribution of prediction errors in (a) low and medium-low segments,and (b) medium-high and high-ﬂow segments. in Fig. 4 (a), this issue is partly addressed by using thecontrastive loss. An alternative solution to this issue is tointelligently select the most suitable model (e.g., PRMS-SNTemp or ML models) for different types of river segments,which we suggest as future work. C. Assessing performance on unobserved segments

Here we aim to test the performance of models for thesegments which have no observation data (Tables III and IV).Such segments commonly exist in a real-world basin system.Seg A to Seg E are ﬁve river segments which have sufﬁcientobservation data for both stream temperature and streamﬂow.Each row shows the results for an individual experimentwhere we intentionally remove the temperature or streamﬂowobservations for a speciﬁc segment during the training period(Oct 01, 1980 to Sep 30, 2004). Then we report the predictionperformance of RNN, RGrN, and PGRGrN ptr,ctr only on this

TABLE IIIRMSE

OF TEMPERATURE PREDICTION ON INDIVIDUAL SEGMENTS AFTERREMOVING TRAINING OBSERVATION DATA . H

ERE WE COMPARE THEPERFORMANCE OF R ECURRENT N EURAL N ETWORKS (RNN),

THEPROPOSED R ECURRENT G RAPH N EURAL N ETWORKS (RG R N) AND THE RG R N MODEL USING BOTH PRE - TRAINING AND CONTRASTIVE LOSS (PGRG R N PTR , CTR ). Segment

Method With Obs Without ObsRNN 2.297 ± ± ± ± ptr,ctr ± ± ± ± ± ± ptr,ctr ± ± ± ± ± ± ptr,ctr ± ± ± ± ± ± ptr,ctr ± ± ± ± ± ± ptr,ctr ± ± segment during the testing period (Oct 01, 2004 to Sep 31,2016).We can observe larger errors produced by all the threemodels after we remove training data for a segment. This isexpected because segment-speciﬁc observation data has notbeen used for reﬁnement of the model. However, we observethat the drop in performance of PGRGrN ptr,ctr is consistent andoften signiﬁcantly smaller than that of the RNN model.We notice that RGrN may not produce better streamﬂowpredictions than RNN for every segment. For example, Seg Ais a head water and has no upstream river segments. Hence, thegraph convolution is not helpful for modeling Seg A. Besides,the information propagated from neighbors may not always behelpful if neighboring segments have very different streamﬂowpatterns. A potential area of future research is to develop newgraph operators so that each segment can only learn fromneighbors that are most relevant. (a) (b)Fig. 5. The predictions of pre-trained (PGRGrN pretrain ) and ﬁne-tuned models(PGRGrN ptr ) for (a) temperature and (b) streamﬂow, using 2% training data.Here we compare the performance of the RGrN model that is only pre-trainedusing simulation data (PGRGrN pretrain ) and the RGrN model using pre-trainingand then ﬁne-tuned with observation data (PGRGrN ptr ). D. Effectiveness of pre-training

In Fig. 5, we randomly select two example segments toshow how predictions change from the pre-trained model(PGRGrN pretrain ) to the ﬁne-tuned model (PGRGrN ptr ) using

ABLE IVRMSE

OF STREAMFLOW PREDICTION ON INDIVIDUAL SEGMENTS AFTERREMOVING TRAINING OBSERVATION DATA . H

ERE WE COMPARE THEPERFORMANCE OF R ECURRENT N EURAL N ETWORKS (RNN),

THEPROPOSED R ECURRENT G RAPH N EURAL N ETWORKS (RG R N) AND THE RG R N MODEL USING BOTH PRE - TRAINING AND CONTRASTIVE LOSS (PGRG R N PTR , CTR ). Segment

Method With Obs Without ObsRNN 0.643 ± ± ± ± ptr,ctr ± ± ± ± ± ± ptr,ctr ± ± ± ± ± ± ptr,ctr ± ± ± ± ± ± ptr,ctr ± ± ± ± ± ± ptr,ctr ± ±

2% training data. Here the model PGRGrN ptr is the samewith the PGRGrN ptr model used in the previous resultsbut is only ﬁne-tuned using 2% observations. In contrast,PGRGrN pretrain is only pre-trained using the strategy proposedin Section IV-B but without using observations for ﬁne-tuning.Fig. 5 (a) shows that the pre-trained model match the simulatedtemperatures very well and thus can capture general tem-perature patterns even without using observation data. Thereis still a gap between true observations and PGRGrN pretrain since PGRGrN pretrain emulates PRMS-SNTemp, which hasinherent bias due to an incomplete representation of physics.Nevertheless, after learning general patterns from simulationdata, the model can be ﬁne-tuned to match true observationsusing much less training data. This can be veriﬁed as weshow that PGRGrN ptr ﬁne-tuned with 2% data closes the gapbetween PGRGrN pretrain and true observations. Similarly, Fig. 5(b) shows that pre-training helps capture general streamﬂowpatterns and ﬁne-tuning effectively ﬁxes the bias and the slowresponse of PRMS-SNTemp.

E. Spatial distribution of errors

In Fig. 6, we show the distribution across different seg-ments for ML temperature and streamﬂow prediction improve-ments over the physics-based PRMS-SNTemp simulations. InFig. 6 (a), we can observe that RGrN and PGRGrN ptr,ctr pro-duce smaller temperature error than RNN in many segments.We ﬁnd one segment (in dashed circle) where RNN producesworse predictions than PRMS-SNTemp but RGrN greatlyimproves the prediction. This is the only segment in the dataset for which we have no training data but sufﬁcient testingdata. The reason why RGrN can produce better predictions forthis unlabeled segment is that it leverages the dependencieswith other segments to learn the temperature patterns evenwithout training data from this speciﬁc segment.For streamﬂow prediction (Fig. 6 (b)), it can be seen thatRGrN and PGRGrN ptr,ctr have lower RMSE in several high- ﬂow segments (e.g., the segments in red dashed circles).However, RGrN performs worse than RNN on headwatersegments (in black dashed circles) and these segments havelower streamﬂow. Compared with RGrN, it can be seen thatPGRGrN ptr,ctr alleviates this issue and produces smaller errorsin these low-ﬂow segments by using the contrastive loss.

F. Generalization test

Generalizability is important for scientiﬁc problems becausemost observation data can be collected from certain periodsor locations for which it is easier to deploy sensors. Hence,we test model generazability for both streamﬂow and streamtemperature prediction. In particular, we test the model per-formance on streamﬂow prediction using training data onlyfrom segments with higher average streamﬂows ( > m /s ) inthe ﬁrst 24 years (Oct 1980 to Sep 2004). Then we report theRMSE in in the next 12 years for segments with lower averagestreamﬂows ( < m /s ) in Table V. We also include the testingperformance of the model trained using all the observationsfrom ﬁrst 24 years as a baseline in the second column ofTable. V. Similarly, it is challenging for traditional ML modelsto generalize to predicting temperatures in a season that wasnot included in training data. We train each model using dataonly from colder seasons (spring, fall and winter) in the ﬁrst24 years and then test in summers in the next 12 years, asshown in Table VI.Note that PGRGrN ptr,ctr always performs better than othermethods because the incorporation of physical knowledge andminimizing the contrastive loss forces the model to learngeneralizable patterns for each segment. It can be seen thatANN has large errors (especially in Table V) when appliedto scenarios that are very different to training data. This isbecause ANN only focuses on the mapping from input featuresto target variables while physical phenomena are often drivenby spatial and temporal processes. Such a gap makes ANNprone to overﬁt the training data. In contrast, RGrN performsbetter than ANN and RNN in temperature prediction becauseit leverages the dependencies both over space and time, andthus has a better chance in learning generalizable patterns.However, the performance of RGrN becomes worse in predict-ing segments with lower streamﬂows. The graph convolutionmakes the extracted hidden representation ( h ti ) more similarfor neighboring segments and thus RGrN simulate streamﬂowswith similar dynamic patterns as in high-ﬂow segments when itis trained just using observation data from high-ﬂow segments. G. Sensitivity tests

Here we test the sensitivity of the model to different hyper-parameter settings. Also we test the performance of the modelwith different time delays (Eq. 5).In Figs. 7 (a) and (b), we show the prediction performanceusing 2% data using different values of λ (Eq. 11) and γ (Eq. 13). We can see that the model is relatively robustwith different values of hyper-parameters but there is a slightincrease of RMSE if we set γ and λ to be either too smallor too large. With a small γ , the model has less weight on a) (b)Fig. 6. Prediction errors across different segments for (a) temperature prediction and (b) streamﬂow prediction. Here we show the error as the RMSE of eachmethod minus the RMSE of PRMS-SNTemp simulation data to get a better contrast. Darker blue indicates better performance of the ML model over thephysics-based model. In (a), we do not discuss the red colored segment on the bottom (predicted poorly by all the three methods) and head water (75.7 ◦ W,39.95 ◦ N) predicted poorly by RNN because they just have one test observation.TABLE VS

TREAMFLOW

RMSE

ON SEGMENTS WITH STREAMFLOW < m /s FROM TO ACH MODEL IS TRAINED USING OBSERVATION DATAFROM SEGMENTS WITH STREAMFLOW > m /s ( COLUMN OR ALL THEOBSERVATIONS DATA ( COLUMN FROM O CT TO S EP EREOUR METHOD IS COMPARED AGAINST A RTIFICIAL N EURAL N ETWORKS (ANN), R

ECURRENT N EURAL N ETWORKS (RNN),

THE

RNN

THAT ISPRE - TRAINED USING SIMULATION DATA AND THEN FINE - TUNED USINGOBSERVATION DATA (RNN

PTR ). Method

Train on high-ﬂow Train on all the dataANN 9.752 ± ± ± ± ptr ± ± ± ± ptr ± ± ptr,ctr ± ± TABLE VIT

EMPERATURE

RMSE

IN SUMMERS FROM TO ACH MODELIS TRAINED USING OBSERVATION DATA FROM COLDER SEASONS (C OLUMN OR ALL THE OBSERVATIONS DATA ( COLUMN FROM O CT TO S EP ERE OUR METHOD IS COMPARED AGAINST A RTIFICIAL N EURAL N ETWORKS (ANN), R

ECURRENT N EURAL N ETWORKS (RNN),

THE

RNN

THAT IS PRE - TRAINED USING SIMULATIONDATA AND THEN FINE - TUNED USING OBSERVATION DATA (RNN

PTR ). Method

Train on cold seasons Train on all the dataANN 2.138 ± ± ± ± ptr ± ± ± ± ptr ± ± ptr,ctr ± ± contrastive loss and thus produces worse performance usinglimited (2%) observation data. With a large γ value, we puta higher weight on the contrastive loss but the model is morelikely to be impacted by the bias in the simulation data. Onthe other hand, the smaller value of λ will make the modelignorant of the supervision on transferred variables while amuch larger value of λ results in less focus on the targetvariables. It is worth mentioning that our adopted hyper-parameters may not be optimal values in the testing set becausethey are tuned using the simulation data.In Figs. 7 (c) and (d), we use different lengths of time delay,from one time step to ﬁve time steps, in Eq. 5. The time delay indicates how many days prior the information from upstreamsegments inﬂuence the downstream segments. We can see ageneral increase of RMSE when we set a larger time delay. (a) (b)(c) (d)Fig. 7. Performance using different values of (a) γ and (b) λ . Performancewith different time delay for (c) temperature and (d) streamﬂow prediction.In (c) and (d) we sohw the performance of the proposed Recurrent GraphNeural Networks (RGrN) and the RGrN model using both pre-training andcontrastive loss (PGRGrN ptr,ctr ). VI. C

ONCLUSION

In this paper, we propose a novel method PGRGrN formodeling interacting segments in river networks. We lever-age the prior physical knowledge about segment-to-segmentinteractions embedded in physics-based models to enhancethe learning of latent representation in the proposed MLmodel. Moreover, we improve the loss function to optimizeboth the overall performance over the river network andthe local performance on each individual river segment. Wehave demonstrated the superiority of the proposed method inhandling the scarcity of labeled data and in generalizing tounseen scenarios.In addition to modeling variables in river networks, theproposed method can be adjusted to model other complexystems which involve interacting processes. For example, thismethod could be potentially used for material discovery, bio-logical research and quantum chemistry to capture interactionsbetween different atoms or molecules.While our method performs much better than existing mod-els, it remains limited in precisely predicting special segments(e.g., unobserved segments and segments with extremely lowstreamﬂows). To advance understanding, future ML modelingefforts may consider uncertainty of the global ML model anddetermine whether ML should be used to replace physics-based models in different situations.VII. A

CKNOWLEDGMENTS

Any use of trade, ﬁrm, or product names is for descriptivepurposes only and does not imply endorsement by the U.S.Government. R

EFERENCES[1] D. Graham-Rowe, D. Goldston, C. Doctorow, M. Waldrop, C. Lynch,F. Frankel, R. Reid, S. Nelson, D. Howe, S. Rhee et al. , “Big data:science in the petabyte era,”

Nature , vol. 455, no. 7209, pp. 8–9, 2008.[2] T. Jonathan, A. Gerald, B. Sandrine et al. , “Special online collection:dealing with data,”

Science , vol. 331, no. 6018, pp. 639–806, 2011.[3] T. J. Sejnowski et al. , “Putting big data to good use in neuroscience,”

Nature neuroscience , vol. 17, no. 11, p. 1440, 2014.[4] A. Ravindranath, N. Devineni, and P. Kolesar, “An environmentalperspective on the water management policies of the upper delawareriver basin,”

Water Policy , vol. 18, no. 6, pp. 1399–1419, 2016.[5] U. Lall, “Debatesthe future of hydrological sciences: A (common)path forward? one water. one world. many climes. many souls,”

WaterResources Research , vol. 50, no. 6, pp. 5335–5341, 2014.[6] H. V. Gupta and G. S. Nearing, “Debatesthe future of hydrologicalsciences: A (common) path forward? using models and data to learn:A systems theoretic perspective on the future of hydrological science,”

Water Resources Research , 2014.[7] J. J. McDonnell and K. Beven, “Debatesthe future of hydrological sci-ences: A (common) path forward? a call to action aimed at understandingvelocities, celerities and residence time distributions of the headwaterhydrograph,”

Water Resources Research , vol. 50, no. 6, pp. 5342–5350,2014.[8] X. Jia, J. Willard, A. Karpatne, J. Read, J. Zwart, M. Steinbach, andV. Kumar, “Physics guided rnns for modeling dynamical systems: Acase study in simulating lake temperature proﬁles,” in

Proceedings ofthe 2019 SIAM International Conference on Data Mining . SIAM, 2019,pp. 558–566.[9] J. S. Read, X. Jia, J. Willard, A. P. Appling, J. A. Zwart, S. K. Oliver,A. Karpatne, G. J. Hansen, P. C. Hanson, W. Watkins et al. , “Process-guided deep learning predictions of lake water temperature,”

WaterResources Research , 2019.[10] B. Anderson, T. S. Hy, and R. Kondor, “Cormorant: Covariant molec-ular neural networks,” in

Advances in Neural Information ProcessingSystems , 2019, pp. 14 510–14 519.[11] N. Muralidhar, M. R. Islam, M. Marwah, A. Karpatne, and N. Ra-makrishnan, “Incorporating prior domain knowledge into deep neuralnetworks,” in . IEEE, 2018, pp. 36–45.[12] J.-X. Wang, J.-L. Wu, and H. Xiao, “Physics-informed machine learningapproach for reconstructing reynolds stress modeling discrepanciesbased on dns data,”

Physical Review Fluids , 2017.[13] D. Liu and Y. Wang, “Multi-ﬁdelity physics-constrained neural networkand its application in materials modeling,”

Journal of MechanicalDesign , vol. 141, no. 12, 2019.[14] J. Park and J. Park, “Physics-induced graph neural network: An appli-cation to wind-farm power estimation,”

Energy , 2019.[15] F. Kratzert, D. Klotz, M. Herrnegger, A. K. Sampson, S. Hochreiter,and G. S. Nearing, “Toward improved predictions in ungauged basins:Exploiting the power of machine learning,”

Water Resources Research ,2019. [16] Z. Moshe, A. F. Metzger, F. Kratzert, G. Elidan, S. Nevo, and R. El-Yaniv, “Hydronets: Leveraging river structure for hydrologic modeling,”2020.[17] Y. Qi, Q. Li, H. Karimian, and D. Liu, “A hybrid model for spatiotem-poral forecasting of pm2. 5 based on graph convolutional neural networkand long short-term memory,”

Science of the Total Environment , 2019.[18] T. Xie and J. C. Grossman, “Crystal graph convolutional neural networksfor an accurate and interpretable prediction of material properties,”

Physical review letters , vol. 120, no. 14, p. 145301, 2018.[19] D. Zhu et al. , “Understanding place characteristics in geographiccontexts through graph convolutional neural networks,”

Annals of theAmerican Association of Geographers , 2020.[20] J. S. Read et al. , “Process-guided deep learning predictions of lake watertemperature,”

Water Resources Research , 2019.[21] S. L. Markstrom et al. , “Prms-iv, the precipitation-runoff modelingsystem, version 4,”

USGS Techniques and Methods , no. 6-B7, 2015.[22] M. J. Sanders, S. L. Markstrom, R. S. Regan, and R. D. Atkinson, “Doc-umentation of a daily mean stream temperature modulean enhancementto the precipitation-runoff modeling system,” US Geological Survey,Tech. Rep., 2017.[23] J. H. LaFontaine et al. , “Application of the precipitation-runoff modelingsystem (prms) in the apalachicola-chattahoochee-ﬂint river basin in thesoutheastern united states,”

USGS Scientiﬁc Investigations Report , 2013.[24] R. S. Regan, S. L. Markstrom, L. E. Hay, R. J. Viger, P. A. Norton, J. M.Driscoll, and J. H. LaFontaine, “Description of the national hydrologicmodel for use with the precipitation-runoff modeling system (prms),”US Geological Survey, Tech. Rep., 2018.[25] T. N. Williamson, J. G. Lant, P. Claggett, E. A. Nystrom, P. C. Milly,H. L. Nelson, S. A. Hoffman, S. J. Colarullo, and J. M. Fischer,“Summary of hydrologic modeling for the delaware river basin usingthe water availability tool for environmental resources (water),” USGeological Survey, Tech. Rep., 2015.[26] U. G. Survey, “National water information system data available on theworld wide web (usgs water data for the nation),” 2016.[27] E. K. Read et al. , “Water quality data for national-scale aquatic research:The water quality portal,”

Water Resources Research , 2017.[28] R. S. Regan et al.et al.