[PDF] Accurate and Clear Precipitation Nowcasting with Consecutive Attention and Rain-map Discrimination

Abstract

Precipitation nowcasting is an important task for weather forecasting. Many recent works aim to predict the high rainfall events more accurately with the help of deep learning techniques, but such events are relatively rare. The rarity is often addressed by formulations that re-weight the rare events. Somehow such a formulation carries a side effect of making "blurry" predictions in low rainfall regions and cannot convince meteorologists to trust its practical usability. We fix the trust issue by introducing a discriminator that encourages the prediction model to generate realistic rain-maps without sacrificing predictive accuracy. Furthermore, we extend the nowcasting time frame from one hour to three hours to further address the needs from meteorologists. The extension is based on consecutive attentions across different hours. We propose a new deep learning model for precipitation nowcasting that includes both the discrimination and attention techniques. The model is examined on a newly-built benchmark dataset that contains both radar data and actual rain data. The benchmark, which will be publicly released, not only establishes the superiority of the proposed model, but also is expected to encourage future research on precipitation nowcasting.

Full PDF

AAccurate and Clear Precipitation Nowcasting withConsecutive Attention and Rain-map Discrimination

Ashesh [email protected] Taiwan UniversityTaipei, Taiwan

Buo-Fu Chen [email protected] Taiwan UniversityTaipei, Taiwan

Treng-Shi Huang [email protected] Weather BureauTaipei, Taiwan

Boyo Chen [email protected] Taiwan UniversityTaipei, Taiwan

Chia-Tung Chang [email protected] Taiwan UniversityTaipei, Taiwan

Hsuan-Tien Lin [email protected] Taiwan UniversityTaipei, Taiwan

ABSTRACT

Precipitation nowcasting is an important task for weather fore-casting. Many recent works aim to predict the high rainfall eventsmore accurately with the help of deep learning techniques, but suchevents are relatively rare. The rarity is often addressed by formula-tions that re-weight the rare events. Somehow such a formulationcarries a side effect of making “blurry” predictions in low rainfallregions and cannot convince meteorologists to trust its practicalusability. We fix the trust issue by introducing a discriminator thatencourages the prediction model to generate realistic rain-mapswithout sacrificing predictive accuracy. Furthermore, we extendthe nowcasting time frame from one hour to three hours to furtheraddress the needs from meteorologists. The extension is based onconsecutive attentions across different hours. We propose a newdeep learning model for precipitation nowcasting that includes boththe discrimination and attention techniques. The model is exam-ined on a newly-built benchmark dataset that contains both radardata and actual rain data. The benchmark, which will be publiclyreleased, not only establishes the superiority of the proposed model,but also is expected to encourage future research on precipitationnowcasting.

KEYWORDS convolution neural networks, rainfall prediction, precipitation now-casting, attention, discriminator, sequence models

Short-term (usually referred to as <

12 hours) precipitation fore-casting is one of the most important weather forecasting topics dueto the ever-growing need for real-time, large-scale, and fine-grainedprecipitation nowcasting. Better short-term forecasts facilitate moreefficient and safer daily lives; it helps provide road conditions, traf-fic jams, aviation weather report, and flood alert information to thesociety. Generally, for 6 to 12-hour forecasts, the numerical weatherprediction (NWP) models driven by physics simulation providesuperior and more stable predictions than conventional data-drivenstatistical techniques due to the refinement of model physics andcomputational schemes [11, 19]. For 0 to 1-hour quantitative pre-cipitation nowcasting (QPN), on the other hand, radar echo extrap-olation remains a powerful and highly relevant method [3, 4, 6, 7] because of the high temporal and spatial resolutions of radar mapswhenever they are available. However, the major drawback of theseextrapolation-based QPN techniques includes the difficulty to cap-ture the growth and decay of storms, uncertainty in convertingradar reflectivity to actual rainfall amount, and limitation of antici-pating storm motion at the larger lead time (i.e., the 2 𝑛𝑑 and 3 𝑟𝑑 hours).The deep learning community has recently shown great inter-est in the QPN problem with several recent works [5, 17, 18, 20].Notably, Shi et al. [17] tackled this problem by modeling it as aspatiotemporal sequence forecasting problem (i.e., predicting theanimation of radar echo), introducing the encoding-forecastingstructure of ConvLSTM. Another recent work [18] replaced Con-vLSTM with a novel recurrent module, TrajGRU, while using asimilar encoding-forecasting structure. The TrajGRU module wasclaimed to learn location-variant structure. Although these studieshave demonstrated a great potential to apply deep learning to QPN,two critical issues need to be addressed. First, the deep learningmodel should predict heavy rainfall and simultaneously keep goodperformance in drizzle areas. If the model demonstrates good per-formance on heavy rainfalls but outputs an unrealistically largerain area, the forecaster may get confused (why is it raining every-where?) and lose trust in the deep-learning-driven QPN system. Inaddition, most of the previous QPN models [5, 17, 18, 20] provideonly a one-hour prediction that is not long enough for forecasters.Extending the prediction period from one hour to three hours canbe a unique selling proposition of deep learning models comparedto the conventional radar echo extrapolation method.In this work, we tackle both issues above by designing a noveldeep learning model. We first address the issue of producing predic-tions that humans can trust in both low rainfall regions and highrainfall ones. We observe that it is easy for a human to distinguishbetween a real rain map and the predicted (generated) rain map,where visual blurriness appears to be the primary distinguishingfactor. We thus design a discriminator that learns to distinguish be-tween real and generated rain maps. The discriminator loss nudgesthe deep model to generate more realistically looking rain mapswithout compromising model performance. We observe both quali-tatively and quantitatively that this indeed leads to generated rainmaps with less blurriness to win human trust.Then, we address the issue of making accurate 1 to 3-hour pre-dictions by proposing an attention-based model. The attention a r X i v : . [ c s . C V ] F e b rXiv’21, Feb 2021, Taiwan Ashesh, Buo-Fu Chen, Treng-Shi Huang, Boyo Chen, Chia-Tung Chang, and Hsuan-Tien Lin mechanism creates focus regions on the rain maps that can be car-ried from the earlier hours to later ones, and the focus regions areused to rescale the predicted rainfall values. We observe that thisleads to consistent improved performance on both low and highrainfall regions.Combining the discriminator and the attention mechnism resultsin a novel deep learning model, which, to the best of our knowledge,is the first deep learning model that tackles the extreme precipita-tion nowcasting problem using from both radar and real rain data.The dataset that contains both part of the data is publicly releasedto encourage more research in this direction. Experiments on thedataset justify the validity of our designs on using the discriminatorand attention mechanism, and demonstrate the superiority of theproposed deep learning model.The paper is organized as follows. Section 2 describes the relatedworks regarding QPN models and deep learning components relatedto our proposed model architecture. Section 3 first describes theradar and rain rate data used in this study and later describes thedifferent components of our model which involves discriminator-based architecture, attention module and different loss components.Section 4 evaluates the performance of our model using statisticalanalysis and a case study. At the end, our final remarks and futurework is stated in Section 5. As mentioned in the first section, Shi et al. [17] introduced anencoding-forecasting structure to model QPN as a spatiotemporalsequence forecasting problem. Their following work [18] replacedConvLSTM with a TrajGRU module to learn the location-variantfeatures in the radar map. To ensure better prediction for less fre-quent but heavy rainfall, they introduced a weighted loss, whichgave more weightage to loss happening in heavy rainfall regions.However, these studies did not use rainfall observation as the tar-get. Their models only predicted the radar echo in the future andcoverted the predicted radar echo to rainfall using simplified rela-tionships. This is a serious shortcoming in QPN since it questionsits real world applicability.Additionally, the Shi et al. model [18] suffers from under pre-dictions for high rainfall regions despite having the weighted loss.Another issue is that the prediction contains large number of pixelswith small amount of rainfall which gives a hazy appearance. Thisis a natural consequence of giving low weightage to low rainfall re-gions. They also use a mask to ignore pixels where rainfall is belowa threshold. While this mask-based weighted loss formulation isvery useful for training the highly skewed rain data, the hazinessin prediction is not desirable. The haziness in prediction makesmeteorologists skeptical about using the deep learning model sincethe conventional QPN models used by meteorologists do not sufferfrom such artifacts.Subsequently, Tran and Song [20] tried to improve the ConvL-STM and TrajGRU QPN models by modifying the loss functiondesign. They employed some visual image quality assessment tech-niques, including structural similarity (SSIM) and multi-scale SSIM,to train the models and significantly reduce the blurry image issue.Furthermore, Franch et al [5] proposed a solution based on modelstacking to improve deep learning QPN prediction skills. They used RR 0-1 1-3 3-5 5-10 10-20 20-30 30-40 40-% 95.92 2.18 0.71 0.67 0.37 0.09 0.03 0.03

Table 1: Distribution of rainfall in QPESUMS dataset. RR israin rate in mm/hr a convolutional neural network to combine an ensemble of deeplearning models focusing on various rain regimes, doubling theprediction skills. However, both of these studies only provide one-hour predictions, while the meteorological community genuinelyneeds deep learning QPN models with a longer prediction time.To manage blurriness of the prediction and improve the QPNquality in high rainfall regions, this study touches upon two impor-tant techniques from deep learning, namely adverserial learninggreatly popularized by generated adverserial networks (GAN) andattention. While GANs [8] are primarily used for image generation,their discriminator based adverserial learning approach has beenused in a stand alone fashion to make the distribution of generatedimages similar to a desired distribution [10]. Thus the adverserialtraining leads to clarity in generated images when clarity is partof the desired distribution [12, 21]. Attention-based approaches,while being initially developed for natural language processingtasks, have found decent popularity in the computer vision commu-nity [15, 22]. Their ability to focus differently for different pixelshave been shown to do well in cases where features of interest spanssmall number of pixels and their occurrences are rare [1, 16, 23].

Two datasets are used in this work— rain rate and radar reflectivityin Taiwan region. Both of the datasets are produced by the Quantita-tive Precipitation Estimation and Segregation Using Multiple Sensor(QPESUMS) system [2] from Central Weather Bureau (CWB), Tai-wan. QPESUMS provides high-resolution and rapid-updated rainfalldata and radar reflectivities based on the observation from differentradar site.

Rain rate . The rainfall data is a 2D map of shape 561 ∗ 𝑚𝑚 / ℎ𝑟 . Radar reflectivity . The 3D radar data possesses a horizontalsize of 561 ∗

441 and 21 levels in height. We take the maximumvalue of radar reflectivity (unit: 𝑑𝐵𝑍 ) over the 21 channels to get2D data which we work with.Both rain and radar data have a time resolution of 10 minutes.We have this data from January 2015 till December 2018 comprisingof 203K frames. We divide the data into train, test and validationby their timestamp:(1) Training set: 2015-2017.(2) Validation set: First 15 days of each month in 2018.(3) Test set: Last 15 days of each month in 2018.This validation-test allocation ensures that both our test and vali-dation data cover all seasons. Distribution of rainfall can be seenin Table1. ccurate and Clear Precipitation Nowcasting ArXiv’21, Feb 2021, Taiwan

Figure 1: Data flow in Encoder. (R: RNN layer, C: Convolu-tion layer with Leaky ReLU activation). Figure is shown forsequence length 𝑁 = Benchmark QPESUMS extrapolation model . The QPESUMSsystem also provides a 1 hour rain rate prediction which serves asour benchmark. The rain rate product is made by a radar extrapo-lation method, similar to optical flow [13, 14]. This extrapolationtechnique first analyzes the moving vector of each convection cellfrom the 2D reflectivity mosaic and then add the vectors onto theprevious rainfall map. Because this technique doesn’t take the evo-lution of convective cells into consideration, the growth and declineof weather systems will not be reflected in this benchmark. Thus,the effective forecast time is only up to 1 hour. This QPN productis currently being used operationally by weather forecasters in theCWB, Taiwan.

In this section, we explain our proposed approach. First we formallyintroduce the problem setup. Next, we go on to describe the differentcomponents of our model. Thereafter, we describe our different losscomponents. Our model predicts 𝑇 = 𝑇 frames, one for each hour. As input, we feed last60 minutes of rain and radar data to the model. Let 𝑅𝑛 𝑡 and 𝑅𝑑 𝑡 denote the 2D rain rate data and 2D radar dataat time 𝑡 respectively where 𝑡 is measured in minutes. Let 𝑌 𝑡 = / ∗ (cid:205) 𝑖 = 𝑅𝑛 𝑡 + ∗ 𝑖 .Given { 𝑅𝑛 𝑡 − , 𝑅𝑛 𝑡 − .. 𝑅𝑛 𝑡 − ∗ 𝑁 } and { 𝑅𝑑 𝑡 − ,.. 𝑅𝑑 𝑡 − ∗ 𝑁 } ,task is to predict [ 𝑌 𝑡 , 𝑌 𝑡 + , 𝑌 𝑡 + ] . In our setting, 𝑁 =

6. Put simply,given one hour rain and radar data, we want to predict averagehourly rain-rate for next three hours.

We next describe the three modules which comprise our model.The overall architecture is shown in Figure 4.

Prediction Module . Our prediction module is inspired fromthe one used in [18] and is an encoder-forecaster based architecture.We use ConvGRU (Convolution GRU) instead of their proposedTrajGRU for the RNN module as the latter did not give any extra

Figure 2: Data flow in Decoder. (R: RNN layer, U: Transposedconvolution layer). Figure is showing 2 hour prediction.Figure 3: Attention Module (S: Sigmoid layer, C: Convolutionlayer) benefit. We also introduce a spatial attention component describedin the next subsection. Encoder is composed of 3 layers of RNNmodule with one 2D convolution layer between every consecutiveRNN module pair. Purpose of the 2D CNN module is to downsam-ple the spatial dimension thereby allowing feature extraction atmultiple scales. The forecaster has a similar structure with 3 RNNmodules with two 2D transposed convolution modules sandwichedbetween them. In forecaster, the transposed convolution performsupsampling. One can find the schematic for Encoder and Decoderin Figure 1 and Figure 2.

Attention Module . The attention module, shown in Fig-ure 3 pixel-wise scales the prediction obtained from the Predicionmodule. It predicts an attention map for next hour rain-rate andtakes as input the attention map for previous hour. For predictingthe first hour attention map, sigmoid applied on the latest availablerain map is taken as input. The module is composed of 2D convolu-tion layers and uses leaky relu as activation. In this work, we referto the presence of Attention module by token ’Atn’. rXiv’21, Feb 2021, Taiwan Ashesh, Buo-Fu Chen, Treng-Shi Huang, Boyo Chen, Chia-Tung Chang, and Hsuan-Tien Lin

Figure 4: Overall Model

Discriminator Module . Together with the Prediction mod-ule, we also simultaneously train a Discriminator. It learns to dis-criminate between real rain maps and generated rain maps. Theloss from the discriminator nudges the Prediction module towardsgenerating realistic rain maps. This leads to a much clearer andtherefore more informative prediction even with weak input signalsas is the case for second and third hour prediction. Our Discrimina-tor has a simple linear structure and is composed of Dense layers.

Prediction Loss . Following [18], we adopt a weighted lossscheme. Let 𝑌 𝑡 represent the actual 𝑡 𝑡ℎ one hour averaged rain rate2D map sequence and ˆ 𝑌 𝑡 represent the predicted version. We use aweighted MAE loss defined as follows: 𝐿 𝑃𝑟𝑒𝑑 = /( ∗ ∗ ) ∑︁ 𝑡 ∑︁ 𝑗 ∑︁ 𝑘 𝑊 ( 𝑌 𝑡 [ 𝑗, 𝑘 ] ; 0 . )∗| 𝑌 𝑡 [ 𝑗, 𝑘 ]− ˆ 𝑌 𝑡 [ 𝑗, 𝑘 ]| (1)Here, Weight 𝑊 is defined as follows. 𝑊 ( 𝑥 ; 𝑇ℎ ) =  𝑥 < 𝑇ℎ 𝑇ℎ < = 𝑥 <

22 2 < = 𝑥 <

55 5 < = 𝑥 < < = 𝑥 < < = 𝑥 (2)We refer to this loss component as ’WMAE’ (Weighted MAE). Forcomparison, we also use an MSE based version of it where MAEis replaced with MSE. We call it ’WMSE’. In both cases, unlessexplicitly stated, the threshold 𝑇ℎ is set to 0 . Discriminator Loss . From equations 1 and 2, one can inferthat model simply does not care about prediction for pixels wherethe target rain is less than 0.5mm/hr. As seen in the Figure 8 (WMAE column), one can observe that this causes blurry prediction for sec-ond and third hour. We use a Discriminator to encourage the modelto generate rain maps which look like the real ones. Discriminatoritself is trained using the cross-entropy formulation as describedbelow. 𝐿 𝐷 = − ∗ ∑︁ 𝑡 log ( 𝐷 ( 𝑌 𝑡 )) + log ( − 𝐷 ( ˆ 𝑌 𝑡 )) (3)The discriminator loss formulation is shown below. Note that thisloss component does not change the Discriminator’s weights. Itchanges ˆ 𝑌 𝑡 by updating Predictive module’s weights so that 𝐷 ( ˆ 𝑌 𝑡 ) gets closer to 1. 𝐿 𝐺𝐷 = − ∗ ∑︁ 𝑡 log ( 𝐷 ( ˆ 𝑌 𝑡 )) (4)We refer to the Discriminator module and the two associated losscomponents 𝐿 𝐷 , 𝐿 𝐺𝐷 with ’Adv’ token. Final loss expression forthe Predictive module is the weighted sum of discriminator lossand the prediction loss. 𝐿 𝐺 = ( − 𝑤 𝐴𝑑𝑣 ) ∗ 𝐿 𝑃𝑟𝑒𝑑 + 𝑤 𝐴𝑑𝑣 ∗ 𝐿 𝐺𝐷 (5) Balanced Loss . This loss component was developed forcomparison of Discriminator based approach and we do not usethis in our final model. The motivation here is to account for pixelswhich are not considered in equations 1 and 2— pixels with rainfallvalues less than 0.5mm/hr. Let

𝑀𝑎𝑠𝑘 𝑡 be a {0,1} valued tensor ofsame shape as 𝑌 𝑡 and 𝑀𝑎𝑠𝑘 𝑡 [ 𝑖, 𝑗 ] = 𝑌 𝑡 [ 𝑖, 𝑗 ] < . 𝐿 𝐵𝑎𝑙 is then defined as 𝐿 𝐵𝑎𝑙 = /( ∗ ∗ ) ∑︁ 𝑡 ∑︁ 𝑗 ∑︁ 𝑘 𝑀𝑎𝑠𝑘 𝑡 [ 𝑗, 𝑘 ] ∗ | 𝑌 𝑡 [ 𝑗, 𝑘 ] − ˆ 𝑌 𝑡 [ 𝑗, 𝑘 ]| . The expression of loss for this model (GRU+WMAE+Bal) is ( − 𝑤 𝐵𝑎𝑙 ) ∗ 𝐿 𝑃𝑟𝑒𝑑 + 𝑤 𝐵𝑎𝑙 ∗ 𝐿 𝐵𝑎𝑙 . For memory and file system management, we converted the rainand radar values to integer values. We center cropped the rain andradar data to size 540 ∗ The code is written using

𝑃𝑦𝑡𝑜𝑟𝑐ℎ𝐿𝑖𝑔ℎ𝑡𝑛𝑖𝑛𝑔 == . . 𝑃𝑦𝑡𝑜𝑟𝑐ℎ == . .

0. We used Adam as optimizer with 0.0001 as the learning rate.For all experiments, batch size is set to 16 and we train for a maxi-mum of 15 epochs. We pick the best performing model on validationdata using 𝐿 𝑃𝑟𝑒𝑑 metric. Our Discriminator model is composed of3 dense layers of sizes 128, 128 and 1. LeakyRelu is used for non-linearity except for the last layer where a sigmoid is used. Forattention module, we use 5 Convolution layers whose output chan-nel counts are [16,32,32,32,1]. Kernel size of 5 with padding of 2is used. LeakyRelu is used for non-linearity after every convolu-tion layer except the last one. For models having Discriminator, 𝑤 𝐴𝑑𝑣 = .

05 and for model using Balanced loss, 𝑤 𝐵𝑎𝑙 = .

01. Theyare obtained by looking at performance of validation set on WMAEmetric. ccurate and Clear Precipitation Nowcasting ArXiv’21, Feb 2021, Taiwan

Model 2018 August 2018WMAE(Th=0.5) WMAE(Th=0.0) WMAE(Th=0.5) WMAE(Th=0.0)H0 H1 H2 H0 H1 H2 H0 H1 H2 H0 H1 H2Last 10min 0.75 1.2 1.3 0.76 1.27 1.38 4.36 7.16 7.82 4.4 7.36 8.1QPESUMS 0.96 1.19 1.32 1.06 1.3 1.49 5.55 6.89 7.9 5.81 7.21 8.41TrajGRU + WMSE+WMAE [18] 0.57 0.89 0.99 2.97 5.85 7.23 3.18 5.09 5.64 5.21 9.4 11.07TrajGRU + WMAE 0.48 0.84 0.95 1.99 3.7 4.71 2.65 4.84 5.56 3.91 7.5 9.19GRU + WMAE 0.49 0.84 0.95 1.78 3.51 4.57 2.69 4.84 5.55 3.78 7.28 8.96GRU + WMAE+Bal 0.49 0.84 0.95 0.66 1.4 1.91 2.7 4.8 5.52 3.13 6.31 7.99GRU + WMAE+Adv 0.49 0.85 0.96 0.58 1.26 1.85 2.72 4.83 5.52 2.96 5.98 7.68GRU + WMAE+Adv+Atn rXiv’21, Feb 2021, Taiwan Ashesh, Buo-Fu Chen, Treng-Shi Huang, Boyo Chen, Chia-Tung Chang, and Hsuan-Tien Lin

We use multiple metrics to evaluate the performance of our modelquantitatively and show a case study for qualitative evaluation.Firstly, we use the

𝑊 𝑀𝐴𝐸 with 𝑇ℎ = .

5. As mentioned before, oneissue with this metric is that it hides the poor performance on lowrain regions. We therefore also evaluate

𝑊 𝑀𝐴𝐸 with 𝑇ℎ =

0. Sincemeteorologist are interested in assessing how the model performsfor different amount of rain, we binarize our prediction with differ-ent thresholds and compute CSI and Heidke Skill Score (HSS) [9].For computing these metrices for a threshold, both prediction andtarget are binarized using the threshold. We then compute Truepositive (TP), False positive (FP), True Negative (TN) and False Neg-ative (FN). CSI and HSS are computed as

𝑇 𝑃 /( 𝑇 𝑃 + 𝐹 𝑁 + 𝐹𝑃 ) and ( 𝑇 𝑃 ∗ 𝑇 𝑁 − 𝐹𝑃 ∗ 𝐹 𝑁 )/((

𝑇 𝑃 + 𝐹 𝑁 )( 𝑇 𝑁 + 𝐹 𝑁 ) + (

𝑇 𝑃 + 𝐹𝑃 )( 𝑇 𝑁 + 𝐹𝑃 )) respectively. For both metrics, the higher the value, better is themodel. Since our focus is on heavy rainfall and to that end weintroduced the attention module, we evaluate the performance ontwo date ranges— on whole of 2018 and on August 2018, the mostrainy month of 2018. We also plot a performance diagram depictingmodel’s performance on different rainfall levels.For benchmarks, we train the model developed in [18]. We referto it by ’TrajGRU+WMAE+WMSE’ since it uses TrajGRU [18] layerfor RNN module and uses average of weighted MAE and MSE asthe loss. With [18], we observe that the WMSE loss componentwas causing a significant downgradation in performance. So, wetrain another variant of it where we just keep WMAE as the lossfunction, which we refer to by ’TrajGRU+WMAE’. As discussedpreviously, we obtain another benchmark from CWB, Taiwan andrefer to it as QPESUMS. The benchmark has the prediction for firsthour only. For evaluating on second and third hour, we use thisfirst hour prediction as the prediction for all 3 hours. Finally, as yetanother benchmark, we use latest available rain map as predictionwhich we refer by "Last 10min’. Performance on first hour prediction in terms of HSS and CSImetrics is given for 2018 in Figure 5 and for August 2018 in Figure 6.Corresponding tables (Table 5, 7) are given in supplemental data.For thresholds greater than 5, we observe best performance in ourAttention based model in both cases. The benefit is quite significantif one compares it with [18] and with QPESUMS, a benchmark usedby professional meteorologists. The evidence of our Attention basedmodel outperforming other deep learning models on all rain-ratethresholds (both high-rain and low-rain) can be seen in both thesefigures. The benefit of Attention module exclusively can be observedby comparing ’GRU+WMAE+Adv+Atn’ vs ’GRU+WMAE+Adv’.Note that everything including all hyperparameters are identicalfor these two models except the presence/absence of attentionmodule.Our claim that the discriminator based approach gets clearerpredictions can be inferred by observing its (’GRU+WMAE+Adv’model) superior performance on threshold of 1 with respect to mod-els not having the discriminator (’Adv’ token). It is worth noting thatit gets similar if not better performance on higher thresholds in bothFigures 5 and 6. We developed another model ’GRU+WMAE+Bal’

Figure 7: Blended model performance on August 2018 datafor first hour prediction for obtaining clarity in prediction whose details are given in Section4.3. We see that the discriminator based model outperforms thismodel as well on threshold of 1. All this can also be observed fromperformance on WMAE reported in Table 2. Here performance isshown for 𝑇ℎ = . 𝑇ℎ =

0. Note that 𝑇ℎ = 𝑇ℎ =

0, one can observe that using discriminator (models with’Adv’) improves clarity.On HSS and CSI metrics, we observe a decent performance of’Last 10min’ benchmark where we use the latest available rainmap as prediction. We note that this especially works well for non-moving rainfall occurrences and low thresholds. It is so becausewith low thresholds, the model need not account for change in inten-sity of the rainfall with time. On the other hand, with WMAE metricshown in Table 2 where the change in rainfall intensity is directlypenalized, we naturally see worse performance of Last 10min bench-mark, especially with

𝑊 𝑀𝐴𝐸 ( 𝑇ℎ = . ) . With 𝑊 𝑀𝐴𝐸 ( 𝑇ℎ = . ) ,the deep learning based models are at a disadvantage since theuncertainty gets manifested in blurry predictions which mostlyhampers performance on significant number of non-rainy pixels. A Blended Model . Beyond clarity, which our model achieves(Figure 10), meteorologists are not concerned with very low rain-rate thresholds. Nonetheless, we argue that if needed, it is relativelyeasy to get even better performance on lower thresholds and tothat end we showcase a blended version of our model. We firstcreate a classification model which predicts whether or not a pixelwill have hourly rain-rate exceeding 0.5mm/hr. Our final predictionfor the blended version is the weighted average of prediction ofGRU+WMAE+Adv+Atn model and last 10min rain-rate. Weightsare computed using classifier’s prediction. As seen in Figure 7, wesee that the blended model outperforms the Last 10min benchmarkon lower thresholds and achieves similar performance to our bestmodel on higher thresholds. We don’t prefer the blending, sincesimilar to Last 10min benchmark, it is slightly inferior on WMAEmetric. Please refer to supplemental for more details.

A case study of the frontal system near Taiwan on 7th May 2018illustrates the performance of QPN models developed in this study(Fig. 10). The benchmark operational QPESUMS radar echo extrap-olation (Fig. 10, second row) fairly capture the motion of the frontalrainband but overpredict the maximum rainfall (over 70 mm), while ccurate and Clear Precipitation Nowcasting ArXiv’21, Feb 2021, Taiwan

Figure 8: The observational rainfall (target) and the predic-tions that are conducted one, two, and three hours ago forfour of our deep learning QPN models.Figure 9: Performance diagram of +1 h prediction for thebenchmark QPESUMS radar echo extrapolation (black text)and four of our deep learning QPN models for the frontevent. Red, gold, blue, and purple texts indicates the scoresof different thresholds for GRU+WMAE, GRU+WMAE+Bal,GRU+WMAE+Adv, and GRU+WMAE+Adv+Atn model, re-spectively. the maximum rainfall in the observation (Fig. 10, first row) is around40 mm. The GRU+WMAE model (Fig. 10, third row) captures therainband movement and the maximum rainfall but overforecastthe light rain, leading to unrealistic large raining areas. Further-more, the GRU+WMAE+Bal model (Fig. 10, fourth row) and theGRU+WMAE+Adv model (Fig. 10, fifth row) handles the problem ofpredicting too large raining areas with GRU+WMAE+Adv doing abetter job visually as well. After fixing this issue, the deep learning QPN model becomes competitive to the state-of-the-art operationalmodel.By adding the attention mask into the model, the GRU+WMAE+Adv+Atn model performs better for the second and third-hour predic-tions. Specifically, the light rain area is better depicted for theGRU+WMAE +Adv+Atn model (Fig. 8), while other models havetoo much-blurred prediction in which the frontal system is lessidentifiable. Moreover the GRU+WMAE+Adv+Atn model retainsthe frontal rainband cells in the third-hour prediction as shown in(Fig. 8, red ellipse).The performance diagram (Fig. 9) also suggests that, for thisfrontal case, our models outperform the operational QPESUMSextrapolation technique. For every rainfall thresholds except for1 mm, our models have higher CSI scores (Fig. 9, green contours)than that of the QPESUMS. For 1, 3, 5, and 10-mm thresholds, ourmodifications in the model (e.g., balanced loss, discriminator, andattention) gradually improve the CSI by increasing the success ratio(Fig. 9, X-axis) and keeping the probability of detection high. Forhigher 30, 40, and 50-mm thresholds, the GRU+WMAE+Adv+Atnmodel (Fig. 9, purple text) also outperforms other models.

In this work, we used last hour rain and radar data with a granularityof 10 minutes to predict 3 hourly rain rate maps, one for each hour.Deep learning approaches in general respond to uncertainty bypredicting blurred predictions. We made use of a Discriminator tolessen redundant uncertainty thereby generating clearer rain-mapswithout adversely affecting the prediction quality for most rain-ratethresholds. After inspecting latest available rain map, it is relativelyeasy for a human to locate few regions where there is very lessprobability of rain. Deep learning models predict non-zero rain-rateon such regions as well which we refer to as redundant uncertainty.We further improved the performance at different thresholds by dy-namically re-scaling the prediction using an attention module. Wesee the improvement at multiple rain-rate thresholds. We observedthat on using real rain data as in our case, using a MSE based lossled to inferior performance and thus we easily outperformed [18].We still outperformed [18] even after using it with an improvedloss function (WMAE). Our model also performed better over QPE-SUMS, a benchmark used by meteorologists. Finally, we provide 3hour, hourly rain-rates which is what is preferred by meteorologistsover short duration predictions as done in many approaches in-cluding [18]. Note that while multiple sequence predictions of suchmodels could be averaged to get what we are predicting, we arguethat directly optimizing hourly rain-rate would give our approachan edge for estimating hourly rain-rate.There are a number of areas where we feel more work needsto be done. One area of improvement could be to predict longerhours into the future. Another area of work can be to tackle uncer-tainty with multiple predictions. Instead of predicting one blurryrain-map, an architecture may be developed which predicts K prob-able clear rain-maps. Generally speaking, the field of precipitationnowcasting has immense scope of improvement from a machinelearning perspective and we hope to be part of it. rXiv’21, Feb 2021, Taiwan Ashesh, Buo-Fu Chen, Treng-Shi Huang, Boyo Chen, Chia-Tung Chang, and Hsuan-Tien Lin

Figure 10: The hourly rainfall observation (first row) and predictions based on the benchmark operational QPESUMS radarecho extrapolation (second row) and four of our deep learning QPN models for the front event near Taiwan on 𝑡ℎ May 2018 .

REFERENCES [1] Ching-Yuan Bai, Buo-Fu Chen, and Hsuan-Tien Lin. 2020. Benchmarking TropicalCyclone Rapid Intensification with Satellite Images and Attention-based DeepModels. arXiv:1909.11616 [cs, stat] (Sept. 2020). http://arxiv.org/abs/1909.11616arXiv: 1909.11616.[2] Pao-Liang Chang, Jian Zhang, Yu-Shuang Tang, Lin Tang, Pin-Fang Lin, LangstonCarrie, Brian Kaney, Chia-Rong Chen, and Kenneth Howard. 2020. An operationalmulti-radar multi-sensor QPE system in Taiwan.

Bull. Amer. Math. Soc. (2020),1–56. https://journals.ametsoc.org/view/journals/bams/aop/bamsD200043/bamsD200043.xml [3] Kao-Shen Chung and I-An Yao. 2020. Improving radar echo Lagrangian extrap-olation nowcasting by blending numerical model wind information: Statisticalperformance of 16 typhoon cases.

Monthly Weather Review

Journal ofatmospheric and oceanic technology

10, 6 (1993), 785–797.[5] Gabriele Franch, Daniele Nerini, Marta Pendesini, Luca Coviello, Giuseppe Ju-rman, and Cesare Furlanello. 2020. Precipitation Nowcasting with OrographicEnhanced Stacked Generalization: Improving Deep Learning Predictions on Ex-treme Events.

Atmosphere

11, 3 (March 2020), 267. https://doi.org/10.3390/ ccurate and Clear Precipitation Nowcasting ArXiv’21, Feb 2021, Taiwan

Model 2018 August 2018WMAE(Th=0.5) WMAE(Th=0.0) WMAE(Th=0.5) WMAE(Th=0.0)H0 H1 H2 H0 H1 H2 H0 H1 H2 H0 H1 H2TrajGRU + WMSE+WMAE 0.015 0.011 0.015 0.3 0.708 0.71 0.08 0.04 0.08 0.2 0.57 0.61TrajGRU + WMAE 0.009 0.002 0.003 0.252 0.372 0.397 0.05 0.01 0.03 0.12 0.01 0.13GRU + WMAE 0.005 0.005 0.006 0.102 0.152 0.159 0.03 0.03 0.02 0.07 0.08 0.06GRU + WMAE+Bal 0.003 0.007 0.014 0.017 0.027 0.026 0.03 0.06 0.1 0.06 0.09 0.1GRU + WMAE+Adv 0.007 0.007 0.012 0.017 0.061 0.119 0.04 0.05 0.09 0.06 0.12 0.21GRU +WMAE+Atn 0.003 0.004 0.01 0.293 0.38 0.275 0.03 0.05 0.09 0.15 0.21 0.28GRU + WMAE+Adv+Atn 0.002 0.004 0.007 0.011 0.045 0.093 0.02 0.02 0.03 0.04 0.11 0.15

Table 3: Standard error on WMAE metric for first (H0), second (H1) and third (H2) hour prediction. atmos11030267[6] Urs Germann and Isztar Zawadzki. 2002. Scale-dependence of the predictabil-ity of precipitation from continental radar images. Part I: Description of themethodology.

Monthly Weather Review

Journalof Applied Meteorology

43, 1 (2004), 74–89.[8] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. GenerativeAdversarial Networks. arXiv:1406.2661 [cs, stat] (June 2014). http://arxiv.org/abs/1406.2661 arXiv: 1406.2661.[9] Robin J. Hogan, Christopher A. T. Ferro, Ian T. Jolliffe, and David B. Stephenson.2010. Equitability Revisited: Why the “Equitable Threat Score” Is Not Equitable.

Weather and Forecasting

25, 2 (April 2010), 710–726. https://doi.org/10.1175/2009WAF2222350.1[10] Tomas Jakab, Ankush Gupta, Hakan Bilen, and Andrea Vedaldi. 2020.Self-Supervised Learning of Interpretable Keypoints From Unlabelled Videos.8787–8797. https://openaccess.thecvf.com/content_CVPR_2020/html/Jakab_Self-Supervised_Learning_of_Interpretable_Keypoints_From_Unlabelled_Videos_CVPR_2020_paper.html[11] John S Kain, Ming Xue, Michael C Coniglio, Steven J Weiss, Fanyou Kong, Tara LJensen, Barbara G Brown, Jidong Gao, Keith Brewster, Kevin W Thomas, et al.2010. Assessing advances in the assimilation of radar data and other mesoscaleobservations within a collaborative forecasting–research environment.

Weatherand forecasting

25, 5 (2010), 1510–1521.[12] Y. Kwon and M. Park. 2019. Predicting Future Frames Using Retrospective CycleGAN. In . 1811–1820. https://doi.org/10.1109/CVPR.2019.00191 ISSN: 2575-7075.[13] Valliappa Lakshmanan, Roy Rabin, and Victor Debrunner. 2003. Multiscale stormidentification and forecast.

Atmospheric Research

67 (2003), 367–380. DOI:10.1016/S0169-8095(03)00068-1[14] Valliappa Lakshmanan and Travis Smith. 2010. An objective method of evaluatingand devising storm tracking algorithms.

Weather and Forecasting

25 (2010), 701–709. https://doi.org/10.1175/2009WAF2222330.1[15] Wei Li, Kai Liu, Lizhe Zhang, and Fei Cheng. 2020. Object detection based onan adaptive attention mechanism.

Scientific Reports

10, 1 (July 2020), 11307.https://doi.org/10.1038/s41598-020-67529-x[16] Jeong-Seon Lim, Marcella Astrid, Hyun-Jin Yoon, and Seung-Ik Lee. 2019. SmallObject Detection using Context and Attention. arXiv:1912.06319 [cs] (Dec. 2019).http://arxiv.org/abs/1912.06319 arXiv: 1912.06319.[17] Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-kin Wong, andWang-chun Woo. 2015. Convolutional LSTM Network: A Machine LearningApproach for Precipitation Nowcasting.

Advances in Neural Information Pro-cessing Systems

28 (2015), 802–810. https://papers.nips.cc/paper/2015/hash/07563a3fe3bbe7e3ba84431ad9d055af-Abstract.html[18] Xingjian Shi, Zhihan Gao, Leonard Lausen, Hao Wang, Dit-Yan Yeung, Wai-kinWong, and Wang-chun Woo. 2017. Deep Learning for Precipitation Nowcast-ing: A Benchmark and A New Model.

Advances in Neural Information Pro-cessing Systems

30 (2017), 5617–5627. https://papers.nips.cc/paper/2017/hash/a6db4ed04f1621a119799fd3d7545d3d-Abstract.html[19] Juanzhen Sun, Ming Xue, James W Wilson, Isztar Zawadzki, Sue P Ballard,Jeanette Onvlee-Hooimeyer, Paul Joe, Dale M Barker, Ping-Wah Li, Brian Gold-ing, et al. 2014. Use of NWP for nowcasting convective precipitation: Recentprogress and challenges.

Bulletin of the American Meteorological Society

95, 3(2014), 409–426.[20] Quang-Khai Tran and Sa-kwang Song. 2019. Computer vision in precipitationnowcasting: Applying image quality assessment metrics for training deep neuralnetworks.

Atmosphere

10, 5 (2019), 244. [21] Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating Videoswith Scene Dynamics. arXiv:1609.02612 [cs] (Oct. 2016). http://arxiv.org/abs/1609.02612 arXiv: 1609.02612.[22] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM:Convolutional Block Attention Module. In

Proceedings of the European Conferenceon Computer Vision (ECCV) .[23] Fan Zhang, Licheng Jiao, Lingling Li, Fang Liu, and Xu Liu. 2020. MultiResolutionAttention Extractor for Small Object Detection. arXiv:2006.05941 [cs] (June 2020).http://arxiv.org/abs/2006.05941 arXiv: 2006.05941. rXiv’21, Feb 2021, Taiwan Ashesh, Buo-Fu Chen, Treng-Shi Huang, Boyo Chen, Chia-Tung Chang, and Hsuan-Tien Lin

Model H0 H1 H2GRU+WMAE(Th=0) 0.49 0.90 1.08GRU+WMAE+Adv+ Just Radar 0.71 0.94 1.02Blended 2.66 4.92 5.70

Table 4: Performance on WMAE(Th=0.5)

A SUPPLEMENTALA.1 Utility of Threshold in WMAE

We see the threshold being used in [18] where they articulate it intheir definition of mask. For establishing the utility of using a thresh-old, we trained the model ’GRU+WMAE’, where

𝑊 𝑀𝑀𝐴𝐸 ( 𝑇ℎ = − . ) was used for optimization. In this situation, as can be seenfrom Equation 2, the model is also penalized for its mis-predictionson non-rainy pixels as well. We see significant worse performanceon 𝑊 𝑀𝑀𝐴𝐸 ( 𝑇ℎ = . ) which is our metric for evaluating heavyrainfall conditions in Table 4 (first row). A.2 Utility of Radar and Rain Data

We trained ’GRU+WMAE+Adv’ in two configurations where wevaried the input data. In one, model was trained using just rain data.In other, model was trained with just radar data. We found thatmodel trained with just Radar data has considerably worse perfor-mance on WMAE metric as can be seen in Table 4. We observedmodel trained with just rain data to suffer from visual artefacts(very high intensity rain predictions on very small isolated regions)for one configuration( 𝑤 𝑎𝑑𝑣 = . A.3 Blended Model

In order to demonstrate that it is relatively easy to get good perfor-mance on low thresholds, we created a blended version of the model.We first create a classification model. It has the same Encoder-decoder structure as our Predictive module with sigmoid being thefinal activation. We train it using binary crossentropy loss. For cre-ating pixelwise 0-1 labels, we use a threshold of 0.5mm/hr— a pixelhas label 1 if its hourly rain-rate is greater than the threshold and 0otherwise. The classifier generates 3 probablity maps, one for eachhour. For every consecutive hour, due to increase in uncertainty,the probability maps have consecutively lower value. So we rescalethem and clip them in 0-1 region. If 𝑊 is the rescaled probablitymap, ˆ 𝑌 𝐴𝑡𝑛 and ˆ 𝑌 𝑚𝑖𝑛 are the predictions of our attention basedmodel and last 10 minute rain respectively, the final predictionˆ 𝑌 = 𝑊 ∗ ˆ 𝑌 𝐴𝑡𝑛 + ( − 𝑊 ) ∗ ˆ 𝑌 𝑚𝑖𝑛 . Note that we did not prefer theblending in the first place since besides being a patchy solution, itis also inferior on WMAE metric which is shown in Table 4 (lastcolumn). Figure 11: Artefacts observed in predicted rain maps inGRU+WMAE+Adv model trained on just rain data. ccurate and Clear Precipitation Nowcasting ArXiv’21, Feb 2021, Taiwan

Model CSI HSS1 3 5 10 15 20 30 40 1 3 5 10 15 20 30 40Last 10min 0.27 0.25 0.23 0.19 0.17 0.16 0.14 0.11 0.41 0.36 0.33 0.26 0.23 0.21 0.18 0.14Last 20min 0.26 0.23 0.21 0.16 0.14 0.13 0.11 0.08 0.39 0.32 0.29 0.22 0.19 0.17 0.14 0.11QPESUMS 0.19 0.16 0.13 0.12 0.11 0.1 0.08 0.07 0.28 0.22 0.18 0.16 0.14 0.12 0.10 0.08TrajGRU + WMSE+WMAE 0.00 0.05 0.11 0.14 0.14 0.13 0.11 0.07 0.04 0.07 0.15 0.20 0.19 0.18 0.15 0.10TrajGRU + WMAE 0.01 0.14 0.19 0.19 0.17 0.16 0.13 0.10 0.04 0.19 0.28 0.27 0.25 0.22 0.19 0.14GRU + WMAE 0.01 0.17 0.19 0.19 0.17 0.16 0.14 0.10 0.05 0.25 0.28 0.27 0.25 0.23 0.19 0.14GRU + WMAE+Bal 0.15 0.19 0.19 0.18 0.17 0.15 0.13 0.09 0.22 0.28 0.28 0.27 0.24 0.22 0.18 0.13GRU + WMAE+Adv 0.22 0.21 0.20 0.18 0.17 0.15 0.12 0.09 0.33 0.31 0.28 0.27 0.24 0.21 0.17 0.13GRU +WMAE+Atn 0.01 0.11 0.2 0.2 0.19 0.18 0.15 0.11 0.04 0.16 0.29 0.3 0.27 0.25 0.21 0.15GRU + WMAE+Adv+Atn 0.22 0.22 0.21 0.20 0.19 0.17 0.15 0.11 0.32 0.32 0.31 0.30 0.27 0.25 0.21 0.15

Table 5: HSS and CSI score for 2018 on first hour prediction

Model CSI HSS1 5 10 15 20 30 1 5 10 15 20 30TrajGRU + WMSE+WMAE 0 0.014 0.003 0.003 0.005 0.007 0 0.018 0.006 0.005 0.007 0.01TrajGRU + WMAE 0.004 0.007 0.006 0.007 0.008 0.009 0.004 0.011 0.009 0.011 0.012 0.014GRU + WMAE 0.003 0.002 0.005 0.006 0.007 0.009 0.003 0.003 0.007 0.009 0.011 0.013GRU + WMAE+Bal 0.009 0.003 0.003 0.003 0.002 0.003 0.014 0.005 0.004 0.005 0.004 0.004GRU + WMAE+Adv 0.012 0.004 0.006 0.007 0.007 0.007 0.019 0.007 0.008 0.011 0.01 0.01GRU +WMAE+Atn 0.003 0.003 0.005 0.006 0.006 0.01 0.003 0.006 0.007 0.009 0.008 0.014GRU + WMAE+Adv+Atn 0.008 0.003 0.004 0.007 0.008 0.007 0.013 0.006 0.006 0.01 0.011 0.01

Table 6: Standard Error of HSS and CSI score for 2018 on first hour prediction

Model CSI HSS1 3 5 10 15 20 30 40 1 3 5 10 15 20 30 40Last 20min 0.35 0.33 0.31 0.27 0.23 0.2 0.15 0.12 0.6 0.53 0.49 0.4 0.32 0.26 0.19 0.15Last 10min 0.36 0.34 0.33 0.29 0.25 0.22 0.17 0.14 0.62 0.57 0.52 0.43 0.36 0.3 0.22 0.18QPESUMS 0.31 0.3 0.28 0.24 0.2 0.17 0.12 0.09 0.55 0.48 0.44 0.35 0.27 0.22 0.15 0.11TrajGRU + WMSE+WMAE 0.00 0.16 0.27 0.29 0.28 0.25 0.18 0.12 0.18 0.28 0.42 0.45 0.41 0.35 0.25 0.17TrajGRU + WMAE 0.05 0.31 0.36 0.35 0.31 0.28 0.21 0.16 0.21 0.51 0.59 0.56 0.48 0.41 0.3 0.22GRU + WMAE 0.06 0.33 0.36 0.34 0.31 0.28 0.21 0.15 0.23 0.56 0.59 0.55 0.48 0.41 0.3 0.22GRU + WMAE+Bal 0.25 0.34 0.36 0.34 0.31 0.28 0.21 0.15 0.44 0.57 0.59 0.55 0.47 0.41 0.29 0.21GRU + WMAE+Adv 0.34 0.36 0.36 0.34 0.31 0.27 0.2 0.14 0.59 0.6 0.6 0.54 0.47 0.4 0.28 0.2GRU +WMA+Atn 0.04 0.31 0.36 0.35 0.32 0.29 0.22 0.16 0.21 0.51 0.59 0.56 0.49 0.43 0.32 0.23GRU + WMAE+Adv+Atn 0.32 0.36 0.36 0.35 0.32 0.29 0.23 0.16 0.56 0.6 0.61 0.57 0.5 0.43 0.32 0.23

Table 7: HSS and CSI score for Highest Rainfall Month in 2018, namely August on first hour prediction