[PDF] Classification and understanding of cloud structures via satellite images with EfficientUNet

Abstract

Climate change has been a common interest and the forefront of crucial political discussion and decision-making for many years. Shallow clouds play a significant role in understanding the Earth's climate, but they are challenging to interpret and represent in a climate model. By classifying these cloud structures, there is a better possibility of understanding the physical structures of the clouds, which would improve the climate model generation, resulting in a better prediction of climate change or forecasting weather update. Clouds organise in many forms, which makes it challenging to build traditional rule-based algorithms to separate cloud features. In this paper, classification of cloud organization patterns was performed using a new scaled-up version of Convolutional Neural Network (CNN) named as EfficientNet as the encoder and UNet as decoder where they worked as feature extractor and reconstructor of fine grained feature map and was used as a classifier, which will help experts to understand how clouds will shape the future climate. By using a segmentation model in a classification task, it was shown that with a good encoder alongside UNet, it is possible to obtain good performance from this dataset. Dice coefficient has been used for the final evaluation metric, which gave the score of 66.26\% and 66.02\% for public and private (test set) leaderboard on Kaggle competition respectively.

Full PDF

CClassiﬁcation and understanding of cloud structuresvia satellite images with EfﬁcientUNet

Tashin AhmedDhaka, [email protected] Noor Hossain Nuri SababDhaka, [email protected]

Abstract —Climate change has been a common interest andthe forefront of crucial political discussion and decision-makingfor many years. Shallow clouds play a signiﬁcant role in un-derstanding the Earth’s climate, but they are challenging tointerpret and represent in a climate model. By classifying thesecloud structures, there is a better possibility of understandingthe physical structures of the clouds, which would improve theclimate model generation, resulting in a better prediction ofclimate change or forecasting weather update. Clouds organisein many forms, which makes it challenging to build traditionalrule-based algorithms to separate cloud features. In this paper,classiﬁcation of cloud organization patterns was performed usinga new scaled-up version of Convolutional Neural Network (CNN)named as EfﬁcientNet as the encoder and UNet as decoderwhere they worked as feature extractor and reconstructor of ﬁnegrained feature map and was used as a classiﬁer, which will helpexperts to understand how clouds will shape the future climate.By using a segmentation model in a classiﬁcation task, it wasshown that with a good encoder alongside UNet, it is possible toobtain good performance from this dataset. Dice coefﬁcient hasbeen used for the ﬁnal evaluation metric, which gave the scoreof 66.26% and 66.02% for public and private leaderboard onKaggle competition respectively.

Index Terms —EfﬁcientNet, UNet, clouds, Dice coefﬁcient

I. I

NTRODUCTION

Clouds play an important role in climate by controllingthe amount of solar energy that reaches the surface of theEarth and the amount of the Earth’s energy that is radiatedback into space. The more energy that is trapped inside theplanet, the warmer the atmosphere becomes, giving rise tosea level via meltdown of polar ice caps and contributing toglobal warming. The less energy that is trapped, the colderthe temperature becomes. Understanding the structure of theclouds gives a better insight into the planet’s weather. Henceit is crucial to climatologists [1].

Albedo is a measure ofhow much energy is reﬂected without being absorbed [2].White surfaces reﬂect the most energy; hence it has a highalbedo, while dark surfaces absorb most energy, indicating alow albedo. Earth’s albedo is 0.3, which indicate the warmingof the climate [3]. Interpreting cloud structures provide usefulinsight into the abuse of Earth’s climate and risks associatedwith it. Satellite images of clouds give a broader picture of theatmosphere, and interpreting the images provide informationon the current situation of the planet.It is assumed that as the overall temperature of Earth in-creases, it will evaporate more water from the oceans, resultingin more clouds with different structures and variations [4]. The

Fig. 1. Masked Images of four different class samples. general effect of clouds on climate change depends on whichcloud types change, and whether they become more or lessabundant, thicker or thinner, and higher or lower in altitude.There are several types of cloud structures.

Cirrus clouds arethe most abundant of all top-level clouds. Cirrus means a ”curlof hair” [5]. These feathery clouds are composed of ice andconsist of long, thin streamers that are also known as mare’stails. A few scattered cirrus clouds is a good sign of goodweather. However, a gradually increasing cover of web-likecirrus clouds is a sign of more humid air mass and stormis approaching.

Cirrostratus clouds look like thin scatteredlines that spread themselves across the sky. When these icyfragments cover the sky, they give the air a subtle, whiteappearance. These clouds can indicate the approach of rain.They are translucent so that the sun and moon can be readilyseen through them. Cirrostratus clouds are usually visible 12 to24 hours before a period of rain or snow [6].

Cirrocumulus isanother type of high cloud, which tend to be broad groupingsof white streaks that are sometimes seemingly neatly aligned.During the summertime in the tropics, they could indicate anincoming of a hurricane [7]. There are many more forms ofclouds which determine the weather and provides an indicationof any natural disaster, which can be handled in a low-riskmanner if correctly detected beforehand.Formation of clouds and detecting their structures andpatterns beforehand allows a lot of high-risk activities to be a r X i v : . [ ee ss . I V ] S e p voided previously. Aircraft ﬂights are a high-risk activity thatcarries thousands of passengers at a given interval of time,and ﬂying through clouds is similar to driving a car througha thick fog - it is difﬁcult to see what is ahead, making it achallenging maneuver to accomplish, as a conequence learningabout cloud formations and their potential dangers when ﬂyingis a vital part of pilot training in some countries [8]. Cloudstructures like cumulonimbus are a direct threat to aircraft[9]. Cloud-borne updrafts and downdrafts result in rapid andunpredictable changes to the lift force on aircraft wings. Thesechanges cause the plane to lurch and jump about during ﬂight,known as turbulence, which sometimes makes less experiencedpilots lose control of the craft, and the result is often fatal,with high casualties. Cargo ships, on the other hand, rely alot on the weather of the sea, which is often unpredictableand ever-changing. Cargoes usually have a tight schedule anddelay causes a loss of hundreds of thousands of dollars in fuelconsumption. Storms also delay shipping which contributes tothe loss of millions of dollars. An early prediction of stormsor change in weather saves many lives and millions of dollars.Interpretation of satellite images of cloud structures requiresthe expertise of a well-trained meteorologist, although it isnot feasible and not always readily available. An intelligentautomated system to interpret the satellite images, therefore,becomes a promising alternative with ease of access and hencebecomes quite desirable for the understanding of cloud struc-tures. In this paper, a model has been developed to classifythe clouds into four categories using satellite images, withclassiﬁcation architecture like EfﬁcientNet and segmentationarchitecture UNet [10] [11].II. D ATASET

The dataset consists of satellite images, courtesy of

NASAWorldview , gathered from Kaggle competition ”Understand-ing Clouds from satellite images” . The images contain cloudsof four classes namely:

Fish , Flower , Sugar and

Gravel .The images were taken from three regions, spanning 21 ◦ longitude and 14 ◦ latitude. The true-color images were takenfrom two polar-orbiting satellites, Terra and Aqua. An imagemight be attached from two orbits, due to the small footprintof Moderate Resolution Imaging Spectroradiometer (MODIS) onboard these satellites. The remaining area, which has notbeen covered by two succeeding orbits, is marked black, asshown in Figure 1 .The dataset was split into train and validation of 80:20. Thetraining sample was perfectly balanced with 22184 images,which consists of × images, where each class has5546 images. All images are of the same size, which is × pixels. 68 scientists labelled images in thetrain set, and 3 individual ones labelled each image. Severalaugmentation techniques were applied on the images, whichis a process used to artiﬁcially expand the size of a trainingdataset by creating modiﬁed versions of images in the dataset,which improves the performance and ability of the model togeneralise. Albumentations library has been used for augmen-tation [12]. This library efﬁciently implements an abundant variety of image transform operations that are optimized forperformance. The images were augmented into four types:horizontal ﬂip, vertical ﬂip, random rotation 20 ◦ and griddistortion. Also some adjustment have been done in image sizeto feed into efﬁcentUNet architecture mentioned in Section V .III. M

ATERIALS

A. Public and Private Leaderboard Score

Since the dataset was gathered from a Kaggle competition,there were two sets of scores that competed among others.Public LB scores are those which are shown while thecompetition is ongoing. It shows the outcome from a subsetof the test dataset. Private LB scores get generated after thecompetition is over, that provides scores on the remaining testdataset. As a result, public LB scores are usually better thanthe private. For this particular competition, private score wascalculated on 75% of the test data.Pixel encoding technique was followed to participate in thesubmission of the competition since the image sizes were toolarge for Kaggle system [13]. As a result, instead of submittingan exhaustive list of indices for segmentation, pairs of valueswere submitted, which contained the start position and therun length of the image pixels. For example, a pair value of(1, 3) indicates that the pixel starts at 1 and run 3 pixels.The competition also required a space-delimited list of pairs.The predicted encodings were scaled by 0.25 per side, whichscaled down the images of size × pixels in bothtrain and test set to × pixels, hence allowing the scopeto achieve reasonable submission evaluation times. B. Dice coefﬁcient (DSC)

In this paper, the evaluation metric used is the Dice coef-ﬁcient to measure the quality of the model, as instructed byKaggle competition. It was used to compare the pixel-wiseagreement between a predicted segmentation and correspond-ing ground truth, using the following equation:

DSC = 2 ∗ | X ∩ Y || X | + | Y | (1)Here X is the predicted set of pixels, and Y is the groundtruth. The dice coefﬁcient is deﬁned to be 1 when both X andY are empty. The LB score is the mean of the Dice coefﬁcientsfor each (Image, Label) pair in the test set.Dice coefﬁcients are slightly different from the more pop-ular evaluation metric: accuracy of a model. They are usedto quantify the performance of image segmentation methods.Some ground truth regions are annotated in the images, andthen an automated algorithm is allowed to do it. The algorithmis validated by calculating the dice score, which is a measureof how similar the objects are and is calculated by the overlapof the two segmentations divided by the total size of the twoobjects [14]. LB: Leaderboard nputImageConv2DBlock 1 Block 2 Block 3 Block 4Block 5Block 6 ConcatUp-Conv2DConcatUp-Conv2D Up-Conv2D Concat Up-Conv2D Concat OutputImageConv2DBlock 7 Up-Conv2D

Fig. 2. Architecture of EfﬁcientUNet with EfﬁcientNet-B0 framework for semantic segmentation. Blocks of EfﬁcientNet-B0 as encoder has been shown in

Figure 3

C. Optimizer

Rectiﬁed Adam (RAdam) was used instead of Adam as anoptimizer for high accuracy and fewer epochs [15]. C on v X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X I npu t I m age F ea t u r e M ap B l o ck B l o ck B l o ck B l o ck B l o ck B l o ck B l o ck Fig. 3. Architecture of EfﬁcientNet-B0 with MBConv as Basic buildingblocks.

D. Loss Function

Categorical Cross-entropy (CCE) has been applied as aloss function as it is a multi-label classiﬁcation task. It isdesigned to quantify the difference between two probabilitydistributions.

Loss = − outputsize (cid:88) i =1 y i · log ˆ y i Here ˆ y i is the i -th scalar value in the model output, y i denotes the corresponding target value, and output size givesthe number of scalar values in the model output.This function is important to measure and distinguish twodiscrete probability distribution. y i speciﬁes probability thatevent i occurred and sum of all y i is 1, indicating that preciselyone event occurred. The negative sign ensures that the loss getssmaller when the distributions get closer to each other. Softmax activation function is recommended with categor-ical crossentropy, which rescales the model output to ensure ithas the right properties, as positive outputs are desirable so thatlogarithm of every output value ˆ y i exists. The main appeal ofthis loss function is to compare two probability distributions.For the classiﬁcation part loss score was calculated fromthe CCE and for the segmentation part it was calculated as CCE × . DICE × . . DSC is a measure of overlapbetween corresponding pixel values of prediction and groundtruth respectively. Range of DSC is between 0 and 1 as itis understandable from Subsection III-B and the larger thebetter. So, Dice Loss (DICE) try to maximize the overlapbetween the above mentioned two sets (predcitions and groundtruth pixel values) [16].IV. D

ESCRIPTION OF APPLIED ARCHITECTURES

A. EfﬁcientNet

EfﬁcientNet presented by Google AI research is consideredas a group of CNN models, but with subtle improvements,it works better than its predecessors [10]. It consists of 8models from B0 to B7, where each subsequent model numberrefers to variants with more parameters and higher accuracy.EfﬁcientNet works in three ways: • Depthwise + Pointwise Convolution : Depthwise con-volution performs independently over each channel ofinput. This is a spatial convolution. Pointwise convolutionprojects the channel’s output by the depthwise convolu-tion onto a new channel space. This is a × convolution. • Inverse Res : ResNet blocks consist of – a layer that squeezes the channels – a layer that extends the channels.In this way, it links skip connections to rich channellayers [17]. Linear Bottlneck : In each block, it uses linear activationin the last layer to prevent loss of information from ReLU[18].Among the 8 models of EfﬁcientNet, 6 models, namelyfrom B0 to B5, were explored in this paper. Due to therise in complexity, the remaining models were ignored asthey produced underappreciated results with poor performancewhile absorbing precious runtime.As mentioned earlier, EfﬁcientNet has 8 models, B0 - B7,among which, ﬁrst 6 models have been explored in this paper.The layers in each of the models (B0 - B7) can be created byusing 5 standard modules shown in

Figure 4 . • Module 1 acts as the starting block for the sub-blocks. • Module 2 acts as the initializing point for the ﬁrst sub-block of all the 7 main blocks except the 1st block. • Module 3 is used as a skip connection block for all thesub-blocks. • Module 4 combines the skip connections that occurredin the ﬁrst sub-blocks. • Module 5 combines each sub-block that is connected toits previous sub-block in a skip connection.

DepthwiseConv2D BatchNormalization Activation

MODULE 1

MODULE 1 MODULE 1ZeroPadding

MODULE 2

GlobalAveragePooling Rescaling Conv2D Conv2D

MODULE 3

Multiply Conv2D BatchNormalization

MODULE 4

MODULE 4 Dropout

MODULE 5

Fig. 4. Common modules used to implement layers of all 8 models ofEfﬁcientNet

The individual modules are further used in various order tocreate sub-blocks, as shown in

Figure 5 . Its easy to observethe difference among the models, with a gradual increasein the number of sub-blocks. The main building block forEfﬁcientNet is MBConv layer which is an inverted residualblock originally applied in MobileNetV2 [19]. Basic buildingblock of EfﬁcientNet-B0 with respect to MBConv layers havebeen shown in

Figure 3 . The 8 models of EfﬁcientNet (B0- B7) share the common blocks with subtle complexities intheir architectures. EfﬁcientNet is a scaled-up neural network architecture,where the models scale all dimensions with a compoundcoefﬁcient, which is a newly proposed method known as compound scaling [20]. Here scaling up is deﬁned as asystematic, principled scaling of three factors, which are depth,width and resolution. • Width scale adds more feature maps in each layer. • Depth scale adds more layers to the network. • Resoultion scale increase resolution of input images.

Sub-block 1Module 1Module 3Module 4 Sub-block 2Module 2Module 3Module 4 Sub-block 3Module 2Module 3Module 5

Fig. 5. Sub-blocks using individual modules presented in

Figure 4

Every architecture has similarity with its earlier versions.The only difference is the different feature maps that increasethe number of parameters. All the models have the samearchitecture as its previous one, except for the multiplied block(x2) that expands and covers more blocks. This gives a lot ofparameters to be used in a calculation, making it a very robustmodel. Its not difﬁcult to observe the changes among all themodels, and they gradually increased the number of sub-blocks[10]. Starting from EfﬁcientNet-B0, compound scaling methodwas used to scale up with two steps: • Step 1 coefﬁcient was ﬁxed to 1 assuming twice moreresources to be available, and it enforces a small gridsearch for the networks depth, width and resolutionconstants. • Step 2

The constants then get ﬁxed and scaled up thebaseline network with different coefﬁcient to obtain thesuccessive versions from B1 to B7.EfﬁcientNet was prioritized in this paper due to limitationsof Kaggle notebook as well. It was also the core reason whythe remaining EfﬁcientNet models were avoided, since theycalculate a substantial amount of parameters, which takes alot of processing power and time, producing a disappointingoutcome.

B. U-Net

U-Net is based on the fully convolutional network and itsarchitecture was modiﬁed and extended to work with fewertraining images and to produce more precise segmentation.The idea here is to enhance a contracting layer by successivelayers, where pooling operations are replaced by upsamplingoperators and these layers increase the resolution of the output. successive convolutional layer then learn to assemble aprecise output based on this information [11].U-Net have a large number of feature channels in theupsampling part and it allows the network to propagate con-text information to higher resolution layers. As a result, theexpansive path is more or less symmetric to the contractingpart, and produces a u-shaped architecture [21].The network consists of a contracting path and an expansivepath, which gives it the u-shaped architecture. The contractingpath is a typical convolutional network that consists of re-peated application of convolutions, each followed by a rectiﬁedlinear unit (ReLU) and a max pooling operation. During thecontraction, the spatial information is reduced while featureinformation is increased. The expansive pathway combinesthe feature and spatial information through a sequence of up-convolutions and concatenations with high-resolution featuresfrom the contracting path [11].V. E

FFICIENT U NET

In conventional UNet, expansion path is nearly symmetricto the contracting path. In this work, EfﬁcientNet was used asan encoder in contracting path instead of conventional set ofconvolution layers. The decoder module is similar to the orig-inal UNet. Details of the proposed architecture are illustratedin

Figure 2 . The original input image size is × thenresized the images to × for further processing. Thenumber of levels, resolution and number of channels of eachfeature map is also shown in the Figure 2 . First, the featuremap of last logit of the encoder was upsampled bi-linearly bya factor of two and then concatenated with the feature mapfrom encoder having same spatial resolution. It was followedby 33 convolution layers before again upsampling by the factorof two. The process was repeated till the segmentation mapof size equal to input image was reconstructed. The proposedarchitecture is asymmetric unlike the original UNet. Here, thecontracting path is deeper than the expansion path. Inclusion ofpowerful CNN like EfﬁcientNet as encoder improved overallperformance of the algorithm [22].VI. M

ETHODS , R

ESULTS AND D ISCUSSION

Experimentation has been undertaken by applying six differ-ent versions of EfﬁcentNet architectures mentioned previously.Mean Validation Accuracy (mVA) has been measured forall EfﬁcentNet architectures for both baseline and ﬁne-tunedversions. The outcome of this part of the inspection has beenpresented in

Table I . EfﬁcientNetVersion mean validation accuracyBaseline Fine TunedB0 0.670 0.836B1 0.656 0.829B2 0.657 0.827B3 0.659 0.825B4 0.640 0.798B5 0.601 0.732TABLE IM

EAN V ALIDATION A CCURACY S CORES

Cross validation result along with the public and private LBscore (DSC) has been presented in

Table II . EfﬁcientNetVersion CrossValidation LB scorePrivate PublicB0 0.6389 0.64595 0.65936B1 0.6423 0.64640 0.65849B2 0.6351 0.64551 0.65801B3 0.6311 0.64585 0.65820B4 0.6294 0.63911 0.65563B5 0.6255 0.64059 0.65325TABLE IIC

ROSS V ALIDATION , P

RIVATE AND P UBLIC

DSC LB S

CORES OF E FFICIENT N ET A RCHITECTURE FOR C LASSIFICATION

The mean validation accuracy (mVA) score was best for theB0 model of EfﬁcientNet architecture, as shown in

Table I ,although the other versions (B1-B5) came close on both privateand public leaderboard scores, as shown in

Table II . The crossvalidation scores weren’t stable since the image segmentationwasn’t reliable, with ﬂuctuations in the outcome, having only63.89% in B0 while the other models had a higher value,as a result it was important to use an efﬁcient segmentationapproach to reach a stable and accurate result, as shown in

Table III . It became visible that an improved segmentationof images stabilized the scores, with a gradual trend in thescores and model B0 showing the best response in both crossvalidation and LB values, with 66.54% in cross validation,66.02% in private and 66.26% in public leaderboard. The othermodels werent far behind either, and the scores decreased ina descending order with B1 being the second-best model with66.01% on cross validation, 65.53% on private and 65.98%on public leaderboard.

EfﬁcientNetVersion CrossValidation LB scorePrivate PublicB0 0.6654 0.6602 0.6626B1 0.6601 0.6553 0.6598B2 0.6589 0.6570 0.6578B3 0.6588 0.6500 0.6582B4 0.6425 0.6417 0.6421B5 0.6322 0.6319 0.6333TABLE IIIC

ROSS V ALIDATION , P

RIVATE AND P UBLIC

DSC LB S

CORES OF E FFICIENT N ET U NET FOR C LASSIFICATION

VII. C

ONCLUSION

This paper implements classiﬁcation of satellite images ofcloud structures into four different classes: Sugar, Gravel, Fishand Flower. 6 versions of EfﬁcientNet from B0 to B5 wereused as encoder and UNet as decoder has been applied. Dicecoefﬁcient was used as the evaluation metric. The scores usedwere compared with both public and private LB scores ofKaggle competition. By using a segmentation model like UNetin a classiﬁcation problem, it was proven that with a goodencoder it is possible to achieve good performance from thedataset. Although EfﬁcientNet was used in this paper, it couldbe replaced with a different model as well but was not tested inthis research. Also, a good segmentation of the images boosthe output of the classiﬁcation drastically, which was alsoproven in this paper. In future, estimation of the distributionof classes and adjustment of the validation set could beimplemented accordingly, although it would only be ideal forthe competition. As the complex architectures of EfﬁcientNetgave less appreciating results because of default coefﬁcients,altering these hyperparameters according to the dataset willhopefully improve the outcome. Exploring gradient weightedclass activation mapping to generate a baseline, which is aclass explainability technique, could be achieved.A

CKNOWLEDGMENT

We thank Kaggle for hosting such a great contest andfor their kernel. We also thank Max Planck Institute forMeteorology for providing the dataset. We also admire PavelYakubovskiy for sharing his contribution in GitHub repositoryfrom where we managed to access the pretrained imagenetweights. R

EFERENCES[1] D. D. Turner, A. Vogelmann, R. T. Austin, J. C. Barnard, K. Cady-Pereira, J. C. Chiu, S. A. Clough, C. Flynn, M. M. Khaiyer, J. Liljegren, et al. , “Thin liquid water clouds: Their importance and our challenge,”

Bulletin of the American Meteorological Society , vol. 88, no. 2, pp. 177–190, 2007.[2] S. Twomey et al. , “Pollution and the planetary albedo,”

Atmos. Environ ,vol. 8, no. 12, pp. 1251–1256, 1974.[3] E. Pall´e, P. Goode, P. Montanes-Rodriguez, and S. Koonin, “Changesin earth’s reﬂectance over the past two decades,” science , vol. 304,no. 5675, pp. 1299–1301, 2004.[4] J. Leconte, F. Forget, B. Charnay, R. Wordsworth, and A. Pottier,“Increased insolation threshold for runaway greenhouse processes onearth-like planets,”

Nature , vol. 504, no. 7479, pp. 268–271, 2013.[5] D. R. Dowling and L. F. Radke, “A summary of the physical properties ofcirrus clouds,”

Journal of Applied Meteorology , vol. 29, no. 9, pp. 970–978, 1990.[6] M. Gadsden and W. Schr¨oder, “Noctilucent clouds,” in

NoctilucentClouds , pp. 1–12, Springer, 1989.[7] G. S. McLean, “Cloud distributions in the vicinity of jet streams,”

Bulletin of the American Meteorological Society , vol. 38, no. 10,pp. 579–583, 1957.[8] Q. Zhang, J. Quan, X. Tie, M. Huang, and X. Ma, “Impact of aerosolparticles on cloud formation: Aircraft measurements in china,”

Atmo-spheric Environment , vol. 45, no. 3, pp. 665–672, 2011.[9] J. Mason, W. Strapp, and P. Chow, “The ice particle threat to engines inﬂight,” in , p. 206,2006.[10] M. Tan and Q. V. Le, “Efﬁcientnet: Rethinking model scaling forconvolutional neural networks,” arXiv preprint arXiv:1905.11946 , 2019.[11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in

International Conference onMedical image computing and computer-assisted intervention , pp. 234–241, Springer, 2015.[12] A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin,and A. A. Kalinin, “Albumentations: fast and ﬂexible image augmenta-tions,”

Information , vol. 11, no. 2, p. 125, 2020.[13] N. D. Richards, “Method of encoding image pixel values for storage ascompressed digital data and method of decoding the compressed digitaldata,” Dec. 14 1993. US Patent 5,270,812.[14] L. R. Dice, “Measures of the amount of ecologic association betweenspecies,”

Ecology , vol. 26, no. 3, pp. 297–302, 1945.[15] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “Onthe variance of the adaptive learning rate and beyond,” arXiv preprintarXiv:1908.03265 , 2019. [16] C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso,“Generalised dice overlap as a deep learning loss function for highlyunbalanced segmentations,” in

Deep learning in medical image analysisand multimodal learning for clinical decision support , pp. 240–248,Springer, 2017.[17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in

Proceedings of the IEEE conference on computer visionand pattern recognition , pp. 770–778, 2016.[18] A. F. Agarap, “Deep learning using rectiﬁed linear units (relu),” arXivpreprint arXiv:1803.08375 , 2018.[19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,“Mobilenetv2: Inverted residuals and linear bottlenecks,” in

Proceedingsof the IEEE conference on computer vision and pattern recognition ,pp. 4510–4520, 2018.[20] J. Lee, T. Won, and K. Hong, “Compounding the performance improve-ments of assembled techniques in a convolutional neural network,” arXivpreprint arXiv:2001.06268 , 2020.[21] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networksfor semantic segmentation,” in

Proceedings of the IEEE conference oncomputer vision and pattern recognition , pp. 3431–3440, 2015.[22] B. Baheti, S. Innani, S. Gajre, and S. Talbar, “Eff-unet: A novelarchitecture for semantic segmentation in unstructured environment,”in

Proceedings of the IEEE/CVF Conference on Computer Vision andPattern Recognition Workshops , pp. 358–359, 2020.

PPENDIX

PR Curves

Precision-Recall(PR) curve demonstrates the relationship between precision (positive predictive value) and recall (sensitivity)of a machine learning model. Precision gives the percentage of the relevant outcome, while recall indicates the total consistentresult. PR curve is a vital evaluation metric as it provides a more informative picture of an algorithm’s performance. X-axisshows recall while the Y-axis shows precision.The precision-recall curve shows the trade-off between precision and recall for different threshold. A high area under thecurve shows both high recall and high precision, where high precision relates to a low false-positive rate, and high recallrefers to a ﬂat false-negative rate. Since cloud structures are complex, it was essential to understand whether the implementedmodel was correctly detecting the shapes and evaluating true positive, true negative scores appropriately, hence PR curve wasa crucial evaluation metric to understand the learning rate of the model.Precision-Recall curves for all four classes are shown below. Highest precision (0.74) corresponding to recall threshold wasobtained for class: Sugar and lowest for class: Fish which was 0.55, while the maximum and minimum recall for class: Flowerwas 0.49 and class: Fish was 0.22.

Fig. 6. PR-Curves for All Four Classes

R-AUC; EfﬁcientNet-B0

It takes nearly 25 epochs to train the model to reach the best result while mean PR AUC still increases even after 30 epochsfor both train and validation set, showed in

Figure 7 . Fig. 7. Training and Validation PR-AUC

Loss Graph; EfﬁcientNet-B0

Train and validation loss graph of EfﬁcentNet-B0 which clearly shows that EfﬁcientNet architecture is not enough to trainthis data as the loss is not decreasing.

Fig. 8. Training and Validation Loss (CCE) for EfﬁcentNet-B0 oss Graph; EfﬁcientUNet-B0

Applying EfﬁcientNet as encoder gave a huge success than only EfﬁcientNet as segmentation model, since UNet was ableto do the segmentation way better than a regular classiﬁcation model.