Classification and understanding of cloud structures via satellite images with EfficientUNet
CClassification and understanding of cloud structuresvia satellite images with EfficientUNet
Tashin AhmedDhaka, [email protected] Noor Hossain Nuri SababDhaka, [email protected]
Abstract —Climate change has been a common interest andthe forefront of crucial political discussion and decision-makingfor many years. Shallow clouds play a significant role in un-derstanding the Earth’s climate, but they are challenging tointerpret and represent in a climate model. By classifying thesecloud structures, there is a better possibility of understandingthe physical structures of the clouds, which would improve theclimate model generation, resulting in a better prediction ofclimate change or forecasting weather update. Clouds organisein many forms, which makes it challenging to build traditionalrule-based algorithms to separate cloud features. In this paper,classification of cloud organization patterns was performed usinga new scaled-up version of Convolutional Neural Network (CNN)named as EfficientNet as the encoder and UNet as decoderwhere they worked as feature extractor and reconstructor of finegrained feature map and was used as a classifier, which will helpexperts to understand how clouds will shape the future climate.By using a segmentation model in a classification task, it wasshown that with a good encoder alongside UNet, it is possible toobtain good performance from this dataset. Dice coefficient hasbeen used for the final evaluation metric, which gave the scoreof 66.26% and 66.02% for public and private leaderboard onKaggle competition respectively.
Index Terms —EfficientNet, UNet, clouds, Dice coefficient
I. I
NTRODUCTION
Clouds play an important role in climate by controllingthe amount of solar energy that reaches the surface of theEarth and the amount of the Earth’s energy that is radiatedback into space. The more energy that is trapped inside theplanet, the warmer the atmosphere becomes, giving rise tosea level via meltdown of polar ice caps and contributing toglobal warming. The less energy that is trapped, the colderthe temperature becomes. Understanding the structure of theclouds gives a better insight into the planet’s weather. Henceit is crucial to climatologists [1].
Albedo is a measure ofhow much energy is reflected without being absorbed [2].White surfaces reflect the most energy; hence it has a highalbedo, while dark surfaces absorb most energy, indicating alow albedo. Earth’s albedo is 0.3, which indicate the warmingof the climate [3]. Interpreting cloud structures provide usefulinsight into the abuse of Earth’s climate and risks associatedwith it. Satellite images of clouds give a broader picture of theatmosphere, and interpreting the images provide informationon the current situation of the planet.It is assumed that as the overall temperature of Earth in-creases, it will evaporate more water from the oceans, resultingin more clouds with different structures and variations [4]. The
Fig. 1. Masked Images of four different class samples. general effect of clouds on climate change depends on whichcloud types change, and whether they become more or lessabundant, thicker or thinner, and higher or lower in altitude.There are several types of cloud structures.
Cirrus clouds arethe most abundant of all top-level clouds. Cirrus means a ”curlof hair” [5]. These feathery clouds are composed of ice andconsist of long, thin streamers that are also known as mare’stails. A few scattered cirrus clouds is a good sign of goodweather. However, a gradually increasing cover of web-likecirrus clouds is a sign of more humid air mass and stormis approaching.
Cirrostratus clouds look like thin scatteredlines that spread themselves across the sky. When these icyfragments cover the sky, they give the air a subtle, whiteappearance. These clouds can indicate the approach of rain.They are translucent so that the sun and moon can be readilyseen through them. Cirrostratus clouds are usually visible 12 to24 hours before a period of rain or snow [6].
Cirrocumulus isanother type of high cloud, which tend to be broad groupingsof white streaks that are sometimes seemingly neatly aligned.During the summertime in the tropics, they could indicate anincoming of a hurricane [7]. There are many more forms ofclouds which determine the weather and provides an indicationof any natural disaster, which can be handled in a low-riskmanner if correctly detected beforehand.Formation of clouds and detecting their structures andpatterns beforehand allows a lot of high-risk activities to be a r X i v : . [ ee ss . I V ] S e p voided previously. Aircraft flights are a high-risk activity thatcarries thousands of passengers at a given interval of time,and flying through clouds is similar to driving a car througha thick fog - it is difficult to see what is ahead, making it achallenging maneuver to accomplish, as a conequence learningabout cloud formations and their potential dangers when flyingis a vital part of pilot training in some countries [8]. Cloudstructures like cumulonimbus are a direct threat to aircraft[9]. Cloud-borne updrafts and downdrafts result in rapid andunpredictable changes to the lift force on aircraft wings. Thesechanges cause the plane to lurch and jump about during flight,known as turbulence, which sometimes makes less experiencedpilots lose control of the craft, and the result is often fatal,with high casualties. Cargo ships, on the other hand, rely alot on the weather of the sea, which is often unpredictableand ever-changing. Cargoes usually have a tight schedule anddelay causes a loss of hundreds of thousands of dollars in fuelconsumption. Storms also delay shipping which contributes tothe loss of millions of dollars. An early prediction of stormsor change in weather saves many lives and millions of dollars.Interpretation of satellite images of cloud structures requiresthe expertise of a well-trained meteorologist, although it isnot feasible and not always readily available. An intelligentautomated system to interpret the satellite images, therefore,becomes a promising alternative with ease of access and hencebecomes quite desirable for the understanding of cloud struc-tures. In this paper, a model has been developed to classifythe clouds into four categories using satellite images, withclassification architecture like EfficientNet and segmentationarchitecture UNet [10] [11].II. D ATASET
The dataset consists of satellite images, courtesy of
NASAWorldview , gathered from Kaggle competition ”Understand-ing Clouds from satellite images” . The images contain cloudsof four classes namely:
Fish , Flower , Sugar and
Gravel .The images were taken from three regions, spanning 21 ◦ longitude and 14 ◦ latitude. The true-color images were takenfrom two polar-orbiting satellites, Terra and Aqua. An imagemight be attached from two orbits, due to the small footprintof Moderate Resolution Imaging Spectroradiometer (MODIS) onboard these satellites. The remaining area, which has notbeen covered by two succeeding orbits, is marked black, asshown in Figure 1 .The dataset was split into train and validation of 80:20. Thetraining sample was perfectly balanced with 22184 images,which consists of × images, where each class has5546 images. All images are of the same size, which is × pixels. 68 scientists labelled images in thetrain set, and 3 individual ones labelled each image. Severalaugmentation techniques were applied on the images, whichis a process used to artificially expand the size of a trainingdataset by creating modified versions of images in the dataset,which improves the performance and ability of the model togeneralise. Albumentations library has been used for augmen-tation [12]. This library efficiently implements an abundant variety of image transform operations that are optimized forperformance. The images were augmented into four types:horizontal flip, vertical flip, random rotation 20 ◦ and griddistortion. Also some adjustment have been done in image sizeto feed into efficentUNet architecture mentioned in Section V .III. M
ATERIALS
A. Public and Private Leaderboard Score
Since the dataset was gathered from a Kaggle competition,there were two sets of scores that competed among others.Public LB scores are those which are shown while thecompetition is ongoing. It shows the outcome from a subsetof the test dataset. Private LB scores get generated after thecompetition is over, that provides scores on the remaining testdataset. As a result, public LB scores are usually better thanthe private. For this particular competition, private score wascalculated on 75% of the test data.Pixel encoding technique was followed to participate in thesubmission of the competition since the image sizes were toolarge for Kaggle system [13]. As a result, instead of submittingan exhaustive list of indices for segmentation, pairs of valueswere submitted, which contained the start position and therun length of the image pixels. For example, a pair value of(1, 3) indicates that the pixel starts at 1 and run 3 pixels.The competition also required a space-delimited list of pairs.The predicted encodings were scaled by 0.25 per side, whichscaled down the images of size × pixels in bothtrain and test set to × pixels, hence allowing the scopeto achieve reasonable submission evaluation times. B. Dice coefficient (DSC)
In this paper, the evaluation metric used is the Dice coef-ficient to measure the quality of the model, as instructed byKaggle competition. It was used to compare the pixel-wiseagreement between a predicted segmentation and correspond-ing ground truth, using the following equation:
DSC = 2 ∗ | X ∩ Y || X | + | Y | (1)Here X is the predicted set of pixels, and Y is the groundtruth. The dice coefficient is defined to be 1 when both X andY are empty. The LB score is the mean of the Dice coefficientsfor each (Image, Label) pair in the test set.Dice coefficients are slightly different from the more pop-ular evaluation metric: accuracy of a model. They are usedto quantify the performance of image segmentation methods.Some ground truth regions are annotated in the images, andthen an automated algorithm is allowed to do it. The algorithmis validated by calculating the dice score, which is a measureof how similar the objects are and is calculated by the overlapof the two segmentations divided by the total size of the twoobjects [14]. LB: Leaderboard nputImageConv2DBlock 1 Block 2 Block 3 Block 4Block 5Block 6 ConcatUp-Conv2DConcatUp-Conv2D Up-Conv2D Concat Up-Conv2D Concat OutputImageConv2DBlock 7 Up-Conv2D
Fig. 2. Architecture of EfficientUNet with EfficientNet-B0 framework for semantic segmentation. Blocks of EfficientNet-B0 as encoder has been shown in
Figure 3
C. Optimizer
Rectified Adam (RAdam) was used instead of Adam as anoptimizer for high accuracy and fewer epochs [15]. C on v X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X M B C on v , X I npu t I m age F ea t u r e M ap B l o ck B l o ck B l o ck B l o ck B l o ck B l o ck B l o ck Fig. 3. Architecture of EfficientNet-B0 with MBConv as Basic buildingblocks.
D. Loss Function
Categorical Cross-entropy (CCE) has been applied as aloss function as it is a multi-label classification task. It isdesigned to quantify the difference between two probabilitydistributions.
Loss = − outputsize (cid:88) i =1 y i · log ˆ y i Here ˆ y i is the i -th scalar value in the model output, y i denotes the corresponding target value, and output size givesthe number of scalar values in the model output.This function is important to measure and distinguish twodiscrete probability distribution. y i specifies probability thatevent i occurred and sum of all y i is 1, indicating that preciselyone event occurred. The negative sign ensures that the loss getssmaller when the distributions get closer to each other. Softmax activation function is recommended with categor-ical crossentropy, which rescales the model output to ensure ithas the right properties, as positive outputs are desirable so thatlogarithm of every output value ˆ y i exists. The main appeal ofthis loss function is to compare two probability distributions.For the classification part loss score was calculated fromthe CCE and for the segmentation part it was calculated as CCE × . DICE × . . DSC is a measure of overlapbetween corresponding pixel values of prediction and groundtruth respectively. Range of DSC is between 0 and 1 as itis understandable from Subsection III-B and the larger thebetter. So, Dice Loss (DICE) try to maximize the overlapbetween the above mentioned two sets (predcitions and groundtruth pixel values) [16].IV. D
ESCRIPTION OF APPLIED ARCHITECTURES
A. EfficientNet
EfficientNet presented by Google AI research is consideredas a group of CNN models, but with subtle improvements,it works better than its predecessors [10]. It consists of 8models from B0 to B7, where each subsequent model numberrefers to variants with more parameters and higher accuracy.EfficientNet works in three ways: • Depthwise + Pointwise Convolution : Depthwise con-volution performs independently over each channel ofinput. This is a spatial convolution. Pointwise convolutionprojects the channel’s output by the depthwise convolu-tion onto a new channel space. This is a × convolution. • Inverse Res : ResNet blocks consist of – a layer that squeezes the channels – a layer that extends the channels.In this way, it links skip connections to rich channellayers [17]. Linear Bottlneck : In each block, it uses linear activationin the last layer to prevent loss of information from ReLU[18].Among the 8 models of EfficientNet, 6 models, namelyfrom B0 to B5, were explored in this paper. Due to therise in complexity, the remaining models were ignored asthey produced underappreciated results with poor performancewhile absorbing precious runtime.As mentioned earlier, EfficientNet has 8 models, B0 - B7,among which, first 6 models have been explored in this paper.The layers in each of the models (B0 - B7) can be created byusing 5 standard modules shown in
Figure 4 . • Module 1 acts as the starting block for the sub-blocks. • Module 2 acts as the initializing point for the first sub-block of all the 7 main blocks except the 1st block. • Module 3 is used as a skip connection block for all thesub-blocks. • Module 4 combines the skip connections that occurredin the first sub-blocks. • Module 5 combines each sub-block that is connected toits previous sub-block in a skip connection.
DepthwiseConv2D BatchNormalization Activation
MODULE 1
MODULE 1 MODULE 1ZeroPadding
MODULE 2
GlobalAveragePooling Rescaling Conv2D Conv2D
MODULE 3
Multiply Conv2D BatchNormalization
MODULE 4
MODULE 4 Dropout
MODULE 5
Fig. 4. Common modules used to implement layers of all 8 models ofEfficientNet
The individual modules are further used in various order tocreate sub-blocks, as shown in
Figure 5 . Its easy to observethe difference among the models, with a gradual increasein the number of sub-blocks. The main building block forEfficientNet is MBConv layer which is an inverted residualblock originally applied in MobileNetV2 [19]. Basic buildingblock of EfficientNet-B0 with respect to MBConv layers havebeen shown in
Figure 3 . The 8 models of EfficientNet (B0- B7) share the common blocks with subtle complexities intheir architectures. EfficientNet is a scaled-up neural network architecture,where the models scale all dimensions with a compoundcoefficient, which is a newly proposed method known as compound scaling [20]. Here scaling up is defined as asystematic, principled scaling of three factors, which are depth,width and resolution. • Width scale adds more feature maps in each layer. • Depth scale adds more layers to the network. • Resoultion scale increase resolution of input images.
Sub-block 1Module 1Module 3Module 4 Sub-block 2Module 2Module 3Module 4 Sub-block 3Module 2Module 3Module 5
Fig. 5. Sub-blocks using individual modules presented in
Figure 4
Every architecture has similarity with its earlier versions.The only difference is the different feature maps that increasethe number of parameters. All the models have the samearchitecture as its previous one, except for the multiplied block(x2) that expands and covers more blocks. This gives a lot ofparameters to be used in a calculation, making it a very robustmodel. Its not difficult to observe the changes among all themodels, and they gradually increased the number of sub-blocks[10]. Starting from EfficientNet-B0, compound scaling methodwas used to scale up with two steps: • Step 1 coefficient was fixed to 1 assuming twice moreresources to be available, and it enforces a small gridsearch for the networks depth, width and resolutionconstants. • Step 2
The constants then get fixed and scaled up thebaseline network with different coefficient to obtain thesuccessive versions from B1 to B7.EfficientNet was prioritized in this paper due to limitationsof Kaggle notebook as well. It was also the core reason whythe remaining EfficientNet models were avoided, since theycalculate a substantial amount of parameters, which takes alot of processing power and time, producing a disappointingoutcome.
B. U-Net
U-Net is based on the fully convolutional network and itsarchitecture was modified and extended to work with fewertraining images and to produce more precise segmentation.The idea here is to enhance a contracting layer by successivelayers, where pooling operations are replaced by upsamplingoperators and these layers increase the resolution of the output. successive convolutional layer then learn to assemble aprecise output based on this information [11].U-Net have a large number of feature channels in theupsampling part and it allows the network to propagate con-text information to higher resolution layers. As a result, theexpansive path is more or less symmetric to the contractingpart, and produces a u-shaped architecture [21].The network consists of a contracting path and an expansivepath, which gives it the u-shaped architecture. The contractingpath is a typical convolutional network that consists of re-peated application of convolutions, each followed by a rectifiedlinear unit (ReLU) and a max pooling operation. During thecontraction, the spatial information is reduced while featureinformation is increased. The expansive pathway combinesthe feature and spatial information through a sequence of up-convolutions and concatenations with high-resolution featuresfrom the contracting path [11].V. E
FFICIENT U NET
In conventional UNet, expansion path is nearly symmetricto the contracting path. In this work, EfficientNet was used asan encoder in contracting path instead of conventional set ofconvolution layers. The decoder module is similar to the orig-inal UNet. Details of the proposed architecture are illustratedin
Figure 2 . The original input image size is × thenresized the images to × for further processing. Thenumber of levels, resolution and number of channels of eachfeature map is also shown in the Figure 2 . First, the featuremap of last logit of the encoder was upsampled bi-linearly bya factor of two and then concatenated with the feature mapfrom encoder having same spatial resolution. It was followedby 33 convolution layers before again upsampling by the factorof two. The process was repeated till the segmentation mapof size equal to input image was reconstructed. The proposedarchitecture is asymmetric unlike the original UNet. Here, thecontracting path is deeper than the expansion path. Inclusion ofpowerful CNN like EfficientNet as encoder improved overallperformance of the algorithm [22].VI. M
ETHODS , R
ESULTS AND D ISCUSSION
Experimentation has been undertaken by applying six differ-ent versions of EfficentNet architectures mentioned previously.Mean Validation Accuracy (mVA) has been measured forall EfficentNet architectures for both baseline and fine-tunedversions. The outcome of this part of the inspection has beenpresented in
Table I . EfficientNetVersion mean validation accuracyBaseline Fine TunedB0 0.670 0.836B1 0.656 0.829B2 0.657 0.827B3 0.659 0.825B4 0.640 0.798B5 0.601 0.732TABLE IM
EAN V ALIDATION A CCURACY S CORES
Cross validation result along with the public and private LBscore (DSC) has been presented in
Table II . EfficientNetVersion CrossValidation LB scorePrivate PublicB0 0.6389 0.64595 0.65936B1 0.6423 0.64640 0.65849B2 0.6351 0.64551 0.65801B3 0.6311 0.64585 0.65820B4 0.6294 0.63911 0.65563B5 0.6255 0.64059 0.65325TABLE IIC
ROSS V ALIDATION , P
RIVATE AND P UBLIC
DSC LB S
CORES OF E FFICIENT N ET A RCHITECTURE FOR C LASSIFICATION
The mean validation accuracy (mVA) score was best for theB0 model of EfficientNet architecture, as shown in
Table I ,although the other versions (B1-B5) came close on both privateand public leaderboard scores, as shown in
Table II . The crossvalidation scores weren’t stable since the image segmentationwasn’t reliable, with fluctuations in the outcome, having only63.89% in B0 while the other models had a higher value,as a result it was important to use an efficient segmentationapproach to reach a stable and accurate result, as shown in
Table III . It became visible that an improved segmentationof images stabilized the scores, with a gradual trend in thescores and model B0 showing the best response in both crossvalidation and LB values, with 66.54% in cross validation,66.02% in private and 66.26% in public leaderboard. The othermodels werent far behind either, and the scores decreased ina descending order with B1 being the second-best model with66.01% on cross validation, 65.53% on private and 65.98%on public leaderboard.
EfficientNetVersion CrossValidation LB scorePrivate PublicB0 0.6654 0.6602 0.6626B1 0.6601 0.6553 0.6598B2 0.6589 0.6570 0.6578B3 0.6588 0.6500 0.6582B4 0.6425 0.6417 0.6421B5 0.6322 0.6319 0.6333TABLE IIIC
ROSS V ALIDATION , P
RIVATE AND P UBLIC
DSC LB S
CORES OF E FFICIENT N ET U NET FOR C LASSIFICATION
VII. C
ONCLUSION
This paper implements classification of satellite images ofcloud structures into four different classes: Sugar, Gravel, Fishand Flower. 6 versions of EfficientNet from B0 to B5 wereused as encoder and UNet as decoder has been applied. Dicecoefficient was used as the evaluation metric. The scores usedwere compared with both public and private LB scores ofKaggle competition. By using a segmentation model like UNetin a classification problem, it was proven that with a goodencoder it is possible to achieve good performance from thedataset. Although EfficientNet was used in this paper, it couldbe replaced with a different model as well but was not tested inthis research. Also, a good segmentation of the images boosthe output of the classification drastically, which was alsoproven in this paper. In future, estimation of the distributionof classes and adjustment of the validation set could beimplemented accordingly, although it would only be ideal forthe competition. As the complex architectures of EfficientNetgave less appreciating results because of default coefficients,altering these hyperparameters according to the dataset willhopefully improve the outcome. Exploring gradient weightedclass activation mapping to generate a baseline, which is aclass explainability technique, could be achieved.A
CKNOWLEDGMENT
We thank Kaggle for hosting such a great contest andfor their kernel. We also thank Max Planck Institute forMeteorology for providing the dataset. We also admire PavelYakubovskiy for sharing his contribution in GitHub repositoryfrom where we managed to access the pretrained imagenetweights. R
EFERENCES[1] D. D. Turner, A. Vogelmann, R. T. Austin, J. C. Barnard, K. Cady-Pereira, J. C. Chiu, S. A. Clough, C. Flynn, M. M. Khaiyer, J. Liljegren, et al. , “Thin liquid water clouds: Their importance and our challenge,”
Bulletin of the American Meteorological Society , vol. 88, no. 2, pp. 177–190, 2007.[2] S. Twomey et al. , “Pollution and the planetary albedo,”
Atmos. Environ ,vol. 8, no. 12, pp. 1251–1256, 1974.[3] E. Pall´e, P. Goode, P. Montanes-Rodriguez, and S. Koonin, “Changesin earth’s reflectance over the past two decades,” science , vol. 304,no. 5675, pp. 1299–1301, 2004.[4] J. Leconte, F. Forget, B. Charnay, R. Wordsworth, and A. Pottier,“Increased insolation threshold for runaway greenhouse processes onearth-like planets,”
Nature , vol. 504, no. 7479, pp. 268–271, 2013.[5] D. R. Dowling and L. F. Radke, “A summary of the physical properties ofcirrus clouds,”
Journal of Applied Meteorology , vol. 29, no. 9, pp. 970–978, 1990.[6] M. Gadsden and W. Schr¨oder, “Noctilucent clouds,” in
NoctilucentClouds , pp. 1–12, Springer, 1989.[7] G. S. McLean, “Cloud distributions in the vicinity of jet streams,”
Bulletin of the American Meteorological Society , vol. 38, no. 10,pp. 579–583, 1957.[8] Q. Zhang, J. Quan, X. Tie, M. Huang, and X. Ma, “Impact of aerosolparticles on cloud formation: Aircraft measurements in china,”
Atmo-spheric Environment , vol. 45, no. 3, pp. 665–672, 2011.[9] J. Mason, W. Strapp, and P. Chow, “The ice particle threat to engines inflight,” in , p. 206,2006.[10] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling forconvolutional neural networks,” arXiv preprint arXiv:1905.11946 , 2019.[11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networksfor biomedical image segmentation,” in
International Conference onMedical image computing and computer-assisted intervention , pp. 234–241, Springer, 2015.[12] A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin,and A. A. Kalinin, “Albumentations: fast and flexible image augmenta-tions,”
Information , vol. 11, no. 2, p. 125, 2020.[13] N. D. Richards, “Method of encoding image pixel values for storage ascompressed digital data and method of decoding the compressed digitaldata,” Dec. 14 1993. US Patent 5,270,812.[14] L. R. Dice, “Measures of the amount of ecologic association betweenspecies,”
Ecology , vol. 26, no. 3, pp. 297–302, 1945.[15] L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “Onthe variance of the adaptive learning rate and beyond,” arXiv preprintarXiv:1908.03265 , 2019. [16] C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso,“Generalised dice overlap as a deep learning loss function for highlyunbalanced segmentations,” in
Deep learning in medical image analysisand multimodal learning for clinical decision support , pp. 240–248,Springer, 2017.[17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in
Proceedings of the IEEE conference on computer visionand pattern recognition , pp. 770–778, 2016.[18] A. F. Agarap, “Deep learning using rectified linear units (relu),” arXivpreprint arXiv:1803.08375 , 2018.[19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,“Mobilenetv2: Inverted residuals and linear bottlenecks,” in
Proceedingsof the IEEE conference on computer vision and pattern recognition ,pp. 4510–4520, 2018.[20] J. Lee, T. Won, and K. Hong, “Compounding the performance improve-ments of assembled techniques in a convolutional neural network,” arXivpreprint arXiv:2001.06268 , 2020.[21] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networksfor semantic segmentation,” in
Proceedings of the IEEE conference oncomputer vision and pattern recognition , pp. 3431–3440, 2015.[22] B. Baheti, S. Innani, S. Gajre, and S. Talbar, “Eff-unet: A novelarchitecture for semantic segmentation in unstructured environment,”in
Proceedings of the IEEE/CVF Conference on Computer Vision andPattern Recognition Workshops , pp. 358–359, 2020.
PPENDIX
PR Curves
Precision-Recall(PR) curve demonstrates the relationship between precision (positive predictive value) and recall (sensitivity)of a machine learning model. Precision gives the percentage of the relevant outcome, while recall indicates the total consistentresult. PR curve is a vital evaluation metric as it provides a more informative picture of an algorithm’s performance. X-axisshows recall while the Y-axis shows precision.The precision-recall curve shows the trade-off between precision and recall for different threshold. A high area under thecurve shows both high recall and high precision, where high precision relates to a low false-positive rate, and high recallrefers to a flat false-negative rate. Since cloud structures are complex, it was essential to understand whether the implementedmodel was correctly detecting the shapes and evaluating true positive, true negative scores appropriately, hence PR curve wasa crucial evaluation metric to understand the learning rate of the model.Precision-Recall curves for all four classes are shown below. Highest precision (0.74) corresponding to recall threshold wasobtained for class: Sugar and lowest for class: Fish which was 0.55, while the maximum and minimum recall for class: Flowerwas 0.49 and class: Fish was 0.22.
Fig. 6. PR-Curves for All Four Classes
R-AUC; EfficientNet-B0
It takes nearly 25 epochs to train the model to reach the best result while mean PR AUC still increases even after 30 epochsfor both train and validation set, showed in
Figure 7 . Fig. 7. Training and Validation PR-AUC
Loss Graph; EfficientNet-B0
Train and validation loss graph of EfficentNet-B0 which clearly shows that EfficientNet architecture is not enough to trainthis data as the loss is not decreasing.
Fig. 8. Training and Validation Loss (CCE) for EfficentNet-B0 oss Graph; EfficientUNet-B0
Applying EfficientNet as encoder gave a huge success than only EfficientNet as segmentation model, since UNet was ableto do the segmentation way better than a regular classification model.