[PDF] Unsupervised Segmentation Algorithms for Infrared Cloud Images

Abstract

The increasing number of Photovoltaic (PV) systems connected to the power grids makes them vulnerable to the projection of shadows from moving clouds. Solar Global Irradiance (GSI) forecasting allows smart grids to optimize energy dispatch preventing cloud coverage shortages. This investigation compares the performances of unsupervised learning algorithms (not requiring labelled images for training) for real-time segmentation of clouds in a ground-base infrared sky-imaging system, which is commonly used to extract cloud features using only the pixels where clouds are detected.

Full PDF

UU NSUPERVISED S EGMENTATION A LGORITHMS FOR I NFRARED C LOUD I MAGES

Guillermo Terrén-Serrano

Department of Electrical and Computer EngineeringThe University of New MexicoAlbuquerque, NM 87131, United States [email protected]

Manel Martínez-Ramón

Department of Electrical and Computer EngineeringThe University of New MexicoAlbuquerque, NM 87131, United States [email protected]

March 4, 2021 A BSTRACT

The increasing number of Photovoltaic (PV) systems connected to the power grid are vulnerable tothe projection of shadows from moving clouds. Global Solar Irradiance (GSI) forecasting allowssmart grids to optimize the energy dispatch, preventing energy shortages caused by occlusion ofthe sun. This investigation compares the performances of unsupervised learning algorithms (notrequiring labelled images for training) for real-time segmentation of clouds in images acquired usinga ground-based infrared sky-imaging system. Real-time segmentation is utilized to extract cloudfeatures using only the pixels in which clouds are detected. K eywords Cloud Segmentation · Machine Learning · Solar Forecasting · Sky Imaging · Unsupervised Algorithms

Passing clouds produce interruptions in the generation of energy from PV systems which are out of the grid operatorsacceptable range [1]. The implementation of very short-term GSI forecasting is necessary to attenuate the effects ofenergy interruptions caused by clouds. Algorithms that do not include information extracted from sky-images arenot effective for very short-term forecasting [2]. The inclusion of cloudiness information from satellite images in aforecasting algorithm is useful when the horizon is from 15 minutes to an hour [3]. For these reasons, ground-based skyimaging systems are the most suitable method for very short-term GSI forecasting [4]When using visible light sky-imaging systems, the pixels in the circumsolar area of the image are saturated. This isparticularly problematic for very short-term GSI forecasting algorithms, because relevant forecasting information issuppressed in the pixels that are saturated. Blocking the Sun in the sky-images is a solution, but information is removedin the part of the image that is blocked by the shade structure. The same problem occurs when a total sky imager isused. The saturated area in the images is reduced when using a thermal sky-imaging system [5]. Thermal imagingsystems allow the derivation of the temperature and altitude of clouds [6]. In previous investigations, ground-basedradiometric infrared sky-imaging systems [7] have been utilized to analyze the dynamics of clouds [8] and to establishvisual links for Earth-space communications [9].Previous research regarding cloud segmentation has shown that the accuracy of the segmentation models increaseswhen information extracted from neighboring pixels is included [10]. Operations using the dense Gram matrix inkernel learning methods are a problem for real-time cloud segmentation [11]. While the super-pixel approach reducesthe computation time, it produces a coarse segmentation [12]. Convolutional neural networks may be used in cloudsegmentation, but require a large amount of training samples to avoid overﬁtting [13].The unsupervised learning algorithms implemented in this investigation are the Gaussian Mixture Model (GMM),k-means and Markov Random Fields (MRF). The GMM is a clustering algorithm that infers the distribution of theclusters using a dense covariance matrix. The training and testing time of the GMM may be improved by simplifying thecovariance matrix. The k-means clustering algorithm uses an identity covariance matrix for simpliﬁcation. MRF models a r X i v : . [ ee ss . I V ] M a r PREPRINT - M

ARCH

4, 2021are computationally expensive but suitable for segmentation problems, because information from the classiﬁcation ofneighboring pixels is included in the prior. The Iterated Conditional Modes (ICM) algorithm allows for training ofMRF models in an unsupervised manner. With the aim of reducing the classiﬁcation time, the Simulate Anneling (SA)algorithm is implemented to perform an intelligent optimization of the ICM-MRF.

This data was acquired by a sky-imaging system mounted on a solar tracker which maintains the Sun in the centerof the images. The sky-imaging system is equipped with a Lepton 2.5 radiometric infrared camera that measurestemperature in centi-kelvin units. The resolution of the camera is × pixels and the diagonal FOV is ◦ . Thethermal sky-imaging system is located at the UNM Central Campus. The weather parameters are measured by a weatherstation at UNM Hospital. A pixel of the camera frame is deﬁned by its pair of Euclidean coordinates i, j . The temperatures in an image aredeﬁned as T = { T i,j ∈ R + | ∀ i = 1 , . . . , M, ∀ j = 1 , . . . , N } . The height of the pixels H = { H i,j ∈ R + | ∀ i =1 , . . . , M, ∀ j = 1 , . . . , N } is computed using the Moist Adiabatic Lapse Rate and assuming linear decrease of thetemperature in the troposphere. The images are processed to remove stationary artifacts such as water stains. Thetemperature of the pixels after applying the processing algorithm are T (cid:48) = { T (cid:48) i,j ∈ R + | ∀ i = 1 , . . . , M, ∀ j =1 , . . . , N } , and the heights are H (cid:48) = { H (cid:48) i,j ∈ R + | ∀ i = 1 , . . . , M, ∀ j = 1 , . . . , N } . After processing the images toremove stationary artifacts, they are processed again to remove the effect of the Sun and the atmospheric backgroundradiation. The obtained values of the pixels are the difference of temperature with respect to the tropopause, deﬁnedas ∆T = { ∆ T i,j ∈ R | ∀ i = 1 , . . . , M, ∀ j = 1 , . . . , N } . The temperature differences are multiplied by the averageatmospheric background temperature to compute the heights: H (cid:48)(cid:48) = { H (cid:48)(cid:48) i,j ∈ R + | ∀ i = 1 , . . . , M, ∀ j = 1 , . . . , N } .The after applying both processing algorithms, the temperature differences are normalized to an 8 bit image I = { i i,j ∈ N | ∀ i = 1 , . . . , M, ∀ j = 1 , . . . , N } . The velocity vectors were computed applying the Lucas-Kanade algorithm totwo consecutive normalized images. The four different combinations of features extracted from the images are used as input vectors in the segmentationmodels. The ﬁrst vector is x i,j = { T i,j , H i,j } . The second vector is x i,j = { T (cid:48) i,j , H (cid:48) i,j } . The third vector is x i,j = { ∆ T i,j , H (cid:48)(cid:48) i,j } . The fourth vector is x i,j = { mag( v i,j ) , i i,j , ∆ T i,j } . Other combinations of feature vectorswere explored and were found to under-perform in cloud classiﬁcation.Features extracted from neighboring pixels are included in the feature vectors of pixel i, j :• Single pixel: { x i,j } , ∀ i, j = i , j , . . . , i M , j N • st order: { x i − ,j , x i,j − , x i,j +1 , x i +1 ,j } .• nd order: { x i − ,j , x i,j − , x i,j +1 , x i +1 ,j , x i − ,j − , . . . . . . , x i − ,j +1 , x i +1 ,j +1 , x i +1 ,j +1 } .When single pixels are used, the vectors do not contain features from other pixels. The th order neighborhood containsfeatures of the 4 closest pixels. The nd order neighborhood contains features of the 8 closest pixels. The methods described below can be classiﬁed as generative when they have the capacity of generating new samplesfrom a likelihood model, this is, when the model implements a density approximation of the form p ( x |C k ) where C k isthe segmentation label of the pixel. Feature distributions can be approximate by a mixture of multivariate normal distributions x i ∼ N ( µ k , Σ k ) . Under thehypothesis that a sample x i belongs to class C k , its class conditional likelihood is f ( x ; µ k , Σ k ) = 1 (cid:113) (2 π ) d | Σ k | e { − ( x − µ k ) (cid:62) Σ − k ( x − µ k ) } . (1)2 PREPRINT - M

ARCH

4, 2021The expected complete data log-likelihood is deﬁned as [14], Q ( θ t , θ t − ) = N (cid:88) i =1 K (cid:88) i = k γ i,k log π k + N (cid:88) i =1 K (cid:88) i = k γ i,k log p ( x i | θ t ) (2)where γ i,k (cid:44) p ( y i = k | x i , θ t − ) is the responsibility of the cluster k in the sample i .The parameters in the clustering of multivariate normal distributions can be directly computed applying the ExpectationMaximization (EM) algorithm. In the E stage of the algorithm a prior is established and then, by using the likelihoodfunction (1), a posterior γ i,k = p ( C k | x i ) can be assigned to each sample. In the M stage, the mean and variance of eachcluster that maximize the log likelihood are computed as µ k = (cid:80) Ni =1 γ i,k · x i γ k , Σ k = (cid:80) Ni =1 γ i,k · x i x (cid:62) i γ k − µ k µ (cid:62) k . (3)The priors are updated as well using the posterior probabilities that are π k = p ( C k ) = 1 N N (cid:88) i =1 γ i,k , (4)where N is the number of available samples. A class is assigned to each sample by Maximum A Posteriori (MAP)criteria ˆ y i = argmax k p ( C k | x i , µ k , Σ k ) The theory behind mixture models, as well as the EM algorithm, is fully developed in [14].

The k-means algorithm can be see as a particularization of the algorithm above, where the posteriors γ i,k are approxi-mated by 1 if distance (cid:107) x i − µ k (cid:107) < (cid:107) x i − µ k (cid:48) (cid:107) , k (cid:54) = k (cid:48) , and zero otherwise. The mean is computed as in Eq. (3), andthe covariance is approximated by an identity. The energy function of a MRF is composed of two functions [15]. The function ϕ that is the joint distribution of a class,and the function ψ that is the potential energy of the system’s conﬁguration (a term from statistical mechanics), E ( y i , x i ) = (cid:88) i ϕ ( x i , y i ) + (cid:88) i,j ψ ( y i , y j ) , (5)where x i is the feature vector of sample i and y i is its class. In the graph G , a sample i has a set of neighboring pixels,and each neighboring sample j has class y j . Sample x i is classiﬁed using the Bayes’ theorem as p ( y i = C k | x i , θ k ) ∝ p ( x i | y i = C k , θ k ) p ( y i = C k ) . (6)where the corresponding likelihood is approximated by a normal distribution x i ∼ N ( µ k , Σ k ) of class C k , and θ k = { µ k , Σ k } are the parameter set of the feature distribution in class C k .The log-likelihood of class C k is deﬁned as ϕ ( x i , y i ) (cid:44) log p ( x i | y i = C k , θ k ) in the energy function (5). The prior can be expressed as, p ( y i ) = 1 Z exp ( − ψ ( y i )) (7)By applying the Hammersley–Clifford theorem [16], the potential function ψ ( y i ) in the exponential form can befactorized in cliques of a graph G . A clique is deﬁned as a set of nodes that are all neighbors of each other [14]. In thisway, the potential function can be independently evaluated for each clique in the factorized graph, ψ ( y i ) = L (cid:88) (cid:96) =1 (cid:88) i,j ∈ Ω (cid:96) y i βy j , (8)where the set of maximal cliques in the graph is deﬁned as Ω = Ω ∪ Ω ∪ . . . ∪ Ω L , (cid:96) represents the order of theneighboring pixels to sample i in the graph network G , and Ω L is the maximal clique as it cannot be made any largerwithout losing the clique property [14]. The cliques considered in our problem are Ω and Ω , which represent the st and nd order neighborhood cliques respectively. Parameter β needs to be cross-validated.3 PREPRINT - M

ARCH

4, 2021By applying expression (7) in the logarithm of (6), the energy function for a pixel i of class y i and features x i is E ( y i = C k | x i , µ k , Σ k ) = 12 log | Σ k |−

12 ( x i − µ k ) (cid:62) Σ − k ( x i − µ k ) + ψ ( y i ) . (9)plus constant terms. Finally, probability (6) can be written as p ( y i = C k | x i , θ k ) = exp E ( y i = C k | x i , θ k ) (cid:80) Kk =1 exp E ( y i = C k | x i θ k ) . (10)A class C k is assigned to the sample x i by the MAP criterion. Parameters θ k in a MRF can be inferred with the ICM algorithm [17]. The algorithm initially assigns a class to eachpixel from a uniform distribution. The samples with label C k are deﬁned within the set S (0) k . At iteration t + 1 the meanand covariance of a class are computed as µ t +1 k = 1 |S tk | (cid:88) x i,j,z ∈ S tk x i,j,z , (11) Σ t +1 k = 1 |S tk | − (cid:88) x i,j ∈ S tk ( x i,j,z − µ t +1 k )( x i,j,z − µ t +1 k ) . (12)A class is reassigned to each pixel according to the parameters computed at iteration t + 1 with the MAP criterion y t +1 i,j = argmax k E ( y ti,j | x i,j , µ t +1 k , Σ t +1 k ) . (13)When the total energy stops increasing, so that (cid:80) i,j E ( y t +1 i,j | x i,j , θ t +1 k ) ≤ (cid:80) i,j E ( y ti,j | x i,j , θ tk ) , the algorithm hasconverged to a stable conﬁguration and the optimal set of parameters θ k have been found. The distribution of class C k is deﬁned as N ( µ tk , Σ tk ) . The standard optimization goes through all the pixels calculating their potential and classifying them in each iteration ofthe algorithm. The computational cost of this method is high, but we can assume that it is not necessary to evaluate thepixels whose state has high energy, because their classiﬁcation will not change. The computation cost can be reducedby sampling the pixels that are likely to be misclassiﬁed, and applying the optimization procedure only to them.We propose to optimize the conﬁguration of the pixels in an IR image applying the Simulated Annealing algorithm(SA) [18] to the MRF models [19]. SA algorithm is applied on the implementation, after the inference of the classdistributions.The class distributions N ( µ k , Σ k ) were previously inferred applying a supervised or unsupervised learn-ing algorithm. The optimization is initialized to maximum likelihood classiﬁcation of the pixels y (0) i,j =argmax k p ( y i,j = C k | x i,j , θ k ) .The likelihood a pixel to belong a class C k is only evaluated at the initialization of the algorithm.The objective is to evaluate the potential function of the samples that have low energy. For that, a sample x i,j withlabel y i,j = C k is randomly selected and its classiﬁcation is changed in each iteration t , so that ¯ y ti,j = 1 − y ti,j . Theprobability of selecting a sample x i,j is weighted by their energy. The weights of the samples in an image are deﬁnedas, w i,j = E (cid:0) ¯ y ti,j | x ti,j , θ k (cid:1) − max k E (cid:0) ¯ y ti,j | x ti,j , θ k (cid:1)(cid:80) i,j (cid:2) E (cid:0) ¯ y ti,j | x ti,j , θ k (cid:1) − max k E (cid:0) ¯ y ti,j | x ti,j , θ k (cid:1)(cid:3) , (14)and the cumulative distribution of the weights is computed such as ¯ w n,m = {{ (cid:80) ni =1 (cid:80) mj =1 w i,j } Nn =1 } Mm =1 . Then, asample is drawn from a uniform distribution ˆ w ∼ U (0 , . The sample whose weight has the minimum distance to thedrawn sample, is selected i, j = argmin | ¯ w i,j − ˆ w | . 4 PREPRINT - M

ARCH

4, 2021The algorithm follows with Metropolis step which is computed with the energy of the changed sample ¯ y i,j and theenergy of the original label y i,j , ∆ E = E ( y ti,j | x i,j , θ k ) − E (¯ y ti,j | x i,j , θ k ) .The new label is directly accepted ¯ y ti,j iff ∆ E < . Otherwise, it will be accepted with probability ρ = exp( − ∆ E/T t ) in an analogous way to thermodynamics with the Gibbs distribution, y t +1 i,j =  ¯ y ti,j if ∆ E ≤ y ti,j if ∆ E > and ρ > uy ti,j Otherwise (15)the acceptance probability is drawn from a uniform distribution u ∼ U (0 , .We propose to linearly cool down the acceptance rate through the temperature parameter, so that T t +1 = αT t . Theoptimal parameter α is a trade off between accuracy and speed. Younde’s j-statistic [20], is a test that evaluates the performances of a binary classiﬁcation, it is deﬁned as J = sensitivity + specif icity − .To compensate possible class imbalance, we deﬁne a prior of the classiﬁcation function λ that is validated in eachmodel, p ( D | C k ) = p ( C k | D ) · p ( C k ) p ( D ) ∝ p ( C k | D ) · λ. (16)The classiﬁcation probabilities are deﬁned as p ( D | C ) = p ( C | D ) · λ , and p ( D | C ) = 1 − p ( D | C ) . Thej-statistic score is maximized ﬁnding the optimal binary classiﬁcation λ threshold. For that, the j-statistic is appliedto the conventional Receiver Operating Characteristic (ROC) analysis [21], and it is computed at each point of theROC. We propose to use the maximum value of j-statistic in the ROC curve as the optimal point, so that ˆ y ∗ =argmax k p ( C k | x ∗ , D ) · λ . The dataset is formed by 12 samples of infrared sky-images and their respective labels, ordered chronologically. Thesamples belong to different days in each of the four seasons. The pixels were manually labelled as either clear-sky y i,j = 0 or cloudy y i,j = 1 . 7 of the images (earlier dates) are used for training and the remaining 5 (later dates) areused for testing. The training dataset (33,600 pixels) has 5 images featuring different types of clouds, 1 with clear-skyand 1 with covered sky. The testing dataset (24,000 pixels) has 3 images featuring different types of clouds, 1 withclear-sky and 1 with covered sky.Leave-One-Out Cross-Validation (LOO-CV) is applied to validate the parameters of the models and the prior λ in Eq.(16). In each iteration of the LOO-CV routine an image is left out for validation and the remaining 6 images are usedfor training the model. The validation j-statistic is the average of the j-statistics obtained in the 7 LOO-CV routines.The set of parameters, and λ , were selected because they achieved a higher j-statistic.The set of parameters is different for each model, k-means clustering does not have parameters, but the GMM has thecovariance matrix regularization term γ which needs cross-validation in Eq. (1). The ICM-MRF and SA-ICM-MRFrequire the cross-validation of the cliques potential β in Eq. (8). The ICM algorithm is computationally expensive, so theregularization term of the covariance matrix was set to γ = 1 . The cooling parameter of the SA-ICM-MRF was also setto α = 0 . . In the k-means clustering algorithm, the feature vectors were standardized ¯ x i,j = [ x i,j − E ( X )] / Var ( X ) .The rest of the models neither required normalization nor standardization of the feature vectors.ICM-MRF reached the highest j-statistic of 92.55 % but the average testing time was the highest, at 641ms. SA-ICM-MRF achieved a lower j-statistic of 90.06 % but the implementation of the SA algorithm reduced the average testingtime to 214ms. The GMM has the best compromise between an average testing time of 4ms and j-statistic of89.39 %.The k-means algorithm reached the fastest average testing time of 1ms, but had the lowest j-statistic of 80.73 %. Thepreprocessing time of the feature vectors is 0.1 ms for x , 4.7 ms for x , 99.9 ms for x and 1079 ms for x . Whenthe pre-processing time is considered, the GMM (1083ms) is much slower than the rest of the algorithm. Taking thepreprocessing time into account, the optimal models are the ICM-MRF (740.9ms) or the SA-ICM-MRF (313.9ms)depending on the time constraints of the user 5 PREPRINT - M

ARCH

4, 2021Figure 1: This ﬁgure shows the features extracted from three testing images. The image in the ﬁrst row show theincrements of temperature with respect to the height of the Tropopause. The features in the second row show the heightsof the clouds. The images in the third row show the normalized intensity of the pixels. The images in the fourth rowshow the magnitude of the velocity vectors. 6

PREPRINT - M

ARCH

4, 2021Figure 2: This ﬁgure shows three images from the testing set in the columns. The rows are the segmentation performedby the models. The higher j-statistic was achieved by ICM-MRF in the ﬁrst image and by GMM in the second and thirdimage. 7

PREPRINT - M

ARCH

4, 2021

J-statistic [%] Time [ms]Feature Vector Single st Order nd Order Single st Order nd Orderk-means x i,j x i,j x i,j x i,j x i,j x i,j x i,j x i,j Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Ω ( x i,j ) Table 1: This table shows the j-statistics and average testing time archived by the different models. The j-statistics andaverage testing time are organized by neighborhood. ICM-MRF and SA-ICM-MRF models have two groups of vectors:those with a potential function of st order Ω ( · ) or nd order cliques Ω ( · ) . This investigation aims to ﬁnd an optimal unsupervised learning algorithm for real-time segmentation of clouds inthermal images acquired using an infrared sky-imaging system. The thermal images were preprocessed to extractthe most informative features for segmentation. Preprocessing removes the scattering effect produced by debris inthe outdoor germanium window of the camera, and the direct and scattered irradiance produced by the Sun and theatmosphere.The performance of the classiﬁcation models increases when the extracted features were preprocessed and informationfrom neighboring pixels was included in the feature vectors. Further research could investigate the performances ofsupervised Bayesian methods for cloud segmentation, and compare the classiﬁcation performances between generativeand discriminative models.

Acknowledgment

Partially supported by NSF EPSCoR grant number OIA-1757207 and the King Felipe VI endowed Chair. Authorswould like to thank the UNM Center for Advanced Research Computing (CARC) for providing the high performancecomputing and large-scale storage resources used in this work.8

PREPRINT - M

ARCH

4, 2021

References [1] Kari Lappalainen and Seppo Valkealahti. Output power variation of different pv array conﬁgurations duringirradiance transitions caused by moving clouds.

Applied Energy , 190:902 – 910, 2017.[2] O. García-Hinde, G. Terrén-Serrano, M.Á. Hombrados-Herrera, V. Gómez-Verdejo, S. Jiménez-Fernández,C. Casanova-Mateo, J. Sanz-Justo, M. Martínez-Ramón, and S. Salcedo-Sanz. Evaluation of dimensionalityreduction methods applied to numerical weather models for solar radiation forecasting.

Engineering Applicationsof Artiﬁcial Intelligence , 69:157 – 167, 2018.[3] Sobrina Sobri, Sam Koohi-Kamali, and Nasrudin Abd. Rahim. Solar photovoltaic generation forecasting methods:A review.

Energy Conversion and Management , 156:459–497, 2018.[4] Weicong Kong, Youwei Jia, Zhao Yang Dong, Ke Meng, and Songjian Chai. Hybrid approaches based on deepwhole-sky-image learning to photovoltaic generation forecasting.

Applied Energy , 280:115875, 2020.[5] Andrea Mammoli, Guillermo Terren-Serrano, Anthony Menicucci, Thomas P Caudell, and Manel Martínez-Ramón. An experimental method to merge far-ﬁeld images from multiple longwave infrared sensors for short-termsolar forecasting.

Solar Energy , 187:254–260, 2019.[6] H. Escrig, Francisco Batlles, Joaquín Alonso-Montesinos, F.M. Baena, Juan Bosch, I. Salbidegoitia, and JuanBurgaleta. Cloud detection, classiﬁcation and motion estimation using geostationary satellite imagery for cloudcover forecast.

Energy , 55, 06 2013.[7] Joseph A Shaw and Paul W Nugent. Physics principles in radiometric infrared imaging of clouds in the atmosphere.

European Journal of Physics , 34(6):S111–S121, oct 2013.[8] B. Thurairajah and J. A. Shaw. Cloud statistics measured with the infrared cloud imager.

IEEE Transactions onGeoscience and Remote Sensing , 43(9):2000–2007, Sep. 2005.[9] Paul W. Nugent, Joseph A. Shaw, and Sabino Piazzolla. Infrared cloud imaging in support of earth-space opticalcommunication.

Opt. Express , 17(10):7862–7872, May 2009.[10] Cunzhao Shi, Yu Wang, Chunheng Wang, and Baihua Xiao. Ground-based cloud detection using graph modelbuilt upon superpixels.

IEEE Geoscience and Remote Sensing Letters , 14(5):719–723, 2017.[11] Alireza Taravat, F. Del Frate, Cristina Cornaro, and Stefania Vergari. Neural networks and support vector machinealgorithms for automatic cloud classiﬁcation of whole-sky ground-based images.

IEEE Geoscience and RemoteSensing Letters , 12, 02 2015.[12] C. Deng, Z. Li, W. Wang, S. Wang, L. Tang, and A. C. Bovik. Cloud detection in satellite images based on naturalscene statistics and gabor features.

IEEE Geoscience and Remote Sensing Letters , 16(4):608–612, April 2019.[13] Johannes Drönner, Nikolaus Korfhage, Sebastian Egli, Markus Mühling, Boris Thies, Jörg Bendix, BerndFreisleben, and Bernhard Seeger. Fast cloud segmentation using convolutional neural networks.

Remote Sensing ,10:1782, 11 2018.[14] Kevin P Murphy.

Machine learning: a probabilistic perspective . MIT press, 2012.[15] Stan Z. Li.

Markov Random Field Modeling in Image Analysis . Springer-Verlag, Berlin, Heidelberg, 2001.[16] J. M. Hammersley and P. Clifford. Markov ﬁelds on ﬁnite graphs and lattices. Unpublished, 1971.[17] Julian Besag. On the statistical analysis of dirty pictures.

Journal of the Royal Statistical Society B , 48(3):48–259,1986.[18] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing.

Science , 220(4598):671–680,1983.[19] Zoltan Kato and Ting-Chuen Pong. A Markov random ﬁeld image segmentation model using combined colorand texture features. In Władysław Skarbek, editor,

Computer Analysis of Images and Patterns , pages 547–554.Springer Berlin Heidelberg, 2001.[20] W. J. Youden. Index for rating diagnostic tests.

Cancer , 3(1):32–35, 1950.[21] Tom Fawcett. An introduction to ROC analysis.