[PDF] Leveraging Domain Knowledge using Machine Learning for Image Compression in Internet-of-Things

Abstract

The emergent ecosystems of intelligent edge devices in diverse Internet of Things (IoT) applications, from automatic surveillance to precision agriculture, increasingly rely on recording and processing variety of image data. Due to resource constraints, e.g., energy and communication bandwidth requirements, these applications require compressing the recorded images before transmission. For these applications, image compression commonly requires: (1) maintaining features for coarse-grain pattern recognition instead of the high-level details for human perception due to machine-to-machine communications; (2) high compression ratio that leads to improved energy and transmission efficiency; (3) large dynamic range of compression and an easy trade-off between compression factor and quality of reconstruction to accommodate a wide diversity of IoT applications as well as their time-varying energy/performance needs. To address these requirements, we propose, MAGIC, a novel machine learning (ML) guided image compression framework that judiciously sacrifices visual quality to achieve much higher compression when compared to traditional techniques, while maintaining accuracy for coarse-grained vision tasks. The central idea is to capture application-specific domain knowledge and efficiently utilize it in achieving high compression. We demonstrate that the MAGIC framework is configurable across a wide range of compression/quality and is capable of compressing beyond the standard quality factor limits of both JPEG 2000 and WebP. We perform experiments on representative IoT applications using two vision datasets and show up to 42.65x compression at similar accuracy with respect to the source. We highlight low variance in compression rate across images using our technique as compared to JPEG 2000 and WebP.

Full PDF

LLeveraging Domain Knowledge using Machine Learning forImage Compression in Internet-of-Things

Prabuddha Chakraborty, Jonathan Cruz, and Swarup BhuniaDepartment of Electrical & Computer EngineeringUniversity of Florida, Gainesville, FL, USA

Abstract —The emergent ecosystems of intelligentedge devices in diverse Internet of Things (IoT) appli-cations, from automatic surveillance to precision agri-culture, increasingly rely on recording and processingvariety of image data. Due to resource constraints, e.g.,energy and communication bandwidth requirements,these applications require compressing the recordedimages before transmission. For these applications,image compression commonly requires: (1) maintain-ing features for coarse-grain pattern recognition in-stead of the high-level details for human perceptiondue to machine-to-machine communications; (2) highcompression ratio that leads to improved energy andtransmission eﬃciency; (3) large dynamic range ofcompression and an easy trade-oﬀ between compressionfactor and quality of reconstruction to accommodate awide diversity of IoT applications as well as their time-varying energy/performance needs. To address theserequirements, we propose, MAGIC, a novel machinelearning (ML) guided image compression frameworkthat judiciously sacriﬁces visual quality to achieve muchhigher compression when compared to traditional tech-niques, while maintaining accuracy for coarse-grainedvision tasks. The central idea is to capture application-speciﬁc domain knowledge and eﬃciently utilize it inachieving high compression. We demonstrate that theMAGIC framework is conﬁgurable across a wide rangeof compression/quality and is capable of compressingbeyond the standard quality factor limits of both JPEG2000 and WebP. We perform experiments on represen-tative IoT applications using two vision datasets andshow up to 42.65x compression at similar accuracy withrespect to the source. We highlight low variance incompression rate across images using our technique ascompared to JPEG 2000 and WebP.

Index Terms —Computer vision, edge intelligence,image compression, Internet-of-Things (IoT), machinelearning, sensor signal processing.

I. Introduction

In the Internet of Things (IoT) era, humans have beenincreasingly removed from the surveillance loop in favor ofa connected ecosystem of edge devices performing vision-based tasks [1]. Automatic analysis is the only viable op-tion given the huge amount of data continuously collectedfrom diﬀerent IoT edge devices. For example, resource-constrained unmanned aerial vehicles (UAVs) or imagesensors can be used as surveillance devices for detectingforest ﬁres [2] or infrastructure damages after naturaldisasters [3]. In these scenarios, autonomous UAVs or edgedevices collect data that may be sent to other edge devicesor to the cloud for automated machine learning (ML) based analysis. According to the 2019 Embedded MarketsStudy [4], 43% of IoT applications incorporating advancedtechnologies are using embedded vision and 32% are usingmachine learning. However, using these IoT devices oftenrequires meeting the tight storage, energy and/or com-munication bandwidth constraints, while maintaining theeﬀectiveness of surveillance.Image compression can address these needs in edge de-vices that operate in constrained environments and at thesame time reduce network traﬃc [5]. Compressed imagesare easier to store and more energy eﬃcient to transmitlong-range. An ideal image compression technique for IoTapplications should: • Optimize for machine-to-machine communication andmachine-based interpretation in diverse IoT applica-tions - i.e., pattern recognition or feature extractionon the image. Visual perception by human usersshould be given less importance. • Aim for minimizing the communication bandwidthas IoT is creating 1000X more dense networkingrequirements [6], [7], often driven by image/videocommunication. • Gear towards minimizing the overall energy and spacerequirement on resource-constrained edge devices.The standard image compression methods, such asJPEG [8], JPEG 2000 [9], and WebP [10] are tailored tomaintain good human-perceivable visual quality and werenot designed with IoT applications in mind. Propertiesof IoT applications which can be leveraged to obtainincreased compression are as follows: • The image domain is biased based on the applicationand on each speciﬁc edge image sensor device. Thebias can be divided into two categories: (1) colordistribution bias, (2) common pattern bias. We de-ﬁne patterns as segment outlines in an image. Thisinformation can be learned and utilized. • Depending on the application, speciﬁc entities of theimages may hold greater value with respect to therest of the image. Such applications, therefore, havea region of interest bias which can be learned andutilized. • Coarse-grained ML tasks prevalent in IoT applica-tions can tolerate extreme levels of compression.Building on these observations, we propose MAGIC, a M achine le A rning G uided I mage C ompression frameworkfor achieving extreme levels of image compression in IoT a r X i v : . [ c s . C V ] S e p ig. 1: Overall ﬂow of MAGIC framework.systems while maintaining suﬃcient accuracy for coarse-grained AI tasks. MAGIC consists of three major steps:(1) knowledge acquisition, (2) encoding and (3) decoding.During knowledge acquisition, diﬀerent application anddomain-speciﬁc information such as color distribution,common pattern bias and region of interest bias can beextracted in the form of (1) a color quantization dictio-nary, (2) a common pattern dictionary and (3) a machinelearning model which can intelligently represent imagesegments as a set of common pattern dictionary entries.During the encoding stage, an image is segmented intonon-overlapping triangles using an eﬃcient Delaunay tri-angulation (DT) method. The ML model, we name patternprediction model, and the common pattern dictionaryfrom the knowledge acquisition stage are used to guidethe image segmentation process. Finally, the colors areassigned by averaging the pixel colors within each triangleand quantizing them based on the color quantizationdictionary, which is constructed by analyzing the colordistribution from the domain using k-means. The decodephase operates similarly by reconstructing the segmentsusing DT and assigning colors from the color quantizationdictionary.We have implemented MAGIC as a completely conﬁg-urable framework that can be used to compress imagesfrom a given dataset. We evaluate MAGIC extensivelyusing two publicly available datasets: ﬁre detection [11]and building crack detection [12] and observe promisingperformance. For the building crack detection dataset, at a1.06% accuracy loss, we obtained 22.09x more compressionwith respect to the source images. For the ﬁre detectiondataset, at a 2.99% accuracy loss, we obtained 42.65xmore compression with respect to the source images. Weshow up to ∼ ∼ II. Background & Related Works

In this section, we will give a brief introduction tovision tasks in IoT application, and discuss state-of-the-art compression techniques.

A. Computer Vision in IoT Applications

IoT applications are gaining popularity in severalspheres such as industry, home, healthcare, retail, trans-port and even security [13]. Many applications in thesedomains involve capturing images at the edge device andtransmitting the image to cloud or other edge devices foranalysis. For example: • UAV based ﬁre detection techniques have been pro-posed which uses optical remote sensing [14]. • Detecting infrastructure damage in a post-disasterscenario using UAV imaging is being investigated in[15], [16]. • IoT image sensors and computer vision techniquesare widely used for ﬂood monitoring, warning anddamage mitigation [17].Image sensors and intelligent data analysis are twokey aspects of surveillance based IoT applications. Ad-ditionally, security-oriented IoT surveillance applicationsactively rely on computer vision to detect anomalies[13].

B. Need for Image Compression in IoT Vision

Diﬀerent IoT applications require sensing image dataat the edge and transmitting them over to other edgedevices or cloud for analysis. These edge devices operatewith strict space, energy, and bandwidth requirements.Compressing images not only has the direct eﬀect ofig. 2: Pixel color distribution for forest ﬁre and buildingcrack detection datasets [11], [12]. Red, Green and Bluelines represent R,G and B channels respectively.Fig. 3: DT guided segmentation for a sample buildingcrack detection image [12].reducing the space and network traﬃc requirement butalso can reduce energy consumption.IoT-based communication is expected to reach 50% ofnetwork traﬃc by 2025 [6]. For example, a typical 4Gnetwork is designed to support thousands of devices worthof traﬃc in a region. However, with the increase in thenumber of IoT devices being connected to the network,it may become impossible to eﬃciently serve all devicessimultaneously. Therefore, compression at the edge canhelp reduce network stress.The energy required to transmit data increases withdistance [18]. For long-range transmission devices suchas MaxStream XTend (at 500 mW transmit power), theenergy required for one byte of transmission can be higherthan 1 million clock cycles worth of computation [18].Hence, even with the cost of additional computation,compression can ultimately lead to less overall energyexpenditure. Due to all these reasons, image compressionis a vital step for any IoT vision application.

C. State-of-the-art Image Compression Techniques

Several image compression techniques proposed over theyears can be divided primarily into two categories: (1)Lossless compression techniques and (2) Lossy compres-sion techniques. Lossless compression techniques such asArithmetic Coding ([22], [23]) and Huﬀman Coding ([24])aim to completely preserve the content under compression,but generally at the cost of signiﬁcant mathematical com-putations [25]. For coarse-grained ML tasks, such qualityis not needed, therefore, lossy compression techniquesare preferred. As the name suggests, lossy compressionallows for variable data loss to achieve higher rates ofcompression. The data lost is generally not perceivableby humans. Some lossy compression techniques include,JPEG [8], JPEG 2000 [9], and WebP [10] which performquantization in the frequency domain using techniquessuch as discrete wavelet transform and discrete cosinetransform. Another class of lossy compression performsquantization in the spatial domain, such as triangulation-based image compression most recently proposed in [21].This compression technique relies on the DT of a set ofpoints in the image matrix to construct the image outof non-overlapping triangles during both encoding anddecoding. In this way, triangulation allows for sendingminimal amounts of information at the cost of slightlymore encoding/decoding time. While we use DT, ourcompression algorithm and compression goals are vastlydiﬀerent than [21].Machine learning has been used to further improvecompression [26], [27], [28], [29]. In all these works, thegoal is to maximize visual quality metrics (PSNR, MS-SSIM, and remove artifacts) all while minimizing bits perpixel (BPP). However, complex, large neural networks, arenot ideal for use in edge devices. More recently, peoplehave been targeting image compression optimized for MLaccuracy over human perceived quality. Liu et al. proposeDeepN-JPEG [19] which modiﬁes JPEG’s quantizationtable for deep neural network (DNN) accuracy over humanvisual quality. DeepN-JPEG is targeted for generalized AI-models and can achieve only 3.5x compression comparedto source images. However, our approach can achieve upto 42.65x more compression than the source. Similarly, in[30], Liu et al. modify JPEG 2000 to extract frequenciesrelevant for neural network (NN) based segmentation of3D medical images. Weber et al. develop a recurrentneural network (RNN) based compression with the aimof maximizing the accuracy of generalized classiﬁers andinvestigate the accuracies of several classiﬁers for imagescompressed for human perception versus machine percep-tion [20].In Table I, we qualitatively compare MAGIC with dif-ferent state-of-the-art relevant image compression tech-niques. MAGIC distinguishes itself as the only imagecompression technique to be targeted for coarse ML vi-sion tasks in IoT applications. The compression rangeof MAGIC is higher than other techniques because it isdesigned to leverage domain knowledge.ABLE I: Qualitative comparison of MAGIC with diﬀerent state-of-the-art image compression techniques.

Type Target Application Domain KnowledgeLeveraged ROISupport Encoder SpaceComplexity Encoder Time &Energy Requirement CompressionRangeJPEG 2000[9]

Wavelet Human Vision No Yes Low Low Medium

WebP[10]

Fequency + Spatial Human Vision No No Low Low Medium

Deepn-JPEG[19]

Frequency Complex ML Task Limited No Low Low Medium

Weber et. al.[20]

Frequency Complex ML Task Limited No Medium High Medium

Marwood et al.[21]

Spatial Human Vision No No Low High High

MAGIC

Spatial Coarse ML Task Yes Yes Low Medium Extreme

III. Motivation

Most IoT applications designed to perform a particularautomated vision task will have some bias in the imagesbeing captured and analyzed. The amount of bias willdepend on the application and the sensory edge devicein use. For a given application the images will have (1) apixel color distribution bias depending on the environmentwhere the image occurs and (2) a pattern bias due toprevalence of certain common objects in the images. Apartfrom the image set bias, the IoT application may have itsown bias for certain objects and features which are relevantfor the ML analysis task.

A. Color Distribution Bias

Image color bias will exist to an extent in any IoTdomain-speciﬁc application. Apart from the applicationlevel color distribution bias, there may be bias attributedto the physical location of the device. Such locationbias can be more easily observed for stationary devices.Harnessing the bias for each device separately may bebeneﬁcial but in this paper, we limit our study to theapplication level image color distribution bias. We plot thepixel color distributions for the forest ﬁre dataset [11] andthe building crack dataset [12] as shown in Fig. 2. We canclearly observe that certain regions of Red, Green and Bluespectrum are more represented than others. This bias willappear more prominent and severe if we consider the jointRed-Green-Blue distribution. If we could take advantage ofthis bias by limiting the color space tuned for the speciﬁcapplication then we may be able to compress more.

B. Common Pattern Bias

The images captured and analyzed by task-speciﬁc IoTapplications will have pattern (image segment outlines)bias because of the nature of the objects that are presentin the images. For a building crack detection application,the images will consist of cracked and uncracked surfaces(Fig. 3) and for a forest ﬁre surveillance application, theimages will consist of trees and occasional ﬁres (Fig. 4).Just like color distribution bias, common pattern bias willalso exist both in the application level and in the devicelocation level. If we could capture and store these domain-speciﬁc repeating patterns in a dictionary, for example,then we could potentially save space by storing dictionaryentry indices instead of concrete data. Fig. 4: DT guided segmentation for a sample forest ﬁredetection image [11]

C. Region of Interest Bias

Certain objects/regions in the image may hold moreimportance depending on the IoT task. If the image can becompressed based on the application-speciﬁc requirementthen we will be able to save important regions at higherquality while sacriﬁcing other regions. For example, letus assume that we have an IoT application which isdesigned to detect green cars among green and blue cars.Only by using common pattern bias knowledge, we cannotdistinguish between green and blue cars. Both cars willhave the same level of quality. But with the extra region ofinterest bias knowledge, we can save space by only learningto only represent the green cars with high quality.

IV. Methodology

In this section, we present our learning guided com-pression technique (MAGIC) targeted for coarse-grainedML vision tasks in intelligent IoT ecosystems. Fig. 1illustrates the overall ﬂow. Just like any other compressiontechnique, there is a procedure for encoding the imageand a procedure for decoding the image. Additionally,to take advantage of the bias present in the applicationdomain, we propose a knowledge acquisition procedure.In this paper, we focus on the ﬁrst aspect of domain lgorithm 1

Knowledge Acquisition procedure learn ( bDim, iterLimit, pw, imgList, grid, th, cb ) Initialize colorF req = ∅ , trainX = ∅ , trainY = ∅ patDict = generateP atternDict ( imgList, bDim ) for each img ∈ imgList do pointArr = ∅ pointArr = gridSpray ( pointArr, grid, img.rows, img.cols ) edgeP oints = cannyEdgeDetection ( img ) pointArr.append ( edgeP oints ) iter = 0 while iter < iterLimit do pointArr = split ( pointArr, img, th ) iter = iter + 1 pruneP oint ( pointArr, pw ) . In every (pw X pw) window, keep maximum 1 point triangleList = delaunay triangulation ( pointArr ) for each t ∈ triangleList do avg color = findAvgColor ( t, img ) if avg color in colorF req then colorF req [ avgColor ] = colorF req [ avgColor ] + 1 else colorF req [ avgColor ] = 1 blockList = tiling ( img, bDim ) j = 0 while j < length ( blockList ) do dictInd = assignDictInd ( blockList, j, pointArr, patDict ) trainX.append ( blockList [ j ]) trainY.append ( dictInd ) j = j + 1 colorDict = weighted kmean ( colorF req, k = 2 cb ) model = train P oint P rediction Model ( trainX, trainY ) return colorDict, model, patDict knowledge learning, namely, color distribution bias. Theother two areas of domain knowledge (pattern bias andROI bias) are not strictly learned. The common patterndictionary (for segmentation bias) is statically generatedand the pattern prediction model (for ROI bias) is trainedbased on automated supervision. However, the algorithmsare implemented such that future inclusion of human su-pervision and learning in the other two domain knowledgeareas can be easily performed. We will now describe thethree major steps of MAGIC in greater detail. A. Knowledge Acquisition

Before compression is carried out, the knowledge acqui-sition procedure is used to analyze a set of sample imagesfrom the given use-case and learn common features thatcan be reused during compression. This learning stageallows for more eﬃcient image compression. To capturethe application-speciﬁc domain knowledge we use the fol-lowing constructs and techniques.

1) Color Quantization Dictionary:

We construct a dic-tionary of most frequently occurring colors for a speciﬁcapplication. Colors are now represented as entries in thedictionary instead of the standard 24-bit RGB value. The

Algorithm 2

Triangle Split procedure split ( pointArr, img, th ) triangleList = delaunay triangulation ( pointArr ) for each t ∈ triangleList do stdDevColor = calculate Color Std Dev ( img, t ) if stdDevColor > th then pointArr.append ( barycenter ( t )) return pointArr number of entries in the dictionary can be controlled bythe user. To construct the color dictionary, we ﬁrst extractthe color distribution from a set of domain-speciﬁc sampleimages and then apply unsupervised machine learning (k-means) to extract the colors which are strong represen-tatives of the entire color space. The color quantizationdictionary will be used during the encoding and decodingphase for representing the image. Algo. 1 describes in de-tails how the color quantization dictionary is constructed.

2) Common Pattern Dictionary:

Compressing an imagewith MAGIC involves segmenting an image into represen-tative triangles using Delaunay triangulation (DT). Thetriangle segments are determined from the points sprayedon the 2D image plane. Hence, patterns in an imagesegment can be represented as a set of points in a 2D plane.The forest ﬁre images in Fig. 4 illustrate this process.The common pattern dictionary is a data structure forsaving the regularly occurring spray point patterns thatoccur in an image segment. The patterns are indexed inthe dictionary such that a higher index is associated withmore complex details. The pattern dictionary can be stat-ically generated to increase compression robustness acrossdiﬀerent image domains or learned during the knowledgeacquisition phase to be in more tune with the applicationdomain.

3) Machine Learning Model for Pattern Prediction:

Wetrain a machine learning model that learns to representthe segments of an image as a set of patterns from theCommon Pattern Dictionary. Similar to other compressiontechnique, we operate on ‘blocks’ of an image and mustpartition the image. Each block needs to be assigneda point spray pattern entry from the common patterndictionary during encoding. The assignment can be basedon how much texture details the image block has or theimportance of the image block for a given application.MAGIC employs the trained ML model (pattern predic-tion model) for assigning an image block to an entry fromthe common pattern dictionary.Iterative heuristic driven DT segmentation methodshave time complexity O ( IM log M ), where I is the numberof iterations and M is the maximum number of pointsused for computing DT. Our pattern prediction modelcan provide the points in O (1) followed by a single DT ofcomplexity O ( M log M ). Therefore, the pattern predictionmodel has two beneﬁts: (1) The ML guided assignment ofan image block to a speciﬁc pattern dictionary entry isfaster than determining the segmentation pattern of themage block using iterative heuristic means and (2) theML model can be trained to retain more details for speciﬁcimage blocks which may be important for the speciﬁcvisual task.

4) Knowledge Acquisition Algorithm:

Before communi-cation can start between a sender entity and a receiverentity, we must construct the above three componentsduring the knowledge acquisition phase. The pattern pre-diction model (1) must reside on the sender (encoder) side.The common pattern dictionary (2) and color quantizationdictionary (3) should reside on both sender and receiversides.Algo. 1 deﬁnes the knowledge acquisition process whichcan be used to construct these components. We collecta set of sample images (learning dataset) that can ap-proximately represent the nature of images that are to becommunicated. In line 3, the common pattern dictionaryis generated. For this iteration of MAGIC, the generationis such that entry indexed i has exactly i points sprayedrandomly in a ( bDim x bDim ) block. For each image, weconstruct the pointArr (set of points on the 2D imageplane) which determines the segmentation. The pointArr is initially populated with grid points sprayed uniformlybased on the parameter grid (line 6 using Algo. 3) andedge points determined by an edge detection algorithm(line 7). In our case, we use canny edge detection. Weadd more points to the pointArr by repeatedly splittingtriangles with standard deviation of pixel intensity greaterthan th (lines 10-12 using Algo. 2). This process is done tocapture more information, but we note that this may insome cases result in unnecessary details and ultimatelyless compression. Therefore, we keep at most 1 pointin the pointArr for every ( pw x pw ) non-overlappingwindow (line 13). We then perform DT to obtain thetriangle list (line 15). For each triangle in the triangle list,we obtain the average color and update the colorF req .The colorF req holds the frequency of each triangle colorencountered across all the images (lines 16-21). cb (num-ber of bits for representing colors) is a user input tocontrol the size of the color quantization dictionary. Wedivide the image into blocks of dimension ( bDim x bDim )and compute the common pattern dictionary ( patDict )entry index which best corresponds to the point spraypattern of each block (line 25). The dictInd and theRGB block ( blockList [ j ]) act as the label and inputdata (respectively) for training our point prediction model(lines 26-27). We cluster the entries (weighted by theirfrequency) in the colorF req using k-means algorithm [31].The number of clusters is 2 cb . The cluster representativesare assigned an index and collectively form the color quan-tization dictionary ( colorDict ). In this way, we employunsupervised machine learning to leverage domain-speciﬁccolor distribution information. The model training processdepends on the ML model architecture selected for thedomain-speciﬁc point prediction task. After the knowledgeacquisition phase completes the application is ready toencode (compress) and decode images. Algorithm 3

Grid Spray Points procedure gridSpray ( pointArr, grid, rows, cols ) i = 0 while i < rows do j = 0 while j < cols do pointArr.append (( i, j )) j = j + grid i = i + grid return pointArr Algorithm 4

Image Encoding procedure encode ( bDim, d, img, model, colorDict, patDict, grid ) block list = tiling ( img, bDim ) Initialize pointArr = ∅ , labelsArr = ∅ , bIndex = 0 for each block ∈ blockList do label = ( predict ( block, model, bDim )) /d labelsArr.append ( label ) points = patDict [ label ] for each p ( r, c ) ∈ points do p.c = p.c + ( bIndex % bDim ) ∗ bDim p.r = p.r + ( bIndex/bDim ) ∗ bDim pointArr.append ( points ) bIndex = bIndex + 1 pointArr = gridSpray ( pointArr, grid, img.rows, img.cols ) triangleList = delaunay triangulation ( pointArr ) colorList = ∅ for each t ∈ triangleList do avgColor = findAvgColor ( t, img ) quantColor = findClosestMatch ( avgColor, colorDict ) colorList.append ( quantColor ) encImg = cast to bits ( img.rows, img.cols, grid, bDim, labelsArr, colorList ) return encImg B. Encoding Procedure

Algo. 4 deﬁnes the image encoding process at the senderside. For the given image, we divide it into blocks basedon the dimension speciﬁed by bDim (line 2). For eachblock, we predict the pattern dictionary entry to use withthe help of the point prediction model (line 5). The labelpredicted by the ML model is divided by the input d , atunable parameter that allows for dynamic image quality.Higher values of d are associated with higher compressionrates. The predicted labels for each block are appended tothe labelsArr (line 6). For a label predicted for a speciﬁcblock, we fetch the associated point spray pattern fromthe common pattern dictionary ( patDict ) and append thepoints to the pointArr after computing their absoluteposition with respect to the image (lines 8-11). pointArr is next populated with grid points sprayed uniformlybased on the parameter grid (lines 13 using Algo. 3). Weperform DT to obtain the triangleList in line 14. Foreach triangle in the triangleList we compute the averagecolor ( avgColor ) and ﬁnd its closet match ( quantColor ) lgorithm 5 Image Decoding procedure decode ( encImg, colorDict, patDict ) rows, cols, grid, bDim, labelsArr, colorList = unpack ( encImg ) Initialize bIndex = 0 , pointArr = ∅ for each label ∈ labelsArr do points = patDict [ label ] for each p ( r, c ) ∈ points do p.c = p.c + ( bIndex % bDim ) ∗ bDim p.r = p.r + ( bIndex/bDim ) ∗ bDim pointArr.append ( points ) bIndex = bIndex + 1 pointArr = gridSpray ( pointArr, grid, rows, cols ) triangleList = delaunay triangulation ( pointArr ) i = 0 recImg = Array of Zeros of Dimension ( rows, cols ) while i < size ( triangleList ) do trueColor = colorDict [ colorList [ i ]] drawT riangle ( triangleList [ i ] , trueColor, recImg ) return recImg from the color quantization dictionary ( colorDict ). The quantColor is appended to the colorList . The ﬁnal en-coded image consists of the following converted and packedas bits: • img.rows : The number of pixel rows in the image(16 bits). • img.col : Number of pixel columns in the image(16 bits). • grid : Number of pixels to skip between 2 grid pointssprayed (16 bits). • bDim : Dimension of the image block to use (16 bits). • labelsArr : log ( patDict size) bits for each entry. • colorDict : log ( colorDict size) bits for each entry.The encoded image ( endImg ) is returned. C. Decoding Procedure

Algo. 5 deﬁnes the image decoding process at the re-ceiver side. Based on the encoding format, rows , cols , grid , bDim , labelArr and colorList are extracted fromthe encoded image ( encImg ) in line 2. For each label inthe labelArr , we fetch the associated point spray pat-tern from the pattern dictionary and append the pointsto the pointArr after computing their absolute positionwith respect to the image and the block index ( bIndex )(lines 6-8). The pointArr is next populated with gridpoints sprayed uniformly based on the parameter grid (line 11 using Algo. 3). We perform DT to obtain the triangleList in line 12. We initialize a blank image withthe obtained dimensions in line 14. For each triangle in the triangleList , we obtain the RGB color ( trueColor ) fromthe color quantization dictionary using the correspondingentry from the colorList (line 16). We color the pixels in recImg for the given triangle using trueColor (line 17).The ﬁnal decoded/recovered image ( recImg ) is returnedfrom this method. Fig. 5: Comparison of building crack detection accuracyvs BPP of JPEG 2000, WebP, and MAGIC (proposed).Fig. 6: Comparison of ﬁre detection accuracy vs BPP ofJPEG 2000, WebP, and MAGIC (proposed). V. Results

MAGIC compression is designed to excel in autonomoustask-speciﬁc IoT applications where the analysis of the im-ages is done by machine learning models. To quantitativelyanalyze the eﬀectiveness of MAGIC for IoT applicationswe pick two use-cases:1)

Forest ﬁre surveillance [11].2)

Infrastructure analysis [12].In the next few subsections, we describe the experimen-tal setup and compare the accuracy of MAGIC compressedimages to JPEG 2000 and WebP under diﬀerent qual-ity factor (QF) settings. We use ImageMagick’s convertcommand for JPEG 2000 and WebP compressions whichhave quality factor from 1 to 100, with 1 resulting inthe highest compression [32]. We explore the eﬀect theMAGIC input parameters pw , d , and cb have on therate of compression and accuracy. Finally, we introducea computation, transmission energy cutoﬀ for analyzingthe energy eﬃciency of MAGIC. A. Experimental Setup

The neural network architecture for the domain-speciﬁcML models is shown in Fig. 9. We obtain separate modelweights by training on each dataset and knowledge ac-quisition parameters (controlling the level of compression)using Keras [33]. The input to the neural network is theig. 7: Entropy feature of a sample forest ﬁre dataset [11]image visualized by scaling the values between 0 and 255.Brighter pixels have higher entropy.TABLE II: Classiﬁcation Results for ﬁre dataset [11] andbuilding crack dataset [12] at low and ultra low BPP.

Ultra Low BPP ( ∼ Fire DS Crack DSBPP

ACC ∼ Fire DS Crack DSBPP

ACC ﬂattened per-pixel local entropy features of the 64x64image blocks. The entropy of a pixel is deﬁned as thenumber of bits required to represent the local grey-scaledistribution of a deﬁned neighbourhood [34]. A higherentropy value is correlated to higher diversity and higherinformation density. We use a neighbourhood of 5 pixels totrain our models. In Fig. 7, we see the visual representationof the entropy feature of a sample image. The output ofthe neural network domain-speciﬁc point prediction modelis used to compute the entry in the common patterndictionary that is to be assigned for the input image block.For both building crack detection and forest ﬁre de-tection task, we use a statically generated point spraypattern dictionary containing 4096 entries such that entry i has exactly i points sprayed randomly in a 64x64 block.Hence using an entry with a high value of i is equivalentto capturing more information in the image block. B. Evaluation up to Lowest Quality Factor1) Infrastructure Analysis:

We construct two randomlysampled, disjoint sets of 2000 images for both knowledgeacquisition and evaluation, respectively. 1000 images fromthe positive (with crack) class and another 1000 imagesfrom the negative (no crack) class are present in each ofthese sets. For knowledge acquisition parameters (Algo. 1),we use block dimension ( bDim ) 64, number of iteration( iterLimit ) 10, prune window size ( pw ) (4 and 8), griddimension ( grid ) ceil (( rows + cols ) / th ) 5, and cb bDim ) 64, d (1 up to 12 in separate instances),grid dimension ( grid ) ceil (( rows + cols ) /

20) along with thedomain-speciﬁc point prediction model ( model ) and thecolor quantization dictionary obtained. To compare withMAGIC, we compress the same images with JPEG 2000and WebP from QF 1 to 10.We obtain a separate dataset for each JPEG 2000,WebP, and MAGIC setting. Fig. 10 shows sample imagesfrom the compressed datasets. For each dataset, we extractthe features from the second fully connected (fc2) layerof pretrained VGG-16 [35] to train and test a supportvector machine for the classiﬁcation task using 30-foldcross-validation (20/80 test/train splits). From Fig. 5,MAGIC was able to compress beyond JPEG 2000 QF=1while maintaining almost similar classiﬁcation accuracy.The MAGIC images in the dataset compressed with d = 12and pw = 8 are on average 22.09x smaller (1.06% accuracyloss) than source dataset (ACC=98.97%, BPP=0.9479),2.51x smaller (0.24% accuracy loss) than JPEG 2000QF=1 (ACC=98.15%, BPP=0.1080), and 1.98x smaller(1.69% accuracy loss) than WebP QF=1 (ACC=99.60%,BPP=0.0851).

2) Forest Surveillance:

From the forest ﬁre dataset[11], we extract 643 images of which 227 have ﬁre and416 have no ﬁre. We ignore the images which are notrelevant to forests. We use 20 images from the dataset(10 from each class) to perform the knowledge acquisitionprocedure. As knowledge acquisition parameters (Algo. 1)we use block dimension ( bDim ) 64, number of iteration( iterLimit ) 10, prune window size ( pw ) (5 and 8), griddimension ( grid ) ceil (( rows + cols ) / th ) 5 and cb

8. The domain-speciﬁc point prediction model is trained in the samemanner as for the infrastructure analysis task. We com-press the remaining 623 images (excluding the knowledgeacquisition learning set) using MAGIC with compressionparameters (Algo. 4) block dimension ( bDim ) 64, d (1through 12), grid dimension ( grid ) ceil (( rows + cols ) / model ) and the color quantization dictionary obtainedfrom the knowledge acquisition stage.Again, we obtain a separate dataset for each JPEG 2000(QF 1 to 10), WebP (QF 1 to 10), and MAGIC settings.Fig. 8 shows sample images from the ﬁre dataset for JPEGig. 8: Comparison of source, WebP, JPEG 2000, and MAGIC compressed ﬁre detection images.Fig. 9: Lightweight neural network architecture being usedfor spray point prediction.2000, WebP, and MAGIC. We extract the features for eachdataset similar to the building crack dataset and carryout classiﬁcation using a support vector machine with30-fold cross-validation (20/80 test/train splits). As seenin Fig. 6, we observe the same trend from the previousdataset. The MAGIC images compressed with d = 8 and pw = 8 are on average 42.65x smaller (2.99% accuracyloss) than source dataset (ACC=97.17%, BPP=1.864),2.32x smaller (1.20% accuracy loss) than JPEG 2000QF=1 (ACC=95.38%, BPP=0.1014), and 5.85x smaller(3.18% accuracy loss) than WebP QF=1 (ACC=97.36%,BPP=0.2559). C. Evaluation beyond Lowest Quality Factor

WebP and JPEG 2000 are unable to compress beyondQF=1 without some level of pre-processing. On the otherhand, MAGIC naturally can achieve a very large compres-sion range. In Table II, we evaluate MAGIC at extremelevels of compression using smaller cb bit sizes. We cancompress up to ∼ ∼

13% accuracyloss for the ﬁre dataset and ∼

69x more than source at ∼ D. MAGIC Time & Energy Analysis

MAGIC, as per its current implementation, takes longertime to compress images as compared to JPEG 2000 andWebP. However, as shown above, MAGIC can achieve ahigher compression rate while still performing well whenit comes to coarse-grained machine vision classiﬁcation Fig. 10: Comparison of source, WebP, JPEG 2000, andMAGIC compressed building crack images.tasks. To explore the potential energy savings of MAGICcompression, we introduce a threshold, C/T Cutoﬀ (in-spired by Sadler et al. [18]), for determining the suﬃcientcomputation and transmission energy consumption ratiobeyond which MAGIC will be beneﬁcial for overall energyconsumption in a given resource-constrained computingsystem. The C/T Cutoﬀ for MAGIC compression (for aspeciﬁc set of parameters) can be computed using theEquation 1 where E is the average MAGIC encodingtime, E is the average encoding time of the competitormethod (JPEG 2000, WebP), I is the average image sizeof MAGIC, I is the average image size of the competitormethod (JPEG 2000, WebP) and f is the CPU clock fre-quency. The setup time during encoding is due to loadingthe libraries and initializing the Python environment. Inan amortized analysis for a batch operation, the setup timecan be considered negligible. For MAGIC compression(for a speciﬁc set of parameters) to save energy whencompared to other compression standards, the operatingdevice must have a C/T value greater than MAGIC’s C/TCutoﬀ. In Tables III, IV, we see the C/T Cutoﬀs fordiﬀerent MAGIC compression settings for building crackdetection and forest ﬁre detection datasets, respectively.We use f = 3.7 GHz for computing the C/T cutoﬀ values.Any device with C/T value greater than the cutoﬀ willbeneﬁt (in terms of operational power consumption) fromusing MAGIC with respect to the method being comparedagainst (JPEG 2000, WebP). For example in Table IV,with MAGIC (pw=8, cb=2, d=1) the JPEG 2000 (JP2K)C/T cutoﬀ is 0.497 which means the energy for 1 bytetransmission must be greater than the execution energyof 0.497 million clock cycles (CC) in a system for MAGICto have higher energy savings than JPEG 2000. C/T Cutof f = k E − E I − I k ∗ f (1)ABLE III: C/T Cutoﬀ of diﬀerent MAGIC settings for Building Crack detection dataset [12]. MAGIC Source JP2K WebPDataCompression Parameters pw=8cb=2d=1 pw=8cb=8d=2 pw=4cb=8d=10 pw=4cb=8d=2 - QF-1 QF-1

Avg BPP

Avg Size (Byte)

Encode + Setup Time (sec)

Encode Time (sec)

Decode Time (sec)

Detection Accuracy (%)

C/T Cutoﬀ Source ( × CC / Byte)

C/T Cutoﬀ JP2K-QF1 ( × CC / Byte)

C/T Cutoﬀ WebP-QF1 ( × CC / Byte)

VI. Discussion

In this section, we investigate the properties of thecurrent embodiment of MAGIC. We note, the MAGICframework can be improved across many dimensions. Wemake preliminary studies of these possibilities and explorefuture extensions of MAGIC in this section.

A. Variation in Compression Ability

The image compression technique being used must gen-erate images of less size variability for maintaining consis-tent overall system performance. We compress the imagesusing JPEG 2000, WebP and MAGIC to generate boxplots showing the variation of BPP for the sampled distri-butions in Fig. 12 and Fig. 11. We observe that MAGICprovides low variation in BPP as compared to JPEG 2000and WebP images. Due to diﬀerent parameters in theknowledge acquisition and encoding phase, speciﬁcally pw ,MAGIC has ﬁne control over the compressed image size.Hence, MAGIC can provide steady performance even inbiased scenarios, where other techniques may not givegood compression. B. Improving Prediction Accuracy

Post-processing the MAGIC images or using a morepowerful pattern prediction model can improve the pre-diction accuracy by about 1-2%. Images compressed us-ing MAGIC consist of triangulation artifacts. One wayto remove the artifacts is to recursively subdivide thetriangles and compute the approximate color of eachsub triangle based on the colors of the triangle and itsneighbours. Using this technique, we were able to increasethe classiﬁcation accuracy. However, there will be extracomputation due to post-processing in the decoder end. Ifthe decoder system resides in the cloud, then this step canbe considered to squeeze out extra performance.As explained earlier, we use entropy features for trainingand using our neural network models, but we have noticedthat VGG-16 fc2 features perform slightly better. Usinga VGG inspired large convolution neural network forcarrying out the domain-speciﬁc point prediction task also improves the performance slightly. However, we intention-ally use simple entropy features and a small neural networkto boost speed, and help reduce energy consumption andspace requirements. In an application where time, space,and energy are not constrained, we can opt for morecomplex feature extraction methods and larger neuralnetwork architectures for domain-speciﬁc point prediction.

C. Time Complexity and Performance Improvements

Time complexity analysis of encoder (Algo. 4) anddecoder (Algo. 5) algorithms simplify to O ( N + M log M + T R ). The major contributors in encoding are O ( N ) fortiling (line 2), O ( M log M ) for DT (line 14), and O ( T R )for triangle color calculation (line 17, the pixels associatedwith a triangle are determined by searching in a rectanglecircumscribing the triangle), where N is the number ofpixels in the image, M is the number of points sprayed, T is the number of triangles, and R is the dimension of thebounding rectangle of the biggest triangle. For decoding,the contributors are O ( N ) for predicted point absoluteposition computation (lines 6-8), O ( M log M ) for DT (line12), and O ( T R ) for triangle color assignment/drawing(line 17). In both algorithms, we expect the O ( M log M )DT step to consume the most time.Time complexity analysis of the knowledge acquisi-tion algorithm (Algo. 1) simpliﬁes to O ( KN log N + KIM log M + KIT R + SV C + P Q ). The major contrib-utors are O ( KN log N ) for canny edge detection for all K images (line 7), O ( KIM log M + KIT R ) for the splitoperation across all K images (line 11), O ( SV C ) for colordictionary computation using k-means algorithm (line 29)and, O ( P Q ) for training the point prediction model (line30). N , M , T , R hold the same meaning as before andadditionally K is the number of images in the imgList , I is the iterLimit , S is the iteration limit for k-meansalgorithm, V is the number of points in the colorF req map, C is the number of centroids speciﬁed for k-means, P is the number of training samples in trainX and trainY , Q is the number of training epochs for the point predictionmodel.ABLE IV: C/T Cutoﬀ of diﬀerent MAGIC settings for Forest Fire detection dataset [11]. MAGIC Source JP2K WebPDatasetCompression Parameters pw=8cb=8d=12 pw=8cb=8d=4 pw=8cb=8d=3 pw=8cb=8d=1 - QF-1 QF-1

Avg BPP

Avg Size (Byte)

Encode + Setup Time (sec)

Encode Time (sec)

Decode Time (sec)

Detection Accuracy (%)

C/T Cutoﬀ Source ( × CC / Byte)

C/T Cutoﬀ JP2K-QF1( × CC / Byte)

C/T Cutoﬀ WebP-QF1( × CC / Byte)

Fig. 11: Comparison of BPP variance for MAGIC,JPEG 2000, and WebP for building crack dataset.The runtime performance of both decoder and encodercan be improved through parallelization, hardware imple-mentation and code tweaking. Many of the block oper-ations such as block feature extraction and point spraypattern prediction can be easily parallelized. Hardware im-plementation can provide the most speed up and may helpreduce energy consumption as well. In future works, wewill focus on improving the time and energy performanceof MAGIC using diﬀerent means.

D. Manual Region-of-Interest Guided Compression

As previously described, ROI bias can be automaticallycaptured by training the pattern prediction model withsupervised images. Beyond learning the region-of-interestbias, MAGIC oﬀers manual ROI based compression. Withthis feature, users can specify additional regions of animage that can be retained at a higher quality. In Fig. 13,we see an example ROI-guided image compression wherethe ﬁre region is designated manually as a region-of-interest. Note that the image region with the ﬁre maintainsmuch higher information than the remaining regions. Fig. 12: Comparison of BPP variance for MAGIC,JPEG 2000 and WebP for ﬁre dataset.

E. Extension of MAGIC to video compression

A video can be thought of as a collection of images.To this end, MAGIC can be extended to process videosas well. Depending on the sampling rate of the imagesensor, we noticed that, adjacent video frames have verylittle content diﬀerence. Taking this into consideration,we can save more in terms of space, computation andtransmission. The two main components of a MAGICencoded image are labelsArr and colorDict . We can rep-resent frame[ N ] by reusing the colorDict and labelArr offrame[ N − OP is the set of obsolete pointspray patterns which are no longer present in the newframe and N P is the set of new point spray patterns whichare introduced in the new frame. Similarly, as shown inEqn. 3, the colorDict [ N −

1] can be modiﬁed by removingthe obsolete triangle colors and introducing the colors ofthe new triangles in frame N . Future works will investigateand formalize the MAGIC ﬂow applied to video. labelsArr [ N ] = labelsArr [ N − − OP + N P (2) colorDict [ N ] = colorDict [ N − − OC + N C (3)ig. 13: Region-of-Interest Guided MAGIC Compressionon a sample Fire Detection Dataset image [11].

VII. Conclusion

The increasing use of intelligent edge devices in diverseIoT applications, speciﬁcally for a multitude of computervision tasks, calls for innovation in an image compressiontechnique that meets the unique requirements of theseapplications. They primarily require high compressionratio while maintaining machine vision accuracy at an ac-ceptable level. The MAGIC framework we have presentedin this paper addresses this need. We have shown thateﬀective use of domain knowledge learned with ML canprovide high compression in resource-constrained edge ap-plications while keeping appropriate features for machinevision. The proposed framework is ﬂexible for applicationin diverse domains and scalable to large image sizes.Our experiments for coarse-grained ML tasks using twodatasets highlight the eﬀectiveness of MAGIC. We achieveup to 42.65x higher compression than the source (beyondJPEG 2000 and WebP) while achieving similar accuracy.We compute the transmission computation energy cutoﬀsto demonstrate at what level MAGIC compressed imagescan be more energy eﬃcient than standard techniques.Further, we show low compression variance compared tostandard image compression techniques. With the use ofa common pattern dictionary, the proposed ML-basedcompression procedure can be easily extended for recog-nizing coarse-grain patterns in edge devices. Moreover, itcan potentially be extended to video compression wheredomain knowledge is expected to play an even strongerrole. Future work will investigate these extensions as wellas further improvement in the performance of MAGIC ina variety of edge applications.

References [1] A. Al-Kaﬀ, D. Martin, F. Garcia, A. de la Escalera, andJ. M. Armingol, “Survey of computer vision algorithms andapplications for unmanned aerial vehicles,”

Expert Systems withApplications , 2018. [2] ´A. Rest´as, “Thematic division and tactical analysis of the uasapplication supporting forest ﬁre management,”

Viegas, X.D.,Ed., Advances in Forest Fire Research , 2014.[3] C. Koch, K. Georgieva, V. Kasireddy, B. Akinci, and P. Fieguth,“A review on computer vision based defect detection and con-dition assessment of concrete and asphalt civil infrastructure,”

Advanced Engineering Informatics

Future Generation Computer Systems

IEEE Wireless Communications , vol. 26,no. 3, pp. 132–139, 2019.[8] W. B. Pennebaker and J. L. Mitchell,

JPEG: Still image datacompression standard . Springer Science & Business Media,1992.[9] M. Rabbani, “Jpeg2000: Image compression fundamentals, stan-dards and practice,”

Journal of Electronic Imaging , 2002.[10] Google, “Webp,” https://developers.google.com/speed/webp,accessed: 02-28-2020.[11] Dataturks, “Forest ﬁre,” https://dataturks.com/projects/qdyzl1013/Forest%20Fire, accessed: 02-20-2020.[12] C¸ . F. ¨Ozgenel and A. G. Sorgu¸c, “Performance comparison ofpretrained convolutional neural networks on crack detection inbuildings,” in

ISARC. Proceedings of the International Sympo-sium on Automation and Robotics in Construction , 2018.[13] I. Aydin and N. A. Othman, “A new iot combined face detectionof people by using computer vision for security application,” in . IEEE, 2017, pp. 1–6.[14] C. Yuan, Z. Liu, and Y. Zhang, “Aerial images-based forest ﬁredetection for ﬁreﬁghting using optical remote sensing techniquesand unmanned aerial vehicles,”

Journal of Intelligent & RoboticSystems , vol. 88, no. 2-4, pp. 635–654, 2017.[15] S. Li, H. Tang, S. He, Y. Shu, T. Mao, J. Li, and Z. Xu, “Unsu-pervised detection of earthquake-triggered roof-holes from uavimages using joint color and shape features,”

IEEE Geoscienceand Remote Sensing Letters , vol. 12, no. 9, pp. 1823–1827, 2015.[16] D. Duarte, F. Nex, N. Kerle, and G. Vosselman, “Towardsa more eﬃcient detection of earthquake induced facade dam-ages using oblique uav imagery,”

The International Archivesof Photogrammetry, Remote Sensing and Spatial InformationSciences , vol. 42, p. 93, 2017.[17] B. Arshad, R. Ogie, J. Barthelemy, B. Pradhan, N. Ver-staevel, and P. Perez, “Computer vision and iot-based sensors inﬂood monitoring and mapping: A systematic review,”

Sensors ,vol. 19, no. 22, p. 5012, 2019.[18] C. M. Sadler and M. Martonosi, “Data compression algorithmsfor energy-constrained devices in delay tolerant networks,” in

Proceedings of the 4th international conference on Embeddednetworked sensor systems , 2006, pp. 265–278.[19] Z. Liu, T. Liu, W. Wen, L. Jiang, J. Xu, Y. Wang, and G. Quan,“Deepn-jpeg: a deep neural network favorable jpeg-based imagecompression framework,” in

Proceedings of the 55th AnnualDesign Automation Conference , 2018.[20] M. Weber, C. Renggli, H. Grabner, and C. Zhang, “Lossy imagecompression with recurrent neural networks: from human per-ceived visual quality to classiﬁcation accuracy,” arXiv preprintarXiv:1910.03472 , 2019.[21] D. Marwood, P. Massimino, M. Covell, and S. Baluja, “Repre-senting images in 200 bytes: Compression via triangulation,” in

IEEE International Conference on Image Processing , 2018.[22] R. J. Clarke,

Digital compression of still images and video .Academic Press, Inc., 1995.23] G. G. Langdon, “An introduction to arithmetic coding,”

IBMJournal of Research and Development , vol. 28, no. 2, pp. 135–149, 1984.[24] D. A. Huﬀman, “A method for the construction of minimum-redundancy codes,”

Proceedings of the IRE , vol. 40, no. 9, pp.1098–1101, 1952.[25] A. J. Hussain, A. Al-Fayadh, and N. Radi, “Image compressiontechniques: A survey in lossless and lossy algorithms,”

Neuro-computing , vol. 300, pp. 44–69, 2018.[26] O. Rippel and L. Bourdev, “Real-time adaptive image compres-sion,” in

Proceedings of the 34th International Conference onMachine Learning-Volume 70 . JMLR. org, 2017.[27] G. Toderici, D. Vincent, N. Johnston, S. Jin Hwang, D. Minnen,J. Shor, and M. Covell, “Full resolution image compressionwith recurrent neural networks,” in

Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , 2017.[28] M. Li, W. Zuo, S. Gu, D. Zhao, and D. Zhang, “Learning con-volutional networks for content-weighted image compression,”in

The IEEE Conference on Computer Vision and PatternRecognition , June 2018.[29] J. Ball´e, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston,“Variational image compression with a scale hyperprior,” arXivpreprint arXiv:1802.01436 , 2018.[30] Z. Liu, X. Xu, T. Liu, Q. Liu, Y. Wang, Y. Shi, W. Wen,M. Huang, H. Yuan, and J. Zhuang, “Machine vision guided3d medical image compression for eﬃcient transmission andaccurate segmentation in the clouds,” in

Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition ,2019.[31] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Ma-chine learning in Python,”

Journal of Machine Learning Re-search , vol. 12, pp. 2825–2830, 2011.[32] The ImageMagick Development Team, “Imagemagick.” [Online].Available: https://imagemagick.org[33] F. Chollet et al. , “Keras,” https://keras.io, 2015.[34] S. van der Walt, J. L. Sch¨onberger, J. Nunez-Iglesias,F. Boulogne, J. D. Warner, N. Yager, E. Gouillart, T. Yu, andthe scikit-image contributors, “scikit-image: image processingin Python,”

PeerJ , vol. 2, p. e453, 6 2014. [Online]. Available:https://doi.org/10.7717/peerj.453[35] K. Simonyan and A. Zisserman, “Very deep convolutionalnetworks for large-scale image recognition,” arXiv preprintarXiv:1409.1556arXiv preprintarXiv:1409.1556