Leveraging Domain Knowledge using Machine Learning for Image Compression in Internet-of-Things
LLeveraging Domain Knowledge using Machine Learning forImage Compression in Internet-of-Things
Prabuddha Chakraborty, Jonathan Cruz, and Swarup BhuniaDepartment of Electrical & Computer EngineeringUniversity of Florida, Gainesville, FL, USA
Abstract —The emergent ecosystems of intelligentedge devices in diverse Internet of Things (IoT) appli-cations, from automatic surveillance to precision agri-culture, increasingly rely on recording and processingvariety of image data. Due to resource constraints, e.g.,energy and communication bandwidth requirements,these applications require compressing the recordedimages before transmission. For these applications,image compression commonly requires: (1) maintain-ing features for coarse-grain pattern recognition in-stead of the high-level details for human perceptiondue to machine-to-machine communications; (2) highcompression ratio that leads to improved energy andtransmission efficiency; (3) large dynamic range ofcompression and an easy trade-off between compressionfactor and quality of reconstruction to accommodate awide diversity of IoT applications as well as their time-varying energy/performance needs. To address theserequirements, we propose, MAGIC, a novel machinelearning (ML) guided image compression frameworkthat judiciously sacrifices visual quality to achieve muchhigher compression when compared to traditional tech-niques, while maintaining accuracy for coarse-grainedvision tasks. The central idea is to capture application-specific domain knowledge and efficiently utilize it inachieving high compression. We demonstrate that theMAGIC framework is configurable across a wide rangeof compression/quality and is capable of compressingbeyond the standard quality factor limits of both JPEG2000 and WebP. We perform experiments on represen-tative IoT applications using two vision datasets andshow up to 42.65x compression at similar accuracy withrespect to the source. We highlight low variance incompression rate across images using our technique ascompared to JPEG 2000 and WebP.
Index Terms —Computer vision, edge intelligence,image compression, Internet-of-Things (IoT), machinelearning, sensor signal processing.
I. Introduction
In the Internet of Things (IoT) era, humans have beenincreasingly removed from the surveillance loop in favor ofa connected ecosystem of edge devices performing vision-based tasks [1]. Automatic analysis is the only viable op-tion given the huge amount of data continuously collectedfrom different IoT edge devices. For example, resource-constrained unmanned aerial vehicles (UAVs) or imagesensors can be used as surveillance devices for detectingforest fires [2] or infrastructure damages after naturaldisasters [3]. In these scenarios, autonomous UAVs or edgedevices collect data that may be sent to other edge devicesor to the cloud for automated machine learning (ML) based analysis. According to the 2019 Embedded MarketsStudy [4], 43% of IoT applications incorporating advancedtechnologies are using embedded vision and 32% are usingmachine learning. However, using these IoT devices oftenrequires meeting the tight storage, energy and/or com-munication bandwidth constraints, while maintaining theeffectiveness of surveillance.Image compression can address these needs in edge de-vices that operate in constrained environments and at thesame time reduce network traffic [5]. Compressed imagesare easier to store and more energy efficient to transmitlong-range. An ideal image compression technique for IoTapplications should: • Optimize for machine-to-machine communication andmachine-based interpretation in diverse IoT applica-tions - i.e., pattern recognition or feature extractionon the image. Visual perception by human usersshould be given less importance. • Aim for minimizing the communication bandwidthas IoT is creating 1000X more dense networkingrequirements [6], [7], often driven by image/videocommunication. • Gear towards minimizing the overall energy and spacerequirement on resource-constrained edge devices.The standard image compression methods, such asJPEG [8], JPEG 2000 [9], and WebP [10] are tailored tomaintain good human-perceivable visual quality and werenot designed with IoT applications in mind. Propertiesof IoT applications which can be leveraged to obtainincreased compression are as follows: • The image domain is biased based on the applicationand on each specific edge image sensor device. Thebias can be divided into two categories: (1) colordistribution bias, (2) common pattern bias. We de-fine patterns as segment outlines in an image. Thisinformation can be learned and utilized. • Depending on the application, specific entities of theimages may hold greater value with respect to therest of the image. Such applications, therefore, havea region of interest bias which can be learned andutilized. • Coarse-grained ML tasks prevalent in IoT applica-tions can tolerate extreme levels of compression.Building on these observations, we propose MAGIC, a M achine le A rning G uided I mage C ompression frameworkfor achieving extreme levels of image compression in IoT a r X i v : . [ c s . C V ] S e p ig. 1: Overall flow of MAGIC framework.systems while maintaining sufficient accuracy for coarse-grained AI tasks. MAGIC consists of three major steps:(1) knowledge acquisition, (2) encoding and (3) decoding.During knowledge acquisition, different application anddomain-specific information such as color distribution,common pattern bias and region of interest bias can beextracted in the form of (1) a color quantization dictio-nary, (2) a common pattern dictionary and (3) a machinelearning model which can intelligently represent imagesegments as a set of common pattern dictionary entries.During the encoding stage, an image is segmented intonon-overlapping triangles using an efficient Delaunay tri-angulation (DT) method. The ML model, we name patternprediction model, and the common pattern dictionaryfrom the knowledge acquisition stage are used to guidethe image segmentation process. Finally, the colors areassigned by averaging the pixel colors within each triangleand quantizing them based on the color quantizationdictionary, which is constructed by analyzing the colordistribution from the domain using k-means. The decodephase operates similarly by reconstructing the segmentsusing DT and assigning colors from the color quantizationdictionary.We have implemented MAGIC as a completely config-urable framework that can be used to compress imagesfrom a given dataset. We evaluate MAGIC extensivelyusing two publicly available datasets: fire detection [11]and building crack detection [12] and observe promisingperformance. For the building crack detection dataset, at a1.06% accuracy loss, we obtained 22.09x more compressionwith respect to the source images. For the fire detectiondataset, at a 2.99% accuracy loss, we obtained 42.65xmore compression with respect to the source images. Weshow up to ∼ ∼ II. Background & Related Works
In this section, we will give a brief introduction tovision tasks in IoT application, and discuss state-of-the-art compression techniques.
A. Computer Vision in IoT Applications
IoT applications are gaining popularity in severalspheres such as industry, home, healthcare, retail, trans-port and even security [13]. Many applications in thesedomains involve capturing images at the edge device andtransmitting the image to cloud or other edge devices foranalysis. For example: • UAV based fire detection techniques have been pro-posed which uses optical remote sensing [14]. • Detecting infrastructure damage in a post-disasterscenario using UAV imaging is being investigated in[15], [16]. • IoT image sensors and computer vision techniquesare widely used for flood monitoring, warning anddamage mitigation [17].Image sensors and intelligent data analysis are twokey aspects of surveillance based IoT applications. Ad-ditionally, security-oriented IoT surveillance applicationsactively rely on computer vision to detect anomalies[13].
B. Need for Image Compression in IoT Vision
Different IoT applications require sensing image dataat the edge and transmitting them over to other edgedevices or cloud for analysis. These edge devices operatewith strict space, energy, and bandwidth requirements.Compressing images not only has the direct effect ofig. 2: Pixel color distribution for forest fire and buildingcrack detection datasets [11], [12]. Red, Green and Bluelines represent R,G and B channels respectively.Fig. 3: DT guided segmentation for a sample buildingcrack detection image [12].reducing the space and network traffic requirement butalso can reduce energy consumption.IoT-based communication is expected to reach 50% ofnetwork traffic by 2025 [6]. For example, a typical 4Gnetwork is designed to support thousands of devices worthof traffic in a region. However, with the increase in thenumber of IoT devices being connected to the network,it may become impossible to efficiently serve all devicessimultaneously. Therefore, compression at the edge canhelp reduce network stress.The energy required to transmit data increases withdistance [18]. For long-range transmission devices suchas MaxStream XTend (at 500 mW transmit power), theenergy required for one byte of transmission can be higherthan 1 million clock cycles worth of computation [18].Hence, even with the cost of additional computation,compression can ultimately lead to less overall energyexpenditure. Due to all these reasons, image compressionis a vital step for any IoT vision application.
C. State-of-the-art Image Compression Techniques
Several image compression techniques proposed over theyears can be divided primarily into two categories: (1)Lossless compression techniques and (2) Lossy compres-sion techniques. Lossless compression techniques such asArithmetic Coding ([22], [23]) and Huffman Coding ([24])aim to completely preserve the content under compression,but generally at the cost of significant mathematical com-putations [25]. For coarse-grained ML tasks, such qualityis not needed, therefore, lossy compression techniquesare preferred. As the name suggests, lossy compressionallows for variable data loss to achieve higher rates ofcompression. The data lost is generally not perceivableby humans. Some lossy compression techniques include,JPEG [8], JPEG 2000 [9], and WebP [10] which performquantization in the frequency domain using techniquessuch as discrete wavelet transform and discrete cosinetransform. Another class of lossy compression performsquantization in the spatial domain, such as triangulation-based image compression most recently proposed in [21].This compression technique relies on the DT of a set ofpoints in the image matrix to construct the image outof non-overlapping triangles during both encoding anddecoding. In this way, triangulation allows for sendingminimal amounts of information at the cost of slightlymore encoding/decoding time. While we use DT, ourcompression algorithm and compression goals are vastlydifferent than [21].Machine learning has been used to further improvecompression [26], [27], [28], [29]. In all these works, thegoal is to maximize visual quality metrics (PSNR, MS-SSIM, and remove artifacts) all while minimizing bits perpixel (BPP). However, complex, large neural networks, arenot ideal for use in edge devices. More recently, peoplehave been targeting image compression optimized for MLaccuracy over human perceived quality. Liu et al. proposeDeepN-JPEG [19] which modifies JPEG’s quantizationtable for deep neural network (DNN) accuracy over humanvisual quality. DeepN-JPEG is targeted for generalized AI-models and can achieve only 3.5x compression comparedto source images. However, our approach can achieve upto 42.65x more compression than the source. Similarly, in[30], Liu et al. modify JPEG 2000 to extract frequenciesrelevant for neural network (NN) based segmentation of3D medical images. Weber et al. develop a recurrentneural network (RNN) based compression with the aimof maximizing the accuracy of generalized classifiers andinvestigate the accuracies of several classifiers for imagescompressed for human perception versus machine percep-tion [20].In Table I, we qualitatively compare MAGIC with dif-ferent state-of-the-art relevant image compression tech-niques. MAGIC distinguishes itself as the only imagecompression technique to be targeted for coarse ML vi-sion tasks in IoT applications. The compression rangeof MAGIC is higher than other techniques because it isdesigned to leverage domain knowledge.ABLE I: Qualitative comparison of MAGIC with different state-of-the-art image compression techniques.
Type Target Application Domain KnowledgeLeveraged ROISupport Encoder SpaceComplexity Encoder Time &Energy Requirement CompressionRangeJPEG 2000[9]
Wavelet Human Vision No Yes Low Low Medium
WebP[10]
Fequency + Spatial Human Vision No No Low Low Medium
Deepn-JPEG[19]
Frequency Complex ML Task Limited No Low Low Medium
Weber et. al.[20]
Frequency Complex ML Task Limited No Medium High Medium
Marwood et al.[21]
Spatial Human Vision No No Low High High
MAGIC
Spatial Coarse ML Task Yes Yes Low Medium Extreme
III. Motivation
Most IoT applications designed to perform a particularautomated vision task will have some bias in the imagesbeing captured and analyzed. The amount of bias willdepend on the application and the sensory edge devicein use. For a given application the images will have (1) apixel color distribution bias depending on the environmentwhere the image occurs and (2) a pattern bias due toprevalence of certain common objects in the images. Apartfrom the image set bias, the IoT application may have itsown bias for certain objects and features which are relevantfor the ML analysis task.
A. Color Distribution Bias
Image color bias will exist to an extent in any IoTdomain-specific application. Apart from the applicationlevel color distribution bias, there may be bias attributedto the physical location of the device. Such locationbias can be more easily observed for stationary devices.Harnessing the bias for each device separately may bebeneficial but in this paper, we limit our study to theapplication level image color distribution bias. We plot thepixel color distributions for the forest fire dataset [11] andthe building crack dataset [12] as shown in Fig. 2. We canclearly observe that certain regions of Red, Green and Bluespectrum are more represented than others. This bias willappear more prominent and severe if we consider the jointRed-Green-Blue distribution. If we could take advantage ofthis bias by limiting the color space tuned for the specificapplication then we may be able to compress more.
B. Common Pattern Bias
The images captured and analyzed by task-specific IoTapplications will have pattern (image segment outlines)bias because of the nature of the objects that are presentin the images. For a building crack detection application,the images will consist of cracked and uncracked surfaces(Fig. 3) and for a forest fire surveillance application, theimages will consist of trees and occasional fires (Fig. 4).Just like color distribution bias, common pattern bias willalso exist both in the application level and in the devicelocation level. If we could capture and store these domain-specific repeating patterns in a dictionary, for example,then we could potentially save space by storing dictionaryentry indices instead of concrete data. Fig. 4: DT guided segmentation for a sample forest firedetection image [11]
C. Region of Interest Bias
Certain objects/regions in the image may hold moreimportance depending on the IoT task. If the image can becompressed based on the application-specific requirementthen we will be able to save important regions at higherquality while sacrificing other regions. For example, letus assume that we have an IoT application which isdesigned to detect green cars among green and blue cars.Only by using common pattern bias knowledge, we cannotdistinguish between green and blue cars. Both cars willhave the same level of quality. But with the extra region ofinterest bias knowledge, we can save space by only learningto only represent the green cars with high quality.
IV. Methodology
In this section, we present our learning guided com-pression technique (MAGIC) targeted for coarse-grainedML vision tasks in intelligent IoT ecosystems. Fig. 1illustrates the overall flow. Just like any other compressiontechnique, there is a procedure for encoding the imageand a procedure for decoding the image. Additionally,to take advantage of the bias present in the applicationdomain, we propose a knowledge acquisition procedure.In this paper, we focus on the first aspect of domain lgorithm 1
Knowledge Acquisition procedure learn ( bDim, iterLimit, pw, imgList, grid, th, cb ) Initialize colorF req = ∅ , trainX = ∅ , trainY = ∅ patDict = generateP atternDict ( imgList, bDim ) for each img ∈ imgList do pointArr = ∅ pointArr = gridSpray ( pointArr, grid, img.rows, img.cols ) edgeP oints = cannyEdgeDetection ( img ) pointArr.append ( edgeP oints ) iter = 0 while iter < iterLimit do pointArr = split ( pointArr, img, th ) iter = iter + 1 pruneP oint ( pointArr, pw ) . In every (pw X pw) window, keep maximum 1 point triangleList = delaunay triangulation ( pointArr ) for each t ∈ triangleList do avg color = findAvgColor ( t, img ) if avg color in colorF req then colorF req [ avgColor ] = colorF req [ avgColor ] + 1 else colorF req [ avgColor ] = 1 blockList = tiling ( img, bDim ) j = 0 while j < length ( blockList ) do dictInd = assignDictInd ( blockList, j, pointArr, patDict ) trainX.append ( blockList [ j ]) trainY.append ( dictInd ) j = j + 1 colorDict = weighted kmean ( colorF req, k = 2 cb ) model = train P oint P rediction Model ( trainX, trainY ) return colorDict, model, patDict knowledge learning, namely, color distribution bias. Theother two areas of domain knowledge (pattern bias andROI bias) are not strictly learned. The common patterndictionary (for segmentation bias) is statically generatedand the pattern prediction model (for ROI bias) is trainedbased on automated supervision. However, the algorithmsare implemented such that future inclusion of human su-pervision and learning in the other two domain knowledgeareas can be easily performed. We will now describe thethree major steps of MAGIC in greater detail. A. Knowledge Acquisition
Before compression is carried out, the knowledge acqui-sition procedure is used to analyze a set of sample imagesfrom the given use-case and learn common features thatcan be reused during compression. This learning stageallows for more efficient image compression. To capturethe application-specific domain knowledge we use the fol-lowing constructs and techniques.
1) Color Quantization Dictionary:
We construct a dic-tionary of most frequently occurring colors for a specificapplication. Colors are now represented as entries in thedictionary instead of the standard 24-bit RGB value. The
Algorithm 2
Triangle Split procedure split ( pointArr, img, th ) triangleList = delaunay triangulation ( pointArr ) for each t ∈ triangleList do stdDevColor = calculate Color Std Dev ( img, t ) if stdDevColor > th then pointArr.append ( barycenter ( t )) return pointArr number of entries in the dictionary can be controlled bythe user. To construct the color dictionary, we first extractthe color distribution from a set of domain-specific sampleimages and then apply unsupervised machine learning (k-means) to extract the colors which are strong represen-tatives of the entire color space. The color quantizationdictionary will be used during the encoding and decodingphase for representing the image. Algo. 1 describes in de-tails how the color quantization dictionary is constructed.
2) Common Pattern Dictionary:
Compressing an imagewith MAGIC involves segmenting an image into represen-tative triangles using Delaunay triangulation (DT). Thetriangle segments are determined from the points sprayedon the 2D image plane. Hence, patterns in an imagesegment can be represented as a set of points in a 2D plane.The forest fire images in Fig. 4 illustrate this process.The common pattern dictionary is a data structure forsaving the regularly occurring spray point patterns thatoccur in an image segment. The patterns are indexed inthe dictionary such that a higher index is associated withmore complex details. The pattern dictionary can be stat-ically generated to increase compression robustness acrossdifferent image domains or learned during the knowledgeacquisition phase to be in more tune with the applicationdomain.
3) Machine Learning Model for Pattern Prediction:
Wetrain a machine learning model that learns to representthe segments of an image as a set of patterns from theCommon Pattern Dictionary. Similar to other compressiontechnique, we operate on ‘blocks’ of an image and mustpartition the image. Each block needs to be assigneda point spray pattern entry from the common patterndictionary during encoding. The assignment can be basedon how much texture details the image block has or theimportance of the image block for a given application.MAGIC employs the trained ML model (pattern predic-tion model) for assigning an image block to an entry fromthe common pattern dictionary.Iterative heuristic driven DT segmentation methodshave time complexity O ( IM log M ), where I is the numberof iterations and M is the maximum number of pointsused for computing DT. Our pattern prediction modelcan provide the points in O (1) followed by a single DT ofcomplexity O ( M log M ). Therefore, the pattern predictionmodel has two benefits: (1) The ML guided assignment ofan image block to a specific pattern dictionary entry isfaster than determining the segmentation pattern of themage block using iterative heuristic means and (2) theML model can be trained to retain more details for specificimage blocks which may be important for the specificvisual task.
4) Knowledge Acquisition Algorithm:
Before communi-cation can start between a sender entity and a receiverentity, we must construct the above three componentsduring the knowledge acquisition phase. The pattern pre-diction model (1) must reside on the sender (encoder) side.The common pattern dictionary (2) and color quantizationdictionary (3) should reside on both sender and receiversides.Algo. 1 defines the knowledge acquisition process whichcan be used to construct these components. We collecta set of sample images (learning dataset) that can ap-proximately represent the nature of images that are to becommunicated. In line 3, the common pattern dictionaryis generated. For this iteration of MAGIC, the generationis such that entry indexed i has exactly i points sprayedrandomly in a ( bDim x bDim ) block. For each image, weconstruct the pointArr (set of points on the 2D imageplane) which determines the segmentation. The pointArr is initially populated with grid points sprayed uniformlybased on the parameter grid (line 6 using Algo. 3) andedge points determined by an edge detection algorithm(line 7). In our case, we use canny edge detection. Weadd more points to the pointArr by repeatedly splittingtriangles with standard deviation of pixel intensity greaterthan th (lines 10-12 using Algo. 2). This process is done tocapture more information, but we note that this may insome cases result in unnecessary details and ultimatelyless compression. Therefore, we keep at most 1 pointin the pointArr for every ( pw x pw ) non-overlappingwindow (line 13). We then perform DT to obtain thetriangle list (line 15). For each triangle in the triangle list,we obtain the average color and update the colorF req .The colorF req holds the frequency of each triangle colorencountered across all the images (lines 16-21). cb (num-ber of bits for representing colors) is a user input tocontrol the size of the color quantization dictionary. Wedivide the image into blocks of dimension ( bDim x bDim )and compute the common pattern dictionary ( patDict )entry index which best corresponds to the point spraypattern of each block (line 25). The dictInd and theRGB block ( blockList [ j ]) act as the label and inputdata (respectively) for training our point prediction model(lines 26-27). We cluster the entries (weighted by theirfrequency) in the colorF req using k-means algorithm [31].The number of clusters is 2 cb . The cluster representativesare assigned an index and collectively form the color quan-tization dictionary ( colorDict ). In this way, we employunsupervised machine learning to leverage domain-specificcolor distribution information. The model training processdepends on the ML model architecture selected for thedomain-specific point prediction task. After the knowledgeacquisition phase completes the application is ready toencode (compress) and decode images. Algorithm 3
Grid Spray Points procedure gridSpray ( pointArr, grid, rows, cols ) i = 0 while i < rows do j = 0 while j < cols do pointArr.append (( i, j )) j = j + grid i = i + grid return pointArr Algorithm 4
Image Encoding procedure encode ( bDim, d, img, model, colorDict, patDict, grid ) block list = tiling ( img, bDim ) Initialize pointArr = ∅ , labelsArr = ∅ , bIndex = 0 for each block ∈ blockList do label = ( predict ( block, model, bDim )) /d labelsArr.append ( label ) points = patDict [ label ] for each p ( r, c ) ∈ points do p.c = p.c + ( bIndex % bDim ) ∗ bDim p.r = p.r + ( bIndex/bDim ) ∗ bDim pointArr.append ( points ) bIndex = bIndex + 1 pointArr = gridSpray ( pointArr, grid, img.rows, img.cols ) triangleList = delaunay triangulation ( pointArr ) colorList = ∅ for each t ∈ triangleList do avgColor = findAvgColor ( t, img ) quantColor = findClosestMatch ( avgColor, colorDict ) colorList.append ( quantColor ) encImg = cast to bits ( img.rows, img.cols, grid, bDim, labelsArr, colorList ) return encImg B. Encoding Procedure
Algo. 4 defines the image encoding process at the senderside. For the given image, we divide it into blocks basedon the dimension specified by bDim (line 2). For eachblock, we predict the pattern dictionary entry to use withthe help of the point prediction model (line 5). The labelpredicted by the ML model is divided by the input d , atunable parameter that allows for dynamic image quality.Higher values of d are associated with higher compressionrates. The predicted labels for each block are appended tothe labelsArr (line 6). For a label predicted for a specificblock, we fetch the associated point spray pattern fromthe common pattern dictionary ( patDict ) and append thepoints to the pointArr after computing their absoluteposition with respect to the image (lines 8-11). pointArr is next populated with grid points sprayed uniformlybased on the parameter grid (lines 13 using Algo. 3). Weperform DT to obtain the triangleList in line 14. Foreach triangle in the triangleList we compute the averagecolor ( avgColor ) and find its closet match ( quantColor ) lgorithm 5 Image Decoding procedure decode ( encImg, colorDict, patDict ) rows, cols, grid, bDim, labelsArr, colorList = unpack ( encImg ) Initialize bIndex = 0 , pointArr = ∅ for each label ∈ labelsArr do points = patDict [ label ] for each p ( r, c ) ∈ points do p.c = p.c + ( bIndex % bDim ) ∗ bDim p.r = p.r + ( bIndex/bDim ) ∗ bDim pointArr.append ( points ) bIndex = bIndex + 1 pointArr = gridSpray ( pointArr, grid, rows, cols ) triangleList = delaunay triangulation ( pointArr ) i = 0 recImg = Array of Zeros of Dimension ( rows, cols ) while i < size ( triangleList ) do trueColor = colorDict [ colorList [ i ]] drawT riangle ( triangleList [ i ] , trueColor, recImg ) return recImg from the color quantization dictionary ( colorDict ). The quantColor is appended to the colorList . The final en-coded image consists of the following converted and packedas bits: • img.rows : The number of pixel rows in the image(16 bits). • img.col : Number of pixel columns in the image(16 bits). • grid : Number of pixels to skip between 2 grid pointssprayed (16 bits). • bDim : Dimension of the image block to use (16 bits). • labelsArr : log ( patDict size) bits for each entry. • colorDict : log ( colorDict size) bits for each entry.The encoded image ( endImg ) is returned. C. Decoding Procedure
Algo. 5 defines the image decoding process at the re-ceiver side. Based on the encoding format, rows , cols , grid , bDim , labelArr and colorList are extracted fromthe encoded image ( encImg ) in line 2. For each label inthe labelArr , we fetch the associated point spray pat-tern from the pattern dictionary and append the pointsto the pointArr after computing their absolute positionwith respect to the image and the block index ( bIndex )(lines 6-8). The pointArr is next populated with gridpoints sprayed uniformly based on the parameter grid (line 11 using Algo. 3). We perform DT to obtain the triangleList in line 12. We initialize a blank image withthe obtained dimensions in line 14. For each triangle in the triangleList , we obtain the RGB color ( trueColor ) fromthe color quantization dictionary using the correspondingentry from the colorList (line 16). We color the pixels in recImg for the given triangle using trueColor (line 17).The final decoded/recovered image ( recImg ) is returnedfrom this method. Fig. 5: Comparison of building crack detection accuracyvs BPP of JPEG 2000, WebP, and MAGIC (proposed).Fig. 6: Comparison of fire detection accuracy vs BPP ofJPEG 2000, WebP, and MAGIC (proposed). V. Results
MAGIC compression is designed to excel in autonomoustask-specific IoT applications where the analysis of the im-ages is done by machine learning models. To quantitativelyanalyze the effectiveness of MAGIC for IoT applicationswe pick two use-cases:1)
Forest fire surveillance [11].2)
Infrastructure analysis [12].In the next few subsections, we describe the experimen-tal setup and compare the accuracy of MAGIC compressedimages to JPEG 2000 and WebP under different qual-ity factor (QF) settings. We use ImageMagick’s convertcommand for JPEG 2000 and WebP compressions whichhave quality factor from 1 to 100, with 1 resulting inthe highest compression [32]. We explore the effect theMAGIC input parameters pw , d , and cb have on therate of compression and accuracy. Finally, we introducea computation, transmission energy cutoff for analyzingthe energy efficiency of MAGIC. A. Experimental Setup
The neural network architecture for the domain-specificML models is shown in Fig. 9. We obtain separate modelweights by training on each dataset and knowledge ac-quisition parameters (controlling the level of compression)using Keras [33]. The input to the neural network is theig. 7: Entropy feature of a sample forest fire dataset [11]image visualized by scaling the values between 0 and 255.Brighter pixels have higher entropy.TABLE II: Classification Results for fire dataset [11] andbuilding crack dataset [12] at low and ultra low BPP.
Ultra Low BPP ( ∼ Fire DS Crack DSBPP
ACC ∼ Fire DS Crack DSBPP
ACC flattened per-pixel local entropy features of the 64x64image blocks. The entropy of a pixel is defined as thenumber of bits required to represent the local grey-scaledistribution of a defined neighbourhood [34]. A higherentropy value is correlated to higher diversity and higherinformation density. We use a neighbourhood of 5 pixels totrain our models. In Fig. 7, we see the visual representationof the entropy feature of a sample image. The output ofthe neural network domain-specific point prediction modelis used to compute the entry in the common patterndictionary that is to be assigned for the input image block.For both building crack detection and forest fire de-tection task, we use a statically generated point spraypattern dictionary containing 4096 entries such that entry i has exactly i points sprayed randomly in a 64x64 block.Hence using an entry with a high value of i is equivalentto capturing more information in the image block. B. Evaluation up to Lowest Quality Factor1) Infrastructure Analysis:
We construct two randomlysampled, disjoint sets of 2000 images for both knowledgeacquisition and evaluation, respectively. 1000 images fromthe positive (with crack) class and another 1000 imagesfrom the negative (no crack) class are present in each ofthese sets. For knowledge acquisition parameters (Algo. 1),we use block dimension ( bDim ) 64, number of iteration( iterLimit ) 10, prune window size ( pw ) (4 and 8), griddimension ( grid ) ceil (( rows + cols ) / th ) 5, and cb bDim ) 64, d (1 up to 12 in separate instances),grid dimension ( grid ) ceil (( rows + cols ) /
20) along with thedomain-specific point prediction model ( model ) and thecolor quantization dictionary obtained. To compare withMAGIC, we compress the same images with JPEG 2000and WebP from QF 1 to 10.We obtain a separate dataset for each JPEG 2000,WebP, and MAGIC setting. Fig. 10 shows sample imagesfrom the compressed datasets. For each dataset, we extractthe features from the second fully connected (fc2) layerof pretrained VGG-16 [35] to train and test a supportvector machine for the classification task using 30-foldcross-validation (20/80 test/train splits). From Fig. 5,MAGIC was able to compress beyond JPEG 2000 QF=1while maintaining almost similar classification accuracy.The MAGIC images in the dataset compressed with d = 12and pw = 8 are on average 22.09x smaller (1.06% accuracyloss) than source dataset (ACC=98.97%, BPP=0.9479),2.51x smaller (0.24% accuracy loss) than JPEG 2000QF=1 (ACC=98.15%, BPP=0.1080), and 1.98x smaller(1.69% accuracy loss) than WebP QF=1 (ACC=99.60%,BPP=0.0851).
2) Forest Surveillance:
From the forest fire dataset[11], we extract 643 images of which 227 have fire and416 have no fire. We ignore the images which are notrelevant to forests. We use 20 images from the dataset(10 from each class) to perform the knowledge acquisitionprocedure. As knowledge acquisition parameters (Algo. 1)we use block dimension ( bDim ) 64, number of iteration( iterLimit ) 10, prune window size ( pw ) (5 and 8), griddimension ( grid ) ceil (( rows + cols ) / th ) 5 and cb
8. The domain-specific point prediction model is trained in the samemanner as for the infrastructure analysis task. We com-press the remaining 623 images (excluding the knowledgeacquisition learning set) using MAGIC with compressionparameters (Algo. 4) block dimension ( bDim ) 64, d (1through 12), grid dimension ( grid ) ceil (( rows + cols ) / model ) and the color quantization dictionary obtainedfrom the knowledge acquisition stage.Again, we obtain a separate dataset for each JPEG 2000(QF 1 to 10), WebP (QF 1 to 10), and MAGIC settings.Fig. 8 shows sample images from the fire dataset for JPEGig. 8: Comparison of source, WebP, JPEG 2000, and MAGIC compressed fire detection images.Fig. 9: Lightweight neural network architecture being usedfor spray point prediction.2000, WebP, and MAGIC. We extract the features for eachdataset similar to the building crack dataset and carryout classification using a support vector machine with30-fold cross-validation (20/80 test/train splits). As seenin Fig. 6, we observe the same trend from the previousdataset. The MAGIC images compressed with d = 8 and pw = 8 are on average 42.65x smaller (2.99% accuracyloss) than source dataset (ACC=97.17%, BPP=1.864),2.32x smaller (1.20% accuracy loss) than JPEG 2000QF=1 (ACC=95.38%, BPP=0.1014), and 5.85x smaller(3.18% accuracy loss) than WebP QF=1 (ACC=97.36%,BPP=0.2559). C. Evaluation beyond Lowest Quality Factor
WebP and JPEG 2000 are unable to compress beyondQF=1 without some level of pre-processing. On the otherhand, MAGIC naturally can achieve a very large compres-sion range. In Table II, we evaluate MAGIC at extremelevels of compression using smaller cb bit sizes. We cancompress up to ∼ ∼
13% accuracyloss for the fire dataset and ∼
69x more than source at ∼ D. MAGIC Time & Energy Analysis
MAGIC, as per its current implementation, takes longertime to compress images as compared to JPEG 2000 andWebP. However, as shown above, MAGIC can achieve ahigher compression rate while still performing well whenit comes to coarse-grained machine vision classification Fig. 10: Comparison of source, WebP, JPEG 2000, andMAGIC compressed building crack images.tasks. To explore the potential energy savings of MAGICcompression, we introduce a threshold, C/T Cutoff (in-spired by Sadler et al. [18]), for determining the sufficientcomputation and transmission energy consumption ratiobeyond which MAGIC will be beneficial for overall energyconsumption in a given resource-constrained computingsystem. The C/T Cutoff for MAGIC compression (for aspecific set of parameters) can be computed using theEquation 1 where E is the average MAGIC encodingtime, E is the average encoding time of the competitormethod (JPEG 2000, WebP), I is the average image sizeof MAGIC, I is the average image size of the competitormethod (JPEG 2000, WebP) and f is the CPU clock fre-quency. The setup time during encoding is due to loadingthe libraries and initializing the Python environment. Inan amortized analysis for a batch operation, the setup timecan be considered negligible. For MAGIC compression(for a specific set of parameters) to save energy whencompared to other compression standards, the operatingdevice must have a C/T value greater than MAGIC’s C/TCutoff. In Tables III, IV, we see the C/T Cutoffs fordifferent MAGIC compression settings for building crackdetection and forest fire detection datasets, respectively.We use f = 3.7 GHz for computing the C/T cutoff values.Any device with C/T value greater than the cutoff willbenefit (in terms of operational power consumption) fromusing MAGIC with respect to the method being comparedagainst (JPEG 2000, WebP). For example in Table IV,with MAGIC (pw=8, cb=2, d=1) the JPEG 2000 (JP2K)C/T cutoff is 0.497 which means the energy for 1 bytetransmission must be greater than the execution energyof 0.497 million clock cycles (CC) in a system for MAGICto have higher energy savings than JPEG 2000. C/T Cutof f = k E − E I − I k ∗ f (1)ABLE III: C/T Cutoff of different MAGIC settings for Building Crack detection dataset [12]. MAGIC Source JP2K WebPDataCompression Parameters pw=8cb=2d=1 pw=8cb=8d=2 pw=4cb=8d=10 pw=4cb=8d=2 - QF-1 QF-1
Avg BPP
Avg Size (Byte)
Encode + Setup Time (sec)
Encode Time (sec)
Decode Time (sec)
Detection Accuracy (%)
C/T Cutoff Source ( × CC / Byte)
C/T Cutoff JP2K-QF1 ( × CC / Byte)
C/T Cutoff WebP-QF1 ( × CC / Byte)
VI. Discussion
In this section, we investigate the properties of thecurrent embodiment of MAGIC. We note, the MAGICframework can be improved across many dimensions. Wemake preliminary studies of these possibilities and explorefuture extensions of MAGIC in this section.
A. Variation in Compression Ability
The image compression technique being used must gen-erate images of less size variability for maintaining consis-tent overall system performance. We compress the imagesusing JPEG 2000, WebP and MAGIC to generate boxplots showing the variation of BPP for the sampled distri-butions in Fig. 12 and Fig. 11. We observe that MAGICprovides low variation in BPP as compared to JPEG 2000and WebP images. Due to different parameters in theknowledge acquisition and encoding phase, specifically pw ,MAGIC has fine control over the compressed image size.Hence, MAGIC can provide steady performance even inbiased scenarios, where other techniques may not givegood compression. B. Improving Prediction Accuracy
Post-processing the MAGIC images or using a morepowerful pattern prediction model can improve the pre-diction accuracy by about 1-2%. Images compressed us-ing MAGIC consist of triangulation artifacts. One wayto remove the artifacts is to recursively subdivide thetriangles and compute the approximate color of eachsub triangle based on the colors of the triangle and itsneighbours. Using this technique, we were able to increasethe classification accuracy. However, there will be extracomputation due to post-processing in the decoder end. Ifthe decoder system resides in the cloud, then this step canbe considered to squeeze out extra performance.As explained earlier, we use entropy features for trainingand using our neural network models, but we have noticedthat VGG-16 fc2 features perform slightly better. Usinga VGG inspired large convolution neural network forcarrying out the domain-specific point prediction task also improves the performance slightly. However, we intention-ally use simple entropy features and a small neural networkto boost speed, and help reduce energy consumption andspace requirements. In an application where time, space,and energy are not constrained, we can opt for morecomplex feature extraction methods and larger neuralnetwork architectures for domain-specific point prediction.
C. Time Complexity and Performance Improvements
Time complexity analysis of encoder (Algo. 4) anddecoder (Algo. 5) algorithms simplify to O ( N + M log M + T R ). The major contributors in encoding are O ( N ) fortiling (line 2), O ( M log M ) for DT (line 14), and O ( T R )for triangle color calculation (line 17, the pixels associatedwith a triangle are determined by searching in a rectanglecircumscribing the triangle), where N is the number ofpixels in the image, M is the number of points sprayed, T is the number of triangles, and R is the dimension of thebounding rectangle of the biggest triangle. For decoding,the contributors are O ( N ) for predicted point absoluteposition computation (lines 6-8), O ( M log M ) for DT (line12), and O ( T R ) for triangle color assignment/drawing(line 17). In both algorithms, we expect the O ( M log M )DT step to consume the most time.Time complexity analysis of the knowledge acquisi-tion algorithm (Algo. 1) simplifies to O ( KN log N + KIM log M + KIT R + SV C + P Q ). The major contrib-utors are O ( KN log N ) for canny edge detection for all K images (line 7), O ( KIM log M + KIT R ) for the splitoperation across all K images (line 11), O ( SV C ) for colordictionary computation using k-means algorithm (line 29)and, O ( P Q ) for training the point prediction model (line30). N , M , T , R hold the same meaning as before andadditionally K is the number of images in the imgList , I is the iterLimit , S is the iteration limit for k-meansalgorithm, V is the number of points in the colorF req map, C is the number of centroids specified for k-means, P is the number of training samples in trainX and trainY , Q is the number of training epochs for the point predictionmodel.ABLE IV: C/T Cutoff of different MAGIC settings for Forest Fire detection dataset [11]. MAGIC Source JP2K WebPDatasetCompression Parameters pw=8cb=8d=12 pw=8cb=8d=4 pw=8cb=8d=3 pw=8cb=8d=1 - QF-1 QF-1
Avg BPP
Avg Size (Byte)
Encode + Setup Time (sec)
Encode Time (sec)
Decode Time (sec)
Detection Accuracy (%)
C/T Cutoff Source ( × CC / Byte)
C/T Cutoff JP2K-QF1( × CC / Byte)
C/T Cutoff WebP-QF1( × CC / Byte)
Fig. 11: Comparison of BPP variance for MAGIC,JPEG 2000, and WebP for building crack dataset.The runtime performance of both decoder and encodercan be improved through parallelization, hardware imple-mentation and code tweaking. Many of the block oper-ations such as block feature extraction and point spraypattern prediction can be easily parallelized. Hardware im-plementation can provide the most speed up and may helpreduce energy consumption as well. In future works, wewill focus on improving the time and energy performanceof MAGIC using different means.
D. Manual Region-of-Interest Guided Compression
As previously described, ROI bias can be automaticallycaptured by training the pattern prediction model withsupervised images. Beyond learning the region-of-interestbias, MAGIC offers manual ROI based compression. Withthis feature, users can specify additional regions of animage that can be retained at a higher quality. In Fig. 13,we see an example ROI-guided image compression wherethe fire region is designated manually as a region-of-interest. Note that the image region with the fire maintainsmuch higher information than the remaining regions. Fig. 12: Comparison of BPP variance for MAGIC,JPEG 2000 and WebP for fire dataset.
E. Extension of MAGIC to video compression
A video can be thought of as a collection of images.To this end, MAGIC can be extended to process videosas well. Depending on the sampling rate of the imagesensor, we noticed that, adjacent video frames have verylittle content difference. Taking this into consideration,we can save more in terms of space, computation andtransmission. The two main components of a MAGICencoded image are labelsArr and colorDict . We can rep-resent frame[ N ] by reusing the colorDict and labelArr offrame[ N − OP is the set of obsolete pointspray patterns which are no longer present in the newframe and N P is the set of new point spray patterns whichare introduced in the new frame. Similarly, as shown inEqn. 3, the colorDict [ N −
1] can be modified by removingthe obsolete triangle colors and introducing the colors ofthe new triangles in frame N . Future works will investigateand formalize the MAGIC flow applied to video. labelsArr [ N ] = labelsArr [ N − − OP + N P (2) colorDict [ N ] = colorDict [ N − − OC + N C (3)ig. 13: Region-of-Interest Guided MAGIC Compressionon a sample Fire Detection Dataset image [11].
VII. Conclusion
The increasing use of intelligent edge devices in diverseIoT applications, specifically for a multitude of computervision tasks, calls for innovation in an image compressiontechnique that meets the unique requirements of theseapplications. They primarily require high compressionratio while maintaining machine vision accuracy at an ac-ceptable level. The MAGIC framework we have presentedin this paper addresses this need. We have shown thateffective use of domain knowledge learned with ML canprovide high compression in resource-constrained edge ap-plications while keeping appropriate features for machinevision. The proposed framework is flexible for applicationin diverse domains and scalable to large image sizes.Our experiments for coarse-grained ML tasks using twodatasets highlight the effectiveness of MAGIC. We achieveup to 42.65x higher compression than the source (beyondJPEG 2000 and WebP) while achieving similar accuracy.We compute the transmission computation energy cutoffsto demonstrate at what level MAGIC compressed imagescan be more energy efficient than standard techniques.Further, we show low compression variance compared tostandard image compression techniques. With the use ofa common pattern dictionary, the proposed ML-basedcompression procedure can be easily extended for recog-nizing coarse-grain patterns in edge devices. Moreover, itcan potentially be extended to video compression wheredomain knowledge is expected to play an even strongerrole. Future work will investigate these extensions as wellas further improvement in the performance of MAGIC ina variety of edge applications.
References [1] A. Al-Kaff, D. Martin, F. Garcia, A. de la Escalera, andJ. M. Armingol, “Survey of computer vision algorithms andapplications for unmanned aerial vehicles,”
Expert Systems withApplications , 2018. [2] ´A. Rest´as, “Thematic division and tactical analysis of the uasapplication supporting forest fire management,”
Viegas, X.D.,Ed., Advances in Forest Fire Research , 2014.[3] C. Koch, K. Georgieva, V. Kasireddy, B. Akinci, and P. Fieguth,“A review on computer vision based defect detection and con-dition assessment of concrete and asphalt civil infrastructure,”
Advanced Engineering Informatics
Future Generation Computer Systems
IEEE Wireless Communications , vol. 26,no. 3, pp. 132–139, 2019.[8] W. B. Pennebaker and J. L. Mitchell,
JPEG: Still image datacompression standard . Springer Science & Business Media,1992.[9] M. Rabbani, “Jpeg2000: Image compression fundamentals, stan-dards and practice,”
Journal of Electronic Imaging , 2002.[10] Google, “Webp,” https://developers.google.com/speed/webp,accessed: 02-28-2020.[11] Dataturks, “Forest fire,” https://dataturks.com/projects/qdyzl1013/Forest%20Fire, accessed: 02-20-2020.[12] C¸ . F. ¨Ozgenel and A. G. Sorgu¸c, “Performance comparison ofpretrained convolutional neural networks on crack detection inbuildings,” in
ISARC. Proceedings of the International Sympo-sium on Automation and Robotics in Construction , 2018.[13] I. Aydin and N. A. Othman, “A new iot combined face detectionof people by using computer vision for security application,” in . IEEE, 2017, pp. 1–6.[14] C. Yuan, Z. Liu, and Y. Zhang, “Aerial images-based forest firedetection for firefighting using optical remote sensing techniquesand unmanned aerial vehicles,”
Journal of Intelligent & RoboticSystems , vol. 88, no. 2-4, pp. 635–654, 2017.[15] S. Li, H. Tang, S. He, Y. Shu, T. Mao, J. Li, and Z. Xu, “Unsu-pervised detection of earthquake-triggered roof-holes from uavimages using joint color and shape features,”
IEEE Geoscienceand Remote Sensing Letters , vol. 12, no. 9, pp. 1823–1827, 2015.[16] D. Duarte, F. Nex, N. Kerle, and G. Vosselman, “Towardsa more efficient detection of earthquake induced facade dam-ages using oblique uav imagery,”
The International Archivesof Photogrammetry, Remote Sensing and Spatial InformationSciences , vol. 42, p. 93, 2017.[17] B. Arshad, R. Ogie, J. Barthelemy, B. Pradhan, N. Ver-staevel, and P. Perez, “Computer vision and iot-based sensors inflood monitoring and mapping: A systematic review,”
Sensors ,vol. 19, no. 22, p. 5012, 2019.[18] C. M. Sadler and M. Martonosi, “Data compression algorithmsfor energy-constrained devices in delay tolerant networks,” in
Proceedings of the 4th international conference on Embeddednetworked sensor systems , 2006, pp. 265–278.[19] Z. Liu, T. Liu, W. Wen, L. Jiang, J. Xu, Y. Wang, and G. Quan,“Deepn-jpeg: a deep neural network favorable jpeg-based imagecompression framework,” in
Proceedings of the 55th AnnualDesign Automation Conference , 2018.[20] M. Weber, C. Renggli, H. Grabner, and C. Zhang, “Lossy imagecompression with recurrent neural networks: from human per-ceived visual quality to classification accuracy,” arXiv preprintarXiv:1910.03472 , 2019.[21] D. Marwood, P. Massimino, M. Covell, and S. Baluja, “Repre-senting images in 200 bytes: Compression via triangulation,” in
IEEE International Conference on Image Processing , 2018.[22] R. J. Clarke,
Digital compression of still images and video .Academic Press, Inc., 1995.23] G. G. Langdon, “An introduction to arithmetic coding,”
IBMJournal of Research and Development , vol. 28, no. 2, pp. 135–149, 1984.[24] D. A. Huffman, “A method for the construction of minimum-redundancy codes,”
Proceedings of the IRE , vol. 40, no. 9, pp.1098–1101, 1952.[25] A. J. Hussain, A. Al-Fayadh, and N. Radi, “Image compressiontechniques: A survey in lossless and lossy algorithms,”
Neuro-computing , vol. 300, pp. 44–69, 2018.[26] O. Rippel and L. Bourdev, “Real-time adaptive image compres-sion,” in
Proceedings of the 34th International Conference onMachine Learning-Volume 70 . JMLR. org, 2017.[27] G. Toderici, D. Vincent, N. Johnston, S. Jin Hwang, D. Minnen,J. Shor, and M. Covell, “Full resolution image compressionwith recurrent neural networks,” in
Proceedings of the IEEEConference on Computer Vision and Pattern Recognition , 2017.[28] M. Li, W. Zuo, S. Gu, D. Zhao, and D. Zhang, “Learning con-volutional networks for content-weighted image compression,”in
The IEEE Conference on Computer Vision and PatternRecognition , June 2018.[29] J. Ball´e, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston,“Variational image compression with a scale hyperprior,” arXivpreprint arXiv:1802.01436 , 2018.[30] Z. Liu, X. Xu, T. Liu, Q. Liu, Y. Wang, Y. Shi, W. Wen,M. Huang, H. Yuan, and J. Zhuang, “Machine vision guided3d medical image compression for efficient transmission andaccurate segmentation in the clouds,” in
Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition ,2019.[31] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Ma-chine learning in Python,”
Journal of Machine Learning Re-search , vol. 12, pp. 2825–2830, 2011.[32] The ImageMagick Development Team, “Imagemagick.” [Online].Available: https://imagemagick.org[33] F. Chollet et al. , “Keras,” https://keras.io, 2015.[34] S. van der Walt, J. L. Sch¨onberger, J. Nunez-Iglesias,F. Boulogne, J. D. Warner, N. Yager, E. Gouillart, T. Yu, andthe scikit-image contributors, “scikit-image: image processingin Python,”
PeerJ , vol. 2, p. e453, 6 2014. [Online]. Available:https://doi.org/10.7717/peerj.453[35] K. Simonyan and A. Zisserman, “Very deep convolutionalnetworks for large-scale image recognition,” arXiv preprintarXiv:1409.1556arXiv preprintarXiv:1409.1556