[PDF] Modern CNNs for IoT Based Farms

Abstract

Recent introduction of ICT in agriculture has brought a number of changes in the way farming is done. This means use of Internet of Things(IoT), Cloud Computing(CC), Big Data (BD) and automation to gain better control over the process of farming. As the use of these technologies in farms has grown exponentially with massive data production, there is need to develop and use state-of-the-art tools in order to gain more insight from the data within reasonable time. In this paper, we present an initial understanding of Convolutional Neural Network (CNN), the recent architectures of state-of-the-art CNN and their underlying complexities. Then we propose a classification taxonomy tailored for agricultural application of CNN. Finally, we present a comprehensive review of research dedicated to applications of state-of-the-art CNNs in agricultural production systems. Our contribution is in two-fold. First, for end users of agricultural deep learning tools, our benchmarking finding can serve as a guide to selecting appropriate architecture to use. Second, for agricultural software developers of deep learning tools, our in-depth analysis explains the state-of-the-art CNN complexities and points out possible future directions to further optimize the running performance.

Full PDF

MModern CNNs for IoT Based Farms

Patrick Kinyua Gikunda , and Nicolas Jouandeau Department of Information TechnologyDedan Kimathi University of Technology, Kenya [email protected] Le Laboratoire d’Informatique Avance de Saint-Denis (LIASD)University of Paris 8, France [email protected]

Abstract.

Recent introduction of ICT in agriculture has brought anumber of changes in the way farming is done. This means use of In-ternet of Things(IoT), Cloud Computing(CC), Big Data (BD) and au-tomation to gain better control over the process of farming. As the use ofthese technologies in farms has grown exponentially with massive dataproduction, there is need to develop and use state-of-the-art tools inorder to gain more insight from the data within reasonable time. Inthis paper, we present an initial understanding of Convolutional NeuralNetwork (CNN), the recent architectures of state-of-the-art CNN andtheir underlying complexities. Then we propose a classiﬁcation taxon-omy tailored for agricultural application of CNN. Finally, we present acomprehensive review of research dedicated to applications of state-of-the-art CNNs in agricultural production systems. Our contribution isin two-fold. First, for end users of agricultural deep learning tools, ourbenchmarking ﬁnding can serve as a guide to selecting appropriate ar-chitecture to use. Second, for agricultural software developers of deeplearning tools, our in-depth analysis explains the state-of-the-art CNNcomplexities and points out possible future directions to further optimizethe running performance.

Keywords:

Convolutional Neural Network, Farming, Internet of Things

The global population is set to touch 9.6 billion mark by year 2050 [4]. Thecontinous population growth means increase in demand for food to feed thepopulation [5]. Agriculture is the practice of cultivation of land and breeding ofanimals & plants to provide food and other products in order to sustain andenhance life [6]. Due to the extreme weather conditions, rising climate changeand environmental impact resulting from intensive farming practices [8], farm-ers are now forced to change their farming practices. To cope with the newfarming challenges, farmers are forced to practice smart farming [9], which of-fers solutions of farming management and environment management for better a r X i v : . [ c s . C Y ] J u l Gikunda and Jouandeau. production. Smart farming focuses on the use of information and communica-tion technology(ICT) in the cyber-physical farm management cycle for eﬃcientfarming [10].Curent ICT technologies relevant for use in smart farming include IoT [11],remote sensing [12], CC [13] and BD [14]. Remote sensing is the science of gath-ering information about objects or areas from a distance without having physi-cal contact with objects or areas being investigated. Data collected through re-mote sensing and distributed devices is managed by cloud computing technology,which oﬀers the tools for pre-processing and modelling of huge amounts of datacoming from various heterogeneous sources [15]. These four technologies couldcreate applications to provide solutions to todays’s agricultural challenges. Thesolutions include real time analytics required to carry out agile actions especiallyin case of suddenly changed operational or environmental condition (e.g. weatheror disease alert). The continous monitoring, measuring, storing and analysing ofvarious physical aspects has led to a phenomena of big data [16]. To get insightfor practical action from this large type of data requires tools and methods thatcan process multidimensional data from diﬀerent sources while leveraging on theprocessing time.One of the sucessful data processing tool applied in this kind of large datasetis the biologically inspired Convolutional Neural Networks (CNNs), which haveachieved state-of-the-art results [17] in computer vision [18] and data mining [19].As deep learning has been successfully applied in various domains, it has recentlyentered in the domain of agriculture [10]. CNN is a subset method of Deep Learn-ing(DL) [20], deﬁned as deep, feed-forward Artiﬁcial Neural Network(ANN)[21].The CNN covolutions allow data representations in a hierarchical way [22]. Thecommon characteristics of CNN models is that they follow the same generaldesign principles of successive applying convolutional layers to the input, peri-odically downsampling the spatial dimensions while increasing the number offeature maps. These architectures serve as rich feature extractors which can beused for image classiﬁcation, object detection, image segmentation and manymore other advanced tasks. This study investigates the agricultural problemsthat employ the major state-of-the-art CNN archtectures that have participatedin the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [23] withhighest accuracy in a multi-class classiﬁcation problem. ImageNet [26] classiﬁ-cation challange has played a critical role in advancing the CNN state-of-the-art[17]. The motivation for carrying out the study include: a) CNNs has betterprecision compared to other popular image-processing techniques in the largemajority of problems [27]. b) CNN has entered in the agricultural domain withpromising potential [28]. c) all the CNN models that have achieved the top-5error are successful when applied in other computer vision domain with remark-able results [27]. This review aims to provide insight on use of state-of-the-artCNN models in relation to smart farming and to identify smart farming researchand development challenges related to computer vision. Therefore the analysiswill primarily focus on the success on use of state-of-the-art CNN models insmart farms with a intention to provide this relevant information to the future odern CNNs for IoT Based Farms 3 researchers. From that perspective the research questions to be addressed in thestudy are:(a) What is the role of CNN in smart farming?(b) What type of the state-of-the-art CNN architecture should be used?(c) What are the beneﬁts of using state-of-the-art CNN in IoT based agriculturalsystems?The rest of the paper is organized as follows: in Section 2, we present anoverview of existing state-of-the-art CNNs architectures including their recentupdates. Then we propose a taxonomy to provide a systematic classiﬁcation ofagricultural issues for CNN application. In Section 3, we present the existingstate-of-the-art CNNs and their scope of application in agri-systems. We con-clude the paper in Section 4.

In order to address the research questions a bibliographic analysis in the domainunder study was done between 2012 and 2018, it involved two steps: a) collec-tion of related works and, b) detailed review and analysis of the works. Thechoice of the period is from the fact that CNN is rather a recent phenomenon.In the ﬁrst step, a keyword-based search using all combinations of two groupsof keywords of which the ﬁrst group addresses CNN models (LeNet, AlexNet,NIN, ENet, GoogLeNet, ResNet, DenseNet, VGG, Inception) and the secondgroup refers to farming (i.e. agriculture, farming, smart farming). The analysiswas done while considering the following research questions: a) smart farm prob-lem they addressed, b)dataset used, c) accuracy based on author’s performancemetric, d) state-of-the-art CNN model used. Its important to note that use ofstate-of-the-art deep learning has great potential, and there have been recentsmall comparative studies to analyse and compare the most eﬃcient archtec-ture to use in agricultural systems. They include: Comparison between LeNet,AlexNet, and VGGNet on automatic identiﬁcation of center pivot irrigation [29]and comparison between VGG-16, Inception-V4, Resnet and DenseNet for plantdisease identiﬁcation [30].

CNNs typically perform best when they are large, meaning when they have moredeeper and highly interconnected layers [31]. The primary drawback of thesearchtectures is the computational cost, thus large CNNs are typically imprac-tically slow especially for embedded IoT devices [32]. There are recent researcheﬀorts on how to reduce the computation cost of deep learning networks foreveryday application while maintaining the prediction accuracy [33]. In order tounderstand the application of the state-of-the-art CNN archtectures in agricul-tural systems, we reviewed the accuracy and computational requirements from

Gikunda and Jouandeau. relevant literature including recent updates of networks as shown in Fig 1. Theclassical state-of-the-art deep network architectures include; LeNet [34], AlexNet[35], NIN [36], ENet[37], ZFNet [38], GoogleLeNet [39] and VGG 16 [40]. Modernarchitectures include; Inception [41], ResNet [42], and DenseNet [43].

Fig. 1.

Top-1 Accuracy vs the computational cost. The size of the circles is propor-taional to number of parameters. Legend;the grey cirles at the botton right representsnumber of parameters in millions. [32]

LeNet-5 is a 7-layer pioneer convolutional network by LeCun et al. [34] toclassify digits, used to recognise hand-written numbers digitized in 32x32 pixelgreyscale input images. High resolution images require more convolutional layers,so the model is constrained by the availability of the computing resources.AlexNet is a 5-layer network similar to LeNet-5 but with more ﬁlters [35].It outperformed Lenet-5 and won the LSVRC challenge by reducing the top-5 error from 26.2% to 15.3%. Use Rectiﬁed Linear Unit (Relu) [1] instead ofHyperbolic Tangent (Tanh) [2] to add non-linearity and accelerates the speedby 6 times. Droupout was employed to reduce over-ﬁtting in the fully-connectedlayers. Overlap pooling was used to reduce the size of the network while reducingtop-1 error by 0.4% and top-5 error by 0.3%.Lin et al. [36] created a Network in Network(NIN) which inspired the in-ception architecture of googlenet. In their paper, they replaced the linear ﬁlterswith nonlinear multi linear perceptrons that had better feature extraction andaccuracy. They also replaced the fully connected layers with activation maps andglobal average pooling. This move helped reduce the parameters and networkcomplexity.In their article Paszke et al. [37] introduced an Eﬃcient Neural Network(ENet) for running on low-power mobile devices while achieving state-of-the-art results. ENet architecture is largely based on ResNets. The structure hasone master and several branches that separate from the master but concatenateback.In their work Zeiler and Fergus [38] created ZFNet which won a ILSVRC2013 [25] image classiﬁcation. It was able to achieve a top-5 rate of 14.8% an odern CNNs for IoT Based Farms 5 improvement of the AlexNet. They were able to do this by tweaking the hyper-parameters of AlexNet while maintaining the same structure with additionaldeep learning elements. There is no record observed of use of ZFNet in agricul-tural systems despite the accuracy improvement. Each branch consists of threeconvolutional layers. The ﬁrst 1 x 1 projection reduces the dimensionality whilethe latter 1 x 1 projection expands the dimensionality. In between these convo-lutions, a regular (no annotation / asymmetric X), dilated (dilated X) or fullconvolution (no annotation) takes place. Batch normalization [60] and PReLU[61] are placed between all convolutions. As regularizer in the bottleneck, SpatialDropout is used. MaxPooling on the master is added only when the bottleneckis downsampling which is true.GoogleNet, a 2014 ILSVRC image classiﬁcation winner, was inspired byLeNet but implemented a novel inception module. The Inception cell performsseries of convolutions at diﬀerent scales and subsequently aggregate the results.This module is based on several very small convolutions in order to drasticallyreduce the number of parameters. There has been tremedious eﬀorts done to im-prove the performance of the architecture: a) Inception v1 [39] which performsconvolution on an input, with 3 diﬀerent sizes of ﬁlters (1x1, 3x3, 5x5). Addi-tionally, max pooling is also performed. The outputs are concatenated and sentto the next inception module. b) Inception v2 and Inception v3 [41] factorize 5x5convolution to two 3x3 convolution operations to improve computational speed.Although this may seem counterintuitive, a 5x5 convolution is 2.78 times moreexpensive than a 3x3 convolution. So stacking two 3x3 convolutions infact leadsto a boost in performance. c) In Inception v4 and Inception-ResNet [44] theinitial set of operations were modiﬁed before introducing the Inception blocks.Simonyan and Zisserman created VGGNet while doing investigation on theeﬀect of convolutional network depth on its accuracy in the large-scale imagerecognition setting. The VGGNet took the second place after GoogLeNet in thecompetation. The model is made up of 16 convolutional layers which is simi-lar to [35] but with many ﬁlters. There have been a number of update to theVGGNet archtecture starting with pioneer VGG-11(11 layers) which obtained10.4% error rate[40]. VGG-13 (13 layers) obtains 9.9% error rate, which meansthe additional convolutional layers helps the classiﬁcation accuracy. VGG-16(16layers) obtained a 9.4% error rate, which means the additional 3x1x1 conv layershelp the classiﬁcation accuracy. 1x1 convolution helps increase non-linearlity ofthe decision function, without changing the dimensions of input and output, 1x1convolution is able to do the projection mapping in the same high dimension-ality. This approach is used in NIN [36] GoogLeNet [39] and ResNet [42]. Afterupdating to VGG-16 it obtained 8.8% error rate which means the deep learningnetwork was still improving by adding number of layers. VGG-19(19 layers) wasdeveloped to further improve the performance but it obtained 9.0% showing noimprovement even after adding more layers.When deeper networks starts converging, a degradation problem is exposed:with the network depth increasing, accuracy gets saturated and then degradesrapidly. Deep Residual Neural Network(ResNet) created by Kaiming He al. [42]

Gikunda and Jouandeau. introduced a norvel architecture with insert shortcut connections which turn thenetwork into its counterpart residual version. This was a breakthrough whichenabled the development of much deeper networks. The residual function is areﬁnement step in which the network learn how to adjust the input feature mapfor higher quality features. Following this intuition, the network residual blockwas reﬁned and proposed a pre-activation variant of residual block [45], in whichthe gradients can ﬂow through the shortcut connections to any other earlier layerunimpeded. Each ResNet block is either 2 layer deep (used in small networks likeResNet 18, 34) or 3 layer deep(ResNet 50, 101, 152). This technique is able totrain a network with 152 layers while still having lower complexity than VGGNet.It achieves a top-5 error rate of 3.57% which beats human-level performance onthis dataset. Although the original ResNet paper focused on creating a networkarchitecture to enable deeper structures by alleviating the degradation problem,other researchers have since pointed out that increasing the network’s width(channel depth) can be a more eﬃcient way of expanding the overall capacity ofthe network.In DenseNet which is a logical extension of ResNet, there is improved eﬃ-ciency by concatenating each layer feature map to every successive layer within adense block [43]. This allows later layers within the network to directly leveragethe features from earlier layers, encouraging feature reuse within the network.For each layer, the feature-maps of all preceding layers are used as inputs, andits own feature-maps are used as inputs into all subsequent layers, this helpsalleviate the vanishing-gradient problem, feature reuse and reduce number ofparameters.

Many agricultural CNN solutions have been developed depending on speciﬁcagriculture issues. For the study purpose, a classiﬁcation taxonomy tailored toCNN application in the smart farming was developed as shown Fig 2. In thissection, we categorize use of state-of-the-art CNN based on the agricultural issuethey solve:(a) Plant management includes solutions geared towards crop welfare and pro-duction. This includes classiﬁcation(species), detection(disease and pest) andprediction(yield production).(b) Livestock management address solutions for livestock production(predictionand quality management) and animal welfare(animal identiﬁcation, speciesdetection and disease and pest control).(c) Environment management addresses solutions for land and water manage-ment.

The table 1, shows use of state-of-the-art CNN in agriculture and in particularthe areas of plant and leaf disease detection, animal face identiﬁcation, plant odern CNNs for IoT Based Farms 7

Fig. 2.

Proposed classiﬁcation taxonomy for CNN use in smart farm recognition, land cover classiﬁcation, fruit counting and identiﬁcation of weeds.It consist of 5 columns to show: the problem description, size of data used, accu-racy according to the metrics used, the state-of-the-art CNN used and referenceliterature.In their paper, Amara et al. [54] use the LeNet architecture to classify thebanana leaves diseases. The model was able to eﬀectively classify the leaves afterseveral experiments. The approach was able to classify leaves images with diﬀer-ent illumination, complex background, resolution, size, pose, and orientation. Wealso reviwed use of CaﬀeNet archtecture [59] in agricultural application, whichis a 1-GPU version of AlexNet. The success of this model at LSVRC 2012 [24]encourage many computer vision community to explore more on the applicationof deep learning in computer vision. Mohanty et al. [7] combined both AlexNetand GoogLeNet to identify 14 crop species and 26 diseases(or absence thereof)from a dataset of 54,305 images. The approach records an impressive accuracy of99.35% demonstrating the feasibility of the state-of-the-art CNN architectures.Other areas AlexNet has been used with high accuracy record include; identifyplants using diﬀerent plant views [49], identify plant species [48], identify obsta-cles in the farm [50] and leaf disease detection [3]. Because of its achievementto improve utilization of the computing resources GoogLeNet has been used infruit count [53] and plant species classiﬁcation [7]. VGGNet has been used inclassifying weed [56], detect obstacles in the farm [51], fruit detection [47] andanimal face recognition [55]. Like ResNet, DenseNet is a recent model that ex-plains why it has not been employeed signiﬁcantly in farming, nevertheless ithas been used in thistle identiﬁcation in winter wheat and spring barley [52].Since ResNet is a such a recent model, it have only been used by one author in

Gikunda and Jouandeau.

Table 1.

Use of state-of-the-art CNN in Smart Farm

No. Smartfarm Problem description Data used Accuracy CNNFrameworkused Article1 Fruit detection Images of threefruit varieties:apples (726), al-monds (385) andmangoes (1154) F1 (precicion score)of 0.904 (apples)0.908 (mango) 0.775(almonds) VGGNet [46]2 Detection of sweet pepper androck melon fruits 122 images 0.838 (F1) VGGNet [47]3 Recognize diﬀerent plant species Data set of 44classes 99.60% (CA - correctprediction) AlexNet [48]4 Recognize diﬀerent plant 91 759 images 48.60% (LC-correctspecies classiﬁcation) AlexNet [49]5 Identify obstacles in row cropsand grass mowing 437 images 99.9% in row crops and90.8% in grass mowing(CA) AlexNet [50]6 Identify crop species and dis-eases 54 306 images 0.9935 (F1) AlexNet +GoogLeNet [7]7 Detect obstacles that are dis-tant, heavily occluded and un-known 48 images 0.72 (F1) AlexNet +VGG [51]8 Leaf disease detection 4483 images 96.30% (CA) CaﬀeNet [3]9 Identify thistle in winter wheatand spring barley images 4500 images 97.00% (CA) DenseNet [52]10 Predict number of tomatoes inimages 24 000 images 91% (RFC-Ratio oftotal fruits counted)on real images, 93%(RFC) on syntheticimages GoogLeNet+ ResNet [53]11 Classify banana leaf diseases 3700 images 96% (CA), 0.968 (F1) LeNet [54]12 Indentify pig face 1553 images 96.7%( CA) VGGNet [55]13 Classify weed from crop speciesbased on 22 diﬀerent species intotal 10413 images 86.20% (CA) VGGNet [56]14 Detecting and categorizing thecriticalness of Fusarium wilt ofradish based on thresholding arange of color features 1500 images GoogLeNet [57]15 Fruit counting 24 000 images 91% accuracy Inception-ResNet [53]16 Automatic Plant disease diagno-sis for early disease symptoms 8178 images overall improvement ofthe balanced accuracyfrom 0.78 to 0.87 fromprevious 2017 study DeepResNet [58] fruit counting [53]. Many of the CNN developed for agricultural use depend onthe problem or challenge they solve. odern CNNs for IoT Based Farms 9

Despite remarkable achievement in use state-of-the-art CNN in agriculture ingeneral, there exist grey areas in relation to smart farm that future researchersmay look at. These areas may include; real-time image classiﬁcation, interactiveimage classiﬁcation and interactive object detection. State-of-the-art CNN is rel-atively a new technology that explain why the ﬁnding of the study about theiruse in smart farm is relatively small. However, its is important to note that mod-els built from state-of-the-art architectures have a impressive record of betterprecision performance. In this paper, we aimed at establishing the potential ofstate-of-the-art CNN in IoT based smart farms. In particular we ﬁrst discussedthe architectures of state-of-the-art CNNs and their respective prediction accu-racy at the ILSVRC challenge. Then a survey on application of the identiﬁedCNNs in Agriculture was performed; to examine the particular application ina smart farm, listed technical details of the architecture employed and overallprediction accuracy achieved according to the author precision metrics. Fromthe study its evident of continuous accuracy improvement of the state-of-the-artCNN architectures as computer vision community put eﬀort to perfect the meth-ods. The ﬁndings indicate that state-of-the-art CNN has achieved better preci-sion in all the cases applied in the agricultural domain, scoring higher accuracyin majority of the problem as compared to other image-processing techniques.Considering that the state-of-the-art CNN has achieved state-of-the-art resultsin prediction in general and high precision in the few farming cases observed,there is great potential that can be achieved in using the methods in smart farm-ing. It has been observed that many authors apply more than one architecturein order to optimize the performance of the network without compromising theexpected accuracy. This approach is very eﬃcient in the observed cases, and werecommend similar hybrid approach when building robust IoT based networkswhich are computationally fair to the mobile devices. This study aims to moti-vate researchers to experiment and apply the state-of-the-art methods in smartfarms problems related to computer vision and data analysis in general.