MModern CNNs for IoT Based Farms
Patrick Kinyua Gikunda , and Nicolas Jouandeau Department of Information TechnologyDedan Kimathi University of Technology, Kenya [email protected] Le Laboratoire d’Informatique Avance de Saint-Denis (LIASD)University of Paris 8, France [email protected]
Abstract.
Recent introduction of ICT in agriculture has brought anumber of changes in the way farming is done. This means use of In-ternet of Things(IoT), Cloud Computing(CC), Big Data (BD) and au-tomation to gain better control over the process of farming. As the use ofthese technologies in farms has grown exponentially with massive dataproduction, there is need to develop and use state-of-the-art tools inorder to gain more insight from the data within reasonable time. Inthis paper, we present an initial understanding of Convolutional NeuralNetwork (CNN), the recent architectures of state-of-the-art CNN andtheir underlying complexities. Then we propose a classification taxon-omy tailored for agricultural application of CNN. Finally, we present acomprehensive review of research dedicated to applications of state-of-the-art CNNs in agricultural production systems. Our contribution isin two-fold. First, for end users of agricultural deep learning tools, ourbenchmarking finding can serve as a guide to selecting appropriate ar-chitecture to use. Second, for agricultural software developers of deeplearning tools, our in-depth analysis explains the state-of-the-art CNNcomplexities and points out possible future directions to further optimizethe running performance.
Keywords:
Convolutional Neural Network, Farming, Internet of Things
The global population is set to touch 9.6 billion mark by year 2050 [4]. Thecontinous population growth means increase in demand for food to feed thepopulation [5]. Agriculture is the practice of cultivation of land and breeding ofanimals & plants to provide food and other products in order to sustain andenhance life [6]. Due to the extreme weather conditions, rising climate changeand environmental impact resulting from intensive farming practices [8], farm-ers are now forced to change their farming practices. To cope with the newfarming challenges, farmers are forced to practice smart farming [9], which of-fers solutions of farming management and environment management for better a r X i v : . [ c s . C Y ] J u l Gikunda and Jouandeau. production. Smart farming focuses on the use of information and communica-tion technology(ICT) in the cyber-physical farm management cycle for efficientfarming [10].Curent ICT technologies relevant for use in smart farming include IoT [11],remote sensing [12], CC [13] and BD [14]. Remote sensing is the science of gath-ering information about objects or areas from a distance without having physi-cal contact with objects or areas being investigated. Data collected through re-mote sensing and distributed devices is managed by cloud computing technology,which offers the tools for pre-processing and modelling of huge amounts of datacoming from various heterogeneous sources [15]. These four technologies couldcreate applications to provide solutions to todays’s agricultural challenges. Thesolutions include real time analytics required to carry out agile actions especiallyin case of suddenly changed operational or environmental condition (e.g. weatheror disease alert). The continous monitoring, measuring, storing and analysing ofvarious physical aspects has led to a phenomena of big data [16]. To get insightfor practical action from this large type of data requires tools and methods thatcan process multidimensional data from different sources while leveraging on theprocessing time.One of the sucessful data processing tool applied in this kind of large datasetis the biologically inspired Convolutional Neural Networks (CNNs), which haveachieved state-of-the-art results [17] in computer vision [18] and data mining [19].As deep learning has been successfully applied in various domains, it has recentlyentered in the domain of agriculture [10]. CNN is a subset method of Deep Learn-ing(DL) [20], defined as deep, feed-forward Artificial Neural Network(ANN)[21].The CNN covolutions allow data representations in a hierarchical way [22]. Thecommon characteristics of CNN models is that they follow the same generaldesign principles of successive applying convolutional layers to the input, peri-odically downsampling the spatial dimensions while increasing the number offeature maps. These architectures serve as rich feature extractors which can beused for image classification, object detection, image segmentation and manymore other advanced tasks. This study investigates the agricultural problemsthat employ the major state-of-the-art CNN archtectures that have participatedin the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [23] withhighest accuracy in a multi-class classification problem. ImageNet [26] classifi-cation challange has played a critical role in advancing the CNN state-of-the-art[17]. The motivation for carrying out the study include: a) CNNs has betterprecision compared to other popular image-processing techniques in the largemajority of problems [27]. b) CNN has entered in the agricultural domain withpromising potential [28]. c) all the CNN models that have achieved the top-5error are successful when applied in other computer vision domain with remark-able results [27]. This review aims to provide insight on use of state-of-the-artCNN models in relation to smart farming and to identify smart farming researchand development challenges related to computer vision. Therefore the analysiswill primarily focus on the success on use of state-of-the-art CNN models insmart farms with a intention to provide this relevant information to the future odern CNNs for IoT Based Farms 3 researchers. From that perspective the research questions to be addressed in thestudy are:(a) What is the role of CNN in smart farming?(b) What type of the state-of-the-art CNN architecture should be used?(c) What are the benefits of using state-of-the-art CNN in IoT based agriculturalsystems?The rest of the paper is organized as follows: in Section 2, we present anoverview of existing state-of-the-art CNNs architectures including their recentupdates. Then we propose a taxonomy to provide a systematic classification ofagricultural issues for CNN application. In Section 3, we present the existingstate-of-the-art CNNs and their scope of application in agri-systems. We con-clude the paper in Section 4.
In order to address the research questions a bibliographic analysis in the domainunder study was done between 2012 and 2018, it involved two steps: a) collec-tion of related works and, b) detailed review and analysis of the works. Thechoice of the period is from the fact that CNN is rather a recent phenomenon.In the first step, a keyword-based search using all combinations of two groupsof keywords of which the first group addresses CNN models (LeNet, AlexNet,NIN, ENet, GoogLeNet, ResNet, DenseNet, VGG, Inception) and the secondgroup refers to farming (i.e. agriculture, farming, smart farming). The analysiswas done while considering the following research questions: a) smart farm prob-lem they addressed, b)dataset used, c) accuracy based on author’s performancemetric, d) state-of-the-art CNN model used. Its important to note that use ofstate-of-the-art deep learning has great potential, and there have been recentsmall comparative studies to analyse and compare the most efficient archtec-ture to use in agricultural systems. They include: Comparison between LeNet,AlexNet, and VGGNet on automatic identification of center pivot irrigation [29]and comparison between VGG-16, Inception-V4, Resnet and DenseNet for plantdisease identification [30].
CNNs typically perform best when they are large, meaning when they have moredeeper and highly interconnected layers [31]. The primary drawback of thesearchtectures is the computational cost, thus large CNNs are typically imprac-tically slow especially for embedded IoT devices [32]. There are recent researchefforts on how to reduce the computation cost of deep learning networks foreveryday application while maintaining the prediction accuracy [33]. In order tounderstand the application of the state-of-the-art CNN archtectures in agricul-tural systems, we reviewed the accuracy and computational requirements from
Gikunda and Jouandeau. relevant literature including recent updates of networks as shown in Fig 1. Theclassical state-of-the-art deep network architectures include; LeNet [34], AlexNet[35], NIN [36], ENet[37], ZFNet [38], GoogleLeNet [39] and VGG 16 [40]. Modernarchitectures include; Inception [41], ResNet [42], and DenseNet [43].
Fig. 1.
Top-1 Accuracy vs the computational cost. The size of the circles is propor-taional to number of parameters. Legend;the grey cirles at the botton right representsnumber of parameters in millions. [32]
LeNet-5 is a 7-layer pioneer convolutional network by LeCun et al. [34] toclassify digits, used to recognise hand-written numbers digitized in 32x32 pixelgreyscale input images. High resolution images require more convolutional layers,so the model is constrained by the availability of the computing resources.AlexNet is a 5-layer network similar to LeNet-5 but with more filters [35].It outperformed Lenet-5 and won the LSVRC challenge by reducing the top-5 error from 26.2% to 15.3%. Use Rectified Linear Unit (Relu) [1] instead ofHyperbolic Tangent (Tanh) [2] to add non-linearity and accelerates the speedby 6 times. Droupout was employed to reduce over-fitting in the fully-connectedlayers. Overlap pooling was used to reduce the size of the network while reducingtop-1 error by 0.4% and top-5 error by 0.3%.Lin et al. [36] created a Network in Network(NIN) which inspired the in-ception architecture of googlenet. In their paper, they replaced the linear filterswith nonlinear multi linear perceptrons that had better feature extraction andaccuracy. They also replaced the fully connected layers with activation maps andglobal average pooling. This move helped reduce the parameters and networkcomplexity.In their article Paszke et al. [37] introduced an Efficient Neural Network(ENet) for running on low-power mobile devices while achieving state-of-the-art results. ENet architecture is largely based on ResNets. The structure hasone master and several branches that separate from the master but concatenateback.In their work Zeiler and Fergus [38] created ZFNet which won a ILSVRC2013 [25] image classification. It was able to achieve a top-5 rate of 14.8% an odern CNNs for IoT Based Farms 5 improvement of the AlexNet. They were able to do this by tweaking the hyper-parameters of AlexNet while maintaining the same structure with additionaldeep learning elements. There is no record observed of use of ZFNet in agricul-tural systems despite the accuracy improvement. Each branch consists of threeconvolutional layers. The first 1 x 1 projection reduces the dimensionality whilethe latter 1 x 1 projection expands the dimensionality. In between these convo-lutions, a regular (no annotation / asymmetric X), dilated (dilated X) or fullconvolution (no annotation) takes place. Batch normalization [60] and PReLU[61] are placed between all convolutions. As regularizer in the bottleneck, SpatialDropout is used. MaxPooling on the master is added only when the bottleneckis downsampling which is true.GoogleNet, a 2014 ILSVRC image classification winner, was inspired byLeNet but implemented a novel inception module. The Inception cell performsseries of convolutions at different scales and subsequently aggregate the results.This module is based on several very small convolutions in order to drasticallyreduce the number of parameters. There has been tremedious efforts done to im-prove the performance of the architecture: a) Inception v1 [39] which performsconvolution on an input, with 3 different sizes of filters (1x1, 3x3, 5x5). Addi-tionally, max pooling is also performed. The outputs are concatenated and sentto the next inception module. b) Inception v2 and Inception v3 [41] factorize 5x5convolution to two 3x3 convolution operations to improve computational speed.Although this may seem counterintuitive, a 5x5 convolution is 2.78 times moreexpensive than a 3x3 convolution. So stacking two 3x3 convolutions infact leadsto a boost in performance. c) In Inception v4 and Inception-ResNet [44] theinitial set of operations were modified before introducing the Inception blocks.Simonyan and Zisserman created VGGNet while doing investigation on theeffect of convolutional network depth on its accuracy in the large-scale imagerecognition setting. The VGGNet took the second place after GoogLeNet in thecompetation. The model is made up of 16 convolutional layers which is simi-lar to [35] but with many filters. There have been a number of update to theVGGNet archtecture starting with pioneer VGG-11(11 layers) which obtained10.4% error rate[40]. VGG-13 (13 layers) obtains 9.9% error rate, which meansthe additional convolutional layers helps the classification accuracy. VGG-16(16layers) obtained a 9.4% error rate, which means the additional 3x1x1 conv layershelp the classification accuracy. 1x1 convolution helps increase non-linearlity ofthe decision function, without changing the dimensions of input and output, 1x1convolution is able to do the projection mapping in the same high dimension-ality. This approach is used in NIN [36] GoogLeNet [39] and ResNet [42]. Afterupdating to VGG-16 it obtained 8.8% error rate which means the deep learningnetwork was still improving by adding number of layers. VGG-19(19 layers) wasdeveloped to further improve the performance but it obtained 9.0% showing noimprovement even after adding more layers.When deeper networks starts converging, a degradation problem is exposed:with the network depth increasing, accuracy gets saturated and then degradesrapidly. Deep Residual Neural Network(ResNet) created by Kaiming He al. [42]
Gikunda and Jouandeau. introduced a norvel architecture with insert shortcut connections which turn thenetwork into its counterpart residual version. This was a breakthrough whichenabled the development of much deeper networks. The residual function is arefinement step in which the network learn how to adjust the input feature mapfor higher quality features. Following this intuition, the network residual blockwas refined and proposed a pre-activation variant of residual block [45], in whichthe gradients can flow through the shortcut connections to any other earlier layerunimpeded. Each ResNet block is either 2 layer deep (used in small networks likeResNet 18, 34) or 3 layer deep(ResNet 50, 101, 152). This technique is able totrain a network with 152 layers while still having lower complexity than VGGNet.It achieves a top-5 error rate of 3.57% which beats human-level performance onthis dataset. Although the original ResNet paper focused on creating a networkarchitecture to enable deeper structures by alleviating the degradation problem,other researchers have since pointed out that increasing the network’s width(channel depth) can be a more efficient way of expanding the overall capacity ofthe network.In DenseNet which is a logical extension of ResNet, there is improved effi-ciency by concatenating each layer feature map to every successive layer within adense block [43]. This allows later layers within the network to directly leveragethe features from earlier layers, encouraging feature reuse within the network.For each layer, the feature-maps of all preceding layers are used as inputs, andits own feature-maps are used as inputs into all subsequent layers, this helpsalleviate the vanishing-gradient problem, feature reuse and reduce number ofparameters.
Many agricultural CNN solutions have been developed depending on specificagriculture issues. For the study purpose, a classification taxonomy tailored toCNN application in the smart farming was developed as shown Fig 2. In thissection, we categorize use of state-of-the-art CNN based on the agricultural issuethey solve:(a) Plant management includes solutions geared towards crop welfare and pro-duction. This includes classification(species), detection(disease and pest) andprediction(yield production).(b) Livestock management address solutions for livestock production(predictionand quality management) and animal welfare(animal identification, speciesdetection and disease and pest control).(c) Environment management addresses solutions for land and water manage-ment.
The table 1, shows use of state-of-the-art CNN in agriculture and in particularthe areas of plant and leaf disease detection, animal face identification, plant odern CNNs for IoT Based Farms 7
Fig. 2.
Proposed classification taxonomy for CNN use in smart farm recognition, land cover classification, fruit counting and identification of weeds.It consist of 5 columns to show: the problem description, size of data used, accu-racy according to the metrics used, the state-of-the-art CNN used and referenceliterature.In their paper, Amara et al. [54] use the LeNet architecture to classify thebanana leaves diseases. The model was able to effectively classify the leaves afterseveral experiments. The approach was able to classify leaves images with differ-ent illumination, complex background, resolution, size, pose, and orientation. Wealso reviwed use of CaffeNet archtecture [59] in agricultural application, whichis a 1-GPU version of AlexNet. The success of this model at LSVRC 2012 [24]encourage many computer vision community to explore more on the applicationof deep learning in computer vision. Mohanty et al. [7] combined both AlexNetand GoogLeNet to identify 14 crop species and 26 diseases(or absence thereof)from a dataset of 54,305 images. The approach records an impressive accuracy of99.35% demonstrating the feasibility of the state-of-the-art CNN architectures.Other areas AlexNet has been used with high accuracy record include; identifyplants using different plant views [49], identify plant species [48], identify obsta-cles in the farm [50] and leaf disease detection [3]. Because of its achievementto improve utilization of the computing resources GoogLeNet has been used infruit count [53] and plant species classification [7]. VGGNet has been used inclassifying weed [56], detect obstacles in the farm [51], fruit detection [47] andanimal face recognition [55]. Like ResNet, DenseNet is a recent model that ex-plains why it has not been employeed significantly in farming, nevertheless ithas been used in thistle identification in winter wheat and spring barley [52].Since ResNet is a such a recent model, it have only been used by one author in
Gikunda and Jouandeau.
Table 1.
Use of state-of-the-art CNN in Smart Farm
No. Smartfarm Problem description Data used Accuracy CNNFrameworkused Article1 Fruit detection Images of threefruit varieties:apples (726), al-monds (385) andmangoes (1154) F1 (precicion score)of 0.904 (apples)0.908 (mango) 0.775(almonds) VGGNet [46]2 Detection of sweet pepper androck melon fruits 122 images 0.838 (F1) VGGNet [47]3 Recognize different plant species Data set of 44classes 99.60% (CA - correctprediction) AlexNet [48]4 Recognize different plant 91 759 images 48.60% (LC-correctspecies classification) AlexNet [49]5 Identify obstacles in row cropsand grass mowing 437 images 99.9% in row crops and90.8% in grass mowing(CA) AlexNet [50]6 Identify crop species and dis-eases 54 306 images 0.9935 (F1) AlexNet +GoogLeNet [7]7 Detect obstacles that are dis-tant, heavily occluded and un-known 48 images 0.72 (F1) AlexNet +VGG [51]8 Leaf disease detection 4483 images 96.30% (CA) CaffeNet [3]9 Identify thistle in winter wheatand spring barley images 4500 images 97.00% (CA) DenseNet [52]10 Predict number of tomatoes inimages 24 000 images 91% (RFC-Ratio oftotal fruits counted)on real images, 93%(RFC) on syntheticimages GoogLeNet+ ResNet [53]11 Classify banana leaf diseases 3700 images 96% (CA), 0.968 (F1) LeNet [54]12 Indentify pig face 1553 images 96.7%( CA) VGGNet [55]13 Classify weed from crop speciesbased on 22 different species intotal 10413 images 86.20% (CA) VGGNet [56]14 Detecting and categorizing thecriticalness of Fusarium wilt ofradish based on thresholding arange of color features 1500 images GoogLeNet [57]15 Fruit counting 24 000 images 91% accuracy Inception-ResNet [53]16 Automatic Plant disease diagno-sis for early disease symptoms 8178 images overall improvement ofthe balanced accuracyfrom 0.78 to 0.87 fromprevious 2017 study DeepResNet [58] fruit counting [53]. Many of the CNN developed for agricultural use depend onthe problem or challenge they solve. odern CNNs for IoT Based Farms 9
Despite remarkable achievement in use state-of-the-art CNN in agriculture ingeneral, there exist grey areas in relation to smart farm that future researchersmay look at. These areas may include; real-time image classification, interactiveimage classification and interactive object detection. State-of-the-art CNN is rel-atively a new technology that explain why the finding of the study about theiruse in smart farm is relatively small. However, its is important to note that mod-els built from state-of-the-art architectures have a impressive record of betterprecision performance. In this paper, we aimed at establishing the potential ofstate-of-the-art CNN in IoT based smart farms. In particular we first discussedthe architectures of state-of-the-art CNNs and their respective prediction accu-racy at the ILSVRC challenge. Then a survey on application of the identifiedCNNs in Agriculture was performed; to examine the particular application ina smart farm, listed technical details of the architecture employed and overallprediction accuracy achieved according to the author precision metrics. Fromthe study its evident of continuous accuracy improvement of the state-of-the-artCNN architectures as computer vision community put effort to perfect the meth-ods. The findings indicate that state-of-the-art CNN has achieved better preci-sion in all the cases applied in the agricultural domain, scoring higher accuracyin majority of the problem as compared to other image-processing techniques.Considering that the state-of-the-art CNN has achieved state-of-the-art resultsin prediction in general and high precision in the few farming cases observed,there is great potential that can be achieved in using the methods in smart farm-ing. It has been observed that many authors apply more than one architecturein order to optimize the performance of the network without compromising theexpected accuracy. This approach is very efficient in the observed cases, and werecommend similar hybrid approach when building robust IoT based networkswhich are computationally fair to the mobile devices. This study aims to moti-vate researchers to experiment and apply the state-of-the-art methods in smartfarms problems related to computer vision and data analysis in general.