[PDF] Application of the Neural Network Dependability Kit in Real-World Environments

Abstract

In this paper, we provide a guideline for using the Neural Network Dependability Kit (NNDK) during the development process of NN models, and show how the algorithm is applied in two image classification use cases. The case studies demonstrate the usage of the dependability kit to obtain insights about the NN model and how they informed the development process of the neural network model. After interpreting neural networks via the different metrics available in the NNDK, the developers were able to increase the NNs' accuracy, trust the developed networks, and make them more robust. In addition, we obtained a novel application-oriented technique to provide supporting evidence for an NN's classification result to the user. In the medical image classification use case, it was used to retrieve case images from the training dataset that were similar to the current patient's image and could therefore act as a support for the NN model's decision and aid doctors in interpreting the results.

Full PDF

AApplication of the Neural Network DependabilityKit in Real-World Environments

Amit Sahu fortiss GmbH

Munich, [email protected]

Noelia V´allez

Ubotica Technologies

Dublin, [email protected]

Rosana Rodr´ıguez-Bobada

Ubotica Technologies

Dublin, [email protected]

Mohamad Alhaddad

ISSD Bilis¸im Elektronik A.S¸.

Ankara, [email protected]

Omar Moured

ISSD Bilis¸im Elektronik A.S¸.

Ankara, [email protected]

Georg Neugschwandtner fortiss GmbH

Munich, [email protected]

Abstract —In this paper, we provide a guideline for usingthe Neural Network Dependability Kit (NNDK) during thedevelopment process of NN models, and show how the algorithmis applied in two image classiﬁcation use cases. The case studiesdemonstrate the usage of the dependability kit to obtain insightsabout the NN model and how they informed the developmentprocess of the neural network model. After interpreting neuralnetworks via the different metrics available in the NNDK, thedevelopers were able to increase the NNs’ accuracy, trust thedeveloped networks, and make them more robust. In addition,we obtained a novel application-oriented technique to providesupporting evidence for an NN’s classiﬁcation result to the user.In the medical image classiﬁcation use case, it was used to retrievecase images from the training dataset that were similar to thecurrent patient’s image and could therefore act as a supportfor the NN model’s decision and aid doctors in interpreting theresults.

I. I

NTRODUCTION

Neural networks have obtained state of the art results invision-based perception (e.g. YOLO [1]). This makes themessential for applications like autonomous driving, medicalimage processing, etc. However, the safety-critical nature ofthese applications limits the applicability of neural networksin the real world due to their black box nature (weightsand parameters). At fortiss, we tackled this challenge bydeveloping a neural network dependability kit (NNDK) [2].NNDK offers dependability metrics for developers to observethe learning (training) process, formal reasoning to avoid riskybehaviour (risk properties), and runtime monitoring to indicatethe network’s application to unknown examples.Theoretical analysis of NNDK and its application on stan-dard datasets had been discussed in previous research pa-pers [5]–[7]. In this paper, we discuss two use cases ofapplying NNDK during the development phase in real-worldenvironments: • Medical Imaging: Diabetic Retinopathy Detection byUbotica

This publication is part of a project that has received funding from theEuropean Union’s Horizon 2020 research and innovation programme undergrant agreement No. 761708. • Smart Tunnels: Incident Detection by ISSDfortiss worked with the development teams at Ubotica, andISSD and supported the application of the NNDK in thedevelopment process as part of the FED4SAE project.NNDK offers support (dependability metrics [5]) duringdifferent phases of the product life cycle:1) Data preparation: scenario k-projection coverage2) Training and validation: neuron activation pattern (NAP)metric, risk properties using formal reasoning3) Testing and generalization: neuron k-projection cover-age, perturbation loss metric4) Operation: runtime monitoring using NAPNNDK techniques were applied during these phases andbased on the results, the next steps were recommended. Afterfollowing up on these recommendations, better outcomes interms of the dependability metrics were obtained. The out-comes inﬂuenced the development decisions and the develop-ers were able to achieve more efﬁcient (faster prediction andlow memory usage) and effective (higher accuracy) operationof the neural network.II. D

EVELOPMENT PROCESS WITH

NNDK

TECHNIQUES

As stated in the previous section, NNDK techniques canbe applied during different phases of the product life cycle.In the following, we describe the generic development phasesthat are commonly used by ML development teams along withthe NNDK metrics that will allow these teams to quantify andimprove the quality of the process in the respective phases.

Data Preparation:

Data collection is the most basic andessential task as every other phase is affected by its qual-ity. High-quality datasets cover all scenarios and situations.However, obtaining a complete set of scenarios can result ina combinatorial explosion. Hence, to quantify the coverageof the scenarios by the dataset, NNDK offers scenario k-projection coverage. It is based on the concept of combinato-rial testing [8] and provides a relative form of completenessagainst the combinatorial explosion of scenarios [9]. It can a r X i v : . [ c s . L G ] D ec ig. 1: A simple GSN based interface to select the relevant solutions (Sn) from NNDK based on the required goals (G),including assumptions (A) and the agreed strategies (S) [2].also suggest new scenarios for labeling that will maximallyimprove the dataset. Training and Validation:

NN model training methodsalready involve metrics, in line with the application tasks,which widely differ based on their requirements. Hence,NNDK metrics were more directed towards validation andavoiding risky behaviour. For example, formal reasoning couldensure that the model satisﬁes the risk properties to ensurepredictable/reliable behaviour under conditions that are simi-lar, but possibly different, to the ones experienced in the testcases. Due to computational limits, the tool provides a layer-wise analysis of the NN model. Since most NN architectureshave deep layers, it provides analysis over a selected shallowlayer and can check the possibility of erroneous behaviour onthe boxed domain of its neurons. In the case of erroneous be- haviour further techniques (such as Counter Example GuidedAbstraction Reﬁnement, CEGAR) can be used to identifycounter-examples that can help in training a better model.In addition, one can look into the histograms of the NeuronActivation Pattern (NAP) of each class. One starts by selectinga target layer and obtaining its on-off activation pattern. As thetraining progresses, data points of the same class should havea similar number of neuron (feature) activations. Hence, thehistogram of a good model is a narrow graph. Observing thehistogram as the training progresses lets one keep this propertyin check.

Testing and generalization:

Test cases should cover allpaths of the software and be robust to noisy environments. Inthe case of NNs, these paths correspond to neuron activations.However, checking all possible neuron activations can result in combinatorial explosion. Therefore, in addition to scenariok-projection coverage, NNDK also offers neuron k-projectioncoverage over a pre-selected layer. This measures the com-pleteness of the test set to cover the whole neuron layer underanalysis.Also, since real environments are prone to different kindsof noise based on hardware (Gaussian), weather (snow, haze),or even deliberate (adversarial noise), NNDK offers a pertur-bation loss metric to measure how the system performs inthese situations. The metric creates these noisy data pointsand tests the model’s robustness (generalization) to noisyenvironments. These noisy data points can further be usedfor dataset augmentation. Training with an augmented datasetincreases the model’s robustness and can be measured usingthe perturbation loss metric.

Operation:

The behaviour of the NN model during trainingand in the operation environment can be widely different ifthe two datasets do not have an equivalent data distribution.Hence, we can only expect adequate performance from the NNmodel when it is applied to a data point with prior similaritiesto the training data. NNDK keeps track of data points byrecording their Neural Activation Pattern (NAP) on a pre-selected (mostly the penultimate) layer. During runtime, theNAP of the data point (in operation) is compared with the NAPdatabase of the training data using the hamming distance. Ifthe pattern is not found in the database, a warning is generatedthat the model’s output is not supported by the training data.As a guideline for applying NNDK as an ad-hoc solutionlimited to certain dependability aspects of the developmentprocess, the Goal Structured Notation (GSN) diagram shownin Figure 1 shows the contribution of NNDK metrics towardsthe overall goal [2] of a safe system. In this diagram, we triedto cover many safety situations that will be further extended aswe progress with the development of NNDK. The ﬁgure actsas an aid to select the solution (Sn) provided by NNDK basedon the associated goals (G), assumptions (A), and strategies(S). For example, one can start from the top goal G1 of havingan intended function delivered by the NN model. One strategyS1 would be to ensure that no undesired behaviour occurs. Thiscan be reﬁned into the different phases of the developmentcycle. During the operation phase (goal G5) strategy S3 wouldbe comparing the decision with prior similarities. Using thisﬁnal strategy NNDK provides solutions in terms of runtimemonitoring (Sn10).III. S

UCCESS STORIES OF

NNDK

APPLICATION

The case studies in this paper were made in 2019/2020 aspart of the FED4SAE project. NNDK was applied at all stagesto generate reports, which were used for guiding the NN modeldevelopment.Due to the close collaboration between fortiss and thedevelopment teams, the simpliﬁed process from Figure 1 wasnot used. Instead, the metrics were selected following an in-depth discussion and analysis of the respective requirementsand development processes.

A. Medical Imaging: Diabetic Retinopathy

The objective of the development was to create a workingprototype that demonstrates the classiﬁcation of retinal fundusimages for the presence of Diabetic Retinopathy (DR) indica-tors. Some of these indicators are tiny abnormal leaky bloodvessels (neovascularization), pale fatty deposits on the retina(exudates), and damaged nerves ﬁbers (”cotton wool spots”).The initial model was InceptionV3, whose last layer waschanged using transfer learning to classify the fundus images.The last layer consisted of 2048 neurons (features) which weremapped using a fully connected layer to two classes. SinceNNDK requires the selection of a feature layer to quantifythe learning process and the last layer (of the model) includedonly features obtained using training on the ImageNet dataset,new layers were added to the model. Two fully connectedextra layers with size 1024 and 512 were added and trainedon Kaggle’s Diabetic Retinopathy Detection challenge dataset.We selected relevant metrics by considering each develop-ment phase sequentially:1) Since the dataset consisted of only images with fewlabels, the scenario coverage metric for data preparationwas not considered.2) Due to limited understanding of the features obtained,formal reasoning was not applied during the trainingand the validation phase as it requires the creation ofrisk properties. The Neuron Activation Pattern (NAP)metric for both classes (positive, negative) was appliedand visualized using the histogram graph.3) For testing and generalization phases, both neuron k-projection coverage and the perturbation loss metricwere applied.4) In the operation phase, runtime monitoring was em-ployed to warn if the model’s output is not supportedby the training data.The tables and ﬁgures that explain the results of thesemetrics are attached in Appendix A for more details. Here,we only explain important points and the insights from theabove metrics that were useful in the development process.Important points and insights from the NAP metric his-togram for both classes–positive, negative–are as follows: • The penultimate layer (512 neurons) was selected foranalysis as it was assumed to have the most high-levelfeatures. • The graph for both classes peaked at high neuron activa-tions (around 2000 from 2048) for the initial model. • As the new model was trained, the graph showed reduc-tions in the number of activations for the negative casesand promotion of activations for the positive cases. • Since identifying the presence of a disease (positive class)should be more feature-oriented than the absence of thedisease (negative class), the training process was moretrustworthy.Important points and insights from the neuron k-projectioncoverage:

In the initial model, the coverage was high over thedataset as the ImageNet-trained model considered all thefeatures before classifying. • In the new model, as the training progressed, the coverageincreased ﬁrst and then kept going downwards. • The increment was attributed to the learning of speciﬁcfeatures from the dataset and the decrement was attributedto the unlearning of features from the ImageNet dataset. • In the ﬁnal model, the reduced coverage suggested thateven with a smaller set of neurons in the layer, theclassiﬁcation will most likely remain the same. Hence,in later stages of development, pruning of the networkarchitecture would be useful. After 74% pruning, themodel still achieved similar accuracy (reduction by 0.1)with less than half of the prediction time as compared tothe non-pruned model.Important points and insights from the perturbation lossmetric: • After applying different kinds of noise–s&p, Poisson,blur, brightness, gain, Gauss–on the ﬁnal model; Gaussand s&p were found to be most effective. • Positive data points had an average loss in conﬁdencescore of around 50% from Gaussian noise but negativedata points only suffered a 10% loss. • Negative data points had an average loss of around 80%from s&p noise but positive data points only suffered a5% loss. • Developers speculated that the effect of Gaussian noiseon positive data points was due to hiding the deﬁnitionof retinal lesions. This reduced the number of detectedfeatures and therefore the classiﬁcation. • Developers speculated that the effect of s&p noise onnegative data points was due to misunderstanding of thesmall noise points as small lesions of diabetic retinopathy. • The unaffected class cases support their speculation aswell.Important points and insights from runtime monitoring: • Since the dataset was small and unbalanced (9,316 pos-itive cases and 25,810 negative cases) there was a highpossibility that a fully comprehensive model including alltypes of cases would be trained. • In order to differentiate from the cases that were not partof the training, runtime monitoring using the NAP metricwas employed. • As expected, there were a few cases that were not foundin the patterns from the training dataset. • These non-priors were classiﬁed with high conﬁdence( > = 92% ). This shows the limitation of neural networksin identifying unknown cases using the conﬁdence score. • Furthermore, since there were no samples with a mis-match between the label predicted and the label of theclosest pattern match, the model effectively assigned eachclass according to the patterns learnt during training.In addition to applying NNDK for dependability metrics, wediscovered a novel application to use these metrics for case- based reasoning. NNs only provide a conﬁdence score, which(as seen in the runtime monitoring case) can be deceptive forunknown cases. Hence, in addition to providing the conﬁdencescore, during operation, a set of similar images from thetraining dataset would be beneﬁcial. In the case of DR, Uboticaused this feature in their application to provide doctors withsimilar case images using the NAP. Neural networks ﬁrst tryto extract features in the NN layer and then use this layerfor classiﬁcation. One can also use the visualization of thesefeatures on the images from the training dataset to serve asevidence for the NN model’s decision. A doctor can analyzethese supporting images to agree or disagree with the model’sdecision.

B. Object Detection: Smart Tunnels

The objective of the development was an Automatic Inci-dent Detection (AID) system for road tunnels using neuralnetworks. In improvement to the current OpenCV [10] basedsolution, it was anticipated that the ﬁnal system would be ableto both detect and track the movement of the various vehiclesor pedestrians with higher accuracy. The development teamwas able to achieve pedestrian detection and stationary vehicledetection in the ﬁnal application. The application was validatedin an operational tunnel where it performed better than thelegacy system.A standard object detection model–YOLOv3 [3]–was usedfor this task. A transfer learning model from YOLOv3 forthe two classes was evaluated but the improvements were notsigniﬁcant.Following the process according to Section II, relevantmetrics were selected by considering each development phasesequentially:1) A tunnel is a very controlled and constant environment.For example, except close to the entrance and exit,the lighting and precipitation never changes. Thus, thediversity in scenarios is minimal and coverage canbe controlled manually. Hence, the scenario coveragemetric for data preparation was not considered.2) Due to limited understanding of the features obtainedand the use of a standard NN model (YOLOv3), formalreasoning was not applied during the training and thevalidation phase as it requires the creation of riskproperties speciﬁc to the use case. The neuron activationpattern (NAP) metric for both classes (positive, negative)was applied and visualized using the histogram graph forvalidating the training results.3) For testing and generalization phases, both Neuron k-projection coverage and the perturbation loss metricwere applied.4) During the operation environment, runtime monitoringwas not deemed relevant due to the presence of cross-checking by human tunnel operators.In this use case, important points and insights from thedependability metrics were centered on network pruning andnoise sensitivity.euron k-projection coverage and NAP metrics showed thatvery few neurons were activated for both classes (pedestrian,stationary vehicle). This suggested that network pruning wouldbe a good next step. This was an essential insight as efﬁciencyof the inference step was a key requirement in this use casedue to the high number of images that need to be processedcontinuously.Applying the perturbation loss metric, it was discovered thatthe images are highly prone to the noise, resulting in a 95%average loss of conﬁdence. To deal with this problem, datasetaugmentation was done and new models were trained.Two models emerged from such training, one was morerobust to weather-related noise, and the other was more robustto hardware related noise. Weather-related noise like snowor haze can occur in some cameras at the entry and exit.Hardware related noise like Gaussian (due to high temperature,low illumination), salt & pepper (due to sharp disturbancein signal), and Poisson (due to electromagnetic properties oflight particles) can still occur inside the tunnel. Therefore, ashardware noise affects more cameras in a tunnel, the ﬁnalselected model was the one that was more robust against it.

C. Conclusion

We started with a brief explanation of the role that NNDKcan play in the development of a neural network model,followed by an explanation of the usage of the metrics inthe context of different development phases. We explainedthe results and use cases of the NNDK looking at two real-world applications: classiﬁcation of medical images for diseasemarkers and detecting pedestrians and stationary vehicles onroad tunnel surveillance camera feeds. Additionally, we intro-duced a novel technique to report similar data points from priorsimilarities in the training dataset using the Neural ActivationPattern (NAP) metric. This new technique provides supportfor the NN model’s decision and helps with interpreting theresults. As seen from the results on real-world applications,NNDK helped in increasing the accuracy, developing trust inthe models, and making them more robust.A

CKNOWLEDGMENT

The authors are grateful for the valuable insight gainedfrom the discussions with Finian Rogers of Intel R&D Irelandthroughout the project. R

EFERENCES[1] Alexey Bochkovskiy, Chien-Yao Wang, and H. Liao, “YOLOv4: Op-timal Speed and Accuracy of Object Detection,” in ArXiv, vol.abs/2004.10934, 2020.[2] Chih-Hong Cheng, Chung-Hao Huang, and Georg N¨uhrenberg, “nn-dependability-kit: Engineering Neural Networks for Safety-Critical Sys-tems,” in ArXiv, vol. abs/1811.06746, 2018.[3] Redmon, Joseph and Ali Farhadi. “YOLOv3: An Incremental Improve-ment,” ArXiv abs/1804.02767 (2018): n. pag.[4] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, etal., “Microsoft COCO: Common objects in context,” In proceedingsEuropean Conference on Computer Vision, pp. 740-755, 2014.[5] C. Cheng, G. N¨uhrenberg, C. Huang, H. Ruess and H. Yasuoka,“Towards Dependability Metrics for Neural Networks,” 2018 16thACM/IEEE International Conference on Formal Methods and Mod-els for System Design (MEMOCODE), Beijing, 2018, pp. 1-4, doi:10.1109/MEMCOD.2018.8556962. [6] Cheng CH., N¨uhrenberg G., Ruess H. “Maximum Resilience of ArtiﬁcialNeural Networks,” In: D’Souza D., Narayan Kumar K. (eds) AutomatedTechnology for Veriﬁcation and Analysis. ATVA. Lecture Notes inComputer Science, vol 10482. Springer, 2017.[7] Chih-Hong Cheng, Georg N¨uhrenberg, and Hirotoshi Yasuoka.“Runtime Monitoring Neuron Activation Patterns,” Design, Au-tomation & Test in Europe Conference & Exhibition, 300-303.10.23919/DATE.2019.8714971, 2019.[8] C. J. Colbourn, “Combinatorial aspects of covering arrays,” Le Matem-atiche, vol 59(1,2):125–172, 2004.[9] J. Lawrence, R. N. Kacker, Y. Lei, D. R. Kuhn, and M. Forbes, “A surveyof binary covering arrays. the electronic journal of combinatorics,” TheElectronic Journal of Combinatorics, vol 18(1):84, 2011.[10] Bradski, G., and Kaehler, A., “OpenCV Library,” Dr. Dobb’s journal ofsoftware tools, 3, 2000. A PPENDIX AM EDICAL I MAGING : D

IABETIC R ETINOPATHY

Most people with diabetes are unaware of having diabetescomplications. However, most complications can be detectedin their early stages by screening programs. In particular,Diabetic Retinopathy (DR) is a medical condition derived fromdiabetes in which damage occurs to the retina. Individuals whopresent for screening have retinal fundus images taken usinga specialist camera. These images are read by screeners at alater date to detect the presence of DR indicators.An InceptionV3 was initially trained to distinguish betweenDR and NonDR images. The datased used was the onepublished in the Kaggle’s Diabetic Retinopathy Detectionchallenge and the training was based on a transfer learningapproach where the ImageNet weights were loaded and onlythe dense layers were trained.The architecture was then modiﬁed adding two dense layersof size 1024 and 512 to better facilitate the analysis. Inthis case, the complete architecture was trained. The modelselected was the one corresponding to the checkpoint fromepoch 110.Finally, the new model undertook a pruning process toreduce their size while roughly maintaining the accuracy. Theapproach followed consists of simultaneously deleting a 2% ofthe network channels in a loop. The channels to be removedeach time are the channels which have the highest AveragePercentage of Zeros (APoZ). In addition, to allow the modelto compensate for the pruned channels, 10 training epochs arerun between pruning iterations.Tables I and II show the k-projection coverage resultsof the initial and new models including some intermediateresults. Similarly, Figure 2 depicts the NAP metric results.The perturbation loss results of both the initial and the newmodel are shown in Figure 3. Finally, Figure 4 and TableIV show how runtime monitoring is used in this particu-lar scenario and the number of images that have a cer-tain real class-predicted class-closest pattern class combina-tion respectively. a) initial model(b) new model (epoch = 1)(c) new model (epoch = 60)(d) new model (epoch = 110)

Fig. 2: NAP metric resultsABLE I: Results of k-projection coverage with k=1 and k=2from the initial model

TABLE II: Results of k-projection coverage with k=1 and k=2from the new model and 5 of its checkpoints

Model 1-projection 2-projectionDR NonDR DR NonDRNew model 625 / 1024 627 / 1024 188987 / 523264 190487 / 523264epoch=01 0.610351 0.612304 0.361169 0.364036New model 856 / 1024 827 / 1024 336990 / 523264 307938 / 523264epoch=20 0.835937 0.807617 0.644015 0.588494New model 787 / 1024 781 / 1024 288822 / 523264 281457 / 523264epoch=40 0.768554 0.762695 0.551962 0.537887New model 735 / 1024 718 / 1024 257012 / 523264 247240 / 523264epoch=60 0.717773 0.761171 0.491170 0.472495New model 607 / 1024 594 / 1024 181143 / 523264 173959 / 523264epoch=80 0.592773 0.580078 0.346178 0.332449New model 600 / 1024 610 / 1024 176847 / 523264 182512 / 523264epoch=110 0.58593 0.595703 0.595703 0.348795

TABLE III: Accuracy results of the pruned models. In themodel name wXX-wYY, YY indicates the checkpoint fromthe training epochs between pruning iterations an XX thecheckpoint from the training epochs after performing thepruning. % Pruning Pruned model ResultsDR Acc NonDR Acc Acc0% new-model-w110 0.630 0.830 0.77022% w07-w03 0.602 0.845 0.78138% w25-w10 0.640 0.795 0.75556% w07-w05 0.579 0.844 0.77574% w02-w10 0.591 0.832 0.769

TABLE IV: Distribution of the number of images using aHamming distance of d=0 and according to the real class,the predicted class, and the class of the closest pattern found

Real Predicted Closest Pattern

DR 534 0.95

NonDR 0 -Both 28 0.53Not found 12 0.98NonDR DR 0 -NonDR 318 0.87Both 21 0.51Not found 2 0.92NonDR DR DR 402 0.87NonDR 0 -Both 34 0.52Not found 7 0.94NonDR DR 0 -

NonDR 2080 0.91

Both 39 0.51Not found 6 0.96

Fig. 3: Perturbation loss results of the new model (epoch =110)ig. 4: Runtime monitoring process

PPENDIX BO BJECT D ETECTION : S

MART T UNNELS [htb] Thousands of vehicular tunnels are in constant useworldwide. As the main safety mechanism, these tunnelsdeploy CCTV cameras which are wired back to a centralcontrol room. Here, human operators monitor the vehicularﬂow to ensure the safety of the tunnel and its users. However,human operators alone cannot handle the monitoring taskat large scales due to vigilance decrements. Hence, theyare supplemented with automatic incident detection (AID)systems.At ISSD, the previously deployed solution (SPECTO) em-ployed algorithms from OpenCV. The project’s challenge wasto leverage its performance while delivering a reasonablereduction in the product life cycle cost. Hence, the AIDsolution was enhanced with object detection using neuralnetworks–Smart Tunnel. The ﬁnal system was able to detectthe vehicles, and pedestrians with better performance andmuch more efﬁcient hardware costs. The comparison of thetwo systems via scorecard is shown in Figure 7.NNDK metrics were analysed in a structured way (Sec-tion II) and the relevant ones were evaluated to understandthe working of the NN model–YOLOv3. Neuron activationpattern metric was applied as follows:1) A convolution layer ( th ) with shape (255, 32, 52)was selected for observation.2) Average ﬁlter was applied over it to obtain a vector ofsize 255.3) Objects were ﬁltered based on the conﬁdence thresholds:80%, 95%.4) For each class–vehicle, pedestrian–neuron activations ofdetected objects with the above conﬁdence thresholdswere noted.NAP results for pedestrian class are shown in Figure 5, andfor vehicle class in Figure 6.Fig. 5: NAP for pedestrian. Left is with classiﬁcation conﬁ-dence 95% while right is with 80%Neuron K-projection coverage was applied on the samelayer and for K=2, results are summarized in Table V.TABLE V: Results of k-projection coverage with k=2 Input size (images) Active patterns Coverage (%)40 32893/129540 25.39509 33149/129540 25.58

Fig. 6: NAP for vehicle. A is with classiﬁcation conﬁdence95% while B is with 80%For synthetic noise analysis, four models were evaluated: • YOLOv3 was the standard model that was trained on theCOCO dataset [4]. • Model A was trained on ISSD’s tunnel pedestrian datasetusing transfer learning on YOLOv3. • Model B was trained on both ISSD’s tunnel pedestrianand the synthetic noise dataset which was created withNNDK’s noise generation package. • Model C was the modiﬁed version of model A by furtherapplying transfer learning (10 extra epochs) with thenoise dataset to tune its parameters accordingly.Models performance analysis against synthetic noise usingaverage perturbation loss metric is shown in Table VI. Models(A, B, C) are compared with the perturbation loss on the stan-dard YOLOv3. The metric measures the efﬁcacy of syntheticnoise. Therefore, the lower the metric, the better the model.TABLE VI: Performance analysis of models against syntheticnoise